audit PRD: J answered 5 open questions — fold into §10, revise phase plan

Conversation 2026-05-03 — J confirmed: - Photos/video YES → BIPA in full force ($1k-$5k per violation) - Langfuse self-hosted → drops GDPR Art. 44 cross-border concern - EU not in scope now but placeholder needed → design EU-compatible - Healthcare vertical YES → HIPAA BAA needed with model providers, PHI redaction at gateway boundary OR local-only routing for those requests, vertical-detection at boundary is Phase 2 requirement - Training/RAG MAY re-run on outcomes → design as if it will, training- safe export interface needed, crypto-erasure becomes load-bearing evidence chain §10 updated with answered/pending status per question. New §10.5 "Effect on phase plan" introduces: - Phase 1.5 (NEW) — BIPA photo/video schema audit + Langfuse boundary scoping + outcomes.jsonl content sample, BEFORE Phase 2 design - Phase 2 design must now include: EU-placeholder fields, vertical detection, training-safe export, BIPA consent metadata - Phase 9 rehearsal must cover discrimination + BIPA + healthcare PHI 3 questions still pending J's call before Phase 2 design ships: identity service daemon vs in-process, JSON vs signed PDF for legal export, audit endpoint auth model. No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:16:27 -05:00 · 2026-05-03 01:16:27 -05:00 · 64bda21614
commit 64bda21614
parent 627a5f0c3d
1 changed files with 39 additions and 10 deletions
--- a/docs/AUDIT_TRAIL_PRD.md
+++ b/docs/AUDIT_TRAIL_PRD.md
@ -211,19 +211,48 @@ The identity service is the new shared substrate — both runtimes call it; the

 ---

-## 10. Open questions blocking phase 1
+## 10. Open questions blocking phase 2 (resolved 2026-05-03)

-These are the things I need J to decide before phase 1 can start, OR I need to investigate-and-propose:
+**These were the original Phase 1 open questions. J answered the load-bearing 5 in conversation 2026-05-03; answers folded in below. Remaining items 1, 4, 6 still need J's call before Phase 2 design ships.**

-1. **Identity service: separate daemon vs in-process?** Recommend separate. Confirm.
-2. **Retention period N years?** Recommend 4. Need staffing client's legal call.
-3. **Photo / surname / zip-code policy?** These are inferred-attribute risks. Need policy decision.
-4. **JSON or signed PDF for legal export?** Different downstream costs.
-5. **Right-to-be-forgotten under append-only logs:** cryptographic erasure (proposed) or hard delete (breaks integrity)? Confirm crypto-erasure approach.
-6. **Audit endpoint auth model:** legal-only credential, or shared with admin? Recommend legal-only with separate token rotation.
-7. **The "indexed before search" concern:** matrix indexer + pathway memory currently fingerprint by code, not subject. Do we (a) make them subject-aware (more audit completeness, more PII surface area), (b) keep them code-only and assert in audit response that "no subject-specific compounding state was used," or (c) something else?
+1. **Identity service: separate daemon vs in-process?** _Recommended: separate. **Status: pending J confirmation.**_
+2. ~~**Retention period N years?**~~ — Out of scope for now; will be set at deployment time per client contract. Default to 4-year hot retention until set.
+3. ✅ **Photos/video in scope?** **YES** (J 2026-05-03). BIPA (740 ILCS 14) applies in full. Per-violation $1k-$5k statutory damages, written consent + retention schedule mandatory. **This becomes a Phase 1.5 priority** — see §10.5/§13 for revised phase ordering.
+4. **JSON or signed PDF for legal export?** _Recommend signed JSON with PDF rendering option. **Status: pending.**_
+5. ✅ **RTBF under append-only:** Cryptographic erasure approach approved in principle (J 2026-05-03 implicit via approval of plan).
+6. **Audit endpoint auth model:** _Recommend legal-only credential separate from admin token. **Status: pending.**_
+7. ✅ **Matrix indexer subject-awareness:** Per scrum review (`AUDIT_PHASE_1_DISCOVERY.md` §10/C1), matrix-indexer is suspected PII sink (trace bodies unverified). Action: sample state.json before deciding (a) keep code-only + add PII-redact-on-write to trace bodies, OR (b) remove subject-summary text from trace bodies entirely. Decision deferred until §8.1 sampling completes.

-Items 1-6 can be resolved by J's call. Item 7 needs design discussion — the safest answer for legal defense is (b), but it loses the "pathway learns about THIS candidate" signal that may be load-bearing for the staffing UX.
+### Newly answered 2026-05-03 (J)
+
+8. ✅ **Langfuse hosting model:** **Self-hosted.** Removes the GDPR Art. 44 cross-border-transfer concern that 3/3 scrum reviewers flagged. Langfuse retention config + Postgres/ClickHouse access controls still need to be audited as part of Phase 1.5 — but the boundary stays inside J's infrastructure, which is materially better than SaaS Langfuse.
+
+9. ✅ **EU candidates in scope:** **Not currently — may need placeholder later.** Design choice: build the identity-service interface to be EU-compatible (DPIA-shaped fields, lawful-basis tracking, SCC-ready transfer mechanism slots) but DO NOT gate Phase 2 on EU compliance. Phase 2 ships IL+IN-shaped; EU additions are a follow-up phase.
+
+10. ✅ **Healthcare vertical / HIPAA:** **Same framework — yes.** Healthcare staffing IS in scope. PHI in `resume_text`, `communications`, and `call_log` is realistic. **Implications:**
+    - Business Associate Agreement (BAA) required with any third-party model provider that processes content from healthcare-vertical staffing requests
+    - opencode + ollama_cloud + openrouter (per PR #13 routing) are external — BAAs needed OR healthcare requests must route to local-only models (Ollama on-box)
+    - PHI redaction at the gateway boundary becomes mandatory before the model call leaves the box, OR the model call must stay on-box for healthcare requests
+    - Vertical detection at the gateway boundary becomes a Phase 2 requirement
+
+11. ✅ **Training / RAG re-runs may use historical outcomes:** **Yes — design as if it WILL.** Implications:
+    - `outcomes.jsonl` and `overseer_corrections.jsonl` cannot remain raw-PII forever — anything that lands in a training corpus or RAG re-index becomes ungeneratable to delete (PII in model weights)
+    - Phase 2 design must include a "training-safe export" pipeline that strips PII from outcomes before feeding to any training/RAG path
+    - Crypto-erasure of historical outcomes becomes load-bearing — if a candidate exercises RTBF and their data already trained a model, we must be able to evidence "the source was destroyed; the model retains it indistinguishably from synthetic patterns"
+
+### Effect on the §8 phase plan
+
+The user-confirmed answers shift priorities. Revised ordering (incorporating scrum-driven priority changes from `AUDIT_PHASE_1_DISCOVERY.md` §10):
+
+- **Phase 1.5 (NEW)** — BIPA-specific photo/video schema audit + Langfuse boundary scoping + outcomes.jsonl content sample. Lands BEFORE Phase 2 design starts.
+- **Phase 2 (identity service design)** — now must include EU-placeholder fields, vertical-detection (healthcare flag), training-safe export interface, BIPA consent + retention metadata
+- **Phase 3** (audit endpoint skeleton) — unchanged
+- **Phase 4** (subject tagging) — must include healthcare-vertical routing decision at gateway boundary
+- **Phase 5** (identity service build) — must include BIPA-compliant biometric metadata table
+- **Phase 6** (protected-attribute boundary) — must include PHI redaction for healthcare-vertical requests
+- **Phase 7** (retention + RTBF) — must include training-safe export evidence chain
+- **Phase 8** (legal export) — unchanged
+- **Phase 9** (rehearsal) — must include both EEOC discrimination scenario AND BIPA biometric scenario AND healthcare PHI breach scenario

 ---