diff --git a/docs/AUDIT_TRAIL_PRD.md b/docs/AUDIT_TRAIL_PRD.md index f91ecb6..2d3a20f 100644 --- a/docs/AUDIT_TRAIL_PRD.md +++ b/docs/AUDIT_TRAIL_PRD.md @@ -211,19 +211,48 @@ The identity service is the new shared substrate — both runtimes call it; the --- -## 10. Open questions blocking phase 1 +## 10. Open questions blocking phase 2 (resolved 2026-05-03) -These are the things I need J to decide before phase 1 can start, OR I need to investigate-and-propose: +**These were the original Phase 1 open questions. J answered the load-bearing 5 in conversation 2026-05-03; answers folded in below. Remaining items 1, 4, 6 still need J's call before Phase 2 design ships.** -1. **Identity service: separate daemon vs in-process?** Recommend separate. Confirm. -2. **Retention period N years?** Recommend 4. Need staffing client's legal call. -3. **Photo / surname / zip-code policy?** These are inferred-attribute risks. Need policy decision. -4. **JSON or signed PDF for legal export?** Different downstream costs. -5. **Right-to-be-forgotten under append-only logs:** cryptographic erasure (proposed) or hard delete (breaks integrity)? Confirm crypto-erasure approach. -6. **Audit endpoint auth model:** legal-only credential, or shared with admin? Recommend legal-only with separate token rotation. -7. **The "indexed before search" concern:** matrix indexer + pathway memory currently fingerprint by code, not subject. Do we (a) make them subject-aware (more audit completeness, more PII surface area), (b) keep them code-only and assert in audit response that "no subject-specific compounding state was used," or (c) something else? +1. **Identity service: separate daemon vs in-process?** _Recommended: separate. **Status: pending J confirmation.**_ +2. ~~**Retention period N years?**~~ — Out of scope for now; will be set at deployment time per client contract. Default to 4-year hot retention until set. +3. ✅ **Photos/video in scope?** **YES** (J 2026-05-03). BIPA (740 ILCS 14) applies in full. Per-violation $1k-$5k statutory damages, written consent + retention schedule mandatory. **This becomes a Phase 1.5 priority** — see §10.5/§13 for revised phase ordering. +4. **JSON or signed PDF for legal export?** _Recommend signed JSON with PDF rendering option. **Status: pending.**_ +5. ✅ **RTBF under append-only:** Cryptographic erasure approach approved in principle (J 2026-05-03 implicit via approval of plan). +6. **Audit endpoint auth model:** _Recommend legal-only credential separate from admin token. **Status: pending.**_ +7. ✅ **Matrix indexer subject-awareness:** Per scrum review (`AUDIT_PHASE_1_DISCOVERY.md` §10/C1), matrix-indexer is suspected PII sink (trace bodies unverified). Action: sample state.json before deciding (a) keep code-only + add PII-redact-on-write to trace bodies, OR (b) remove subject-summary text from trace bodies entirely. Decision deferred until §8.1 sampling completes. -Items 1-6 can be resolved by J's call. Item 7 needs design discussion — the safest answer for legal defense is (b), but it loses the "pathway learns about THIS candidate" signal that may be load-bearing for the staffing UX. +### Newly answered 2026-05-03 (J) + +8. ✅ **Langfuse hosting model:** **Self-hosted.** Removes the GDPR Art. 44 cross-border-transfer concern that 3/3 scrum reviewers flagged. Langfuse retention config + Postgres/ClickHouse access controls still need to be audited as part of Phase 1.5 — but the boundary stays inside J's infrastructure, which is materially better than SaaS Langfuse. + +9. ✅ **EU candidates in scope:** **Not currently — may need placeholder later.** Design choice: build the identity-service interface to be EU-compatible (DPIA-shaped fields, lawful-basis tracking, SCC-ready transfer mechanism slots) but DO NOT gate Phase 2 on EU compliance. Phase 2 ships IL+IN-shaped; EU additions are a follow-up phase. + +10. ✅ **Healthcare vertical / HIPAA:** **Same framework — yes.** Healthcare staffing IS in scope. PHI in `resume_text`, `communications`, and `call_log` is realistic. **Implications:** + - Business Associate Agreement (BAA) required with any third-party model provider that processes content from healthcare-vertical staffing requests + - opencode + ollama_cloud + openrouter (per PR #13 routing) are external — BAAs needed OR healthcare requests must route to local-only models (Ollama on-box) + - PHI redaction at the gateway boundary becomes mandatory before the model call leaves the box, OR the model call must stay on-box for healthcare requests + - Vertical detection at the gateway boundary becomes a Phase 2 requirement + +11. ✅ **Training / RAG re-runs may use historical outcomes:** **Yes — design as if it WILL.** Implications: + - `outcomes.jsonl` and `overseer_corrections.jsonl` cannot remain raw-PII forever — anything that lands in a training corpus or RAG re-index becomes ungeneratable to delete (PII in model weights) + - Phase 2 design must include a "training-safe export" pipeline that strips PII from outcomes before feeding to any training/RAG path + - Crypto-erasure of historical outcomes becomes load-bearing — if a candidate exercises RTBF and their data already trained a model, we must be able to evidence "the source was destroyed; the model retains it indistinguishably from synthetic patterns" + +### Effect on the §8 phase plan + +The user-confirmed answers shift priorities. Revised ordering (incorporating scrum-driven priority changes from `AUDIT_PHASE_1_DISCOVERY.md` §10): + +- **Phase 1.5 (NEW)** — BIPA-specific photo/video schema audit + Langfuse boundary scoping + outcomes.jsonl content sample. Lands BEFORE Phase 2 design starts. +- **Phase 2 (identity service design)** — now must include EU-placeholder fields, vertical-detection (healthcare flag), training-safe export interface, BIPA consent + retention metadata +- **Phase 3** (audit endpoint skeleton) — unchanged +- **Phase 4** (subject tagging) — must include healthcare-vertical routing decision at gateway boundary +- **Phase 5** (identity service build) — must include BIPA-compliant biometric metadata table +- **Phase 6** (protected-attribute boundary) — must include PHI redaction for healthcare-vertical requests +- **Phase 7** (retention + RTBF) — must include training-safe export evidence chain +- **Phase 8** (legal export) — unchanged +- **Phase 9** (rehearsal) — must include both EEOC discrimination scenario AND BIPA biometric scenario AND healthcare PHI breach scenario ---