audit PRD: J answered 5 open questions — fold into §10, revise phase plan

Conversation 2026-05-03 — J confirmed:
- Photos/video YES → BIPA in full force ($1k-$5k per violation)
- Langfuse self-hosted → drops GDPR Art. 44 cross-border concern
- EU not in scope now but placeholder needed → design EU-compatible
- Healthcare vertical YES → HIPAA BAA needed with model providers,
  PHI redaction at gateway boundary OR local-only routing for those
  requests, vertical-detection at boundary is Phase 2 requirement
- Training/RAG MAY re-run on outcomes → design as if it will, training-
  safe export interface needed, crypto-erasure becomes load-bearing
  evidence chain

§10 updated with answered/pending status per question. New §10.5
"Effect on phase plan" introduces:
- Phase 1.5 (NEW) — BIPA photo/video schema audit + Langfuse boundary
  scoping + outcomes.jsonl content sample, BEFORE Phase 2 design
- Phase 2 design must now include: EU-placeholder fields, vertical
  detection, training-safe export, BIPA consent metadata
- Phase 9 rehearsal must cover discrimination + BIPA + healthcare PHI

3 questions still pending J's call before Phase 2 design ships:
identity service daemon vs in-process, JSON vs signed PDF for legal
export, audit endpoint auth model.

No code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-05-03 01:16:27 -05:00
parent 627a5f0c3d
commit 64bda21614

View File

@ -211,19 +211,48 @@ The identity service is the new shared substrate — both runtimes call it; the
---
## 10. Open questions blocking phase 1
## 10. Open questions blocking phase 2 (resolved 2026-05-03)
These are the things I need J to decide before phase 1 can start, OR I need to investigate-and-propose:
**These were the original Phase 1 open questions. J answered the load-bearing 5 in conversation 2026-05-03; answers folded in below. Remaining items 1, 4, 6 still need J's call before Phase 2 design ships.**
1. **Identity service: separate daemon vs in-process?** Recommend separate. Confirm.
2. **Retention period N years?** Recommend 4. Need staffing client's legal call.
3. **Photo / surname / zip-code policy?** These are inferred-attribute risks. Need policy decision.
4. **JSON or signed PDF for legal export?** Different downstream costs.
5. **Right-to-be-forgotten under append-only logs:** cryptographic erasure (proposed) or hard delete (breaks integrity)? Confirm crypto-erasure approach.
6. **Audit endpoint auth model:** legal-only credential, or shared with admin? Recommend legal-only with separate token rotation.
7. **The "indexed before search" concern:** matrix indexer + pathway memory currently fingerprint by code, not subject. Do we (a) make them subject-aware (more audit completeness, more PII surface area), (b) keep them code-only and assert in audit response that "no subject-specific compounding state was used," or (c) something else?
1. **Identity service: separate daemon vs in-process?** _Recommended: separate. **Status: pending J confirmation.**_
2. ~~**Retention period N years?**~~ — Out of scope for now; will be set at deployment time per client contract. Default to 4-year hot retention until set.
3. **Photos/video in scope?** **YES** (J 2026-05-03). BIPA (740 ILCS 14) applies in full. Per-violation $1k-$5k statutory damages, written consent + retention schedule mandatory. **This becomes a Phase 1.5 priority** — see §10.5/§13 for revised phase ordering.
4. **JSON or signed PDF for legal export?** _Recommend signed JSON with PDF rendering option. **Status: pending.**_
5. **RTBF under append-only:** Cryptographic erasure approach approved in principle (J 2026-05-03 implicit via approval of plan).
6. **Audit endpoint auth model:** _Recommend legal-only credential separate from admin token. **Status: pending.**_
7. **Matrix indexer subject-awareness:** Per scrum review (`AUDIT_PHASE_1_DISCOVERY.md` §10/C1), matrix-indexer is suspected PII sink (trace bodies unverified). Action: sample state.json before deciding (a) keep code-only + add PII-redact-on-write to trace bodies, OR (b) remove subject-summary text from trace bodies entirely. Decision deferred until §8.1 sampling completes.
Items 1-6 can be resolved by J's call. Item 7 needs design discussion — the safest answer for legal defense is (b), but it loses the "pathway learns about THIS candidate" signal that may be load-bearing for the staffing UX.
### Newly answered 2026-05-03 (J)
8. ✅ **Langfuse hosting model:** **Self-hosted.** Removes the GDPR Art. 44 cross-border-transfer concern that 3/3 scrum reviewers flagged. Langfuse retention config + Postgres/ClickHouse access controls still need to be audited as part of Phase 1.5 — but the boundary stays inside J's infrastructure, which is materially better than SaaS Langfuse.
9. ✅ **EU candidates in scope:** **Not currently — may need placeholder later.** Design choice: build the identity-service interface to be EU-compatible (DPIA-shaped fields, lawful-basis tracking, SCC-ready transfer mechanism slots) but DO NOT gate Phase 2 on EU compliance. Phase 2 ships IL+IN-shaped; EU additions are a follow-up phase.
10. ✅ **Healthcare vertical / HIPAA:** **Same framework — yes.** Healthcare staffing IS in scope. PHI in `resume_text`, `communications`, and `call_log` is realistic. **Implications:**
- Business Associate Agreement (BAA) required with any third-party model provider that processes content from healthcare-vertical staffing requests
- opencode + ollama_cloud + openrouter (per PR #13 routing) are external — BAAs needed OR healthcare requests must route to local-only models (Ollama on-box)
- PHI redaction at the gateway boundary becomes mandatory before the model call leaves the box, OR the model call must stay on-box for healthcare requests
- Vertical detection at the gateway boundary becomes a Phase 2 requirement
11. ✅ **Training / RAG re-runs may use historical outcomes:** **Yes — design as if it WILL.** Implications:
- `outcomes.jsonl` and `overseer_corrections.jsonl` cannot remain raw-PII forever — anything that lands in a training corpus or RAG re-index becomes ungeneratable to delete (PII in model weights)
- Phase 2 design must include a "training-safe export" pipeline that strips PII from outcomes before feeding to any training/RAG path
- Crypto-erasure of historical outcomes becomes load-bearing — if a candidate exercises RTBF and their data already trained a model, we must be able to evidence "the source was destroyed; the model retains it indistinguishably from synthetic patterns"
### Effect on the §8 phase plan
The user-confirmed answers shift priorities. Revised ordering (incorporating scrum-driven priority changes from `AUDIT_PHASE_1_DISCOVERY.md` §10):
- **Phase 1.5 (NEW)** — BIPA-specific photo/video schema audit + Langfuse boundary scoping + outcomes.jsonl content sample. Lands BEFORE Phase 2 design starts.
- **Phase 2 (identity service design)** — now must include EU-placeholder fields, vertical-detection (healthcare flag), training-safe export interface, BIPA consent + retention metadata
- **Phase 3** (audit endpoint skeleton) — unchanged
- **Phase 4** (subject tagging) — must include healthcare-vertical routing decision at gateway boundary
- **Phase 5** (identity service build) — must include BIPA-compliant biometric metadata table
- **Phase 6** (protected-attribute boundary) — must include PHI redaction for healthcare-vertical requests
- **Phase 7** (retention + RTBF) — must include training-safe export evidence chain
- **Phase 8** (legal export) — unchanged
- **Phase 9** (rehearsal) — must include both EEOC discrimination scenario AND BIPA biometric scenario AND healthcare PHI breach scenario
---