audit PRD: J answered 5 open questions — fold into §10, revise phase plan
Conversation 2026-05-03 — J confirmed: - Photos/video YES → BIPA in full force ($1k-$5k per violation) - Langfuse self-hosted → drops GDPR Art. 44 cross-border concern - EU not in scope now but placeholder needed → design EU-compatible - Healthcare vertical YES → HIPAA BAA needed with model providers, PHI redaction at gateway boundary OR local-only routing for those requests, vertical-detection at boundary is Phase 2 requirement - Training/RAG MAY re-run on outcomes → design as if it will, training- safe export interface needed, crypto-erasure becomes load-bearing evidence chain §10 updated with answered/pending status per question. New §10.5 "Effect on phase plan" introduces: - Phase 1.5 (NEW) — BIPA photo/video schema audit + Langfuse boundary scoping + outcomes.jsonl content sample, BEFORE Phase 2 design - Phase 2 design must now include: EU-placeholder fields, vertical detection, training-safe export, BIPA consent metadata - Phase 9 rehearsal must cover discrimination + BIPA + healthcare PHI 3 questions still pending J's call before Phase 2 design ships: identity service daemon vs in-process, JSON vs signed PDF for legal export, audit endpoint auth model. No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
627a5f0c3d
commit
64bda21614
@ -211,19 +211,48 @@ The identity service is the new shared substrate — both runtimes call it; the
|
||||
|
||||
---
|
||||
|
||||
## 10. Open questions blocking phase 1
|
||||
## 10. Open questions blocking phase 2 (resolved 2026-05-03)
|
||||
|
||||
These are the things I need J to decide before phase 1 can start, OR I need to investigate-and-propose:
|
||||
**These were the original Phase 1 open questions. J answered the load-bearing 5 in conversation 2026-05-03; answers folded in below. Remaining items 1, 4, 6 still need J's call before Phase 2 design ships.**
|
||||
|
||||
1. **Identity service: separate daemon vs in-process?** Recommend separate. Confirm.
|
||||
2. **Retention period N years?** Recommend 4. Need staffing client's legal call.
|
||||
3. **Photo / surname / zip-code policy?** These are inferred-attribute risks. Need policy decision.
|
||||
4. **JSON or signed PDF for legal export?** Different downstream costs.
|
||||
5. **Right-to-be-forgotten under append-only logs:** cryptographic erasure (proposed) or hard delete (breaks integrity)? Confirm crypto-erasure approach.
|
||||
6. **Audit endpoint auth model:** legal-only credential, or shared with admin? Recommend legal-only with separate token rotation.
|
||||
7. **The "indexed before search" concern:** matrix indexer + pathway memory currently fingerprint by code, not subject. Do we (a) make them subject-aware (more audit completeness, more PII surface area), (b) keep them code-only and assert in audit response that "no subject-specific compounding state was used," or (c) something else?
|
||||
1. **Identity service: separate daemon vs in-process?** _Recommended: separate. **Status: pending J confirmation.**_
|
||||
2. ~~**Retention period N years?**~~ — Out of scope for now; will be set at deployment time per client contract. Default to 4-year hot retention until set.
|
||||
3. ✅ **Photos/video in scope?** **YES** (J 2026-05-03). BIPA (740 ILCS 14) applies in full. Per-violation $1k-$5k statutory damages, written consent + retention schedule mandatory. **This becomes a Phase 1.5 priority** — see §10.5/§13 for revised phase ordering.
|
||||
4. **JSON or signed PDF for legal export?** _Recommend signed JSON with PDF rendering option. **Status: pending.**_
|
||||
5. ✅ **RTBF under append-only:** Cryptographic erasure approach approved in principle (J 2026-05-03 implicit via approval of plan).
|
||||
6. **Audit endpoint auth model:** _Recommend legal-only credential separate from admin token. **Status: pending.**_
|
||||
7. ✅ **Matrix indexer subject-awareness:** Per scrum review (`AUDIT_PHASE_1_DISCOVERY.md` §10/C1), matrix-indexer is suspected PII sink (trace bodies unverified). Action: sample state.json before deciding (a) keep code-only + add PII-redact-on-write to trace bodies, OR (b) remove subject-summary text from trace bodies entirely. Decision deferred until §8.1 sampling completes.
|
||||
|
||||
Items 1-6 can be resolved by J's call. Item 7 needs design discussion — the safest answer for legal defense is (b), but it loses the "pathway learns about THIS candidate" signal that may be load-bearing for the staffing UX.
|
||||
### Newly answered 2026-05-03 (J)
|
||||
|
||||
8. ✅ **Langfuse hosting model:** **Self-hosted.** Removes the GDPR Art. 44 cross-border-transfer concern that 3/3 scrum reviewers flagged. Langfuse retention config + Postgres/ClickHouse access controls still need to be audited as part of Phase 1.5 — but the boundary stays inside J's infrastructure, which is materially better than SaaS Langfuse.
|
||||
|
||||
9. ✅ **EU candidates in scope:** **Not currently — may need placeholder later.** Design choice: build the identity-service interface to be EU-compatible (DPIA-shaped fields, lawful-basis tracking, SCC-ready transfer mechanism slots) but DO NOT gate Phase 2 on EU compliance. Phase 2 ships IL+IN-shaped; EU additions are a follow-up phase.
|
||||
|
||||
10. ✅ **Healthcare vertical / HIPAA:** **Same framework — yes.** Healthcare staffing IS in scope. PHI in `resume_text`, `communications`, and `call_log` is realistic. **Implications:**
|
||||
- Business Associate Agreement (BAA) required with any third-party model provider that processes content from healthcare-vertical staffing requests
|
||||
- opencode + ollama_cloud + openrouter (per PR #13 routing) are external — BAAs needed OR healthcare requests must route to local-only models (Ollama on-box)
|
||||
- PHI redaction at the gateway boundary becomes mandatory before the model call leaves the box, OR the model call must stay on-box for healthcare requests
|
||||
- Vertical detection at the gateway boundary becomes a Phase 2 requirement
|
||||
|
||||
11. ✅ **Training / RAG re-runs may use historical outcomes:** **Yes — design as if it WILL.** Implications:
|
||||
- `outcomes.jsonl` and `overseer_corrections.jsonl` cannot remain raw-PII forever — anything that lands in a training corpus or RAG re-index becomes ungeneratable to delete (PII in model weights)
|
||||
- Phase 2 design must include a "training-safe export" pipeline that strips PII from outcomes before feeding to any training/RAG path
|
||||
- Crypto-erasure of historical outcomes becomes load-bearing — if a candidate exercises RTBF and their data already trained a model, we must be able to evidence "the source was destroyed; the model retains it indistinguishably from synthetic patterns"
|
||||
|
||||
### Effect on the §8 phase plan
|
||||
|
||||
The user-confirmed answers shift priorities. Revised ordering (incorporating scrum-driven priority changes from `AUDIT_PHASE_1_DISCOVERY.md` §10):
|
||||
|
||||
- **Phase 1.5 (NEW)** — BIPA-specific photo/video schema audit + Langfuse boundary scoping + outcomes.jsonl content sample. Lands BEFORE Phase 2 design starts.
|
||||
- **Phase 2 (identity service design)** — now must include EU-placeholder fields, vertical-detection (healthcare flag), training-safe export interface, BIPA consent + retention metadata
|
||||
- **Phase 3** (audit endpoint skeleton) — unchanged
|
||||
- **Phase 4** (subject tagging) — must include healthcare-vertical routing decision at gateway boundary
|
||||
- **Phase 5** (identity service build) — must include BIPA-compliant biometric metadata table
|
||||
- **Phase 6** (protected-attribute boundary) — must include PHI redaction for healthcare-vertical requests
|
||||
- **Phase 7** (retention + RTBF) — must include training-safe export evidence chain
|
||||
- **Phase 8** (legal export) — unchanged
|
||||
- **Phase 9** (rehearsal) — must include both EEOC discrimination scenario AND BIPA biometric scenario AND healthcare PHI breach scenario
|
||||
|
||||
---
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user