lakehouse/docs/AUDIT_TRAIL_PRD.md

# PRD: Production-Ready Audit Trail

**Status:** Draft — 2026-05-03 · **Owner:** J · **Drafted by:** working session 2026-05-03

> **Why this document exists.** Staffing client won't sign until we can prove the AI system can defend a discrimination claim. We've been claiming "production-ready" off smoke + parity tests; those prove the surface compiles, NOT that an audit response can be produced for a specific person. This PRD writes the audit-trail capability down before we start building it, so the phases are accountable and the scope doesn't drift mid-implementation.

---

## 1. The legal use case (worked example)

**Scenario.** John Martinez worked Warehouse B as a placed candidate. Six months later he files a complaint claiming discrimination during the hiring process. His lawyer requests an audit under EEOC discovery: produce every AI-system decision affecting John between dates D1 and D2.

**What we must produce.** A response that proves either:
- (a) John was treated identically to other candidates with comparable qualifications — same scoring criteria, same model invocations, same decision rules — and the outcome differences are explained by non-protected factors, OR
- (b) The system surfaces *exactly* what factors led to outcomes, in a form a court can verify, so the claim can be defended on documented criteria rather than "trust the AI."

**What we must NOT produce.**
- Other subjects' data (response leaks if even one other candidate's name appears)
- Internal infrastructure details (DB paths, server names, internal IDs that aren't candidate-shaped)
- Raw model prompts/completions that contain protected attributes (race, gender, age, etc.) — even if the model didn't *use* them, their presence in the audit log creates new evidence

**The defensibility chain.** The audit shows:
1. **Indexing-time decisions** — when John was added to the candidate pool, what embedding the model produced, what features were extracted, what categories he was placed into
2. **Search-time decisions** — every query that included him in candidate sets, what rank he received, what the model used to compute that rank
3. **Recommendation-time decisions** — every fill/recommendation event involving him, what scoring drove it, what validators ran, what they returned
4. **Iteration decisions** — any iterate retries that touched him (validator failures, model self-corrections)
5. **Outcome decisions** — final fills, rejections, hand-offs

For each, the audit row must show: timestamp, decision type, model + provider, input features (sanitized of protected attributes — see §4), output decision, rationale.

---

## 2. The subject audit response — output format

```
GET /audit/subject/{candidate_id}?from=D1&to=D2
→ JSON or signed PDF (legal preference TBD)
```

**Header section:**
- subject identifier (candidate_id), date range, response generation timestamp, signing daemon, integrity hash
- pre-translation note: candidate_id ↔ PII mapping is held by the identity service (§5), NOT by this audit endpoint. Legal counsel re-correlates separately under their own access controls.

**Per-decision row schema** (shape, not exhaustive):
```json
{
  "ts": "ISO-8601 UTC",
  "decision_kind": "embedding_create | search_inclusion | search_rank | fill_recommendation | validation_outcome | iterate_attempt | observer_signal",
  "daemon": "gateway | validatord | observerd | matrixd | ingestd",
  "model": "kimi-k2.6 | deepseek-v3.2 | ...",
  "provider": "ollama_cloud | opencode | openrouter",
  "input_features": { /* what the model SAW — sanitized per §4 */ },
  "output": { /* what the model decided */ },
  "rationale": "model's natural-language explanation, or rule-based justification",
  "trace_id": "X-Lakehouse-Trace-Id linking to Langfuse trace tree",
  "session_id": "iterate session that produced this row"
}
```

**Footer section:**
- Coverage attestation: "this response includes ALL decisions about candidate_id between D1 and D2 that are retained per §6 retention policy"
- Sign-off: cryptographic signature from a daemon whose key is in escrow (proves audit was generated by the system, not hand-edited)

---

## 3. Surface map — where decisions happen

| Decision happens at | Currently logged where | Audit-completeness gap |
|---|---|---|
| Ingestion (candidate added to pool) | `data/_kb/outcomes.jsonl`? `journald` mutation log? | UNKNOWN — needs walk |
| Embedding creation (vector built for candidate) | NOWHERE per-candidate; embed cache hits aren't subject-tagged | **MAJOR GAP** — need to subject-tag every embedding |
| Search inclusion (candidate appeared in a result set) | Pathway memory + session JSONL (?) | Partial — need subject-correlation |
| Search rank (position in result set) | Result set in chat traces, but not indexed by candidate | Partial |
| Fill recommendation | `data/datasets/fill_events.parquet` (per CLAUDE.md decision A) + pathway memory | Probably OK but not verified |
| Validation outcome (FillValidator/EmailValidator pass/fail) | `/v1/iterate` session JSONL — but `validation_kind` not populated per yesterday's misread | Partial — fix today |
| Iterate retry escalations | Session JSONL `attempts[]` array | OK |
| Observer signals | observerd events at :3800 (or :4219 Go side) | UNKNOWN — needs walk |
| Matrix-indexer compounding (semantic flags, bug fingerprints) | `pathway_memory/state.json` (currently 91 traces) | Probably leaks — these are tagged by file/task, not by subject |

**Substantive finding from this walk:** the matrix indexer + pathway memory are tagged by *code* not by *subject*. They surface "this code path failed for this task class" — they don't currently let us answer "every decision matrix-indexer made about John." If matrix-indexer fingerprints leak protected-attribute correlations (e.g., a fingerprint that says "candidates from [zip code with majority demographic X] got outcome Y"), that's a discrimination smoking gun that we currently have no way to audit cleanly.

---

## 4. PII handling rules

**Tokenization rule:** candidate_id is the only identifier that crosses runtime boundaries (logs, JSONL, traces, pathway memory, observer events, model prompts). Email / name / address / phone / SSN / DOB are NEVER in any of these surfaces.

**Identity service** (§5) holds the candidate_id ↔ PII mapping. Only legal-authorized access reads it.

**Protected-attribute exclusion at decision time:** the model NEVER receives:
- Race, ethnicity, national origin
- Sex, gender, marital status, pregnancy
- Age, date of birth (allowed: years of experience)
- Religion
- Disability, genetic information
- Veteran status (unless legally relevant for the role)
- Sexual orientation, gender identity

If the model never *sees* these, no decision can be predicated on them. The audit row's `input_features` field proves this: by inspecting the row, a lawyer can confirm protected attributes were absent from input.

**Inferred-attribute risk.** A model can infer protected attributes from non-protected proxies (zip code → race, name → ethnicity, photo → multiple). The audit must surface this risk. **Open question:** do we ban photo features from candidate scoring? Do we ban surname tokenization? These are policy calls.

**Audit response sanitization:** the response goes to the candidate's lawyer, not to the world. It contains the candidate's own name (re-correlated by legal). It must NOT contain other candidates' names, even in comparison/ranking rows.

---

## 5. Identity service — candidate_id ↔ PII mapping

**Current state:** `data/datasets/workers_500k.parquet` has the full PII (per CLAUDE.md). The `candidates_safe` view (post-fix `c3c9c21`) is the masked projection. **GAP:** `candidate_id` is currently the row position / a derived field — there's no separate identity service. This needs to change.

**Target state:**
- `identity/` subsystem (new) — holds the `candidate_id → {email, name, address, phone, SSN_last4, DOB, ...}` mapping
- All other systems (gateway, validatord, observerd, matrixd, pathwayd) only ever see `candidate_id`
- Identity reads require a separate auth credential held by legal-authorized operators
- Every identity read is itself audited (log who accessed PII for which candidate when)
- Identity service runs as its own daemon, port-isolated from the gateway
- Cross-runtime: same identity service backs both Rust and Go

**Open question:** does the identity service need to be a separate physical daemon (most defensible) or a logically-separated process within an existing one (easier to ship)? Recommend separate daemon — gives legal a single attestable boundary.

---

## 6. Retention policy

**Current state:** UNKNOWN. Pathway memory is append-only. Session JSONL is append-only. We have no documented retention SLA.

**Target state (proposed):**
- **Active retention:** while client is in the system, all audit rows kept hot (queryable in <1s)
- **Legal hold:** N years after client/candidate leaves the system, audit rows retained on warm storage. **N is TBD** — typical EEOC retention is 1-3 years; some state-level claims have 4-year statutes; Title VII discovery can subpoena older. Recommend 4 years minimum, configurable per client contract.
- **Right to be forgotten:** if a candidate requests deletion under CCPA/GDPR, we apply tombstoning to the identity service (PII removed) BUT preserve the audit-decision rows under candidate_id (anonymized via PII removal at the source). The decision history remains; the human identification is severed.
- **Cryptographic erasure for append-only logs:** pathway memory and matrix indexer can't be selectively deleted without breaking integrity. Encryption-at-rest with per-subject keys lets us "delete" by destroying the key — the encrypted row remains but is unreadable.

**Open question:** does the staffing client want a documented retention SLA in their contract? If yes, this PRD becomes contract-grade and the numbers above need their sign-off.

---

## 7. Current state vs target state

| Capability | Today | Production-ready target | Gap |
|---|---|---|---|
| candidate_id as canonical token | partial (row position?) | UUID, separate from PII | Real — needs identity service |
| Identity service | none | separate daemon, audited reads | Real — build new |
| `/audit/subject/{id}` endpoint | none | live with the §2 schema | Real — build new |
| Subject-tagged embeddings | no | every embed creates an audit row | Real — instrument |
| Subject-tagged search results | partial | every result set logged with subject IDs | Partial — needs walk |
| Subject-tagged validation outcomes | yes (in session JSONL) | yes + integrity-signed | Partial |
| Subject-tagged matrix indexer entries | NO | yes (decide first whether matrix should be subject-aware at all) | Major |
| Protected-attribute filter at decision time | informal | enforced at gateway boundary, audited | Unknown — needs code walk |
| Retention policy | none documented | 4-year hot, configurable cold tier | Real — design + build |
| Right to be forgotten | none | per-subject cryptographic erasure | Real — design + build |
| Cross-runtime parity for all of the above | partial (5 algorithm probes) | new audit-parity probes | Real — extend probe set |

---

## 8. Implementation phases (proposed sequence)

Each phase has an exit criterion the next phase can lean on. Don't start phase N+1 until phase N's exit holds.

### Phase 1 — Discovery walk (read-only, ~3-4 hours)
Walk every daemon and tag every code path that touches subject identifiers. Output: a complete map of where candidate_id lives today, where email/name/PII leak today, what's logged where. **No code changes.** Fills in all "UNKNOWN" entries in §3 and §7 with file:line references.

**Exit:** §3 surface map is fully populated with current-state evidence. §7 gap table has no "Unknown" cells.

### Phase 2 — Identity service design (design doc, ~2 hours)
Write `docs/IDENTITY_SERVICE.md`: schema, port, auth model, read-audit format, cross-runtime contract, migration path from current state. **No code changes.**

**Exit:** J approves the design.

### Phase 3 — Audit response endpoint (skeleton, ~4-6 hours)
Build `/audit/subject/{id}` endpoint that returns ALL information CURRENTLY logged about the subject — even before identity service is built, even if logs leak PII, even if subject-tagging is incomplete. **This is the "what John Martinez would get today" baseline.** Reading the output reveals exactly what's wrong.

**Exit:** endpoint returns a JSON response for any candidate_id in workers_500k. Contents are reviewed; gaps catalogued.

### Phase 4 — Subject tagging across substrates
Instrument the missing decision points (embedding creation, search rank, observer signals, matrix indexer entries) with subject identifiers. Each daemon's instrumentation lands as its own commit. Cross-runtime: each Rust commit ships with a Go-side mirror.

**Exit:** `/audit/subject/{id}` response is *complete* for the worked example (John Martinez at Warehouse B can be reconstructed end-to-end).

### Phase 5 — Identity service build
Stand up the identity daemon. Migrate candidate_id ↔ PII mapping out of `workers_500k.parquet` into the new service. Audit every read. Update all callers to never see PII directly.

**Exit:** PII grep across all log files / JSONL streams / pathway memory state returns 0 hits. Cross-runtime parity probe added: `audit_parity.sh` validates Rust + Go produce identical audit responses for the same subject.

### Phase 6 — Protected-attribute boundary enforcement
Add a hard filter at the gateway: any model invocation must declare the input features it sees, and protected attributes are stripped at the boundary. Audit row's `input_features` becomes load-bearing.

**Exit:** can run discrimination-test scenario: feed protected attribute through, verify it's stripped before model sees it, verify audit row shows the stripping.

### Phase 7 — Retention + right-to-be-forgotten
Document retention SLA. Implement tier-down (hot → warm → cold → encrypted-with-deletable-key). Implement subject-erasure endpoint.

**Exit:** test scenario: subject requests deletion, identity service tombstones, decision rows remain under candidate_id but are unreadable post-erasure, audit response for that subject returns "subject erased" header instead of decision rows.

### Phase 8 — Legal-format export + signing
Decide JSON vs signed PDF for legal output. Build the export pipeline. Sign with a key in escrow.

**Exit:** can produce the John Martinez audit response in the format legal will accept; signature verifies.

### Phase 9 — End-to-end discrimination defense rehearsal
Run the worked example: simulate John Martinez's complaint, generate the audit, walk through what a lawyer would see, identify any remaining gaps, fix them.

**Exit:** J + (eventually) the staffing client's legal team sign off on the format and completeness.

---

## 9. Cross-runtime requirement

**Both Rust legacy and Go rewrite must satisfy every phase's exit criterion.** The 5 existing parity probes (validator, extract_json, session_log, materializer, embed) cover algorithmic equivalence; they do NOT cover audit. New parity probe `audit_parity.sh` lands as part of phase 5.

The identity service is the new shared substrate — both runtimes call it; the daemon itself is one implementation (no per-runtime version).

---

## 10. Open questions blocking phase 2 (resolved 2026-05-03)

**These were the original Phase 1 open questions. J answered the load-bearing 5 in conversation 2026-05-03; answers folded in below. Remaining items 1, 4, 6 still need J's call before Phase 2 design ships.**

1. **Identity service: separate daemon vs in-process?** _Recommended: separate. **Status: pending J confirmation.**_
2. ~~**Retention period N years?**~~ — Out of scope for now; will be set at deployment time per client contract. Default to 4-year hot retention until set.
3. ✅ **Photos/video in scope?** **YES** (J 2026-05-03). BIPA (740 ILCS 14) applies in full. Per-violation $1k-$5k statutory damages, written consent + retention schedule mandatory. **This becomes a Phase 1.5 priority** — see §10.5/§13 for revised phase ordering.
4. **JSON or signed PDF for legal export?** _Recommend signed JSON with PDF rendering option. **Status: pending.**_
5. ✅ **RTBF under append-only:** Cryptographic erasure approach approved in principle (J 2026-05-03 implicit via approval of plan).
6. **Audit endpoint auth model:** _Recommend legal-only credential separate from admin token. **Status: pending.**_
7. ✅ **Matrix indexer subject-awareness:** Per scrum review (`AUDIT_PHASE_1_DISCOVERY.md` §10/C1), matrix-indexer is suspected PII sink (trace bodies unverified). Action: sample state.json before deciding (a) keep code-only + add PII-redact-on-write to trace bodies, OR (b) remove subject-summary text from trace bodies entirely. Decision deferred until §8.1 sampling completes.

### Newly answered 2026-05-03 (J)

8. ✅ **Langfuse hosting model:** **Self-hosted.** Removes the GDPR Art. 44 cross-border-transfer concern that 3/3 scrum reviewers flagged. Langfuse retention config + Postgres/ClickHouse access controls still need to be audited as part of Phase 1.5 — but the boundary stays inside J's infrastructure, which is materially better than SaaS Langfuse.

9. ✅ **EU candidates in scope:** **Not currently — may need placeholder later.** Design choice: build the identity-service interface to be EU-compatible (DPIA-shaped fields, lawful-basis tracking, SCC-ready transfer mechanism slots) but DO NOT gate Phase 2 on EU compliance. Phase 2 ships IL+IN-shaped; EU additions are a follow-up phase.

10. ✅ **Healthcare vertical / HIPAA:** **Same framework — yes.** Healthcare staffing IS in scope. PHI in `resume_text`, `communications`, and `call_log` is realistic. **Implications:**
    - Business Associate Agreement (BAA) required with any third-party model provider that processes content from healthcare-vertical staffing requests
    - opencode + ollama_cloud + openrouter (per PR #13 routing) are external — BAAs needed OR healthcare requests must route to local-only models (Ollama on-box)
    - PHI redaction at the gateway boundary becomes mandatory before the model call leaves the box, OR the model call must stay on-box for healthcare requests
    - Vertical detection at the gateway boundary becomes a Phase 2 requirement

11. ✅ **Training / RAG re-runs may use historical outcomes:** **Yes — design as if it WILL.** Implications:
    - `outcomes.jsonl` and `overseer_corrections.jsonl` cannot remain raw-PII forever — anything that lands in a training corpus or RAG re-index becomes ungeneratable to delete (PII in model weights)
    - Phase 2 design must include a "training-safe export" pipeline that strips PII from outcomes before feeding to any training/RAG path
    - Crypto-erasure of historical outcomes becomes load-bearing — if a candidate exercises RTBF and their data already trained a model, we must be able to evidence "the source was destroyed; the model retains it indistinguishably from synthetic patterns"

### Effect on the §8 phase plan

The user-confirmed answers shift priorities. Revised ordering (incorporating scrum-driven priority changes from `AUDIT_PHASE_1_DISCOVERY.md` §10):

- **Phase 1.5 (NEW)** — BIPA-specific photo/video schema audit + Langfuse boundary scoping + outcomes.jsonl content sample. Lands BEFORE Phase 2 design starts.
- **Phase 2 (identity service design)** — now must include EU-placeholder fields, vertical-detection (healthcare flag), training-safe export interface, BIPA consent + retention metadata
- **Phase 3** (audit endpoint skeleton) — unchanged
- **Phase 4** (subject tagging) — must include healthcare-vertical routing decision at gateway boundary
- **Phase 5** (identity service build) — must include BIPA-compliant biometric metadata table
- **Phase 6** (protected-attribute boundary) — must include PHI redaction for healthcare-vertical requests
- **Phase 7** (retention + RTBF) — must include training-safe export evidence chain
- **Phase 8** (legal export) — unchanged
- **Phase 9** (rehearsal) — must include both EEOC discrimination scenario AND BIPA biometric scenario AND healthcare PHI breach scenario

---

## 10.5 Jurisdictional surface (IL + IN)

> **⚠ Not legal advice.** This is a research-grade checklist for J to take into a conversation with actual employment + privacy counsel. The system is targeting **Chicago (Illinois)** and **Indiana** placements per 2026-05-03 conversation. Counsel needs to verify what currently applies, what's pending, and whether case law has moved any of these in 2026. **Verify with counsel before claiming compliance with any item below.**

### Federal layer (always applies)

| Statute / framework | Relevance to this system |
|---|---|
| Title VII (Civil Rights Act) | Bans discrimination on race, color, religion, sex, national origin in hiring. Discrimination claim defense is the worked example in §1. |
| ADEA (Age Discrimination in Employment) | Bans age-based discrimination for workers 40+. DOB must be excluded from features per §4. |
| ADA (Americans with Disabilities Act) | Bans disability discrimination + requires reasonable accommodation. Disability-inferring features (gait, photo features, medical history) need exclusion. |
| EEOC enforcement | Receives complaints, issues right-to-sue. Audit response per §2 is what defends in EEOC investigation. |
| OFCCP | Applies if our staffing client serves federal contractors. Adds affirmative-action recordkeeping on top of EEOC. |
| FCRA (Fair Credit Reporting Act) | Triggers if background checks are performed. Pre-adverse-action notice + dispute process needed. |
| Section 1981 | Race-based contract discrimination — staffing is contract relationship. |

### Illinois-specific (Chicago jurisdiction)

| Statute | What | What we need |
|---|---|---|
| **BIPA** (Biometric Information Privacy Act, 740 ILCS 14) | Bans collection of biometric identifiers (face geometry, fingerprints, voiceprints) without informed written consent + retention schedule. Penalties: $1,000-$5,000 per violation per person. **Class actions are common and aggressive.** | If we use candidate photos for any feature (face match, headshot rendering, photo-derived attributes), BIPA almost certainly applies. The headshot pool we generate (per CLAUDE.md commit `5d93a71` area) needs careful review — synthetic faces are probably OK; real candidate photos are NOT without explicit BIPA-compliant consent. **Counsel must review.** |
| **Illinois AI Video Interview Act** (820 ILCS 42) | If AI analyzes recorded video interviews, employer must disclose AI use, obtain consent, provide explanation of how AI works, and limit who can review the video. | If we ever ingest video, this applies. Currently we don't, but worth flagging to counsel as a "what if we add this in 12 months" boundary. |
| **Illinois Human Rights Act** (775 ILCS 5) | Broader than federal Title VII — adds protected classes including arrest record, military status, marital status, order of protection, citizenship status (in some cases), unfavorable military discharge. | Protected attribute exclusion list in §4 needs expanding to cover IL-specific classes. |
| **Personal Information Protection Act** (PIPA, 815 ILCS 530) | Breach notification — must notify Illinois residents whose unencrypted PII was breached. | If identity service or workers parquet is breached, notification clock starts. Need incident response runbook. |
| **Illinois Day and Temporary Labor Services Act** (820 ILCS 175) | Specific to staffing/temporary services industry. Includes equal-pay-for-equal-work + record-keeping requirements + worker notification. | Highly relevant — applies directly to staffing-company clients. Audit retention may interact with these recordkeeping requirements. |
| **Workplace Transparency Act** | Restrictions on non-disclosure agreements re: harassment/discrimination | Tangential but worth noting. |
| **City of Chicago Human Rights Ordinance** (Title 6 Chicago Municipal Code) | Adds protected classes beyond IHRA (source of income, parental status, military discharge status, credit history). | Chicago-specific protected attributes list. |
| **Cook County Human Rights Ordinance** | Similar additions county-wide. | Chicago is in Cook County so this stacks. |
| **Possible: AI hiring transparency** | Several states/cities have proposed/passed laws modeled on NYC Local Law 144 (annual bias audit + candidate notification). I do not know whether IL or Chicago has such a law on the books as of 2026-01 cutoff. | **Counsel must check current state.** If it exists, we need annual bias audit reports (which IS what this PRD is building toward, but the report format may have specific requirements). |

### Indiana-specific

| Statute | What | What we need |
|---|---|---|
| **Indiana Data Breach Disclosure** (IC 24-4.9) | Breach notification within "without unreasonable delay" | Same incident response runbook as IL PIPA. |
| **Indiana Civil Rights Law** (IC 22-9) | State-level employment discrimination | Largely tracks federal Title VII, fewer expansions than IL. |
| **Indiana Genetic Information Privacy Act** | Bans use of genetic info in employment | Already in §4 protected list. |
| **General observation** | Indiana is generally less aggressive than Illinois on AI/employment regulation as of cutoff. | The IL bar is higher — if we satisfy IL, IN typically follows. **Counsel must confirm this isn't backwards.** |

### Cross-cutting (security frameworks for SaaS sales)

These aren't laws but are commonly required by enterprise customers (including staffing clients) before sale.

| Framework | What | Relevance |
|---|---|---|
| **SOC 2 Type II** | Auditor attestation of operating effectiveness over 6-12 months across Trust Service Criteria (Security, Availability, Processing Integrity, Confidentiality, Privacy). | The Privacy criterion overlaps heavily with this PRD. Privacy + Security are the two load-bearing TSCs. Effort to first Type II report: 6-9 months. Type I (point-in-time) is faster (weeks) but enterprise buyers usually want Type II. |
| **SOC 3** | Public-facing summary of SOC 2 (no detailed control descriptions). | Nice-to-have for marketing but the staffing client will want SOC 2 Type II report under NDA. |
| **HIPAA** | Healthcare data protection. | Triggers ONLY if staffing places workers into healthcare roles where they handle PHI. Currently not in scope per CLAUDE.md. **Confirm scoping with J.** |
| **PCI DSS** | Payment card data | Not currently in scope. |
| **ISO 27001** | International information security management | Alternative to SOC 2; more common in EU. Probably unnecessary for IL/IN-only deployments. |

### What this means for phase ordering

The 9-phase plan in §8 is technically correct but may need re-ordering once counsel weighs in:

- **BIPA risk on photos** is so high and so aggressive that if we use real candidate photos *anywhere*, that may need to be the FIRST thing we resolve — before the audit-trail work starts. Class-action exposure is enormous.
- **SOC 2 Type II prep** runs in parallel with this work, not after. If the staffing client says "show us your SOC 2 report" we need to have started the engagement weeks/months before.
- **Day and Temporary Labor Services Act** may impose recordkeeping that interacts with our retention SLA (§6) — counsel may say "no, retention has to be N years for THIS reason, not your defaulted 4."

### Open questions for counsel (one ask)

1. Does the staffing client have an existing SOC 2 report we leverage, or do we need our own?
2. Are we using any real candidate photos? If yes, is BIPA consent in place?
3. Does Illinois have an AI hiring transparency law on the books in 2026? If yes, what does the bias audit report need to look like?
4. What's the IL Day and Temporary Labor Services Act recordkeeping retention period? Does it interact with our 4-year proposed SLA?
5. Are background checks performed? If yes, do we need FCRA pre-adverse-action workflow integration?
6. Any healthcare placements? (HIPAA scoping)
7. Is the staffing client a federal contractor? (OFCCP scoping)

Counsel's answers shape whether the §8 phase plan ships as-is or needs reordering.

---

## 11. What this PRD is NOT

- Not a contract with the staffing client. That document needs lawyers and signs after this is built.
- Not a regulatory compliance attestation. We can build to the spirit of GDPR/CCPA/EEOC/BIPA/etc — passing actual certification is its own project.
- Not a guarantee against discrimination claims. It's a guarantee that *if* a claim is filed, we can produce evidence about how decisions were made.
- Not a substitute for human review. The audit shows what the AI did; humans still own the final call on hires.
- **Not legal advice.** The §10.5 jurisdictional surface is a research-grade checklist, NOT counsel's analysis. Verify everything with actual employment + privacy counsel licensed in IL + IN before claiming compliance with anything in this document.

---

## 12. Appendix — terms

- **Subject** — a person whose data flows through the system (candidate, worker, applicant). Identified by `candidate_id`.
- **Decision** — any system action that changes a subject's standing (added to candidate pool, ranked in search, recommended for fill, validated, scored, etc.).
- **Audit row** — one record in the audit response per decision, with the schema in §2.
- **PII** — personally identifiable information per the broad CCPA/GDPR definitions. In this system: name, email, phone, address, SSN, DOB, plus inferred-from-photo attributes.
- **Protected attribute** — characteristics that are illegal to discriminate on under federal/state law. The §4 list.
- **Inferred attribute** — a protected attribute the model derives from a non-protected feature (zip → race correlation, name → ethnicity correlation).
- **Identity service** — the daemon that holds candidate_id ↔ PII mapping. Separate auth.
- **Subject tagging** — the practice of labeling every decision/embedding/log row with a candidate_id so the audit endpoint can find it.
- **Cryptographic erasure** — making data unrecoverable by destroying its decryption key, even if the encrypted bytes remain on disk. Used for right-to-be-forgotten on append-only logs.

---

## Change log

- 2026-05-03 — Initial draft. Authored after J flagged the audit-trail gap as the production-readiness blocker.