Conversation 2026-05-03 — J confirmed: - Photos/video YES → BIPA in full force ($1k-$5k per violation) - Langfuse self-hosted → drops GDPR Art. 44 cross-border concern - EU not in scope now but placeholder needed → design EU-compatible - Healthcare vertical YES → HIPAA BAA needed with model providers, PHI redaction at gateway boundary OR local-only routing for those requests, vertical-detection at boundary is Phase 2 requirement - Training/RAG MAY re-run on outcomes → design as if it will, training- safe export interface needed, crypto-erasure becomes load-bearing evidence chain §10 updated with answered/pending status per question. New §10.5 "Effect on phase plan" introduces: - Phase 1.5 (NEW) — BIPA photo/video schema audit + Langfuse boundary scoping + outcomes.jsonl content sample, BEFORE Phase 2 design - Phase 2 design must now include: EU-placeholder fields, vertical detection, training-safe export, BIPA consent metadata - Phase 9 rehearsal must cover discrimination + BIPA + healthcare PHI 3 questions still pending J's call before Phase 2 design ships: identity service daemon vs in-process, JSON vs signed PDF for legal export, audit endpoint auth model. No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
359 lines
30 KiB
Markdown
359 lines
30 KiB
Markdown
# PRD: Production-Ready Audit Trail
|
|
|
|
**Status:** Draft — 2026-05-03 · **Owner:** J · **Drafted by:** working session 2026-05-03
|
|
|
|
> **Why this document exists.** Staffing client won't sign until we can prove the AI system can defend a discrimination claim. We've been claiming "production-ready" off smoke + parity tests; those prove the surface compiles, NOT that an audit response can be produced for a specific person. This PRD writes the audit-trail capability down before we start building it, so the phases are accountable and the scope doesn't drift mid-implementation.
|
|
|
|
---
|
|
|
|
## 1. The legal use case (worked example)
|
|
|
|
**Scenario.** John Martinez worked Warehouse B as a placed candidate. Six months later he files a complaint claiming discrimination during the hiring process. His lawyer requests an audit under EEOC discovery: produce every AI-system decision affecting John between dates D1 and D2.
|
|
|
|
**What we must produce.** A response that proves either:
|
|
- (a) John was treated identically to other candidates with comparable qualifications — same scoring criteria, same model invocations, same decision rules — and the outcome differences are explained by non-protected factors, OR
|
|
- (b) The system surfaces *exactly* what factors led to outcomes, in a form a court can verify, so the claim can be defended on documented criteria rather than "trust the AI."
|
|
|
|
**What we must NOT produce.**
|
|
- Other subjects' data (response leaks if even one other candidate's name appears)
|
|
- Internal infrastructure details (DB paths, server names, internal IDs that aren't candidate-shaped)
|
|
- Raw model prompts/completions that contain protected attributes (race, gender, age, etc.) — even if the model didn't *use* them, their presence in the audit log creates new evidence
|
|
|
|
**The defensibility chain.** The audit shows:
|
|
1. **Indexing-time decisions** — when John was added to the candidate pool, what embedding the model produced, what features were extracted, what categories he was placed into
|
|
2. **Search-time decisions** — every query that included him in candidate sets, what rank he received, what the model used to compute that rank
|
|
3. **Recommendation-time decisions** — every fill/recommendation event involving him, what scoring drove it, what validators ran, what they returned
|
|
4. **Iteration decisions** — any iterate retries that touched him (validator failures, model self-corrections)
|
|
5. **Outcome decisions** — final fills, rejections, hand-offs
|
|
|
|
For each, the audit row must show: timestamp, decision type, model + provider, input features (sanitized of protected attributes — see §4), output decision, rationale.
|
|
|
|
---
|
|
|
|
## 2. The subject audit response — output format
|
|
|
|
```
|
|
GET /audit/subject/{candidate_id}?from=D1&to=D2
|
|
→ JSON or signed PDF (legal preference TBD)
|
|
```
|
|
|
|
**Header section:**
|
|
- subject identifier (candidate_id), date range, response generation timestamp, signing daemon, integrity hash
|
|
- pre-translation note: candidate_id ↔ PII mapping is held by the identity service (§5), NOT by this audit endpoint. Legal counsel re-correlates separately under their own access controls.
|
|
|
|
**Per-decision row schema** (shape, not exhaustive):
|
|
```json
|
|
{
|
|
"ts": "ISO-8601 UTC",
|
|
"decision_kind": "embedding_create | search_inclusion | search_rank | fill_recommendation | validation_outcome | iterate_attempt | observer_signal",
|
|
"daemon": "gateway | validatord | observerd | matrixd | ingestd",
|
|
"model": "kimi-k2.6 | deepseek-v3.2 | ...",
|
|
"provider": "ollama_cloud | opencode | openrouter",
|
|
"input_features": { /* what the model SAW — sanitized per §4 */ },
|
|
"output": { /* what the model decided */ },
|
|
"rationale": "model's natural-language explanation, or rule-based justification",
|
|
"trace_id": "X-Lakehouse-Trace-Id linking to Langfuse trace tree",
|
|
"session_id": "iterate session that produced this row"
|
|
}
|
|
```
|
|
|
|
**Footer section:**
|
|
- Coverage attestation: "this response includes ALL decisions about candidate_id between D1 and D2 that are retained per §6 retention policy"
|
|
- Sign-off: cryptographic signature from a daemon whose key is in escrow (proves audit was generated by the system, not hand-edited)
|
|
|
|
---
|
|
|
|
## 3. Surface map — where decisions happen
|
|
|
|
| Decision happens at | Currently logged where | Audit-completeness gap |
|
|
|---|---|---|
|
|
| Ingestion (candidate added to pool) | `data/_kb/outcomes.jsonl`? `journald` mutation log? | UNKNOWN — needs walk |
|
|
| Embedding creation (vector built for candidate) | NOWHERE per-candidate; embed cache hits aren't subject-tagged | **MAJOR GAP** — need to subject-tag every embedding |
|
|
| Search inclusion (candidate appeared in a result set) | Pathway memory + session JSONL (?) | Partial — need subject-correlation |
|
|
| Search rank (position in result set) | Result set in chat traces, but not indexed by candidate | Partial |
|
|
| Fill recommendation | `data/datasets/fill_events.parquet` (per CLAUDE.md decision A) + pathway memory | Probably OK but not verified |
|
|
| Validation outcome (FillValidator/EmailValidator pass/fail) | `/v1/iterate` session JSONL — but `validation_kind` not populated per yesterday's misread | Partial — fix today |
|
|
| Iterate retry escalations | Session JSONL `attempts[]` array | OK |
|
|
| Observer signals | observerd events at :3800 (or :4219 Go side) | UNKNOWN — needs walk |
|
|
| Matrix-indexer compounding (semantic flags, bug fingerprints) | `pathway_memory/state.json` (currently 91 traces) | Probably leaks — these are tagged by file/task, not by subject |
|
|
|
|
**Substantive finding from this walk:** the matrix indexer + pathway memory are tagged by *code* not by *subject*. They surface "this code path failed for this task class" — they don't currently let us answer "every decision matrix-indexer made about John." If matrix-indexer fingerprints leak protected-attribute correlations (e.g., a fingerprint that says "candidates from [zip code with majority demographic X] got outcome Y"), that's a discrimination smoking gun that we currently have no way to audit cleanly.
|
|
|
|
---
|
|
|
|
## 4. PII handling rules
|
|
|
|
**Tokenization rule:** candidate_id is the only identifier that crosses runtime boundaries (logs, JSONL, traces, pathway memory, observer events, model prompts). Email / name / address / phone / SSN / DOB are NEVER in any of these surfaces.
|
|
|
|
**Identity service** (§5) holds the candidate_id ↔ PII mapping. Only legal-authorized access reads it.
|
|
|
|
**Protected-attribute exclusion at decision time:** the model NEVER receives:
|
|
- Race, ethnicity, national origin
|
|
- Sex, gender, marital status, pregnancy
|
|
- Age, date of birth (allowed: years of experience)
|
|
- Religion
|
|
- Disability, genetic information
|
|
- Veteran status (unless legally relevant for the role)
|
|
- Sexual orientation, gender identity
|
|
|
|
If the model never *sees* these, no decision can be predicated on them. The audit row's `input_features` field proves this: by inspecting the row, a lawyer can confirm protected attributes were absent from input.
|
|
|
|
**Inferred-attribute risk.** A model can infer protected attributes from non-protected proxies (zip code → race, name → ethnicity, photo → multiple). The audit must surface this risk. **Open question:** do we ban photo features from candidate scoring? Do we ban surname tokenization? These are policy calls.
|
|
|
|
**Audit response sanitization:** the response goes to the candidate's lawyer, not to the world. It contains the candidate's own name (re-correlated by legal). It must NOT contain other candidates' names, even in comparison/ranking rows.
|
|
|
|
---
|
|
|
|
## 5. Identity service — candidate_id ↔ PII mapping
|
|
|
|
**Current state:** `data/datasets/workers_500k.parquet` has the full PII (per CLAUDE.md). The `candidates_safe` view (post-fix `c3c9c21`) is the masked projection. **GAP:** `candidate_id` is currently the row position / a derived field — there's no separate identity service. This needs to change.
|
|
|
|
**Target state:**
|
|
- `identity/` subsystem (new) — holds the `candidate_id → {email, name, address, phone, SSN_last4, DOB, ...}` mapping
|
|
- All other systems (gateway, validatord, observerd, matrixd, pathwayd) only ever see `candidate_id`
|
|
- Identity reads require a separate auth credential held by legal-authorized operators
|
|
- Every identity read is itself audited (log who accessed PII for which candidate when)
|
|
- Identity service runs as its own daemon, port-isolated from the gateway
|
|
- Cross-runtime: same identity service backs both Rust and Go
|
|
|
|
**Open question:** does the identity service need to be a separate physical daemon (most defensible) or a logically-separated process within an existing one (easier to ship)? Recommend separate daemon — gives legal a single attestable boundary.
|
|
|
|
---
|
|
|
|
## 6. Retention policy
|
|
|
|
**Current state:** UNKNOWN. Pathway memory is append-only. Session JSONL is append-only. We have no documented retention SLA.
|
|
|
|
**Target state (proposed):**
|
|
- **Active retention:** while client is in the system, all audit rows kept hot (queryable in <1s)
|
|
- **Legal hold:** N years after client/candidate leaves the system, audit rows retained on warm storage. **N is TBD** — typical EEOC retention is 1-3 years; some state-level claims have 4-year statutes; Title VII discovery can subpoena older. Recommend 4 years minimum, configurable per client contract.
|
|
- **Right to be forgotten:** if a candidate requests deletion under CCPA/GDPR, we apply tombstoning to the identity service (PII removed) BUT preserve the audit-decision rows under candidate_id (anonymized via PII removal at the source). The decision history remains; the human identification is severed.
|
|
- **Cryptographic erasure for append-only logs:** pathway memory and matrix indexer can't be selectively deleted without breaking integrity. Encryption-at-rest with per-subject keys lets us "delete" by destroying the key — the encrypted row remains but is unreadable.
|
|
|
|
**Open question:** does the staffing client want a documented retention SLA in their contract? If yes, this PRD becomes contract-grade and the numbers above need their sign-off.
|
|
|
|
---
|
|
|
|
## 7. Current state vs target state
|
|
|
|
| Capability | Today | Production-ready target | Gap |
|
|
|---|---|---|---|
|
|
| candidate_id as canonical token | partial (row position?) | UUID, separate from PII | Real — needs identity service |
|
|
| Identity service | none | separate daemon, audited reads | Real — build new |
|
|
| `/audit/subject/{id}` endpoint | none | live with the §2 schema | Real — build new |
|
|
| Subject-tagged embeddings | no | every embed creates an audit row | Real — instrument |
|
|
| Subject-tagged search results | partial | every result set logged with subject IDs | Partial — needs walk |
|
|
| Subject-tagged validation outcomes | yes (in session JSONL) | yes + integrity-signed | Partial |
|
|
| Subject-tagged matrix indexer entries | NO | yes (decide first whether matrix should be subject-aware at all) | Major |
|
|
| Protected-attribute filter at decision time | informal | enforced at gateway boundary, audited | Unknown — needs code walk |
|
|
| Retention policy | none documented | 4-year hot, configurable cold tier | Real — design + build |
|
|
| Right to be forgotten | none | per-subject cryptographic erasure | Real — design + build |
|
|
| Cross-runtime parity for all of the above | partial (5 algorithm probes) | new audit-parity probes | Real — extend probe set |
|
|
|
|
---
|
|
|
|
## 8. Implementation phases (proposed sequence)
|
|
|
|
Each phase has an exit criterion the next phase can lean on. Don't start phase N+1 until phase N's exit holds.
|
|
|
|
### Phase 1 — Discovery walk (read-only, ~3-4 hours)
|
|
Walk every daemon and tag every code path that touches subject identifiers. Output: a complete map of where candidate_id lives today, where email/name/PII leak today, what's logged where. **No code changes.** Fills in all "UNKNOWN" entries in §3 and §7 with file:line references.
|
|
|
|
**Exit:** §3 surface map is fully populated with current-state evidence. §7 gap table has no "Unknown" cells.
|
|
|
|
### Phase 2 — Identity service design (design doc, ~2 hours)
|
|
Write `docs/IDENTITY_SERVICE.md`: schema, port, auth model, read-audit format, cross-runtime contract, migration path from current state. **No code changes.**
|
|
|
|
**Exit:** J approves the design.
|
|
|
|
### Phase 3 — Audit response endpoint (skeleton, ~4-6 hours)
|
|
Build `/audit/subject/{id}` endpoint that returns ALL information CURRENTLY logged about the subject — even before identity service is built, even if logs leak PII, even if subject-tagging is incomplete. **This is the "what John Martinez would get today" baseline.** Reading the output reveals exactly what's wrong.
|
|
|
|
**Exit:** endpoint returns a JSON response for any candidate_id in workers_500k. Contents are reviewed; gaps catalogued.
|
|
|
|
### Phase 4 — Subject tagging across substrates
|
|
Instrument the missing decision points (embedding creation, search rank, observer signals, matrix indexer entries) with subject identifiers. Each daemon's instrumentation lands as its own commit. Cross-runtime: each Rust commit ships with a Go-side mirror.
|
|
|
|
**Exit:** `/audit/subject/{id}` response is *complete* for the worked example (John Martinez at Warehouse B can be reconstructed end-to-end).
|
|
|
|
### Phase 5 — Identity service build
|
|
Stand up the identity daemon. Migrate candidate_id ↔ PII mapping out of `workers_500k.parquet` into the new service. Audit every read. Update all callers to never see PII directly.
|
|
|
|
**Exit:** PII grep across all log files / JSONL streams / pathway memory state returns 0 hits. Cross-runtime parity probe added: `audit_parity.sh` validates Rust + Go produce identical audit responses for the same subject.
|
|
|
|
### Phase 6 — Protected-attribute boundary enforcement
|
|
Add a hard filter at the gateway: any model invocation must declare the input features it sees, and protected attributes are stripped at the boundary. Audit row's `input_features` becomes load-bearing.
|
|
|
|
**Exit:** can run discrimination-test scenario: feed protected attribute through, verify it's stripped before model sees it, verify audit row shows the stripping.
|
|
|
|
### Phase 7 — Retention + right-to-be-forgotten
|
|
Document retention SLA. Implement tier-down (hot → warm → cold → encrypted-with-deletable-key). Implement subject-erasure endpoint.
|
|
|
|
**Exit:** test scenario: subject requests deletion, identity service tombstones, decision rows remain under candidate_id but are unreadable post-erasure, audit response for that subject returns "subject erased" header instead of decision rows.
|
|
|
|
### Phase 8 — Legal-format export + signing
|
|
Decide JSON vs signed PDF for legal output. Build the export pipeline. Sign with a key in escrow.
|
|
|
|
**Exit:** can produce the John Martinez audit response in the format legal will accept; signature verifies.
|
|
|
|
### Phase 9 — End-to-end discrimination defense rehearsal
|
|
Run the worked example: simulate John Martinez's complaint, generate the audit, walk through what a lawyer would see, identify any remaining gaps, fix them.
|
|
|
|
**Exit:** J + (eventually) the staffing client's legal team sign off on the format and completeness.
|
|
|
|
---
|
|
|
|
## 9. Cross-runtime requirement
|
|
|
|
**Both Rust legacy and Go rewrite must satisfy every phase's exit criterion.** The 5 existing parity probes (validator, extract_json, session_log, materializer, embed) cover algorithmic equivalence; they do NOT cover audit. New parity probe `audit_parity.sh` lands as part of phase 5.
|
|
|
|
The identity service is the new shared substrate — both runtimes call it; the daemon itself is one implementation (no per-runtime version).
|
|
|
|
---
|
|
|
|
## 10. Open questions blocking phase 2 (resolved 2026-05-03)
|
|
|
|
**These were the original Phase 1 open questions. J answered the load-bearing 5 in conversation 2026-05-03; answers folded in below. Remaining items 1, 4, 6 still need J's call before Phase 2 design ships.**
|
|
|
|
1. **Identity service: separate daemon vs in-process?** _Recommended: separate. **Status: pending J confirmation.**_
|
|
2. ~~**Retention period N years?**~~ — Out of scope for now; will be set at deployment time per client contract. Default to 4-year hot retention until set.
|
|
3. ✅ **Photos/video in scope?** **YES** (J 2026-05-03). BIPA (740 ILCS 14) applies in full. Per-violation $1k-$5k statutory damages, written consent + retention schedule mandatory. **This becomes a Phase 1.5 priority** — see §10.5/§13 for revised phase ordering.
|
|
4. **JSON or signed PDF for legal export?** _Recommend signed JSON with PDF rendering option. **Status: pending.**_
|
|
5. ✅ **RTBF under append-only:** Cryptographic erasure approach approved in principle (J 2026-05-03 implicit via approval of plan).
|
|
6. **Audit endpoint auth model:** _Recommend legal-only credential separate from admin token. **Status: pending.**_
|
|
7. ✅ **Matrix indexer subject-awareness:** Per scrum review (`AUDIT_PHASE_1_DISCOVERY.md` §10/C1), matrix-indexer is suspected PII sink (trace bodies unverified). Action: sample state.json before deciding (a) keep code-only + add PII-redact-on-write to trace bodies, OR (b) remove subject-summary text from trace bodies entirely. Decision deferred until §8.1 sampling completes.
|
|
|
|
### Newly answered 2026-05-03 (J)
|
|
|
|
8. ✅ **Langfuse hosting model:** **Self-hosted.** Removes the GDPR Art. 44 cross-border-transfer concern that 3/3 scrum reviewers flagged. Langfuse retention config + Postgres/ClickHouse access controls still need to be audited as part of Phase 1.5 — but the boundary stays inside J's infrastructure, which is materially better than SaaS Langfuse.
|
|
|
|
9. ✅ **EU candidates in scope:** **Not currently — may need placeholder later.** Design choice: build the identity-service interface to be EU-compatible (DPIA-shaped fields, lawful-basis tracking, SCC-ready transfer mechanism slots) but DO NOT gate Phase 2 on EU compliance. Phase 2 ships IL+IN-shaped; EU additions are a follow-up phase.
|
|
|
|
10. ✅ **Healthcare vertical / HIPAA:** **Same framework — yes.** Healthcare staffing IS in scope. PHI in `resume_text`, `communications`, and `call_log` is realistic. **Implications:**
|
|
- Business Associate Agreement (BAA) required with any third-party model provider that processes content from healthcare-vertical staffing requests
|
|
- opencode + ollama_cloud + openrouter (per PR #13 routing) are external — BAAs needed OR healthcare requests must route to local-only models (Ollama on-box)
|
|
- PHI redaction at the gateway boundary becomes mandatory before the model call leaves the box, OR the model call must stay on-box for healthcare requests
|
|
- Vertical detection at the gateway boundary becomes a Phase 2 requirement
|
|
|
|
11. ✅ **Training / RAG re-runs may use historical outcomes:** **Yes — design as if it WILL.** Implications:
|
|
- `outcomes.jsonl` and `overseer_corrections.jsonl` cannot remain raw-PII forever — anything that lands in a training corpus or RAG re-index becomes ungeneratable to delete (PII in model weights)
|
|
- Phase 2 design must include a "training-safe export" pipeline that strips PII from outcomes before feeding to any training/RAG path
|
|
- Crypto-erasure of historical outcomes becomes load-bearing — if a candidate exercises RTBF and their data already trained a model, we must be able to evidence "the source was destroyed; the model retains it indistinguishably from synthetic patterns"
|
|
|
|
### Effect on the §8 phase plan
|
|
|
|
The user-confirmed answers shift priorities. Revised ordering (incorporating scrum-driven priority changes from `AUDIT_PHASE_1_DISCOVERY.md` §10):
|
|
|
|
- **Phase 1.5 (NEW)** — BIPA-specific photo/video schema audit + Langfuse boundary scoping + outcomes.jsonl content sample. Lands BEFORE Phase 2 design starts.
|
|
- **Phase 2 (identity service design)** — now must include EU-placeholder fields, vertical-detection (healthcare flag), training-safe export interface, BIPA consent + retention metadata
|
|
- **Phase 3** (audit endpoint skeleton) — unchanged
|
|
- **Phase 4** (subject tagging) — must include healthcare-vertical routing decision at gateway boundary
|
|
- **Phase 5** (identity service build) — must include BIPA-compliant biometric metadata table
|
|
- **Phase 6** (protected-attribute boundary) — must include PHI redaction for healthcare-vertical requests
|
|
- **Phase 7** (retention + RTBF) — must include training-safe export evidence chain
|
|
- **Phase 8** (legal export) — unchanged
|
|
- **Phase 9** (rehearsal) — must include both EEOC discrimination scenario AND BIPA biometric scenario AND healthcare PHI breach scenario
|
|
|
|
---
|
|
|
|
## 10.5 Jurisdictional surface (IL + IN)
|
|
|
|
> **⚠ Not legal advice.** This is a research-grade checklist for J to take into a conversation with actual employment + privacy counsel. The system is targeting **Chicago (Illinois)** and **Indiana** placements per 2026-05-03 conversation. Counsel needs to verify what currently applies, what's pending, and whether case law has moved any of these in 2026. **Verify with counsel before claiming compliance with any item below.**
|
|
|
|
### Federal layer (always applies)
|
|
|
|
| Statute / framework | Relevance to this system |
|
|
|---|---|
|
|
| Title VII (Civil Rights Act) | Bans discrimination on race, color, religion, sex, national origin in hiring. Discrimination claim defense is the worked example in §1. |
|
|
| ADEA (Age Discrimination in Employment) | Bans age-based discrimination for workers 40+. DOB must be excluded from features per §4. |
|
|
| ADA (Americans with Disabilities Act) | Bans disability discrimination + requires reasonable accommodation. Disability-inferring features (gait, photo features, medical history) need exclusion. |
|
|
| EEOC enforcement | Receives complaints, issues right-to-sue. Audit response per §2 is what defends in EEOC investigation. |
|
|
| OFCCP | Applies if our staffing client serves federal contractors. Adds affirmative-action recordkeeping on top of EEOC. |
|
|
| FCRA (Fair Credit Reporting Act) | Triggers if background checks are performed. Pre-adverse-action notice + dispute process needed. |
|
|
| Section 1981 | Race-based contract discrimination — staffing is contract relationship. |
|
|
|
|
### Illinois-specific (Chicago jurisdiction)
|
|
|
|
| Statute | What | What we need |
|
|
|---|---|---|
|
|
| **BIPA** (Biometric Information Privacy Act, 740 ILCS 14) | Bans collection of biometric identifiers (face geometry, fingerprints, voiceprints) without informed written consent + retention schedule. Penalties: $1,000-$5,000 per violation per person. **Class actions are common and aggressive.** | If we use candidate photos for any feature (face match, headshot rendering, photo-derived attributes), BIPA almost certainly applies. The headshot pool we generate (per CLAUDE.md commit `5d93a71` area) needs careful review — synthetic faces are probably OK; real candidate photos are NOT without explicit BIPA-compliant consent. **Counsel must review.** |
|
|
| **Illinois AI Video Interview Act** (820 ILCS 42) | If AI analyzes recorded video interviews, employer must disclose AI use, obtain consent, provide explanation of how AI works, and limit who can review the video. | If we ever ingest video, this applies. Currently we don't, but worth flagging to counsel as a "what if we add this in 12 months" boundary. |
|
|
| **Illinois Human Rights Act** (775 ILCS 5) | Broader than federal Title VII — adds protected classes including arrest record, military status, marital status, order of protection, citizenship status (in some cases), unfavorable military discharge. | Protected attribute exclusion list in §4 needs expanding to cover IL-specific classes. |
|
|
| **Personal Information Protection Act** (PIPA, 815 ILCS 530) | Breach notification — must notify Illinois residents whose unencrypted PII was breached. | If identity service or workers parquet is breached, notification clock starts. Need incident response runbook. |
|
|
| **Illinois Day and Temporary Labor Services Act** (820 ILCS 175) | Specific to staffing/temporary services industry. Includes equal-pay-for-equal-work + record-keeping requirements + worker notification. | Highly relevant — applies directly to staffing-company clients. Audit retention may interact with these recordkeeping requirements. |
|
|
| **Workplace Transparency Act** | Restrictions on non-disclosure agreements re: harassment/discrimination | Tangential but worth noting. |
|
|
| **City of Chicago Human Rights Ordinance** (Title 6 Chicago Municipal Code) | Adds protected classes beyond IHRA (source of income, parental status, military discharge status, credit history). | Chicago-specific protected attributes list. |
|
|
| **Cook County Human Rights Ordinance** | Similar additions county-wide. | Chicago is in Cook County so this stacks. |
|
|
| **Possible: AI hiring transparency** | Several states/cities have proposed/passed laws modeled on NYC Local Law 144 (annual bias audit + candidate notification). I do not know whether IL or Chicago has such a law on the books as of 2026-01 cutoff. | **Counsel must check current state.** If it exists, we need annual bias audit reports (which IS what this PRD is building toward, but the report format may have specific requirements). |
|
|
|
|
### Indiana-specific
|
|
|
|
| Statute | What | What we need |
|
|
|---|---|---|
|
|
| **Indiana Data Breach Disclosure** (IC 24-4.9) | Breach notification within "without unreasonable delay" | Same incident response runbook as IL PIPA. |
|
|
| **Indiana Civil Rights Law** (IC 22-9) | State-level employment discrimination | Largely tracks federal Title VII, fewer expansions than IL. |
|
|
| **Indiana Genetic Information Privacy Act** | Bans use of genetic info in employment | Already in §4 protected list. |
|
|
| **General observation** | Indiana is generally less aggressive than Illinois on AI/employment regulation as of cutoff. | The IL bar is higher — if we satisfy IL, IN typically follows. **Counsel must confirm this isn't backwards.** |
|
|
|
|
### Cross-cutting (security frameworks for SaaS sales)
|
|
|
|
These aren't laws but are commonly required by enterprise customers (including staffing clients) before sale.
|
|
|
|
| Framework | What | Relevance |
|
|
|---|---|---|
|
|
| **SOC 2 Type II** | Auditor attestation of operating effectiveness over 6-12 months across Trust Service Criteria (Security, Availability, Processing Integrity, Confidentiality, Privacy). | The Privacy criterion overlaps heavily with this PRD. Privacy + Security are the two load-bearing TSCs. Effort to first Type II report: 6-9 months. Type I (point-in-time) is faster (weeks) but enterprise buyers usually want Type II. |
|
|
| **SOC 3** | Public-facing summary of SOC 2 (no detailed control descriptions). | Nice-to-have for marketing but the staffing client will want SOC 2 Type II report under NDA. |
|
|
| **HIPAA** | Healthcare data protection. | Triggers ONLY if staffing places workers into healthcare roles where they handle PHI. Currently not in scope per CLAUDE.md. **Confirm scoping with J.** |
|
|
| **PCI DSS** | Payment card data | Not currently in scope. |
|
|
| **ISO 27001** | International information security management | Alternative to SOC 2; more common in EU. Probably unnecessary for IL/IN-only deployments. |
|
|
|
|
### What this means for phase ordering
|
|
|
|
The 9-phase plan in §8 is technically correct but may need re-ordering once counsel weighs in:
|
|
|
|
- **BIPA risk on photos** is so high and so aggressive that if we use real candidate photos *anywhere*, that may need to be the FIRST thing we resolve — before the audit-trail work starts. Class-action exposure is enormous.
|
|
- **SOC 2 Type II prep** runs in parallel with this work, not after. If the staffing client says "show us your SOC 2 report" we need to have started the engagement weeks/months before.
|
|
- **Day and Temporary Labor Services Act** may impose recordkeeping that interacts with our retention SLA (§6) — counsel may say "no, retention has to be N years for THIS reason, not your defaulted 4."
|
|
|
|
### Open questions for counsel (one ask)
|
|
|
|
1. Does the staffing client have an existing SOC 2 report we leverage, or do we need our own?
|
|
2. Are we using any real candidate photos? If yes, is BIPA consent in place?
|
|
3. Does Illinois have an AI hiring transparency law on the books in 2026? If yes, what does the bias audit report need to look like?
|
|
4. What's the IL Day and Temporary Labor Services Act recordkeeping retention period? Does it interact with our 4-year proposed SLA?
|
|
5. Are background checks performed? If yes, do we need FCRA pre-adverse-action workflow integration?
|
|
6. Any healthcare placements? (HIPAA scoping)
|
|
7. Is the staffing client a federal contractor? (OFCCP scoping)
|
|
|
|
Counsel's answers shape whether the §8 phase plan ships as-is or needs reordering.
|
|
|
|
---
|
|
|
|
## 11. What this PRD is NOT
|
|
|
|
- Not a contract with the staffing client. That document needs lawyers and signs after this is built.
|
|
- Not a regulatory compliance attestation. We can build to the spirit of GDPR/CCPA/EEOC/BIPA/etc — passing actual certification is its own project.
|
|
- Not a guarantee against discrimination claims. It's a guarantee that *if* a claim is filed, we can produce evidence about how decisions were made.
|
|
- Not a substitute for human review. The audit shows what the AI did; humans still own the final call on hires.
|
|
- **Not legal advice.** The §10.5 jurisdictional surface is a research-grade checklist, NOT counsel's analysis. Verify everything with actual employment + privacy counsel licensed in IL + IN before claiming compliance with anything in this document.
|
|
|
|
---
|
|
|
|
## 12. Appendix — terms
|
|
|
|
- **Subject** — a person whose data flows through the system (candidate, worker, applicant). Identified by `candidate_id`.
|
|
- **Decision** — any system action that changes a subject's standing (added to candidate pool, ranked in search, recommended for fill, validated, scored, etc.).
|
|
- **Audit row** — one record in the audit response per decision, with the schema in §2.
|
|
- **PII** — personally identifiable information per the broad CCPA/GDPR definitions. In this system: name, email, phone, address, SSN, DOB, plus inferred-from-photo attributes.
|
|
- **Protected attribute** — characteristics that are illegal to discriminate on under federal/state law. The §4 list.
|
|
- **Inferred attribute** — a protected attribute the model derives from a non-protected feature (zip → race correlation, name → ethnicity correlation).
|
|
- **Identity service** — the daemon that holds candidate_id ↔ PII mapping. Separate auth.
|
|
- **Subject tagging** — the practice of labeling every decision/embedding/log row with a candidate_id so the audit endpoint can find it.
|
|
- **Cryptographic erasure** — making data unrecoverable by destroying its decryption key, even if the encrypted bytes remain on disk. Used for right-to-be-forgotten on append-only logs.
|
|
|
|
---
|
|
|
|
## Change log
|
|
|
|
- 2026-05-03 — Initial draft. Authored after J flagged the audit-trail gap as the production-readiness blocker.
|