# Phase 1.6 — BIPA Pre-Launch Gates **Status:** Draft — 2026-05-03 · **Owner:** J + outside counsel · **Companion to:** [`AUDIT_TRAIL_PRD.md`](AUDIT_TRAIL_PRD.md), [`AUDIT_PHASE_1_5_BIPA_AND_OUTCOMES.md`](AUDIT_PHASE_1_5_BIPA_AND_OUTCOMES.md), [`IDENTITY_SERVICE_DESIGN.md`](IDENTITY_SERVICE_DESIGN.md) > **Why this exists.** `IDENTITY_SERVICE_DESIGN.md` v3 §5 Step 0 names Phase 1.6 as a HARD PREREQUISITE: identityd backfill cannot start until Phase 1.6 ships. This doc specifies what Phase 1.6 contains. > > **Scope.** BIPA (740 ILCS 14) compliance gates that must be in place BEFORE the system accepts a single real candidate photo. Synthetic-data face pool can keep operating; real-photo intake CANNOT begin without these gates. > > **Authority.** This is an engineering scaffold. Sections marked `⚖ COUNSEL` need outside counsel to author the actual legally-binding text. Engineering ships the procedural gates; counsel writes the words. --- ## 1. The five BIPA pre-launch gates Each gate is a deliverable that must ship before real-photo intake. None is optional. Order shown is the recommended ship sequence. ### Gate 1 — Public retention schedule (BIPA §15(a)) **Required:** A publicly-available, written retention schedule for biometric identifiers and information. **What ships:** - `docs/policies/consent/biometric_retention_schedule_v1.md` — public file - Linked from public privacy policy at the deployment URL - Specifies: - Categories of biometric data collected (facial photograph for staff identification at job sites; classifications deferred per Gate 3b — see `docs/specs/GATE_3B_DEEPFACE_DESIGN.md`) - Purpose of collection (identity matching for staffing operations) - Maximum retention: BIPA §15(a) caps at "3 years from the individual's last interaction with the private entity, whichever occurs first" — recommend 18-24 months as the operational ceiling (provides safety margin) - Destruction procedure: per Gate 5 below - Versioned (this is v1; future updates supersede with a new version) **⚖ COUNSEL** — write the actual schedule. Engineering provides the operational facts; counsel writes the binding language. **Engineering acceptance:** the file is committed, the public URL renders it, and identityd's `consent_versions` table references it by hash. --- ### Gate 2 — Informed written consent (BIPA §15(b)) **Required:** Informed, written consent BEFORE any biometric collection occurs. **What ships:** - `docs/policies/consent/biometric_consent_template_v1.md` — public consent template - Versioned, hashed, referenced from identityd's `consent_versions` table - Must disclose, per BIPA §15(b)(1)-(3): 1. That biometric identifiers/information will be collected 2. The specific purpose for collection (and the length of term — references Gate 1) 3. Receipt of a written release authorizing collection - Consent flow at intake: - Candidate sees the disclosure on a UI surface (web form / paper / digital signature) - Candidate provides explicit affirmative action (signature, click-acceptance with timestamp, etc.) - Identityd records `biometric_consent_status='given'` with `consent_version` reference + `consent_given_at` timestamp - **Without identityd recording 'given', no biometric data flows through deepface.** **⚖ COUNSEL** — write the consent template. Recommended content (engineering view): - Clear language (not just legal boilerplate) - Specific to facial-classification (not generic biometrics) - Includes withdrawal procedure - Includes data-subject rights enumeration **Engineering acceptance:** consent gate is enforced in code at the photo-upload endpoint; identityd refuses biometric writes when `biometric_consent_status != 'given'`; pre-existing synthetic-face pool is exempt (no consent needed because no real subject). --- ### Gate 3 — Photo-upload endpoint with consent enforcement **Required:** Code-level enforcement that real-photo intake checks consent before processing. **What ships:** An endpoint at `POST /biometric/subject/{candidate_id}/photo` (catalogd-local — the original v1 spec named this `/v1/identity/subjects/{candidate_id}/photo` under a separate identityd daemon; that daemon was collapsed into catalogd per the architecture pivot. See `IDENTITY_SERVICE_DESIGN.md` deprecation header.) with the following behavior: 1. Caller authenticates with service-tier token 2. Endpoint queries identityd for `subjects.biometric_consent_status` 3. If status ≠ `'given'` → HTTP 403 with reason `"BIPA consent required before biometric processing"` 4. If status = `'given'`: a. Photo bytes accepted, stored to a quarantined path under `data/biometric/uploads/{candidate_id}/{ts}.{ext}` (NOT `data/headshots/`) b. deepface tagging runs against the photo c. Classifications (gender, race, age) — **DEFERRED to Gate 3b** (`docs/specs/GATE_3B_DEEPFACE_DESIGN.md`). `BiometricCollection.classifications` remains `None` in v1. d. Original photo bytes encrypted under DEK + retained per Gate 1 schedule e. `pii_access_log` row written with `purpose_token='biometric_collection'` 5. Response: `{candidate_id, retention_until, consent_version}` **Schema (as shipped — catalogd `SubjectManifest.biometric_collection`):** The original spec proposed JSONB columns on a Postgres `subjects` table under identityd. The shipped implementation collapses this into a per-subject JSON manifest at `data/_catalog/subjects/.json`, with the `BiometricCollection` struct holding `data_path`, `template_hash`, `collected_at`, and `classifications: Option`. See `crates/catalogd/src/subject_manifest.rs` for the canonical type. ```rust // crates/catalogd/src/subject_manifest.rs (paraphrased) pub struct BiometricCollection { pub data_path: String, // quarantined path pub template_hash: String, // SHA-256 of original bytes (integrity, NOT re-derivation) pub collected_at: DateTime, pub classifications: Option, // None until Gate 3b ships (deferred — see GATE_3B_DEEPFACE_DESIGN.md) } ``` **Engineering acceptance:** - Endpoint refuses uploads when consent missing (verified by integration test) - deepface output never lands in the synthetic-face manifest (`data/headshots/manifest.jsonl`) - Real-photo classifications are isolated to identityd `subjects` table — never flow to JSONL sinks - The `/headshots/:key` route in mcp-server REMAINS synthetic-only — does NOT serve real candidate photos to LLMs without an explicit allowance (proposed: real photos served only to authenticated staffer UI, never to model context) --- ### Gate 4 — Deprecate name → ethnicity inference **Required:** The hard-coded `NAMES_HISPANIC` / `SURNAMES_*` lookup tables in `mcp-server/search.html:3375-3432` (per Phase 1.5 §1B walk) get removed. **What ships:** - A code commit that removes: - `FEMALE_NAMES`, `MALE_NAMES` constants - `NAMES_HISPANIC`, `NAMES_BLACK`, `NAMES_SOUTH_ASIAN`, `NAMES_EAST_ASIAN`, `NAMES_MIDDLE_EASTERN` constants - `SURNAMES_HISPANIC`, `SURNAMES_SOUTH_ASIAN`, `SURNAMES_EAST_ASIAN`, `SURNAMES_MIDDLE_EASTERN`, `SURNAMES_BLACK` constants - The `genderFor()` and `guessEthnicityFromFirstName()` functions - All call sites that consumed these (face-pool bucket selection) - Replacement strategy: - For SYNTHETIC face pool routing: deterministic hash of candidate_id selects a face bucket, no demographic inference - For REAL candidate photos: the candidate's actual photo IS the representation; no inference needed **Why this is BIPA + Title VII risk separately:** name-based ethnicity classification is BOTH a discriminatory feature engineering practice (Title VII) AND, when combined with photo-based attribute extraction, a "biometric information derived from a biometric identifier" pattern (BIPA broad reading). Removing the lookup tables forecloses both arguments. **Engineering acceptance:** - Lookup tables removed from search.html - Unit test asserts no protected-attribute inference functions exist in search.html or any mcp-server module - Face-pool routing for synthetic faces uses candidate_id hash exclusively - Phase 1.5 §1B finding closed --- ### Gate 5 — Documented destruction procedure **Required:** A written procedure for biometric data destruction at retention expiry OR consent withdrawal OR right-to-be-forgotten request. **What ships:** - `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` — operator-facing - Specifies: - Triggers: retention expiry (per Gate 1), withdrawal, RTBF request, candidate request - Procedure: catalogd-local `POST /biometric/subject/{id}/erase` (legal-tier auth) — formerly proposed under identityd; now serves from catalogd directly - Erasure scope: `BiometricCollection` set to `None` on the subject manifest (drops `data_path`, `template_hash`, `classifications` together), quarantined photo files at `data/biometric/uploads//*` securely unlinked, audit row appended BEFORE photo unlink so the chain proves intent even if file delete fails - Backup window: per `IDENTITY_SERVICE_DESIGN` v3-B12, residual exists in DB backups for 30 days max; subject is informed - Witnessed: every erasure event written to `pii_access_log` with `purpose_token='biometric_erasure'` and the legal-tier JWT signature (proves authorized destruction) - Reporting: monthly internal report of erasures + retention-expiry sweeps; available to counsel on request **⚖ COUNSEL** — review the runbook for legal sufficiency. Engineering writes the procedure; counsel attests that the procedure satisfies BIPA §15(a) destruction requirements. **Engineering acceptance:** - Runbook committed - `POST /biometric/subject/{id}/erase` endpoint includes biometric-specific erasure path (shipped `848a458` — 21 unit tests, two scopes: biometric_only / full) - Daily sweep job destroys biometric data past `biometric_retention_until` (separate from general retention sweep — biometric has stricter clock) - Erasure events are logged with cryptographic attestation --- ## 2. Cryptographic attestation: no biometric data exists pre-identityd **Per `IDENTITY_SERVICE_DESIGN` v3-B11.** Plaintiffs may argue that the EXISTENCE of biometric schema fields constitutes constructive notice of intent to collect biometric data — therefore consent should have preceded the schema. The defense: prove that no biometric data was actually collected from real candidates before identityd + the consent gate. **What ships:** - A one-shot script `scripts/staffing/attest_pre_identityd_biometric_state.sh` that: - Queries `data/datasets/workers_500k.parquet` schema and confirms NO column named `photo`, `biometric_*`, `face_*`, `image_*` exists - Greps `data/_kb/*.jsonl` and `data/_pathway_memory/state.json` for any base64-encoded image bytes (deepface output, photo blobs) - Verifies `data/headshots/manifest.jsonl` rows ≤ synthetic face pool size - Hashes the schema + summary; commits the hash to S3 Object Lock (per identity service v3 anchor pattern) - Attestation document `docs/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-XX.md` signed by J + outside counsel **This is a one-time defense artifact.** It establishes the baseline: "as of this date, no biometric data was collected from real candidates." --- ## 3. Employee training acknowledgment (general BIPA hygiene) **Required:** People with access to biometric data acknowledge BIPA-handling training. **What ships:** - `docs/policies/BIPA_HANDLING_TRAINING_v1.md` — training material covering: - What constitutes biometric identifiers / information - The consent + retention procedures - Destruction obligations - Reporting suspected exposure - Acknowledgment record per individual (initially: J + counsel + named operators) - Annual refresh **⚖ COUNSEL** — write training content. Engineering doesn't author legal-compliance training. --- ## 4. Phase 1.6 exit criteria (gates Phase 2 backfill) All 5 gates must be DONE before identityd backfill begins. Status as of 2026-05-03 — scaffolds vs. counsel sign-off vs. shipped code: | # | Gate | Engineering | Counsel | Status | |---|---|---|---|---| | 1 | Public retention schedule | scaffolded at `docs/policies/consent/biometric_retention_schedule_v1.md` | pending | **eng-staged** | | 2 | Consent template | scaffolded at `docs/policies/consent/biometric_consent_template_v1.md` | pending | **eng-staged** | | 3 | Photo-upload endpoint with consent enforcement | DONE — `crates/catalogd/src/biometric_endpoint.rs` mounted at `/biometric/subject/{id}/photo`, 11 unit tests, live-verified end-to-end. **Gate 3b DECIDED 2026-05-05: Option C (defer classifications).** `BiometricCollection.classifications` stays `Option = None` in v1; consent + retention docs revised to match. See `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` §6 + change log. | reviewed under Gate 2 (matching consent text) | **DONE — 3a shipped, 3b deferred per design doc** | | 4 | Name → ethnicity inference removed | DONE — `mcp-server/search.html:3372` removal note + `mcp-server/phase_1_6_gate_4.test.ts` absence test (3/3 green) | none required | **DONE** | | 5 | Destruction runbook | scaffolded at `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`; erasure endpoint + verify/report scripts marked TODO | pending | **eng-staged** | PLUS: | # | Item | Engineering | Counsel | Status | |---|---|---|---|---| | 6 | Cryptographic attestation pre-identityd | DONE — `scripts/staffing/attest_pre_identityd_biometric_state.sh` + `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` (3/3 evidence checks pass; signature lines pending) | pending signature | **eng-DONE, signature-pending** | | 7 | Employee training material | scaffold deferred — Gate 5 runbook §7 acknowledgment may serve as substrate | pending | **deferred** | **Blocking set for Phase 2 backfill:** items **1, 2, 3, 4, 5, 6** must all be DONE. Item 7 (employee training) is reduced from blocking to "deferred" because the Gate 5 destruction runbook §7 already requires operator acknowledgment before legal-tier credentials are issued — that acknowledgment is procedurally equivalent to the training-record requirement when the operator population is small (J + 1-2 named operators). If the operator population grows beyond that, item 7 re-promotes to blocking and a separate training program must be authored. ⚖ COUNSEL — confirm whether item 7 deferral is acceptable for the expected operator population size, or restore it to the blocking set. **Calendar bottleneck:** Items 1, 2, 5, 6 (and #7) await counsel review of the engineering scaffolds. Gate 3 substrate is fully shipped; Gate 3b deepface classification was DECIDED on 2026-05-05 as Option C (defer) — `BiometricCollection.classifications` stays `None` in v1, consent + retention docs revised to match this narrower scope. If a future product requirement surfaces a real need for classifications, the substrate is forward-compatible (`Option`) and either Option A (~1 day) or Option B (~5 days) of the design doc can be picked up then under a v2 consent template. --- ## 5. Effort estimate | Gate | Engineering effort | Legal effort | |---|---|---| | Gate 1 (retention schedule) | 0.5 day | counsel-dependent (typically 1-2 weeks for review) | | Gate 2 (consent template) | 0.5 day | counsel-dependent (typically 2-4 weeks for review and consent UX design) | | Gate 3 (photo-upload endpoint) | 1-2 days | review of endpoint behavior | | Gate 4 (deprecate name-ethnicity inference) | 0.5 day | none (engineering-only fix) | | Gate 5 (destruction runbook) | 1 day | counsel sign-off | | §2 cryptographic attestation | 0.5 day | counsel + J signature | | §3 employee training | 0.25 day (admin) | counsel-authored content | | **Total engineering** | **~4-5 days** | — | | **Total counsel** | — | **~3-6 weeks calendar** (review cycles) | **The calendar bottleneck is counsel, not engineering.** Engineering can stage all 5 gates ready-to-ship in a week. Counsel sign-off + consent UX rollout is the longer pole. --- ## 6. Open questions for J + counsel 1. **Photo-upload UX:** is there an existing intake form / staffer console where photo upload would happen? Or is this new UI work? 2. **Consent collection mechanism:** electronic signature service (DocuSign, Adobe Sign), in-app click-acceptance, paper form? Each has different evidentiary weight in litigation. 3. **Operator list with biometric access:** who, today, would be on the named-operators list for §3 training? 4. **Counsel for sign-off:** named outside counsel — same or different from the dual-control legal-token party in identity service? 5. **Public privacy policy URL:** does one exist? If yes, where; if no, that's a separate Gate-1.5 deliverable. --- ## 6.5. Post-signature deployment runbook When counsel returns the countersigned consent template + retention schedule, the engineering side of "flip from permissive to strict mode" is one command: ```bash # 1. Counsel commits their signature to §7 of the consent template # markdown (or J commits the signed PDF + updates §7 with counsel's # name + date). The markdown is the BINDING TEXT — the PDF is just # a rendering of it. # 2. Hash the canonical signed text + seed the gateway allowlist. ./scripts/staffing/seed_consent_version.sh \ docs/policies/consent/biometric_consent_template_v1.md \ --label "v1 signed YYYY-MM-DD by [counsel name]" # The script: # - computes SHA-256 of the markdown (binding text) # - atomically writes /etc/lakehouse/consent_versions.json with # the new hash + per-seed audit metadata (timestamp, label, # source path) # - restarts lakehouse.service so the gateway re-reads the # allowlist # - probes /biometric/health for clean restart # 3. Verify strict mode is rejecting unknown hashes: TOKEN=$(cat /etc/lakehouse/legal_audit.token) curl -sS -X POST http://localhost:3100/biometric/subject/WORKER-1/consent \ -H "X-Lakehouse-Legal-Token: $TOKEN" \ -H "Content-Type: application/json" \ -d '{"consent_version_hash":"deadbeefdeadbeef000000000000000000000000000000000000000000000000","consent_collection_method":"electronic_signature","operator_of_record":"strict_mode_probe"}' # Expect: HTTP 400 + {"error":"consent_version_unknown", ...} ``` After this, the gateway is in counsel-tier strict mode: - Any consent grant POST whose `consent_version_hash` doesn't match a known signed template is refused at intake - Operator typos (mistyped hash) become loud failures, not silent bad records - Future template revisions (v2, v3, ...) require counsel re-sign AND a new `seed_consent_version.sh` run before being accepted — the v1 hash stays in the allowlist for already-collected subjects' audit-trail compatibility ### Pre-counsel demo seed For deployments that want to exercise strict mode BEFORE counsel signs, the same script works against the eng-staged template: ```bash ./scripts/staffing/seed_consent_version.sh \ docs/policies/consent/biometric_consent_template_v1.md \ --label "eng-staged demo seed (NOT counsel-signed)" ``` The hash entry should be replaced (rotate the demo hash out, add the counsel-signed hash) when counsel completes review. The allowlist's `_meta.seeded_at[]` array preserves the seed history. --- ## 7. What this PRD is NOT - Not legal advice. The `⚖ COUNSEL` markers exist because the binding text needs lawyers, not engineers. - Not a substitute for a DPIA / PIA. Phase 1.6 satisfies BIPA-specific gates; a Data Protection Impact Assessment is broader and may be required separately. - Not a SOC2 Type II deliverable. SOC2 is a parallel work stream. - Not the only gate before production. The full 9-phase audit-trail program continues; Phase 1.6 specifically unblocks Phase 2 (identity service implementation). --- ## Change log - 2026-05-05 — Reconciled with shipped state: endpoint paths corrected from the legacy identityd v1 spec (`/v1/identity/subjects/*`) to the catalogd-local routes that actually shipped (`/biometric/subject/*`). Schema block rewritten to reflect the JSON `SubjectManifest.biometric_collection` substrate that replaced the proposed Postgres columns. Gate 3b deepface deferral marked in-line where Disclosure 1 / Gate 3 step 5c / Gate 5 erasure scope previously assumed classifications were collected. No legal text changed; this was doc/code drift cleanup. - 2026-05-03 — Initial draft. Authored after `IDENTITY_SERVICE_DESIGN` v3 §5 Step 0 named Phase 1.6 as a hard prerequisite to backfill.