Per 2026-05-03 phase_1_6_bipa_gates scrum (13 findings, 0 convergent).
1 BLOCK verified false positive, 4 real fixes shipped:
False positive (verified):
- opus BLOCK on attest:55 — claimed `set -uo pipefail` without `-e`
makes the post-python3 `if [ $? -ne 0 ]` check unreachable. Verified
WRONG: `X=$(false); echo $?` prints 1. Bash propagates command-
substitution exit through $? on the assignment line. The check IS
the python3 exit gate. Inline comment added to the script noting
the false positive so future scrums don't re-flag.
Real fixes:
1. opus WARN attestation:18 — schema fingerprint hashed names ONLY,
missing column-type changes. A column repurposed to hold base64
photo bytes under its existing name would pass undetected. Now
hashes "name<TAB>type<TAB>nullable=bool" per row. Re-run produced
evidence SHA-256 1fdcc9f1... (vs old 230fffeb..., reflecting the
broader fingerprint scope).
2. opus WARN gate_4_test:60 — definition regex didn't catch
object-literal property forms (`const t = { FEMALE_NAMES: [...] }`)
or TypeScript class fields (`class L { public NAMES_X: string[] = [] }`).
Added two new patterns + a regression test
(Gate 4: object-literal and class-field bypasses are caught) that
exercises 5 bypass forms. 4/4 tests green; 1 minor regex tweak
needed mid-fix to handle single-line class bodies.
3. kimi WARN python3-reliance — script assumed pyarrow installed and
would emit a stack trace into the attestation if not. Added
`python3 -c "import pyarrow"` gate at top with clean install
instructions on failure.
4. opus INFO PHASE_1_6:200 — item 7 (training) silently dropped from
blocking set with bare "deferred" rationale. Now explicitly states
the deferral is conditional on small operator population (J + 1-2
named ops); item 7 re-promotes to blocking if population grows.
⚖ COUNSEL marker added.
Skipped (acceptable as ⚖ COUNSEL placeholders by design):
- kimi WARN consent template:30-day-SLA (counsel decides number)
- kimi WARN consent template:email-placeholder (counsel supplies)
- kimi WARN parquet absence (env override exists; redeployment-aware)
- kimi INFO runbook manual-erasure (marked TODO when /erase ships)
- qwen INFO doc path/status nits (already addressed by file moves)
Tests: 4/4 Gate 4 absence test (incl. new bypass-coverage), 3/3
attestation evidence checks pass on live data.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
262 lines
15 KiB
Markdown
262 lines
15 KiB
Markdown
# Phase 1.6 — BIPA Pre-Launch Gates
|
|
|
|
**Status:** Draft — 2026-05-03 · **Owner:** J + outside counsel · **Companion to:** [`AUDIT_TRAIL_PRD.md`](AUDIT_TRAIL_PRD.md), [`AUDIT_PHASE_1_5_BIPA_AND_OUTCOMES.md`](AUDIT_PHASE_1_5_BIPA_AND_OUTCOMES.md), [`IDENTITY_SERVICE_DESIGN.md`](IDENTITY_SERVICE_DESIGN.md)
|
|
|
|
> **Why this exists.** `IDENTITY_SERVICE_DESIGN.md` v3 §5 Step 0 names Phase 1.6 as a HARD PREREQUISITE: identityd backfill cannot start until Phase 1.6 ships. This doc specifies what Phase 1.6 contains.
|
|
>
|
|
> **Scope.** BIPA (740 ILCS 14) compliance gates that must be in place BEFORE the system accepts a single real candidate photo. Synthetic-data face pool can keep operating; real-photo intake CANNOT begin without these gates.
|
|
>
|
|
> **Authority.** This is an engineering scaffold. Sections marked `⚖ COUNSEL` need outside counsel to author the actual legally-binding text. Engineering ships the procedural gates; counsel writes the words.
|
|
|
|
---
|
|
|
|
## 1. The five BIPA pre-launch gates
|
|
|
|
Each gate is a deliverable that must ship before real-photo intake. None is optional. Order shown is the recommended ship sequence.
|
|
|
|
### Gate 1 — Public retention schedule (BIPA §15(a))
|
|
|
|
**Required:** A publicly-available, written retention schedule for biometric identifiers and information.
|
|
|
|
**What ships:**
|
|
- `docs/policies/consent/biometric_retention_schedule_v1.md` — public file
|
|
- Linked from public privacy policy at the deployment URL
|
|
- Specifies:
|
|
- Categories of biometric data collected (facial geometry derived from candidate photos, age estimate, gender classification, race classification — per Phase 1.5 deepface walk)
|
|
- Purpose of collection (identity matching for staffing operations)
|
|
- Maximum retention: BIPA §15(a) caps at "3 years from the individual's last interaction with the private entity, whichever occurs first" — recommend 18-24 months as the operational ceiling (provides safety margin)
|
|
- Destruction procedure: per Gate 5 below
|
|
- Versioned (this is v1; future updates supersede with a new version)
|
|
|
|
**⚖ COUNSEL** — write the actual schedule. Engineering provides the operational facts; counsel writes the binding language.
|
|
|
|
**Engineering acceptance:** the file is committed, the public URL renders it, and identityd's `consent_versions` table references it by hash.
|
|
|
|
---
|
|
|
|
### Gate 2 — Informed written consent (BIPA §15(b))
|
|
|
|
**Required:** Informed, written consent BEFORE any biometric collection occurs.
|
|
|
|
**What ships:**
|
|
- `docs/policies/consent/biometric_consent_template_v1.md` — public consent template
|
|
- Versioned, hashed, referenced from identityd's `consent_versions` table
|
|
- Must disclose, per BIPA §15(b)(1)-(3):
|
|
1. That biometric identifiers/information will be collected
|
|
2. The specific purpose for collection (and the length of term — references Gate 1)
|
|
3. Receipt of a written release authorizing collection
|
|
- Consent flow at intake:
|
|
- Candidate sees the disclosure on a UI surface (web form / paper / digital signature)
|
|
- Candidate provides explicit affirmative action (signature, click-acceptance with timestamp, etc.)
|
|
- Identityd records `biometric_consent_status='given'` with `consent_version` reference + `consent_given_at` timestamp
|
|
- **Without identityd recording 'given', no biometric data flows through deepface.**
|
|
|
|
**⚖ COUNSEL** — write the consent template. Recommended content (engineering view):
|
|
- Clear language (not just legal boilerplate)
|
|
- Specific to facial-classification (not generic biometrics)
|
|
- Includes withdrawal procedure
|
|
- Includes data-subject rights enumeration
|
|
|
|
**Engineering acceptance:** consent gate is enforced in code at the photo-upload endpoint; identityd refuses biometric writes when `biometric_consent_status != 'given'`; pre-existing synthetic-face pool is exempt (no consent needed because no real subject).
|
|
|
|
---
|
|
|
|
### Gate 3 — Photo-upload endpoint with consent enforcement
|
|
|
|
**Required:** Code-level enforcement that real-photo intake checks consent before processing.
|
|
|
|
**What ships:**
|
|
|
|
A new endpoint (proposed: `POST /v1/identity/subjects/{candidate_id}/photo`) with the following behavior:
|
|
|
|
1. Caller authenticates with service-tier token
|
|
2. Endpoint queries identityd for `subjects.biometric_consent_status`
|
|
3. If status ≠ `'given'` → HTTP 403 with reason `"BIPA consent required before biometric processing"`
|
|
4. If status = `'given'`:
|
|
a. Photo bytes accepted, stored to a quarantined path under `data/biometric/uploads/{candidate_id}/{ts}.{ext}` (NOT `data/headshots/`)
|
|
b. deepface tagging runs against the photo
|
|
c. Classifications (gender, race, age) stored to `subjects` table fields (NEW columns — see schema additions below)
|
|
d. Original photo bytes encrypted under DEK + retained per Gate 1 schedule
|
|
e. `pii_access_log` row written with `purpose_token='biometric_collection'`
|
|
5. Response: `{candidate_id, retention_until, consent_version}`
|
|
|
|
**Schema additions to identityd `subjects`:**
|
|
|
|
```sql
|
|
ALTER TABLE subjects ADD COLUMN biometric_classifications JSONB; -- {gender, race, age} from deepface
|
|
ALTER TABLE subjects ADD COLUMN biometric_data_path TEXT; -- quarantined path
|
|
ALTER TABLE subjects ADD COLUMN biometric_collected_at TIMESTAMPTZ;
|
|
ALTER TABLE subjects ADD COLUMN biometric_template_hash TEXT; -- hash of the photo bytes (for integrity, NOT for re-derivation)
|
|
```
|
|
|
|
**Engineering acceptance:**
|
|
- Endpoint refuses uploads when consent missing (verified by integration test)
|
|
- deepface output never lands in the synthetic-face manifest (`data/headshots/manifest.jsonl`)
|
|
- Real-photo classifications are isolated to identityd `subjects` table — never flow to JSONL sinks
|
|
- The `/headshots/:key` route in mcp-server REMAINS synthetic-only — does NOT serve real candidate photos to LLMs without an explicit allowance (proposed: real photos served only to authenticated staffer UI, never to model context)
|
|
|
|
---
|
|
|
|
### Gate 4 — Deprecate name → ethnicity inference
|
|
|
|
**Required:** The hard-coded `NAMES_HISPANIC` / `SURNAMES_*` lookup tables in `mcp-server/search.html:3375-3432` (per Phase 1.5 §1B walk) get removed.
|
|
|
|
**What ships:**
|
|
- A code commit that removes:
|
|
- `FEMALE_NAMES`, `MALE_NAMES` constants
|
|
- `NAMES_HISPANIC`, `NAMES_BLACK`, `NAMES_SOUTH_ASIAN`, `NAMES_EAST_ASIAN`, `NAMES_MIDDLE_EASTERN` constants
|
|
- `SURNAMES_HISPANIC`, `SURNAMES_SOUTH_ASIAN`, `SURNAMES_EAST_ASIAN`, `SURNAMES_MIDDLE_EASTERN`, `SURNAMES_BLACK` constants
|
|
- The `genderFor()` and `guessEthnicityFromFirstName()` functions
|
|
- All call sites that consumed these (face-pool bucket selection)
|
|
- Replacement strategy:
|
|
- For SYNTHETIC face pool routing: deterministic hash of candidate_id selects a face bucket, no demographic inference
|
|
- For REAL candidate photos: the candidate's actual photo IS the representation; no inference needed
|
|
|
|
**Why this is BIPA + Title VII risk separately:** name-based ethnicity classification is BOTH a discriminatory feature engineering practice (Title VII) AND, when combined with photo-based attribute extraction, a "biometric information derived from a biometric identifier" pattern (BIPA broad reading). Removing the lookup tables forecloses both arguments.
|
|
|
|
**Engineering acceptance:**
|
|
- Lookup tables removed from search.html
|
|
- Unit test asserts no protected-attribute inference functions exist in search.html or any mcp-server module
|
|
- Face-pool routing for synthetic faces uses candidate_id hash exclusively
|
|
- Phase 1.5 §1B finding closed
|
|
|
|
---
|
|
|
|
### Gate 5 — Documented destruction procedure
|
|
|
|
**Required:** A written procedure for biometric data destruction at retention expiry OR consent withdrawal OR right-to-be-forgotten request.
|
|
|
|
**What ships:**
|
|
- `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` — operator-facing
|
|
- Specifies:
|
|
- Triggers: retention expiry (per Gate 1), withdrawal, RTBF request, candidate request
|
|
- Procedure: identityd `POST /v1/identity/subjects/{id}/erase` (legal-tier auth)
|
|
- Erasure scope: `subjects.biometric_*` columns ciphertext-deleted, `biometric_data_path` files securely overwritten + unlinked, deepface classifications nulled
|
|
- Backup window: per `IDENTITY_SERVICE_DESIGN` v3-B12, residual exists in DB backups for 30 days max; subject is informed
|
|
- Witnessed: every erasure event written to `pii_access_log` with `purpose_token='biometric_erasure'` and the legal-tier JWT signature (proves authorized destruction)
|
|
- Reporting: monthly internal report of erasures + retention-expiry sweeps; available to counsel on request
|
|
|
|
**⚖ COUNSEL** — review the runbook for legal sufficiency. Engineering writes the procedure; counsel attests that the procedure satisfies BIPA §15(a) destruction requirements.
|
|
|
|
**Engineering acceptance:**
|
|
- Runbook committed
|
|
- `POST /v1/identity/subjects/{id}/erase` endpoint includes biometric-specific erasure path
|
|
- Daily sweep job destroys biometric data past `biometric_retention_until` (separate from general retention sweep — biometric has stricter clock)
|
|
- Erasure events are logged with cryptographic attestation
|
|
|
|
---
|
|
|
|
## 2. Cryptographic attestation: no biometric data exists pre-identityd
|
|
|
|
**Per `IDENTITY_SERVICE_DESIGN` v3-B11.** Plaintiffs may argue that the EXISTENCE of biometric schema fields constitutes constructive notice of intent to collect biometric data — therefore consent should have preceded the schema. The defense: prove that no biometric data was actually collected from real candidates before identityd + the consent gate.
|
|
|
|
**What ships:**
|
|
- A one-shot script `scripts/staffing/attest_pre_identityd_biometric_state.sh` that:
|
|
- Queries `data/datasets/workers_500k.parquet` schema and confirms NO column named `photo`, `biometric_*`, `face_*`, `image_*` exists
|
|
- Greps `data/_kb/*.jsonl` and `data/_pathway_memory/state.json` for any base64-encoded image bytes (deepface output, photo blobs)
|
|
- Verifies `data/headshots/manifest.jsonl` rows ≤ synthetic face pool size
|
|
- Hashes the schema + summary; commits the hash to S3 Object Lock (per identity service v3 anchor pattern)
|
|
- Attestation document `docs/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-XX.md` signed by J + outside counsel
|
|
|
|
**This is a one-time defense artifact.** It establishes the baseline: "as of this date, no biometric data was collected from real candidates."
|
|
|
|
---
|
|
|
|
## 3. Employee training acknowledgment (general BIPA hygiene)
|
|
|
|
**Required:** People with access to biometric data acknowledge BIPA-handling training.
|
|
|
|
**What ships:**
|
|
- `docs/policies/BIPA_HANDLING_TRAINING_v1.md` — training material covering:
|
|
- What constitutes biometric identifiers / information
|
|
- The consent + retention procedures
|
|
- Destruction obligations
|
|
- Reporting suspected exposure
|
|
- Acknowledgment record per individual (initially: J + counsel + named operators)
|
|
- Annual refresh
|
|
|
|
**⚖ COUNSEL** — write training content. Engineering doesn't author legal-compliance training.
|
|
|
|
---
|
|
|
|
## 4. Phase 1.6 exit criteria (gates Phase 2 backfill)
|
|
|
|
All 5 gates must be DONE before identityd backfill begins. Status as
|
|
of 2026-05-03 — scaffolds vs. counsel sign-off vs. shipped code:
|
|
|
|
| # | Gate | Engineering | Counsel | Status |
|
|
|---|---|---|---|---|
|
|
| 1 | Public retention schedule | scaffolded at `docs/policies/consent/biometric_retention_schedule_v1.md` | pending | **eng-staged** |
|
|
| 2 | Consent template | scaffolded at `docs/policies/consent/biometric_consent_template_v1.md` | pending | **eng-staged** |
|
|
| 3 | Photo-upload endpoint with consent enforcement | NOT STARTED — depends on identityd photo intake design + deepface integration | n/a until eng | **blocked-on-design** |
|
|
| 4 | Name → ethnicity inference removed | DONE — `mcp-server/search.html:3372` removal note + `mcp-server/phase_1_6_gate_4.test.ts` absence test (3/3 green) | none required | **DONE** |
|
|
| 5 | Destruction runbook | scaffolded at `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`; erasure endpoint + verify/report scripts marked TODO | pending | **eng-staged** |
|
|
|
|
PLUS:
|
|
|
|
| # | Item | Engineering | Counsel | Status |
|
|
|---|---|---|---|---|
|
|
| 6 | Cryptographic attestation pre-identityd | DONE — `scripts/staffing/attest_pre_identityd_biometric_state.sh` + `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` (3/3 evidence checks pass; signature lines pending) | pending signature | **eng-DONE, signature-pending** |
|
|
| 7 | Employee training material | scaffold deferred — Gate 5 runbook §7 acknowledgment may serve as substrate | pending | **deferred** |
|
|
|
|
**Blocking set for Phase 2 backfill:** items **1, 2, 3, 4, 5, 6** must
|
|
all be DONE. Item 7 (employee training) is reduced from blocking to
|
|
"deferred" because the Gate 5 destruction runbook §7 already requires
|
|
operator acknowledgment before legal-tier credentials are issued —
|
|
that acknowledgment is procedurally equivalent to the training-record
|
|
requirement when the operator population is small (J + 1-2 named
|
|
operators). If the operator population grows beyond that, item 7
|
|
re-promotes to blocking and a separate training program must be authored.
|
|
|
|
⚖ COUNSEL — confirm whether item 7 deferral is acceptable for the
|
|
expected operator population size, or restore it to the blocking set.
|
|
|
|
**Calendar bottleneck:** Items 1, 2, 5, 6 (and #7) await counsel
|
|
review of the engineering scaffolds. Gate 3 (photo-upload endpoint)
|
|
is the only remaining engineering work; it's deferred to its own
|
|
session because it crosses into identityd photo intake and deepface
|
|
integration scope that hasn't been designed yet.
|
|
|
|
---
|
|
|
|
## 5. Effort estimate
|
|
|
|
| Gate | Engineering effort | Legal effort |
|
|
|---|---|---|
|
|
| Gate 1 (retention schedule) | 0.5 day | counsel-dependent (typically 1-2 weeks for review) |
|
|
| Gate 2 (consent template) | 0.5 day | counsel-dependent (typically 2-4 weeks for review and consent UX design) |
|
|
| Gate 3 (photo-upload endpoint) | 1-2 days | review of endpoint behavior |
|
|
| Gate 4 (deprecate name-ethnicity inference) | 0.5 day | none (engineering-only fix) |
|
|
| Gate 5 (destruction runbook) | 1 day | counsel sign-off |
|
|
| §2 cryptographic attestation | 0.5 day | counsel + J signature |
|
|
| §3 employee training | 0.25 day (admin) | counsel-authored content |
|
|
| **Total engineering** | **~4-5 days** | — |
|
|
| **Total counsel** | — | **~3-6 weeks calendar** (review cycles) |
|
|
|
|
**The calendar bottleneck is counsel, not engineering.** Engineering can stage all 5 gates ready-to-ship in a week. Counsel sign-off + consent UX rollout is the longer pole.
|
|
|
|
---
|
|
|
|
## 6. Open questions for J + counsel
|
|
|
|
1. **Photo-upload UX:** is there an existing intake form / staffer console where photo upload would happen? Or is this new UI work?
|
|
2. **Consent collection mechanism:** electronic signature service (DocuSign, Adobe Sign), in-app click-acceptance, paper form? Each has different evidentiary weight in litigation.
|
|
3. **Operator list with biometric access:** who, today, would be on the named-operators list for §3 training?
|
|
4. **Counsel for sign-off:** named outside counsel — same or different from the dual-control legal-token party in identity service?
|
|
5. **Public privacy policy URL:** does one exist? If yes, where; if no, that's a separate Gate-1.5 deliverable.
|
|
|
|
---
|
|
|
|
## 7. What this PRD is NOT
|
|
|
|
- Not legal advice. The `⚖ COUNSEL` markers exist because the binding text needs lawyers, not engineers.
|
|
- Not a substitute for a DPIA / PIA. Phase 1.6 satisfies BIPA-specific gates; a Data Protection Impact Assessment is broader and may be required separately.
|
|
- Not a SOC2 Type II deliverable. SOC2 is a parallel work stream.
|
|
- Not the only gate before production. The full 9-phase audit-trail program continues; Phase 1.6 specifically unblocks Phase 2 (identity service implementation).
|
|
|
|
---
|
|
|
|
## Change log
|
|
|
|
- 2026-05-03 — Initial draft. Authored after `IDENTITY_SERVICE_DESIGN` v3 §5 Step 0 named Phase 1.6 as a hard prerequisite to backfill.
|