The remaining production blocker is counsel-calendar bottleneck
(review + sign-off). Engineering can't make counsel move faster,
but it CAN reduce the round-trip overhead:
(1) docs/counsel/COUNSEL_HANDOFF_EMAIL_2026-05-05.md — copy-paste
email body J can send to outside counsel. Subject line + body
+ tarball attachment instructions + headline asks (A/B/C/D
in priority order) + post-signature operator runbook. The
pre-flight checklist + post-signature workflow turn what
would have been "I'll figure out the email" into "click send."
(2) scripts/staffing/seed_consent_version.sh — turnkey
post-signature deployment. Takes the path to a (presumably
counsel-signed) consent template markdown, computes SHA-256,
atomically merges into /etc/lakehouse/consent_versions.json
(creating the file if absent, with per-seed audit metadata
in _meta.seeded_at[]), restarts lakehouse.service, probes
/biometric/health post-restart. Idempotent: re-running with
the same hash is a no-op for the versions array but still
appends a [reseed] entry to the audit metadata.
Verified live against the eng-staged template — strict mode
flipped clean, /biometric/health 200 post-restart.
(3) docs/PHASE_1_6_BIPA_GATES.md §6.5 — post-signature deployment
runbook embedded in the gates doc. Three steps: counsel signs
+ commits → seed_consent_version.sh → strict-mode probe.
Plus a "pre-counsel demo seed" subsection documenting how to
exercise strict mode BEFORE counsel signs (using the
eng-staged template hash) so the deployment workflow is
proven before the legal critical path closes.
Strict mode flipped live — verified post-restart:
- /etc/lakehouse/consent_versions.json populated with the
eng-staged template hash:
8b09591a8dc15f59197affac48909ce943d575eee01705b42303acf3b32f5c56
- POST /biometric/subject/WORKER-1/consent with deadbeef hash:
HTTP 400 + error="consent_version_unknown"
- POST with the known eng-staged hash: passes version check
(then 404 subject_not_found on a ghost candidate, proving
the gate is hash-aware not auth-broken)
The hash currently seeded is the ENG-STAGED template
(pre-counsel-signature). When counsel returns the signed text,
operator runs `seed_consent_version.sh` again with the
counsel-signed markdown — the new hash gets appended; the demo
hash stays in for backwards-compat with any consent records
collected during the pre-counsel demo period (none, today).
Production blocker is now genuinely just counsel calendar:
1. J transmits reports/counsel/counsel_packet_2026-05-05.tar.gz
per the handoff email
2. Counsel reviews + signs (their billable time)
3. Counsel returns signed text → operator runs seed script
4. Strict mode flips to canonical hash → cutover complete
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 KiB
Phase 1.6 — BIPA Pre-Launch Gates
Status: Draft — 2026-05-03 · Owner: J + outside counsel · Companion to: AUDIT_TRAIL_PRD.md, AUDIT_PHASE_1_5_BIPA_AND_OUTCOMES.md, IDENTITY_SERVICE_DESIGN.md
Why this exists.
IDENTITY_SERVICE_DESIGN.mdv3 §5 Step 0 names Phase 1.6 as a HARD PREREQUISITE: identityd backfill cannot start until Phase 1.6 ships. This doc specifies what Phase 1.6 contains.Scope. BIPA (740 ILCS 14) compliance gates that must be in place BEFORE the system accepts a single real candidate photo. Synthetic-data face pool can keep operating; real-photo intake CANNOT begin without these gates.
Authority. This is an engineering scaffold. Sections marked
⚖ COUNSELneed outside counsel to author the actual legally-binding text. Engineering ships the procedural gates; counsel writes the words.
1. The five BIPA pre-launch gates
Each gate is a deliverable that must ship before real-photo intake. None is optional. Order shown is the recommended ship sequence.
Gate 1 — Public retention schedule (BIPA §15(a))
Required: A publicly-available, written retention schedule for biometric identifiers and information.
What ships:
docs/policies/consent/biometric_retention_schedule_v1.md— public file- Linked from public privacy policy at the deployment URL
- Specifies:
- Categories of biometric data collected (facial photograph for staff identification at job sites; classifications deferred per Gate 3b — see
docs/specs/GATE_3B_DEEPFACE_DESIGN.md) - Purpose of collection (identity matching for staffing operations)
- Maximum retention: BIPA §15(a) caps at "3 years from the individual's last interaction with the private entity, whichever occurs first" — recommend 18-24 months as the operational ceiling (provides safety margin)
- Destruction procedure: per Gate 5 below
- Categories of biometric data collected (facial photograph for staff identification at job sites; classifications deferred per Gate 3b — see
- Versioned (this is v1; future updates supersede with a new version)
⚖ COUNSEL — write the actual schedule. Engineering provides the operational facts; counsel writes the binding language.
Engineering acceptance: the file is committed, the public URL renders it, and identityd's consent_versions table references it by hash.
Gate 2 — Informed written consent (BIPA §15(b))
Required: Informed, written consent BEFORE any biometric collection occurs.
What ships:
docs/policies/consent/biometric_consent_template_v1.md— public consent template- Versioned, hashed, referenced from identityd's
consent_versionstable - Must disclose, per BIPA §15(b)(1)-(3):
- That biometric identifiers/information will be collected
- The specific purpose for collection (and the length of term — references Gate 1)
- Receipt of a written release authorizing collection
- Consent flow at intake:
- Candidate sees the disclosure on a UI surface (web form / paper / digital signature)
- Candidate provides explicit affirmative action (signature, click-acceptance with timestamp, etc.)
- Identityd records
biometric_consent_status='given'withconsent_versionreference +consent_given_attimestamp - Without identityd recording 'given', no biometric data flows through deepface.
⚖ COUNSEL — write the consent template. Recommended content (engineering view):
- Clear language (not just legal boilerplate)
- Specific to facial-classification (not generic biometrics)
- Includes withdrawal procedure
- Includes data-subject rights enumeration
Engineering acceptance: consent gate is enforced in code at the photo-upload endpoint; identityd refuses biometric writes when biometric_consent_status != 'given'; pre-existing synthetic-face pool is exempt (no consent needed because no real subject).
Gate 3 — Photo-upload endpoint with consent enforcement
Required: Code-level enforcement that real-photo intake checks consent before processing.
What ships:
An endpoint at POST /biometric/subject/{candidate_id}/photo (catalogd-local — the original v1 spec named this /v1/identity/subjects/{candidate_id}/photo under a separate identityd daemon; that daemon was collapsed into catalogd per the architecture pivot. See IDENTITY_SERVICE_DESIGN.md deprecation header.) with the following behavior:
- Caller authenticates with service-tier token
- Endpoint queries identityd for
subjects.biometric_consent_status - If status ≠
'given'→ HTTP 403 with reason"BIPA consent required before biometric processing" - If status =
'given': a. Photo bytes accepted, stored to a quarantined path underdata/biometric/uploads/{candidate_id}/{ts}.{ext}(NOTdata/headshots/) b. deepface tagging runs against the photo c. Classifications (gender, race, age) — DEFERRED to Gate 3b (docs/specs/GATE_3B_DEEPFACE_DESIGN.md).BiometricCollection.classificationsremainsNonein v1. d. Original photo bytes encrypted under DEK + retained per Gate 1 schedule e.pii_access_logrow written withpurpose_token='biometric_collection' - Response:
{candidate_id, retention_until, consent_version}
Schema (as shipped — catalogd SubjectManifest.biometric_collection):
The original spec proposed JSONB columns on a Postgres subjects table under identityd. The shipped implementation collapses this into a per-subject JSON manifest at data/_catalog/subjects/<id>.json, with the BiometricCollection struct holding data_path, template_hash, collected_at, and classifications: Option<JSON>. See crates/catalogd/src/subject_manifest.rs for the canonical type.
// crates/catalogd/src/subject_manifest.rs (paraphrased)
pub struct BiometricCollection {
pub data_path: String, // quarantined path
pub template_hash: String, // SHA-256 of original bytes (integrity, NOT re-derivation)
pub collected_at: DateTime<Utc>,
pub classifications: Option<Value>, // None until Gate 3b ships (deferred — see GATE_3B_DEEPFACE_DESIGN.md)
}
Engineering acceptance:
- Endpoint refuses uploads when consent missing (verified by integration test)
- deepface output never lands in the synthetic-face manifest (
data/headshots/manifest.jsonl) - Real-photo classifications are isolated to identityd
subjectstable — never flow to JSONL sinks - The
/headshots/:keyroute in mcp-server REMAINS synthetic-only — does NOT serve real candidate photos to LLMs without an explicit allowance (proposed: real photos served only to authenticated staffer UI, never to model context)
Gate 4 — Deprecate name → ethnicity inference
Required: The hard-coded NAMES_HISPANIC / SURNAMES_* lookup tables in mcp-server/search.html:3375-3432 (per Phase 1.5 §1B walk) get removed.
What ships:
- A code commit that removes:
FEMALE_NAMES,MALE_NAMESconstantsNAMES_HISPANIC,NAMES_BLACK,NAMES_SOUTH_ASIAN,NAMES_EAST_ASIAN,NAMES_MIDDLE_EASTERNconstantsSURNAMES_HISPANIC,SURNAMES_SOUTH_ASIAN,SURNAMES_EAST_ASIAN,SURNAMES_MIDDLE_EASTERN,SURNAMES_BLACKconstants- The
genderFor()andguessEthnicityFromFirstName()functions - All call sites that consumed these (face-pool bucket selection)
- Replacement strategy:
- For SYNTHETIC face pool routing: deterministic hash of candidate_id selects a face bucket, no demographic inference
- For REAL candidate photos: the candidate's actual photo IS the representation; no inference needed
Why this is BIPA + Title VII risk separately: name-based ethnicity classification is BOTH a discriminatory feature engineering practice (Title VII) AND, when combined with photo-based attribute extraction, a "biometric information derived from a biometric identifier" pattern (BIPA broad reading). Removing the lookup tables forecloses both arguments.
Engineering acceptance:
- Lookup tables removed from search.html
- Unit test asserts no protected-attribute inference functions exist in search.html or any mcp-server module
- Face-pool routing for synthetic faces uses candidate_id hash exclusively
- Phase 1.5 §1B finding closed
Gate 5 — Documented destruction procedure
Required: A written procedure for biometric data destruction at retention expiry OR consent withdrawal OR right-to-be-forgotten request.
What ships:
docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md— operator-facing- Specifies:
- Triggers: retention expiry (per Gate 1), withdrawal, RTBF request, candidate request
- Procedure: catalogd-local
POST /biometric/subject/{id}/erase(legal-tier auth) — formerly proposed under identityd; now serves from catalogd directly - Erasure scope:
BiometricCollectionset toNoneon the subject manifest (dropsdata_path,template_hash,classificationstogether), quarantined photo files atdata/biometric/uploads/<id>/*securely unlinked, audit row appended BEFORE photo unlink so the chain proves intent even if file delete fails - Backup window: per
IDENTITY_SERVICE_DESIGNv3-B12, residual exists in DB backups for 30 days max; subject is informed - Witnessed: every erasure event written to
pii_access_logwithpurpose_token='biometric_erasure'and the legal-tier JWT signature (proves authorized destruction) - Reporting: monthly internal report of erasures + retention-expiry sweeps; available to counsel on request
⚖ COUNSEL — review the runbook for legal sufficiency. Engineering writes the procedure; counsel attests that the procedure satisfies BIPA §15(a) destruction requirements.
Engineering acceptance:
- Runbook committed
POST /biometric/subject/{id}/eraseendpoint includes biometric-specific erasure path (shipped848a458— 21 unit tests, two scopes: biometric_only / full)- Daily sweep job destroys biometric data past
biometric_retention_until(separate from general retention sweep — biometric has stricter clock) - Erasure events are logged with cryptographic attestation
2. Cryptographic attestation: no biometric data exists pre-identityd
Per IDENTITY_SERVICE_DESIGN v3-B11. Plaintiffs may argue that the EXISTENCE of biometric schema fields constitutes constructive notice of intent to collect biometric data — therefore consent should have preceded the schema. The defense: prove that no biometric data was actually collected from real candidates before identityd + the consent gate.
What ships:
- A one-shot script
scripts/staffing/attest_pre_identityd_biometric_state.shthat:- Queries
data/datasets/workers_500k.parquetschema and confirms NO column namedphoto,biometric_*,face_*,image_*exists - Greps
data/_kb/*.jsonlanddata/_pathway_memory/state.jsonfor any base64-encoded image bytes (deepface output, photo blobs) - Verifies
data/headshots/manifest.jsonlrows ≤ synthetic face pool size - Hashes the schema + summary; commits the hash to S3 Object Lock (per identity service v3 anchor pattern)
- Queries
- Attestation document
docs/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-XX.mdsigned by J + outside counsel
This is a one-time defense artifact. It establishes the baseline: "as of this date, no biometric data was collected from real candidates."
3. Employee training acknowledgment (general BIPA hygiene)
Required: People with access to biometric data acknowledge BIPA-handling training.
What ships:
docs/policies/BIPA_HANDLING_TRAINING_v1.md— training material covering:- What constitutes biometric identifiers / information
- The consent + retention procedures
- Destruction obligations
- Reporting suspected exposure
- Acknowledgment record per individual (initially: J + counsel + named operators)
- Annual refresh
⚖ COUNSEL — write training content. Engineering doesn't author legal-compliance training.
4. Phase 1.6 exit criteria (gates Phase 2 backfill)
All 5 gates must be DONE before identityd backfill begins. Status as of 2026-05-03 — scaffolds vs. counsel sign-off vs. shipped code:
| # | Gate | Engineering | Counsel | Status |
|---|---|---|---|---|
| 1 | Public retention schedule | scaffolded at docs/policies/consent/biometric_retention_schedule_v1.md |
pending | eng-staged |
| 2 | Consent template | scaffolded at docs/policies/consent/biometric_consent_template_v1.md |
pending | eng-staged |
| 3 | Photo-upload endpoint with consent enforcement | DONE — crates/catalogd/src/biometric_endpoint.rs mounted at /biometric/subject/{id}/photo, 11 unit tests, live-verified end-to-end. Gate 3b DECIDED 2026-05-05: Option C (defer classifications). BiometricCollection.classifications stays Option<JSON> = None in v1; consent + retention docs revised to match. See docs/specs/GATE_3B_DEEPFACE_DESIGN.md §6 + change log. |
reviewed under Gate 2 (matching consent text) | DONE — 3a shipped, 3b deferred per design doc |
| 4 | Name → ethnicity inference removed | DONE — mcp-server/search.html:3372 removal note + mcp-server/phase_1_6_gate_4.test.ts absence test (3/3 green) |
none required | DONE |
| 5 | Destruction runbook | scaffolded at docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md; erasure endpoint + verify/report scripts marked TODO |
pending | eng-staged |
PLUS:
| # | Item | Engineering | Counsel | Status |
|---|---|---|---|---|
| 6 | Cryptographic attestation pre-identityd | DONE — scripts/staffing/attest_pre_identityd_biometric_state.sh + docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md (3/3 evidence checks pass; signature lines pending) |
pending signature | eng-DONE, signature-pending |
| 7 | Employee training material | scaffold deferred — Gate 5 runbook §7 acknowledgment may serve as substrate | pending | deferred |
Blocking set for Phase 2 backfill: items 1, 2, 3, 4, 5, 6 must all be DONE. Item 7 (employee training) is reduced from blocking to "deferred" because the Gate 5 destruction runbook §7 already requires operator acknowledgment before legal-tier credentials are issued — that acknowledgment is procedurally equivalent to the training-record requirement when the operator population is small (J + 1-2 named operators). If the operator population grows beyond that, item 7 re-promotes to blocking and a separate training program must be authored.
⚖ COUNSEL — confirm whether item 7 deferral is acceptable for the expected operator population size, or restore it to the blocking set.
Calendar bottleneck: Items 1, 2, 5, 6 (and #7) await counsel
review of the engineering scaffolds. Gate 3 substrate is fully
shipped; Gate 3b deepface classification was DECIDED on 2026-05-05
as Option C (defer) — BiometricCollection.classifications stays
None in v1, consent + retention docs revised to match this
narrower scope. If a future product requirement surfaces a real
need for classifications, the substrate is forward-compatible
(Option<JSON>) and either Option A (~1 day) or Option B (~5 days)
of the design doc can be picked up then under a v2 consent template.
5. Effort estimate
| Gate | Engineering effort | Legal effort |
|---|---|---|
| Gate 1 (retention schedule) | 0.5 day | counsel-dependent (typically 1-2 weeks for review) |
| Gate 2 (consent template) | 0.5 day | counsel-dependent (typically 2-4 weeks for review and consent UX design) |
| Gate 3 (photo-upload endpoint) | 1-2 days | review of endpoint behavior |
| Gate 4 (deprecate name-ethnicity inference) | 0.5 day | none (engineering-only fix) |
| Gate 5 (destruction runbook) | 1 day | counsel sign-off |
| §2 cryptographic attestation | 0.5 day | counsel + J signature |
| §3 employee training | 0.25 day (admin) | counsel-authored content |
| Total engineering | ~4-5 days | — |
| Total counsel | — | ~3-6 weeks calendar (review cycles) |
The calendar bottleneck is counsel, not engineering. Engineering can stage all 5 gates ready-to-ship in a week. Counsel sign-off + consent UX rollout is the longer pole.
6. Open questions for J + counsel
- Photo-upload UX: is there an existing intake form / staffer console where photo upload would happen? Or is this new UI work?
- Consent collection mechanism: electronic signature service (DocuSign, Adobe Sign), in-app click-acceptance, paper form? Each has different evidentiary weight in litigation.
- Operator list with biometric access: who, today, would be on the named-operators list for §3 training?
- Counsel for sign-off: named outside counsel — same or different from the dual-control legal-token party in identity service?
- Public privacy policy URL: does one exist? If yes, where; if no, that's a separate Gate-1.5 deliverable.
6.5. Post-signature deployment runbook
When counsel returns the countersigned consent template + retention schedule, the engineering side of "flip from permissive to strict mode" is one command:
# 1. Counsel commits their signature to §7 of the consent template
# markdown (or J commits the signed PDF + updates §7 with counsel's
# name + date). The markdown is the BINDING TEXT — the PDF is just
# a rendering of it.
# 2. Hash the canonical signed text + seed the gateway allowlist.
./scripts/staffing/seed_consent_version.sh \
docs/policies/consent/biometric_consent_template_v1.md \
--label "v1 signed YYYY-MM-DD by [counsel name]"
# The script:
# - computes SHA-256 of the markdown (binding text)
# - atomically writes /etc/lakehouse/consent_versions.json with
# the new hash + per-seed audit metadata (timestamp, label,
# source path)
# - restarts lakehouse.service so the gateway re-reads the
# allowlist
# - probes /biometric/health for clean restart
# 3. Verify strict mode is rejecting unknown hashes:
TOKEN=$(cat /etc/lakehouse/legal_audit.token)
curl -sS -X POST http://localhost:3100/biometric/subject/WORKER-1/consent \
-H "X-Lakehouse-Legal-Token: $TOKEN" \
-H "Content-Type: application/json" \
-d '{"consent_version_hash":"deadbeefdeadbeef000000000000000000000000000000000000000000000000","consent_collection_method":"electronic_signature","operator_of_record":"strict_mode_probe"}'
# Expect: HTTP 400 + {"error":"consent_version_unknown", ...}
After this, the gateway is in counsel-tier strict mode:
- Any consent grant POST whose
consent_version_hashdoesn't match a known signed template is refused at intake - Operator typos (mistyped hash) become loud failures, not silent bad records
- Future template revisions (v2, v3, ...) require counsel re-sign
AND a new
seed_consent_version.shrun before being accepted — the v1 hash stays in the allowlist for already-collected subjects' audit-trail compatibility
Pre-counsel demo seed
For deployments that want to exercise strict mode BEFORE counsel signs, the same script works against the eng-staged template:
./scripts/staffing/seed_consent_version.sh \
docs/policies/consent/biometric_consent_template_v1.md \
--label "eng-staged demo seed (NOT counsel-signed)"
The hash entry should be replaced (rotate the demo hash out, add
the counsel-signed hash) when counsel completes review. The
allowlist's _meta.seeded_at[] array preserves the seed history.
7. What this PRD is NOT
- Not legal advice. The
⚖ COUNSELmarkers exist because the binding text needs lawyers, not engineers. - Not a substitute for a DPIA / PIA. Phase 1.6 satisfies BIPA-specific gates; a Data Protection Impact Assessment is broader and may be required separately.
- Not a SOC2 Type II deliverable. SOC2 is a parallel work stream.
- Not the only gate before production. The full 9-phase audit-trail program continues; Phase 1.6 specifically unblocks Phase 2 (identity service implementation).
Change log
- 2026-05-05 — Reconciled with shipped state: endpoint paths corrected from the legacy identityd v1 spec (
/v1/identity/subjects/*) to the catalogd-local routes that actually shipped (/biometric/subject/*). Schema block rewritten to reflect the JSONSubjectManifest.biometric_collectionsubstrate that replaced the proposed Postgres columns. Gate 3b deepface deferral marked in-line where Disclosure 1 / Gate 3 step 5c / Gate 5 erasure scope previously assumed classifications were collected. No legal text changed; this was doc/code drift cleanup. - 2026-05-03 — Initial draft. Authored after
IDENTITY_SERVICE_DESIGNv3 §5 Step 0 named Phase 1.6 as a hard prerequisite to backfill.