root b2c34b80b3 phase 1.6: lock Gate 3b = C, reconcile docs to shipped state, fix double-upload file leak

Four threads landing together — all driven by the audit J asked for before
production cutover.

(1) Gate 3b DECIDED: Option C (defer classifications). `BiometricCollection.classifications`
stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status
flipped from "draft / awaits product" to DECIDED. Consent template + retention
schedule revised to remove all "automated facial-classification" / "deepface"
language so disclosed scope matches implemented scope.

(2) Endpoint-path drift reconciled across 3 docs. `PHASE_1_6_BIPA_GATES.md`,
`BIPA_DESTRUCTION_RUNBOOK.md`, and `biometric_retention_schedule_v1.md` had
references to legacy `/v1/identity/subjects/*` paths (proposed under a separate
identityd daemon, never shipped) — corrected to actual shipped routes
`/biometric/subject/*` (catalogd-local). Schema block in PHASE_1_6_BIPA_GATES
rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate
(not the proposed Postgres `subjects` table).

(3) New operational artifacts:
- `scripts/staffing/verify_biometric_erasure.sh` — checks 4 things post-erasure
(manifest cleared, uploads dir empty, audit row matches, chain verified).
Smoke-tested live against WORKER-2.
- `scripts/staffing/biometric_destruction_report.sh` — monthly anonymized
destruction-event aggregation. Smoke-tested clean.
- `scripts/staffing/bundle_counsel_packet.sh` — tarballs the counsel-review
packet with per-file SHA-256 manifest.
- `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` — formal rotation procedure
operationalized after the 2026-05-05 /tmp wipe incident.
- `docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md` — cover note bundling
all eng-staged BIPA docs for counsel review with per-doc questions, sign-off
checklist, recommended review sequence.

(4) Double-upload file leak fixed in `crates/catalogd/src/biometric_endpoint.rs`.
`verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo
file. Investigation showed the file was 13-byte test-fixture bytes (zero PII,
no biometric content); audit timeline showed two consecutive uploads followed
by one erasure — the second upload had silently overwritten manifest.data_path,
orphaning the first file. Patched `process_upload` to refuse a second upload
with HTTP 409 + `error: "biometric_already_collected"` when
`biometric_collection.is_some()` on the manifest. Operator must explicitly
POST `/biometric/subject/{id}/erase` first.

Tests: new `second_upload_without_erase_returns_409` (asserts 409 + manifest
pointer unchanged + first file untouched on disk). Replaced
`repeated_uploads_grow_the_chain` with `upload_erase_upload_grows_the_chain_cleanly`
(covers the legitimate re-collection cycle: chain grows to 3 rows). Updated
`content_type_with_parameters_accepted` to use 2 distinct subjects (was
using 1 subject with 2 uploads to test ct parsing — would now 409).

22/22 biometric_endpoint tests + 59/59 catalogd lib tests green post-patch.

Production posture: gateway needs `cargo build --release -p gateway` +
`systemctl restart lakehouse.service` to pick up the new 409 in live traffic.

Counsel calendar is now the only remaining blocker for first real-photo intake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-05 06:19:40 -05:00

6.5 KiB

Raw Blame History

Biometric Data Retention Schedule — v1

Spec: docs/PHASE_1_6_BIPA_GATES.md §1 Gate 1 (BIPA §15(a)) Status: Engineering scaffold — ⚖ COUNSEL must author the binding text before public publication Version: v1 (initial; supersession requires a new version + new hash) Updated 2026-05-05: §1 + §2 revised to match the Gate 3b deferral recommendation in docs/specs/GATE_3B_DEEPFACE_DESIGN.md (Option C — defer classifications). §5 destruction-trigger endpoint corrected to the shipped catalogd-local route. Pending J's product confirmation of Gate 3b.

This is a publicly-available retention schedule for biometric identifiers and biometric information collected by the Lakehouse staffing platform. It is required by 740 ILCS 14/15(a) (the Illinois Biometric Information Privacy Act) before any biometric collection from real candidates begins.

1. What this schedule governs

This schedule applies to:

Biometric identifiers as defined in 740 ILCS 14/10: candidate photographs from which facial geometry is implicit.
Biometric information as defined in 740 ILCS 14/10: any information derived from a biometric identifier. In v1 of this schedule, no derived information is collected — automated facial-classification (gender, race, age inference) is deferred per docs/specs/GATE_3B_DEEPFACE_DESIGN.md Option C. If a future version of this schedule introduces classification, that is a superseding v2 schedule with re-consent under the matching v2 consent template.

Out of scope (explicitly NOT biometric data under this schedule):

Synthetic faces from the pre-existing face pool (data/headshots/). These are computer-generated portraits, not derived from any real individual, and are not "biometric identifiers" under 740 ILCS 14/10.
Candidate names, email addresses, phone numbers, work history, certifications, or any other non-biometric personal information. These are governed by the general PII retention policy referenced in the SubjectManifest substrate (see docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md).

2. Categories collected

Category	Source	Storage location
Photograph (raw bytes)	Candidate upload via the consent-gated photo endpoint	Quarantined under `data/biometric/uploads/<candidate_id>/<ts>_<uuid>.<ext>`; mode 0700 dir / 0600 file
Photograph integrity hash	SHA-256 of the original bytes	`SubjectManifest.biometric_collection.template_hash` (catalogd JSON manifest at `data/_catalog/subjects/<id>.json`)

We do NOT collect raw biometric template vectors that could be used to re-derive a face from the encoded form. We do NOT run automated facial-classification (gender, race, age inference) in v1 — see docs/specs/GATE_3B_DEEPFACE_DESIGN.md for the deferral rationale. The BiometricCollection.classifications field on the subject manifest exists in the schema but is None for every subject.

3. Purpose of collection

Photographs and the classifications derived from them are used for:

Identity matching during staffing operations. When a worker arrives at a job site, the assigned coordinator may verify identity by comparing the on-file photograph against the person present.
Internal record-keeping. Photographs become part of the worker record so coordinators can recognize repeat workers across multiple placements.

Photographs and biometric classifications are NOT used for:

Demographic targeting in role recommendations (Title VII / IL Human Rights Act compliance).
Training of any machine-learning model.
Sharing with third parties, except as required by court order or with the candidate's separate written consent.
Any purpose beyond those enumerated in §3.1-3.2 above.

4. Retention period

Per 740 ILCS 14/15(a), biometric identifiers and biometric information must be permanently destroyed when the initial purpose for collection has been satisfied OR within three (3) years of the individual's last interaction with the private entity, whichever occurs first.

Operational ceiling: Lakehouse retains biometric data for a maximum of eighteen (18) months from the candidate's last placement or last system interaction, whichever is later. This is more restrictive than the BIPA statutory ceiling and provides a safety margin against accidental over-retention.

The 18-month clock is enforced by the daily retention sweep (crates/catalogd/src/bin/retention_sweep.rs), which checks SubjectManifest.consent.biometric.retention_until on every subject and routes overdue subjects to the destruction queue (see Gate 5 runbook).

⚖ COUNSEL — confirm the 18-month operational ceiling is appropriate for the deployment posture, or specify a different number.

5. Destruction procedure

Per 740 ILCS 14/15(a), Lakehouse follows the BIPA Destruction Runbook (docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md) when:

Retention period under §4 expires
Candidate withdraws biometric consent under the consent template (Gate 2)
Candidate exercises a right-to-be-forgotten request
A catalogd-local POST /biometric/subject/{id}/erase is invoked under legal-tier authentication (shipped 848a458)

Every destruction event is recorded as an append-only audit row in the affected subject's per-subject HMAC-chained audit log (see crates/catalogd/src/subject_audit.rs), providing tamper-evident proof of compliant destruction.

6. Versioning

This schedule is version v1. Future revisions:

Require a new version number (v2, v3, ...).
Are committed to the repository with a git history showing the revision date.
Are referenced by SHA-256 hash from consent_versions table rows in identityd, so each subject's consent record points unambiguously at the schedule version that was in force when consent was given.

v1 SHA-256: generated at deployment time by scripts/staffing/hash_consent_v1.sh (to be added when this schedule is finalized by counsel)

7. Public availability

⚖ COUNSEL — specify the public URL where this schedule will be published (typically the privacy policy page on the deployment site) and the disclosure language that links candidates to it from the intake UI.

8. Authority

This schedule is adopted under the authority of J (operator of record) and reviewed by ⚖ COUNSEL. Effective date: TBD pending counsel sign-off.

Role	Name	Signature	Date
Operator	J	_______________	_____
Outside counsel	_____________	_______________	_____

6.5 KiB Raw Blame History