phase 1.6: lock Gate 3b = C, reconcile docs to shipped state, fix double-upload file leak

Four threads landing together — all driven by the audit J asked for before
production cutover.

(1) Gate 3b DECIDED: Option C (defer classifications). `BiometricCollection.classifications`
    stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status
    flipped from "draft / awaits product" to DECIDED. Consent template + retention
    schedule revised to remove all "automated facial-classification" / "deepface"
    language so disclosed scope matches implemented scope.

(2) Endpoint-path drift reconciled across 3 docs. `PHASE_1_6_BIPA_GATES.md`,
    `BIPA_DESTRUCTION_RUNBOOK.md`, and `biometric_retention_schedule_v1.md` had
    references to legacy `/v1/identity/subjects/*` paths (proposed under a separate
    identityd daemon, never shipped) — corrected to actual shipped routes
    `/biometric/subject/*` (catalogd-local). Schema block in PHASE_1_6_BIPA_GATES
    rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate
    (not the proposed Postgres `subjects` table).

(3) New operational artifacts:
    - `scripts/staffing/verify_biometric_erasure.sh` — checks 4 things post-erasure
      (manifest cleared, uploads dir empty, audit row matches, chain verified).
      Smoke-tested live against WORKER-2.
    - `scripts/staffing/biometric_destruction_report.sh` — monthly anonymized
      destruction-event aggregation. Smoke-tested clean.
    - `scripts/staffing/bundle_counsel_packet.sh` — tarballs the counsel-review
      packet with per-file SHA-256 manifest.
    - `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` — formal rotation procedure
      operationalized after the 2026-05-05 /tmp wipe incident.
    - `docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md` — cover note bundling
      all eng-staged BIPA docs for counsel review with per-doc questions, sign-off
      checklist, recommended review sequence.

(4) Double-upload file leak fixed in `crates/catalogd/src/biometric_endpoint.rs`.
    `verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo
    file. Investigation showed the file was 13-byte test-fixture bytes (zero PII,
    no biometric content); audit timeline showed two consecutive uploads followed
    by one erasure — the second upload had silently overwritten manifest.data_path,
    orphaning the first file. Patched `process_upload` to refuse a second upload
    with HTTP 409 + `error: "biometric_already_collected"` when
    `biometric_collection.is_some()` on the manifest. Operator must explicitly
    POST `/biometric/subject/{id}/erase` first.

    Tests: new `second_upload_without_erase_returns_409` (asserts 409 + manifest
    pointer unchanged + first file untouched on disk). Replaced
    `repeated_uploads_grow_the_chain` with `upload_erase_upload_grows_the_chain_cleanly`
    (covers the legitimate re-collection cycle: chain grows to 3 rows). Updated
    `content_type_with_parameters_accepted` to use 2 distinct subjects (was
    using 1 subject with 2 uploads to test ct parsing — would now 409).

    22/22 biometric_endpoint tests + 59/59 catalogd lib tests green post-patch.

Production posture: gateway needs `cargo build --release -p gateway` +
`systemctl restart lakehouse.service` to pick up the new 409 in live traffic.

Counsel calendar is now the only remaining blocker for first real-photo intake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-05-05 06:19:40 -05:00
parent 03e8a91d97
commit b2c34b80b3
13 changed files with 1451 additions and 65 deletions

10
.gitignore vendored
View File

@ -61,6 +61,16 @@ data/biometric/
# files stay local for forensics. # files stay local for forensics.
reports/scrum/ reports/scrum/
# Per-event biometric verification reports (timestamp-named, regenerated
# per `verify_biometric_erasure.sh` invocation). Source-of-truth is the
# audit chain itself; these reports are derived views.
reports/biometric/
# Counsel transmission tarballs + manifests are regenerated by
# `bundle_counsel_packet.sh` from the tracked `docs/counsel/` source.
# The bundle is transmittable, not source-of-truth.
reports/counsel/
# Local experiments scratchpad — per the "Test code in main is ACTIVELY # Local experiments scratchpad — per the "Test code in main is ACTIVELY
# being cleaned out" policy (commits 6aafd41 + f4ebd22), one-off # being cleaned out" policy (commits 6aafd41 + f4ebd22), one-off
# experiments stay out of the tracked tree. # experiments stay out of the tracked tree.

View File

@ -1,12 +1,58 @@
# STATE OF PLAY — Lakehouse # STATE OF PLAY — Lakehouse
**Last verified:** 2026-05-03 evening CDT **Last verified:** 2026-05-05 morning CDT
**Verified by:** live probe (gateway restarted 2x, all 11 catalogd subject tests + 11 biometric tests + 6 audit tests + 4 mcp-server Gate-4 tests green; cross-runtime parity 6/6 byte-identical against live audit logs; live curl roundtrip on /biometric returned 200 + chained audit row), not memory. **Verified by:** live probe (`/audit/health` 200, `/biometric/subject/{id}/erase` 21-test substrate + `/audit/subject/{id}` legal-tier endpoint live verified against WORKER-100; new `verify_biometric_erasure.sh` + `biometric_destruction_report.sh` + `bundle_counsel_packet.sh` smoke-tested clean against live data) — not memory.
> **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes. > **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
--- ---
## WHAT LANDED 2026-05-05 (doc reconciliation wave — Gate 3b decision + counsel packet ready)
This was a **doc-only wave**, not code. Background: J asked for an audit of the BIPA/biometric documentation before production cutover. Audit found moderate fragmentation between docs and shipped code (post-`identityd` collapse, post-Gate-3a-ship, pre-Gate-3b-decision). Closed it in one pass.
| Item | What changed | Status |
|---|---|---|
| **Gate 3b — DECIDED: Option C (defer classifications)** | `BiometricCollection.classifications` stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status flipped from "draft / awaits product" to "DECIDED 2026-05-05". | Locked |
| **Endpoint-path drift** | `PHASE_1_6_BIPA_GATES.md` (3 spots), `BIPA_DESTRUCTION_RUNBOOK.md` (2 spots), `biometric_retention_schedule_v1.md` (1 spot) updated from legacy `/v1/identity/subjects/*` (proposed under separate identityd daemon, never shipped) to actual `/biometric/subject/*` (catalogd-local, shipped `848a458`). Schema block in `PHASE_1_6_BIPA_GATES.md` rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate (not the proposed Postgres `subjects` table). | Reconciled |
| **Consent template + retention schedule** | Both revised for Option C: removed all "automated facial-classification" / "deepface" language so disclosed scope matches implemented scope. Pending counsel review — they were already eng-staged with ⚖ markers. | Eng-staged for counsel |
| **`scripts/staffing/verify_biometric_erasure.sh`** (NEW) | Operator-side verification of an erasure event. Curls `/audit/subject/{id}` with legal-tier token, checks: manifest.biometric_collection null, uploads dir empty, last audit row is `biometric_erasure`/`full_erasure` with `erased`/`success`, chain_verified=true. Writes a hashed report to `reports/biometric/`. | Smoke-tested live |
| **`scripts/staffing/biometric_destruction_report.sh`** (NEW) | Monthly destruction-event aggregation. Anonymizes candidate IDs (sha256-12 prefix), counts by scope + trigger, flags anomalies. Smoke-test on May 2026 data found 1 historical `biometric_erasure`/`consent_withdrawal` event (test fixture). | Smoke-tested live |
| **`docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md`** (NEW) | Captures the rotation procedure operationalized after the 2026-05-05 `/tmp` wipe incident. Covers: when to rotate, pre-rotation snapshot, atomic-swap procedure, post-rotation verification (incl. expected pre-rotation chain tamper-detect under new key), recovery from lost key, ⚖ counsel notes. | Authored |
| **`docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md`** + `bundle_counsel_packet.sh` (NEW) | Cover note bundling all eng-staged BIPA docs for counsel review with per-doc questions, sign-off checklist, recommended review sequence. Bundler script tarballs the 8 referenced files + emits a SHA-256 manifest. Tarball ready for transmission: `reports/counsel/counsel_packet_2026-05-05.tar.gz`. | Bundled, ready to send |
### Eng follow-up that this wave surfaced
- **Double-upload file leak — DIAGNOSED + FIXED** (2026-05-05 same wave). `verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo file. Investigation showed:
- The file was 13 bytes of test fixture (`ff d8 ff d9 + ASCII "TESTBYTES"`), byte-identical to the unit-test fixture at `biometric_endpoint.rs:841`. NO PII, NO biometric content, NO synthetic-face content. Came from manual integration testing on 2026-05-03.
- Audit log timeline showed two consecutive uploads (09:54, 10:04) followed by one erasure (10:22). The erasure unlinked only the SECOND file (which the manifest pointed at by then); the first file was orphaned because the second upload had silently overwritten `manifest.data_path`.
- **Real bug found**: the upload handler did NOT refuse a second upload to a subject with `biometric_collection.is_some()`. Patched `process_upload` to return HTTP 409 + `error: "biometric_already_collected"` when a re-upload is attempted; operator must explicitly POST `/biometric/subject/{id}/erase` first.
- Stranded test file removed (`rm` of the 13-byte fixture).
- New unit test `second_upload_without_erase_returns_409` asserts the 409 + that the first photo's data_path remains unchanged + that the first file remains untouched on disk.
- Existing `repeated_uploads_grow_the_chain` replaced with `upload_erase_upload_grows_the_chain_cleanly` (covers the legitimate re-collection cycle: upload → erase → upload, chain grows to 3 rows).
- Existing `content_type_with_parameters_accepted` test updated to use two distinct subjects (it had used one subject for two uploads to test content-type parsing — now would 409).
- **22 biometric_endpoint tests + 59 catalogd lib tests all green** post-patch (was 21+58 pre-patch).
- Production posture: gateway binary needs rebuild (`cargo build --release`) + `systemctl restart lakehouse.service` to pick up the new 409 behavior in live traffic.
- **Pre-rotation chain tamper-detect (expected, not a bug).** WORKER-{1..5} had pre-2026-05-05 audit chains under the prior `LH_SUBJECT_AUDIT_KEY`. Under the new key (post-`/tmp` wipe rotation), those chains correctly tamper-detect. The rotation runbook §4.4 documents this as expected; a §2.2 pre-rotation snapshot is what would prove they were intact pre-rotation if defensibility ever needs it.
### What's blocking production cutover NOW (after this wave)
- **Counsel calendar:** the four sign-off items in `COUNSEL_REVIEW_PACKET_2026-05-05.md` (retention schedule, consent template, destruction runbook, pre-identityd attestation). The packet tarball is ready; ⚖ counsel is the bottleneck.
- **Nothing else.** Engineering is no longer the long pole.
### Phase 1.6 BIPA gates — status table (this is the final post-Option-C state)
| # | Gate | Status |
|---|---|---|
| 1 | Public retention schedule | **eng-staged**, revised for Option C, ready for counsel |
| 2 | Informed consent template | **eng-staged**, revised for Option C, ready for counsel |
| 3a | Photo upload endpoint | **DONE** (shipped `f1fa6e4`, 11 unit tests, live verified) |
| 3b | Deepface classification | **DECIDED 2026-05-05: Option C (defer)** |
| 4 | Name → ethnicity inference removal | **DONE** (shipped, 4/4 mcp-server tests pass) |
| 5 | Destruction runbook + erasure endpoint | **eng-DONE** (`848a458`, 21 tests). Runbook scripts (verify + report) shipped 2026-05-05. Counsel review pending. |
| §2 | Pre-identityd attestation | **eng-DONE** (3/3 evidence checks). Awaits J + counsel signature. |
| §3 | Employee training | **deferred** (consolidated into runbook §7 acknowledgment for current operator population) |
---
## WHAT LANDED 2026-05-03 (16 commits this wave — local-first audit substrate + Phase 1.6 BIPA gates) ## WHAT LANDED 2026-05-03 (16 commits this wave — local-first audit substrate + Phase 1.6 BIPA gates)
The dominant work today: **`docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md` Steps 1-8 SHIPPED end-to-end** + **5 of 7 Phase 1.6 BIPA pre-launch gates** + **6th cross-runtime parity probe**. Wave was structured as eight ship-then-scrum cycles — every wave caught real bugs, every fix wave landed within the same session. The dominant work today: **`docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md` Steps 1-8 SHIPPED end-to-end** + **5 of 7 Phase 1.6 BIPA pre-launch gates** + **6th cross-runtime parity probe**. Wave was structured as eight ship-then-scrum cycles — every wave caught real bugs, every fix wave landed within the same session.

View File

@ -307,6 +307,21 @@ pub async fn process_upload(
consent_status: None, consent_status: None,
})); }));
} }
// Refuse double-upload. If a BiometricCollection already exists on
// the manifest, the operator must explicitly erase before re-uploading.
// Without this gate, a second POST silently overwrites manifest.data_path
// and orphans the previous photo file on disk — creating a forever-leak
// pattern and a BIPA defensibility hole ("we said we erased the photo,
// but the previous version of it is still under the same subject dir").
// Caught 2026-05-05 by verify_biometric_erasure.sh against WORKER-2.
if manifest.biometric_collection.is_some() {
return Err((StatusCode::CONFLICT, ErrorResponse {
error: "biometric_already_collected",
detail: "subject already has a BiometricCollection on the manifest; \
POST /biometric/subject/{id}/erase first if you intend to replace the photo".into(),
consent_status: None,
}));
}
let template_hash = { let template_hash = {
let mut h = Sha256::new(); let mut h = Sha256::new();
@ -947,15 +962,92 @@ mod tests {
} }
#[tokio::test] #[tokio::test]
async fn repeated_uploads_grow_the_chain() { async fn second_upload_without_erase_returns_409() {
let state = fixture_state("repeated").await; // BIPA defensibility: a second upload to a subject that already
let _ = state.registry.put_subject(fixture_manifest("WORKER-5", BiometricConsentStatus::Given, SubjectStatus::Active)).await; // has a BiometricCollection must fail-closed. Without this gate,
// the second upload silently overwrites manifest.data_path and
// orphans the first photo on disk forever (caught 2026-05-05 on
// WORKER-2 by verify_biometric_erasure.sh).
let state = fixture_state("second_upload_409").await;
let _ = state.registry.put_subject(fixture_manifest("WORKER-DUP", BiometricConsentStatus::Given, SubjectStatus::Active)).await;
let storage_root = state.storage_root.clone();
let registry = state.registry.clone();
// First upload succeeds.
let resp1 = process_upload(&state, "WORKER-DUP", Some(TEST_TOKEN), Some("image/jpeg"), "v1", "", &jpeg_bytes())
.await.unwrap();
let first_path = storage_root.join(&resp1.data_path);
assert!(first_path.exists(), "first upload should produce a file");
// Second upload refused with 409.
let err = process_upload(&state, "WORKER-DUP", Some(TEST_TOKEN), Some("image/jpeg"), "v1", "", &jpeg_bytes())
.await.unwrap_err();
assert_eq!(err.0, StatusCode::CONFLICT);
assert_eq!(err.1.error, "biometric_already_collected");
// Manifest still points at the first upload — pointer was NOT overwritten.
let m = registry.get_subject("WORKER-DUP").await.unwrap();
let bc = m.biometric_collection.as_ref().expect("collection should still be set");
assert_eq!(bc.data_path, resp1.data_path,
"manifest data_path must be unchanged after refused second upload");
// First file remains on disk untouched (refusal must not unlink it).
assert!(first_path.exists(), "first upload's file must remain after refused second upload");
let still_on_disk = tokio::fs::read(&first_path).await.unwrap();
assert_eq!(still_on_disk, jpeg_bytes(),
"first upload's bytes must not have been overwritten");
}
#[tokio::test]
async fn upload_erase_upload_grows_the_chain_cleanly() {
// Prior version of this test allowed repeated uploads to chain;
// that conflated chain growth with allowed re-upload. Under the
// double-upload guard (409 above), the only legitimate way to
// re-collect is upload → erase → upload. Chain grows to 3 rows
// (collection, erasure, collection); on-disk file count returns
// to one after the second upload.
let state = fixture_state("upload_erase_upload").await;
let _ = state.registry.put_subject(fixture_manifest("WORKER-CYCLE", BiometricConsentStatus::Given, SubjectStatus::Active)).await;
let writer = state.writer.clone(); let writer = state.writer.clone();
for _ in 0..2 { let storage_root = state.storage_root.clone();
let _ = process_upload(&state, "WORKER-5", Some(TEST_TOKEN), Some("image/jpeg"), "", "", &jpeg_bytes())
.await.unwrap(); // First upload.
} let resp1 = process_upload(&state, "WORKER-CYCLE", Some(TEST_TOKEN), Some("image/jpeg"), "", "", &jpeg_bytes())
assert_eq!(writer.verify_chain("WORKER-5").await.unwrap(), 2); .await.unwrap();
let first_path = storage_root.join(&resp1.data_path);
assert!(first_path.exists());
// Erase. Uses process_erase test helper (the production path
// parses the EraseRequest from request body; tests inject it
// directly). Note: the erase flow flips biometric.status to
// Withdrawn, so the post-erase second upload must reset consent
// first (production flow would require new consent collection).
let _ = process_erase(&state, "WORKER-CYCLE", Some(TEST_TOKEN), "trace-cycle", fixture_erase_request("biometric_only"))
.await.unwrap();
assert!(!first_path.exists(), "first photo file must be unlinked by erase");
// Reset consent + status on the post-erase manifest so the second
// upload can proceed (production flow would require new consent
// collection here; for this test we directly flip the manifest).
let mut post_erase = state.registry.get_subject("WORKER-CYCLE").await.unwrap();
post_erase.consent.biometric.status = BiometricConsentStatus::Given;
post_erase.status = SubjectStatus::Active;
post_erase.biometric_collection = None;
let _ = state.registry.put_subject(post_erase).await;
// Second upload (legitimate, after erase).
let resp2 = process_upload(&state, "WORKER-CYCLE", Some(TEST_TOKEN), Some("image/jpeg"), "", "", &jpeg_bytes())
.await.unwrap();
let second_path = storage_root.join(&resp2.data_path);
assert!(second_path.exists(), "second upload should produce a file");
assert_ne!(resp1.data_path, resp2.data_path, "second upload should land at a new path");
// Chain has 3 rows: collection, erasure, collection.
assert_eq!(writer.verify_chain("WORKER-CYCLE").await.unwrap(), 3);
let rows = writer.read_rows_in_range("WORKER-CYCLE", None, None).await.unwrap();
assert_eq!(rows[0].accessor.kind, "biometric_collection");
assert_eq!(rows[1].accessor.kind, "biometric_erasure");
assert_eq!(rows[2].accessor.kind, "biometric_collection");
} }
#[tokio::test] #[tokio::test]
@ -985,18 +1077,23 @@ mod tests {
// Caught 2026-05-03 opus scrum WARN; regression test ensures // Caught 2026-05-03 opus scrum WARN; regression test ensures
// the bare media type is matched after stripping parameters. // the bare media type is matched after stripping parameters.
let state = fixture_state("ct_with_params").await; let state = fixture_state("ct_with_params").await;
let _ = state.registry.put_subject(fixture_manifest("WORKER-CT", BiometricConsentStatus::Given, SubjectStatus::Active)).await; // Two distinct subjects so each upload exercises the "first upload"
// path. Prior version used one subject and two uploads — now blocked
// by the double-upload guard (409). The test's actual intent is
// content-type parsing, not re-upload tolerance.
let _ = state.registry.put_subject(fixture_manifest("WORKER-CT-A", BiometricConsentStatus::Given, SubjectStatus::Active)).await;
let _ = state.registry.put_subject(fixture_manifest("WORKER-CT-B", BiometricConsentStatus::Given, SubjectStatus::Active)).await;
let resp = process_upload( let resp = process_upload(
&state, "WORKER-CT", Some(TEST_TOKEN), &state, "WORKER-CT-A", Some(TEST_TOKEN),
Some("image/jpeg; charset=binary"), "", "", &jpeg_bytes(), Some("image/jpeg; charset=binary"), "", "", &jpeg_bytes(),
).await.unwrap(); ).await.unwrap();
assert_eq!(resp.candidate_id, "WORKER-CT"); assert_eq!(resp.candidate_id, "WORKER-CT-A");
// Also case-insensitive matching: "Image/JPEG" should work too. // Also case-insensitive matching: "Image/JPEG" should work too.
let resp2 = process_upload( let resp2 = process_upload(
&state, "WORKER-CT", Some(TEST_TOKEN), &state, "WORKER-CT-B", Some(TEST_TOKEN),
Some("Image/JPEG"), "", "", &jpeg_bytes(), Some("Image/JPEG"), "", "", &jpeg_bytes(),
).await.unwrap(); ).await.unwrap();
assert_eq!(resp2.candidate_id, "WORKER-CT"); assert_eq!(resp2.candidate_id, "WORKER-CT-B");
} }
// ─── Erasure tests (Gate 5) ────────────────────────────────────── // ─── Erasure tests (Gate 5) ──────────────────────────────────────

View File

@ -22,7 +22,7 @@ Each gate is a deliverable that must ship before real-photo intake. None is opti
- `docs/policies/consent/biometric_retention_schedule_v1.md` — public file - `docs/policies/consent/biometric_retention_schedule_v1.md` — public file
- Linked from public privacy policy at the deployment URL - Linked from public privacy policy at the deployment URL
- Specifies: - Specifies:
- Categories of biometric data collected (facial geometry derived from candidate photos, age estimate, gender classification, race classification — per Phase 1.5 deepface walk) - Categories of biometric data collected (facial photograph for staff identification at job sites; classifications deferred per Gate 3b — see `docs/specs/GATE_3B_DEEPFACE_DESIGN.md`)
- Purpose of collection (identity matching for staffing operations) - Purpose of collection (identity matching for staffing operations)
- Maximum retention: BIPA §15(a) caps at "3 years from the individual's last interaction with the private entity, whichever occurs first" — recommend 18-24 months as the operational ceiling (provides safety margin) - Maximum retention: BIPA §15(a) caps at "3 years from the individual's last interaction with the private entity, whichever occurs first" — recommend 18-24 months as the operational ceiling (provides safety margin)
- Destruction procedure: per Gate 5 below - Destruction procedure: per Gate 5 below
@ -67,7 +67,7 @@ Each gate is a deliverable that must ship before real-photo intake. None is opti
**What ships:** **What ships:**
A new endpoint (proposed: `POST /v1/identity/subjects/{candidate_id}/photo`) with the following behavior: An endpoint at `POST /biometric/subject/{candidate_id}/photo` (catalogd-local — the original v1 spec named this `/v1/identity/subjects/{candidate_id}/photo` under a separate identityd daemon; that daemon was collapsed into catalogd per the architecture pivot. See `IDENTITY_SERVICE_DESIGN.md` deprecation header.) with the following behavior:
1. Caller authenticates with service-tier token 1. Caller authenticates with service-tier token
2. Endpoint queries identityd for `subjects.biometric_consent_status` 2. Endpoint queries identityd for `subjects.biometric_consent_status`
@ -75,18 +75,23 @@ A new endpoint (proposed: `POST /v1/identity/subjects/{candidate_id}/photo`) wit
4. If status = `'given'`: 4. If status = `'given'`:
a. Photo bytes accepted, stored to a quarantined path under `data/biometric/uploads/{candidate_id}/{ts}.{ext}` (NOT `data/headshots/`) a. Photo bytes accepted, stored to a quarantined path under `data/biometric/uploads/{candidate_id}/{ts}.{ext}` (NOT `data/headshots/`)
b. deepface tagging runs against the photo b. deepface tagging runs against the photo
c. Classifications (gender, race, age) stored to `subjects` table fields (NEW columns — see schema additions below) c. Classifications (gender, race, age) **DEFERRED to Gate 3b** (`docs/specs/GATE_3B_DEEPFACE_DESIGN.md`). `BiometricCollection.classifications` remains `None` in v1.
d. Original photo bytes encrypted under DEK + retained per Gate 1 schedule d. Original photo bytes encrypted under DEK + retained per Gate 1 schedule
e. `pii_access_log` row written with `purpose_token='biometric_collection'` e. `pii_access_log` row written with `purpose_token='biometric_collection'`
5. Response: `{candidate_id, retention_until, consent_version}` 5. Response: `{candidate_id, retention_until, consent_version}`
**Schema additions to identityd `subjects`:** **Schema (as shipped — catalogd `SubjectManifest.biometric_collection`):**
```sql The original spec proposed JSONB columns on a Postgres `subjects` table under identityd. The shipped implementation collapses this into a per-subject JSON manifest at `data/_catalog/subjects/<id>.json`, with the `BiometricCollection` struct holding `data_path`, `template_hash`, `collected_at`, and `classifications: Option<JSON>`. See `crates/catalogd/src/subject_manifest.rs` for the canonical type.
ALTER TABLE subjects ADD COLUMN biometric_classifications JSONB; -- {gender, race, age} from deepface
ALTER TABLE subjects ADD COLUMN biometric_data_path TEXT; -- quarantined path ```rust
ALTER TABLE subjects ADD COLUMN biometric_collected_at TIMESTAMPTZ; // crates/catalogd/src/subject_manifest.rs (paraphrased)
ALTER TABLE subjects ADD COLUMN biometric_template_hash TEXT; -- hash of the photo bytes (for integrity, NOT for re-derivation) pub struct BiometricCollection {
pub data_path: String, // quarantined path
pub template_hash: String, // SHA-256 of original bytes (integrity, NOT re-derivation)
pub collected_at: DateTime<Utc>,
pub classifications: Option<Value>, // None until Gate 3b ships (deferred — see GATE_3B_DEEPFACE_DESIGN.md)
}
``` ```
**Engineering acceptance:** **Engineering acceptance:**
@ -130,8 +135,8 @@ ALTER TABLE subjects ADD COLUMN biometric_template_hash TEXT; -- hash of the
- `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` — operator-facing - `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` — operator-facing
- Specifies: - Specifies:
- Triggers: retention expiry (per Gate 1), withdrawal, RTBF request, candidate request - Triggers: retention expiry (per Gate 1), withdrawal, RTBF request, candidate request
- Procedure: identityd `POST /v1/identity/subjects/{id}/erase` (legal-tier auth) - Procedure: catalogd-local `POST /biometric/subject/{id}/erase` (legal-tier auth) — formerly proposed under identityd; now serves from catalogd directly
- Erasure scope: `subjects.biometric_*` columns ciphertext-deleted, `biometric_data_path` files securely overwritten + unlinked, deepface classifications nulled - Erasure scope: `BiometricCollection` set to `None` on the subject manifest (drops `data_path`, `template_hash`, `classifications` together), quarantined photo files at `data/biometric/uploads/<id>/*` securely unlinked, audit row appended BEFORE photo unlink so the chain proves intent even if file delete fails
- Backup window: per `IDENTITY_SERVICE_DESIGN` v3-B12, residual exists in DB backups for 30 days max; subject is informed - Backup window: per `IDENTITY_SERVICE_DESIGN` v3-B12, residual exists in DB backups for 30 days max; subject is informed
- Witnessed: every erasure event written to `pii_access_log` with `purpose_token='biometric_erasure'` and the legal-tier JWT signature (proves authorized destruction) - Witnessed: every erasure event written to `pii_access_log` with `purpose_token='biometric_erasure'` and the legal-tier JWT signature (proves authorized destruction)
- Reporting: monthly internal report of erasures + retention-expiry sweeps; available to counsel on request - Reporting: monthly internal report of erasures + retention-expiry sweeps; available to counsel on request
@ -140,7 +145,7 @@ ALTER TABLE subjects ADD COLUMN biometric_template_hash TEXT; -- hash of the
**Engineering acceptance:** **Engineering acceptance:**
- Runbook committed - Runbook committed
- `POST /v1/identity/subjects/{id}/erase` endpoint includes biometric-specific erasure path - `POST /biometric/subject/{id}/erase` endpoint includes biometric-specific erasure path (shipped `848a458` — 21 unit tests, two scopes: biometric_only / full)
- Daily sweep job destroys biometric data past `biometric_retention_until` (separate from general retention sweep — biometric has stricter clock) - Daily sweep job destroys biometric data past `biometric_retention_until` (separate from general retention sweep — biometric has stricter clock)
- Erasure events are logged with cryptographic attestation - Erasure events are logged with cryptographic attestation
@ -188,7 +193,7 @@ of 2026-05-03 — scaffolds vs. counsel sign-off vs. shipped code:
|---|---|---|---|---| |---|---|---|---|---|
| 1 | Public retention schedule | scaffolded at `docs/policies/consent/biometric_retention_schedule_v1.md` | pending | **eng-staged** | | 1 | Public retention schedule | scaffolded at `docs/policies/consent/biometric_retention_schedule_v1.md` | pending | **eng-staged** |
| 2 | Consent template | scaffolded at `docs/policies/consent/biometric_consent_template_v1.md` | pending | **eng-staged** | | 2 | Consent template | scaffolded at `docs/policies/consent/biometric_consent_template_v1.md` | pending | **eng-staged** |
| 3 | Photo-upload endpoint with consent enforcement | DONE for the consent-gate substrate (`crates/catalogd/src/biometric_endpoint.rs` mounted at `/biometric/subject/{id}/photo`, 10 unit tests, live-verified end-to-end). Deepface classification deferred to **Gate 3b** (own session — needs Python subprocess design after sidecar drop). | n/a until 3b | **3a DONE, 3b deferred** | | 3 | Photo-upload endpoint with consent enforcement | DONE `crates/catalogd/src/biometric_endpoint.rs` mounted at `/biometric/subject/{id}/photo`, 11 unit tests, live-verified end-to-end. **Gate 3b DECIDED 2026-05-05: Option C (defer classifications).** `BiometricCollection.classifications` stays `Option<JSON> = None` in v1; consent + retention docs revised to match. See `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` §6 + change log. | reviewed under Gate 2 (matching consent text) | **DONE — 3a shipped, 3b deferred per design doc** |
| 4 | Name → ethnicity inference removed | DONE — `mcp-server/search.html:3372` removal note + `mcp-server/phase_1_6_gate_4.test.ts` absence test (3/3 green) | none required | **DONE** | | 4 | Name → ethnicity inference removed | DONE — `mcp-server/search.html:3372` removal note + `mcp-server/phase_1_6_gate_4.test.ts` absence test (3/3 green) | none required | **DONE** |
| 5 | Destruction runbook | scaffolded at `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`; erasure endpoint + verify/report scripts marked TODO | pending | **eng-staged** | | 5 | Destruction runbook | scaffolded at `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`; erasure endpoint + verify/report scripts marked TODO | pending | **eng-staged** |
@ -212,10 +217,14 @@ re-promotes to blocking and a separate training program must be authored.
expected operator population size, or restore it to the blocking set. expected operator population size, or restore it to the blocking set.
**Calendar bottleneck:** Items 1, 2, 5, 6 (and #7) await counsel **Calendar bottleneck:** Items 1, 2, 5, 6 (and #7) await counsel
review of the engineering scaffolds. Gate 3 (photo-upload endpoint) review of the engineering scaffolds. Gate 3 substrate is fully
is the only remaining engineering work; it's deferred to its own shipped; Gate 3b deepface classification was DECIDED on 2026-05-05
session because it crosses into identityd photo intake and deepface as Option C (defer) — `BiometricCollection.classifications` stays
integration scope that hasn't been designed yet. `None` in v1, consent + retention docs revised to match this
narrower scope. If a future product requirement surfaces a real
need for classifications, the substrate is forward-compatible
(`Option<JSON>`) and either Option A (~1 day) or Option B (~5 days)
of the design doc can be picked up then under a v2 consent template.
--- ---
@ -258,4 +267,5 @@ integration scope that hasn't been designed yet.
## Change log ## Change log
- 2026-05-05 — Reconciled with shipped state: endpoint paths corrected from the legacy identityd v1 spec (`/v1/identity/subjects/*`) to the catalogd-local routes that actually shipped (`/biometric/subject/*`). Schema block rewritten to reflect the JSON `SubjectManifest.biometric_collection` substrate that replaced the proposed Postgres columns. Gate 3b deepface deferral marked in-line where Disclosure 1 / Gate 3 step 5c / Gate 5 erasure scope previously assumed classifications were collected. No legal text changed; this was doc/code drift cleanup.
- 2026-05-03 — Initial draft. Authored after `IDENTITY_SERVICE_DESIGN` v3 §5 Step 0 named Phase 1.6 as a hard prerequisite to backfill. - 2026-05-03 — Initial draft. Authored after `IDENTITY_SERVICE_DESIGN` v3 §5 Step 0 named Phase 1.6 as a hard prerequisite to backfill.

View File

@ -0,0 +1,260 @@
# Counsel Review Packet — Phase 1.6 BIPA Pre-Launch
**Date assembled:** 2026-05-05
**For:** outside counsel
**From:** J, operator of record
**Scope:** documents that engineering has staged for legal sufficiency review
before the staffing platform begins collecting any real candidate
biometric data (BIPA §15(a)(b)).
> **What this packet is.** The Phase 1.6 BIPA gates outline what
> engineering must ship before real-photo intake. As of 2026-05-05,
> all engineering substrate is shipped and verified live (see §1
> below for the inventory). What remains is binding-text authoring
> + counsel sign-off on five documents, plus operational notification
> obligations counsel may want to layer on top.
>
> **What this packet is NOT.** Not a request for counsel to write
> binding text from scratch. The documents are eng-staged in
> reasonable plain language; the request is for counsel to render
> them into legally-sufficient text and attest where signatures
> are required.
---
## 1. Engineering substrate — shipped + verified
For factual context on what counsel is reviewing AGAINST. None of
this requires sign-off here; it's the system the documents bind to.
| Component | Where it lives | Verification |
|---|---|---|
| Subject manifest registry | `crates/catalogd/src/registry.rs`, `data/_catalog/subjects/<id>.json` | 17 unit tests + 100 backfilled WORKER manifests in production |
| Per-subject HMAC audit chain (SHA-256) | `crates/catalogd/src/subject_audit.rs`, `data/_catalog/subjects/<id>.audit.jsonl` | Tamper-detection + concurrent-append race tests pass |
| Photo upload (consent-gated) | `POST /biometric/subject/{id}/photo` | 11 unit tests + live roundtrip 200 |
| Erasure (two-scope) | `POST /biometric/subject/{id}/erase` (`biometric_only` / `full`) | 21 unit tests; transactional rollback on audit failure |
| Legal-tier audit read | `GET /audit/subject/{id}` (X-Lakehouse-Legal-Token header) | Constant-time auth, chain re-verification per request |
| Retention sweep (BIPA-aware clock) | `crates/catalogd/src/bin/retention_sweep` | 8 unit tests; live verified against 100 backfilled subjects |
| Cross-runtime parity (Rust ↔ Go) | `scripts/cutover/parity/subject_audit_parity.sh` | 6/6 byte-identical assertions pass |
**Key insight for counsel:** the audit chain is the BIPA-defensible
substrate. Every state-changing event (consent given, photo uploaded,
photo erased, legal-tier read) appends to a per-subject HMAC-chained
JSONL log. The chain verifies end-to-end on every legal-tier read.
A tampered chain is detectable; a forged chain requires the HMAC
signing key, which is held under root-only mode 0400 at
`/etc/lakehouse/subject_audit.key` and rotated per the runbook in
attachment §6 below.
**Gate 3b (deepface classification) — decided 2026-05-05: Option C
(defer).** The system collects only the photograph, not derived
demographic information. The consent template + retention schedule
in this packet were revised the same day to match.
---
## 2. Documents requiring counsel review + sign-off
In recommended review order:
| # | Document | Path | Counsel ask | Sign-off |
|---|---|---|---|---|
| A | Biometric Retention Schedule v1 | `docs/policies/consent/biometric_retention_schedule_v1.md` | Render into binding language; confirm 18-month operational ceiling vs. BIPA 3-year statutory cap | Counsel + J |
| B | Biometric Consent Template v1 | `docs/policies/consent/biometric_consent_template_v1.md` | Render Disclosures 1-3 into binding consent language; specify electronic vs. paper signature mechanism | Counsel + J |
| C | BIPA Destruction Runbook | `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` | Confirm 30-day SLA from trigger; confirm two-operator (operator + witness) requirement; confirm legal-hold check procedure | Counsel attestation |
| D | BIPA Pre-IdentityD Attestation | `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` | Sign as countersigning party; J signs as operator-of-record | Counsel + J |
| E | Legal-Tier Audit Key Rotation | `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` | Confirm rotation cadence; opine on candidate-notification obligation when rotation is compromise-driven | Counsel notes |
| F | Gate 3b Deepface Design (FYI) | `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` | Decision-of-record showing classifications were *deliberately deferred*, not omitted by oversight. No sign-off needed; provided for audit-trail completeness. | None |
The five documents requiring sign-off are A, B, C, D, E. Document F
is included so the audit trail shows the Gate 3b decision was
deliberate.
---
## 3. Specific questions for counsel — by document
### Document A — Retention Schedule
1. The schedule sets an **18-month** operational ceiling against the
BIPA 3-year statutory cap. Is the safety margin appropriate, or
should we move to a tighter window (12 months) given the
plaintiff-friendly Illinois posture?
2. The schedule references the **catalogd-local** storage substrate
rather than a separate identityd Postgres table. Does the
public-facing language need to mention the storage architecture
at all, or is "we keep the photo and a SHA-256 hash" sufficient?
3. Public publication URL — counsel to specify (placeholder marked
in §7 of the schedule).
4. Confirm whether existing consent under v1 carries forward when
a future v2 is published, or whether re-consent is required.
### Document B — Consent Template
1. Disclosure 1 says "we do NOT run automated facial-classification
in v1." Does that disclosure need to mention the *possibility* of
future classification, or is silence-with-supersession-clause
adequate?
2. Plain-language summary in §1 — counsel to confirm it's appropriate
to include alongside the binding disclosure, or recommend an
alternative comprehension aid.
3. Withdrawal SLA is set to **30 days** in §2. Counsel to confirm
against jurisdiction (Illinois primary; secondary deployments
would inherit).
4. Contact for withdrawal — counsel to specify the channel
(placeholder in §3).
5. Sign-off mechanism: electronic signature service, in-app
click-acceptance with timestamp, paper form? Each has different
evidentiary weight.
### Document C — Destruction Runbook
1. Confirm 30-day SLA from each of four triggers (retention expiry,
consent withdrawal, RTBF, court order). Some interpretations
prefer 7 or 14 days for withdrawal/RTBF.
2. Two-operator requirement (operator-of-record + witness): is the
witness role acceptable for counsel's defensibility view, or
should we elevate to dual-control with cryptographic split-key?
3. Legal-hold check procedure (§2 step 3) — counsel to specify the
actual procedure for confirming no hold is in force before
erasing.
4. Backup-window disclosure (§4) — confirm 30-day backup retention
is acceptable.
5. Candidate notification template (§3 step 4) — counsel to supply.
### Document D — Pre-IdentityD Attestation
1. Both signature lines blank — J signs as operator-of-record;
counsel signs as the countersigning legal party.
2. The attestation hash anchors the evidence; once signed, the
hash itself becomes a tamper-evident witness. Counsel to confirm
storage location for the signed copy (firm files?).
### Document E — Key Rotation Runbook
1. Recommended rotation cadence — 90 days suggested in §1.
Counsel to confirm or override.
2. Custody schedule for `/etc/lakehouse/_archived/` raw key files —
§7.2 question; suggested 1-year retention but counsel-driven.
3. Candidate-notification obligation when rotation is
compromise-driven (§7.3) — counsel call.
---
## 4. Engineering changes counsel should know about (recent)
These reconciled doc/code drift after a rapid wave on 2026-05-03:
- **Endpoint paths:** the original v1 spec proposed
`/v1/identity/subjects/*` under a separate identityd daemon. That
daemon was collapsed into catalogd; endpoints actually shipped at
`/biometric/subject/*` (catalogd-local). Documents in this packet
reference the catalogd-local routes; legacy references in
`IDENTITY_SERVICE_DESIGN.md` are flagged "do NOT implement
as-written" in that doc's deprecation header.
- **No identityd Postgres database:** the original spec proposed
encrypted-at-rest Postgres + HashiCorp Vault + S3 Object Lock for
PII storage. The shipped substrate is local JSON manifests +
per-subject HMAC-chained JSONL, sized for J's local-only
deployment per `PRD.md` line 70 ("Everything runs locally — no
cloud APIs").
- **Gate 3b deferral (Option C, 2026-05-05):** classifications
(gender / race / age inference) were deliberately deferred. The
consent template and retention schedule in this packet do NOT
disclose collection of derived demographic data, because we are
not collecting it. If a future product requirement reverses this,
we will publish a v2 consent + v2 retention with re-consent.
- **Key rotation 2026-05-05:** the prior `LH_SUBJECT_AUDIT_KEY` was
lost when a `/tmp` wipe on reboot disabled the audit and biometric
endpoints. The new key is at `/etc/lakehouse/subject_audit.key`
(mode 0400). Pre-rotation audit chains tamper-detect under the
new key — this is correct, expected behavior, not a bug.
---
## 5. Open eng items NOT awaiting counsel
For transparency. These are engineering work items, not legal items:
1. **Residual photo unlink on erasure.** During verification of the
one historical erasure event (`WORKER-2`), the verify script
surfaced a stranded photo file that was not unlinked when
`BiometricCollection` was cleared from the manifest. Engineering
investigates; if the bug is real, the fix is `crates/catalogd/src/biometric_endpoint.rs`
in the erasure handler. This does NOT affect the current packet —
no real candidate photos have been collected yet (per §1
attestation), so the residual is from a synthetic test event.
2. **Phase 1.6 §3 employee training.** Currently deferred per
acknowledgement coverage in §7 of the destruction runbook
(single-operator population). Re-promotes to blocking if the
operator population grows; counsel may want to opine on the
threshold.
---
## 6. Sign-off sequence
Recommended order so a hold-up on one doc doesn't block others:
1. **First wave (parallel):** A (retention schedule) + B (consent
template). These two have the tightest interdependence (consent
v1 references retention v1 by hash); review them together.
2. **Second wave:** C (destruction runbook). Depends on A's retention
period being fixed.
3. **Third wave:** D (pre-identityd attestation). Sign once A + B + C
are settled; the attestation snapshot is the boundary between
pre-Phase-1.6 and post-Phase-1.6 system state.
4. **Fourth wave:** E (key rotation). Independent of A-D; can be
reviewed in parallel any time.
---
## 7. After sign-off — engineering steps
Once each document is signed:
| Document | Engineering action | Trigger |
|---|---|---|
| A retention schedule | Hash + commit; reference in `consent_versions` table | Counsel signature |
| B consent template | Hash + commit; reference in candidate-facing intake UI | Counsel signature |
| C destruction runbook | Adopt; operator acknowledgment recorded in §7 | Counsel attestation |
| D pre-identityd attestation | Anchor hash to filesystem + git; counsel keeps original signature | Both signatures |
| E key rotation | Adopt; rotation event log seeded with counsel-approved cadence | Counsel notes |
The HARD blocker for first real-candidate photo collection is
A + B + D signed. C and E are operationally important but do not
block the *first* photo (they govern destruction + key handling
which apply to any state, not the boundary state).
---
## 8. Cover-note hash
This packet is itself a snapshot. Future-Claude / future-J will refer
back to this packet to know what counsel saw on 2026-05-05.
**Packet attached files (referenced by path):**
- `docs/policies/consent/biometric_retention_schedule_v1.md`
- `docs/policies/consent/biometric_consent_template_v1.md`
- `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`
- `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md`
- `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md`
- `docs/specs/GATE_3B_DEEPFACE_DESIGN.md`
- `docs/PHASE_1_6_BIPA_GATES.md` (the spec they all reference)
Per-file SHA-256 hashes are produced by the bundler script (next
section); the bundler also creates a tarball ready for transmission.
---
## 9. Generating the bundle for transmission
```bash
./scripts/staffing/bundle_counsel_packet.sh
```
Produces `reports/counsel/counsel_packet_<DATE>.tar.gz` with all
referenced documents + a manifest listing per-file SHA-256 hashes.
Counsel can verify file integrity on receipt by re-running
sha256sum against each file in the tarball.

View File

@ -3,6 +3,7 @@
**Spec:** docs/PHASE_1_6_BIPA_GATES.md §1 Gate 2 (BIPA §15(b)(1)-(3)) **Spec:** docs/PHASE_1_6_BIPA_GATES.md §1 Gate 2 (BIPA §15(b)(1)-(3))
**Status:** Engineering scaffold — ⚖ COUNSEL must author the binding text before deployment **Status:** Engineering scaffold — ⚖ COUNSEL must author the binding text before deployment
**Version:** v1 (initial; supersession requires a new version + new hash) **Version:** v1 (initial; supersession requires a new version + new hash)
**Updated 2026-05-05:** Disclosure 1 + plain-language summary revised to match the Gate 3b deferral recommendation in `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` (Option C — defer classifications). Pending J's product confirmation of Gate 3b; if Gate 3b chooses Option A or B, this template needs counsel re-authoring.
> This is the consent template a candidate signs (electronically or > This is the consent template a candidate signs (electronically or
> on paper) BEFORE Lakehouse collects, stores, or processes any > on paper) BEFORE Lakehouse collects, stores, or processes any
@ -25,11 +26,12 @@ content; counsel provides the legally-sufficient wording.
### Disclosure 1 — Notice of collection (§15(b)(1)) ### Disclosure 1 — Notice of collection (§15(b)(1))
Lakehouse will collect, store, and use my **biometric identifier** Lakehouse will collect and store my **biometric identifier** (a
(facial geometry derived from a photograph of me) and **biometric photograph of me from which facial geometry is implicit). The
information** (gender, race, and age classifications derived from photograph itself is the data we keep — we do NOT run automated
that photograph by an automated facial-classification model called facial-classification (gender / race / age inference) against it
deepface). in v1. If at a later date we add automated classification, we will
re-collect consent under a superseding template before doing so.
### Disclosure 2 — Specific purpose and length of term (§15(b)(2)) ### Disclosure 2 — Specific purpose and length of term (§15(b)(2))
@ -66,9 +68,9 @@ is appropriate to include or whether a different plain-language
section is preferred. section is preferred.
> **What you're agreeing to:** if you upload a photo of yourself, > **What you're agreeing to:** if you upload a photo of yourself,
> we'll keep that photo and a few descriptive labels about the photo > we'll keep that photo so your staffing coordinator can recognize
> (estimated age, perceived gender, perceived race) to help your > you when you arrive at job sites. We don't run automated guesses
> staffing coordinator recognize you when you arrive at job sites. > about your age, gender, or race against the photo.
> >
> **How long we keep it:** at most 18 months after your last > **How long we keep it:** at most 18 months after your last
> placement or interaction with us, then it's permanently destroyed. > placement or interaction with us, then it's permanently destroyed.

View File

@ -3,6 +3,7 @@
**Spec:** docs/PHASE_1_6_BIPA_GATES.md §1 Gate 1 (BIPA §15(a)) **Spec:** docs/PHASE_1_6_BIPA_GATES.md §1 Gate 1 (BIPA §15(a))
**Status:** Engineering scaffold — ⚖ COUNSEL must author the binding text before public publication **Status:** Engineering scaffold — ⚖ COUNSEL must author the binding text before public publication
**Version:** v1 (initial; supersession requires a new version + new hash) **Version:** v1 (initial; supersession requires a new version + new hash)
**Updated 2026-05-05:** §1 + §2 revised to match the Gate 3b deferral recommendation in `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` (Option C — defer classifications). §5 destruction-trigger endpoint corrected to the shipped catalogd-local route. Pending J's product confirmation of Gate 3b.
> This is a publicly-available retention schedule for biometric identifiers > This is a publicly-available retention schedule for biometric identifiers
> and biometric information collected by the Lakehouse staffing platform. > and biometric information collected by the Lakehouse staffing platform.
@ -15,12 +16,15 @@
This schedule applies to: This schedule applies to:
- **Biometric identifiers** as defined in 740 ILCS 14/10: facial geometry - **Biometric identifiers** as defined in 740 ILCS 14/10: candidate
derived from candidate photographs. photographs from which facial geometry is implicit.
- **Biometric information** as defined in 740 ILCS 14/10: any information - **Biometric information** as defined in 740 ILCS 14/10: any information
derived from a biometric identifier, including but not limited to derived from a biometric identifier. **In v1 of this schedule, no
the gender, race, and age classifications produced by the deepface derived information is collected** — automated facial-classification
model when applied to a candidate photograph. (gender, race, age inference) is deferred per
`docs/specs/GATE_3B_DEEPFACE_DESIGN.md` Option C. If a future version
of this schedule introduces classification, that is a superseding
v2 schedule with re-consent under the matching v2 consent template.
**Out of scope** (explicitly NOT biometric data under this schedule): **Out of scope** (explicitly NOT biometric data under this schedule):
@ -39,14 +43,15 @@ This schedule applies to:
| Category | Source | Storage location | | Category | Source | Storage location |
|---|---|---| |---|---|---|
| Photograph (raw bytes) | Candidate upload via the consent-gated photo endpoint | Quarantined under `data/biometric/uploads/<candidate_id>/<ts>.<ext>`; encrypted at rest | | Photograph (raw bytes) | Candidate upload via the consent-gated photo endpoint | Quarantined under `data/biometric/uploads/<candidate_id>/<ts>_<uuid>.<ext>`; mode 0700 dir / 0600 file |
| Facial geometry classifications | deepface inference run against the photograph | `subjects.biometric_classifications` (JSONB on the identityd `subjects` row) | | Photograph integrity hash | SHA-256 of the original bytes | `SubjectManifest.biometric_collection.template_hash` (catalogd JSON manifest at `data/_catalog/subjects/<id>.json`) |
| Photograph integrity hash | SHA-256 of the original bytes | `subjects.biometric_template_hash` |
We do NOT collect raw biometric template vectors that could be used We do NOT collect raw biometric template vectors that could be used
to re-derive a face from the encoded form. The deepface output is to re-derive a face from the encoded form. We do NOT run automated
stored as discrete classification labels (e.g. `{"age_estimate": 32, facial-classification (gender, race, age inference) in v1 — see
"gender": "...", "race": "..."}`), not as a re-identifiable embedding. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` for the deferral rationale.
The `BiometricCollection.classifications` field on the subject
manifest exists in the schema but is `None` for every subject.
--- ---
@ -104,8 +109,8 @@ Runbook** (`docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`) when:
- Retention period under §4 expires - Retention period under §4 expires
- Candidate withdraws biometric consent under the consent template (Gate 2) - Candidate withdraws biometric consent under the consent template (Gate 2)
- Candidate exercises a right-to-be-forgotten request - Candidate exercises a right-to-be-forgotten request
- An identityd `POST /v1/identity/subjects/{id}/erase` is invoked under - A catalogd-local `POST /biometric/subject/{id}/erase` is invoked
legal-tier authentication under legal-tier authentication (shipped `848a458`)
Every destruction event is recorded as an append-only audit row in Every destruction event is recorded as an append-only audit row in
the affected subject's per-subject HMAC-chained audit log (see the affected subject's per-subject HMAC-chained audit log (see

View File

@ -66,10 +66,11 @@ Before initiating destruction, the operator MUST:
Invoke the legal-tier erasure endpoint: Invoke the legal-tier erasure endpoint:
```bash ```bash
curl -sf -X POST "http://localhost:3100/v1/identity/subjects/${CANDIDATE_ID}/erase" \ curl -sf -X POST "http://localhost:3100/biometric/subject/${CANDIDATE_ID}/erase" \
-H "Authorization: Bearer $(cat /etc/lakehouse/legal_audit.token)" \ -H "Authorization: Bearer $(cat /etc/lakehouse/legal_audit.token)" \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{ -d '{
"scope": "biometric_only|full",
"trigger": "retention_expiry|consent_withdrawal|rtbf|court_order", "trigger": "retention_expiry|consent_withdrawal|rtbf|court_order",
"trigger_evidence_path": "<path to signed artifact>", "trigger_evidence_path": "<path to signed artifact>",
"operator_of_record": "<operator name>", "operator_of_record": "<operator name>",
@ -77,17 +78,25 @@ curl -sf -X POST "http://localhost:3100/v1/identity/subjects/${CANDIDATE_ID}/era
}' }'
``` ```
⚖ ENGINEERING — `POST /v1/identity/subjects/{id}/erase` is Phase 1.6 The endpoint is **shipped** (commit `848a458`, 21 unit tests). It is
Gate 3 dependent. Until it ships, the manual procedure is: served from catalogd-local at `/biometric/subject/{id}/erase` (the
original v1 spec proposed `/v1/identity/subjects/{id}/erase` under a
separate identityd daemon — that daemon was collapsed into catalogd
per the architecture pivot).
a. Set `SubjectManifest.consent.biometric.status = "withdrawn"` and The endpoint exposes two scopes:
`SubjectManifest.status = "erased"` via direct registry write
(operator-of-record only). - **`scope: "biometric_only"`** — clears `BiometricCollection` from
b. Securely overwrite + unlink the quarantined photo path: the SubjectManifest (drops `data_path`, `template_hash`, and
`shred -uvz data/biometric/uploads/${CANDIDATE_ID}/*.jpg` `classifications` together) + securely unlinks the quarantined
(or equivalent for the configured backend). photo file. Subject manifest itself remains. Use for retention
c. NULL the deepface classification fields on the subject row. expiry / consent withdrawal where only biometric data must go.
d. Append the destruction-event audit row (Step 2 below). - **`scope: "full"`** — full subject erasure (manifest + biometric
files). Use for court-ordered erasure or full RTBF requests.
In both scopes, the audit row is appended BEFORE photo unlink so
the chain has legal proof of intent even if the file delete fails
(transactional rollback on audit failure).
### Step 2 — Append the destruction-event audit row ### Step 2 — Append the destruction-event audit row
@ -224,5 +233,12 @@ training program.
## 8. Change log ## 8. Change log
- 2026-05-05 — Endpoint path reconciled with shipped state:
`/v1/identity/subjects/{id}/erase` (legacy proposal under a
separate identityd daemon) → `/biometric/subject/{id}/erase`
(catalogd-local, shipped `848a458`). Step 1 manual-fallback
block removed (the endpoint is no longer "TODO"). Two-scope
body shape (`biometric_only` / `full`) documented to match
the implementation.
- 2026-05-03 — Initial scaffold. ⚖ COUNSEL review required before - 2026-05-03 — Initial scaffold. ⚖ COUNSEL review required before
adoption. adoption.

View File

@ -0,0 +1,308 @@
# Legal-Tier Audit Key & Token Rotation Runbook
**Spec companion:** `docs/PHASE_1_6_BIPA_GATES.md` §2 + `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`
**Audience:** Operators with root on the gateway host (J + named operators)
**Status:** Engineering-authored — ⚖ counsel review encouraged before formal adoption
> This runbook covers rotation of the two crypto-credentials that gate
> the Phase 1.6 audit substrate:
>
> 1. **`LH_SUBJECT_AUDIT_KEY`** — the 32-byte HMAC-SHA256 signing key
> that chains every per-subject audit row. If this key changes, all
> pre-rotation chain rows tamper-detect under the new key. That is
> correct, expected, BIPA-defensible behavior — the chain integrity
> it provided pre-rotation remains intact in the archive of the old
> key, and post-rotation chains remain intact going forward.
>
> 2. **`LH_LEGAL_AUDIT_TOKEN`** — the 32+-character bearer token that
> authorizes calls to `/audit/subject/{id}` and
> `/biometric/subject/{id}/erase`. Rotation does NOT touch any audit
> history; only access to the legal-tier endpoints flips.
>
> Both live at `/etc/lakehouse/` (mode 0400, owned by root) and are
> loaded by the gateway via systemd `Environment=` directives in
> `/etc/systemd/system/lakehouse.service.d/audit_env.conf`. They are
> NOT loaded from `/tmp` — a 2026-05-05 reboot incident wiped a
> `/tmp`-resident key and caused `/audit` + `/biometric` to fail-closed
> (which is what they should do); the rotation fix moved them to the
> persistent path.
---
## 1. When to rotate
Rotate **immediately** when any of the following is true:
| Trigger | Urgency | Notes |
|---|---|---|
| Suspected operator credential compromise | Within 1 hour | Token mismatch is fail-closed by default; immediate rotation closes the window. |
| Operator with legal-tier access leaves the team | Within 24 hours | Treat as compromise. |
| Key/token file's filesystem permissions were ever weakened (mode > 0400, group readable, etc.) | Within 24 hours | Filesystem audit may have leaked the bytes. |
| Token was ever transmitted over an untrusted channel (printed in CI log, sent over SMS, etc.) | Within 24 hours | Same reasoning. |
| Scheduled rotation (recommended) | Every 90 days | BIPA does not mandate a rotation cadence; counsel may set one. |
Do **not** rotate when:
- A subject's audit chain tamper-detects in isolation. That is normal
if the audit log was edited (which would itself be the BIPA finding,
not the key). Investigate the chain, not the key.
- Cross-runtime parity drift appears. That's an HMAC-input-shape bug
(Go vs Rust serialization), not a key issue. See
`STATE_OF_PLAY.md` "three runtime-divergence classes" entry.
---
## 2. Pre-rotation checks (5 minutes)
Before generating new credentials, capture a clean baseline so you can
prove the rotation cause and sequence afterward.
### 2.1. Take the engineering snapshot
```bash
# Confirm the canonical files exist with correct permissions.
ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token
# Hash the existing key + token (NEVER the bytes themselves) so the
# old credential is identifiable in retrospect without storing it.
sha256sum /etc/lakehouse/subject_audit.key
sha256sum /etc/lakehouse/legal_audit.token
# Confirm the gateway is currently using these files.
sudo systemctl cat lakehouse.service | grep -E "Environment.*AUDIT"
# Verify the audit endpoint is healthy with the current credentials.
curl -sf http://localhost:3100/audit/health
```
If `/audit/health` is already 503, the rotation is **recovery**, not
preventive — note this in the rotation event record (§5).
### 2.2. Capture a known-good chain root
Pick one or two subjects with non-empty audit logs and record their
chain roots **under the current key**:
```bash
TOKEN=$(cat /etc/lakehouse/legal_audit.token)
for cid in WORKER-2 WORKER-100; do
curl -sf -H "X-Lakehouse-Legal-Token: $TOKEN" \
"http://localhost:3100/audit/subject/$cid" \
| jq '{cid: .candidate_id, verified: .audit_log.chain_verified, root: .audit_log.chain_root, rows: .audit_log.chain_rows_total}'
done
```
Save the output. Post-rotation, those chains will tamper-detect under
the new key — that is **expected** and the saved snapshot is the proof
that the chain WAS intact under the old key, before rotation.
---
## 3. Generation + rotation
### 3.1. Generate the new key
```bash
# 32 random bytes as hex = 64 chars. Either format works for HMAC-SHA256;
# we follow the existing convention (44-char base64-ish with no padding).
sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \
/etc/lakehouse/subject_audit.key.new
sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \
/etc/lakehouse/legal_audit.token.new
# Sanity: confirm 44-char content + correct mode.
sudo wc -c /etc/lakehouse/subject_audit.key.new /etc/lakehouse/legal_audit.token.new
sudo ls -la /etc/lakehouse/*.new
```
Both must be `mode 0400`, owned by root, exactly **44 chars** (the
audit endpoint refuses tokens shorter than 32 chars at load — see
`crates/catalogd/src/audit_endpoint.rs:73`).
### 3.2. Atomic swap
The gateway reads these files **once at boot** (per
`crates/catalogd/src/audit_endpoint.rs::AuditEndpointState::new` and
the equivalent for the writer). Atomic mv → restart is required.
```bash
# Move the old credentials to a quarantine path with timestamp so the
# old hashes remain identifiable post-rotation.
TS=$(date -u +%Y%m%dT%H%M%SZ)
sudo mkdir -p /etc/lakehouse/_archived
sudo install -d -m 0700 -o root -g root /etc/lakehouse/_archived
sudo mv /etc/lakehouse/subject_audit.key /etc/lakehouse/_archived/subject_audit.key.$TS
sudo mv /etc/lakehouse/legal_audit.token /etc/lakehouse/_archived/legal_audit.token.$TS
sudo mv /etc/lakehouse/subject_audit.key.new /etc/lakehouse/subject_audit.key
sudo mv /etc/lakehouse/legal_audit.token.new /etc/lakehouse/legal_audit.token
sudo ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token
```
### 3.3. Restart the gateway
```bash
sudo systemctl restart lakehouse.service
sleep 2
sudo systemctl status lakehouse.service --no-pager | head -10
```
Wait for the gateway to bind port 3100 cleanly. If it doesn't, check
`journalctl -u lakehouse.service -n 50 --no-pager` for the failure
mode — the most common cause is the new file having wrong mode/owner.
---
## 4. Post-rotation verification (5 minutes)
### 4.1. Health probes
```bash
# Audit endpoint must be 200, not 503.
curl -sf http://localhost:3100/audit/health
# Expect: "audit endpoint ready"
# /v1/health must list the gateway's full provider set.
curl -sf http://localhost:3100/v1/health | jq '.providers, .worker_count'
```
### 4.2. Confirm the new token works
```bash
NEW_TOKEN=$(cat /etc/lakehouse/legal_audit.token)
curl -sS -o /dev/null -w '%{http_code}\n' \
-H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
http://localhost:3100/audit/subject/WORKER-100
# Expect: 200
```
If 401, the file the gateway loaded does NOT match the file you wrote.
Check ownership / mode / for trailing whitespace differences with
`hexdump -C /etc/lakehouse/legal_audit.token | head`.
### 4.3. Confirm the new chain works
Append-only chains are key-tied. Any *new* audit row written
post-rotation is signed under the new key and verifies cleanly:
```bash
# Issue a /v1/validate call against any worker — it spawns an audit row.
curl -sf -X POST http://localhost:3100/v1/validate \
-H 'Content-Type: application/json' \
-d '{"mode":"fill","candidate_id":"WORKER-100","worker_id":"WORKER-100","fields":["exists"]}' >/dev/null
# Read the chain back. Last row must verify under the new key.
curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
http://localhost:3100/audit/subject/WORKER-100 \
| jq '.audit_log | {verified: .chain_verified, rows: .chain_rows_total, last_kind: .rows[-1].accessor.kind}'
```
`chain_verified: true` confirms the new key is signing + verifying.
### 4.4. Confirm pre-rotation chains tamper-detect (expected)
```bash
curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
http://localhost:3100/audit/subject/WORKER-2 \
| jq '.audit_log | {verified: .chain_verified, error: .chain_verification_error}'
```
For any subject whose chain was written under the old key, this
returns `chain_verified: false` with an HMAC-mismatch error. **This
is correct behavior**, not a bug. The old chain was correctly signed
under the old key and verified under it; the new key cannot retroactively
verify rows it didn't sign. The pre-rotation snapshot you captured in
§2.2 is the defensible proof that those rows WERE valid pre-rotation.
If, instead, you see a chain that *should* verify post-rotation
returning `verified: false`, that's the rotation having gone wrong —
likely an old-key file that didn't get archived cleanly. Restore from
`/etc/lakehouse/_archived/<ts>/`, then re-attempt.
---
## 5. Record the rotation event
Append a row to the rotation log:
```bash
sudo tee -a /etc/lakehouse/_archived/rotation_log.jsonl <<EOF
{"ts":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","operator":"<your name>","reason":"<scheduled|compromise|cred_loss|recovery>","old_key_sha256":"<hash from §2.1>","new_key_sha256":"$(sha256sum /etc/lakehouse/subject_audit.key | awk '{print $1}')","old_token_sha256":"<hash from §2.1>","new_token_sha256":"$(sha256sum /etc/lakehouse/legal_audit.token | awk '{print $1}')","witness":"<witness name or N/A for routine>"}
EOF
sudo chmod 0600 /etc/lakehouse/_archived/rotation_log.jsonl
sudo chown root:root /etc/lakehouse/_archived/rotation_log.jsonl
```
This file is the operator-side record of when the key changed and why.
It does NOT contain the key itself — only hashes — so it is safe to
back up and share with counsel on request.
---
## 6. Recovery from a lost key
If the active `subject_audit.key` is destroyed (filesystem corruption,
accidental delete, /tmp wipe per the 2026-05-05 incident), the gateway
will fail-closed at startup:
- `/audit/subject/{id}` → 503 ("audit endpoint disabled (legal token missing)" or equivalent for the signing key)
- `/biometric/subject/{id}/photo` → 503 (same fail-closed posture)
This is correct behavior — a server that cannot HMAC-sign new audit
rows must not accept new biometric writes.
**Recovery is rotation.** Generate a new key per §3.1, atomic-swap
per §3.2, restart per §3.3, verify per §4. Pre-loss chains tamper-detect
under the new key (the old key is gone — there is no way to verify
them). Treat the loss event as the BIPA-defensible boundary: pre-loss
chain verification was provided by the working key; post-loss new
chains are signed under the new key.
If a counsel-grade attestation of the pre-loss chains is needed, the
`/etc/lakehouse/_archived/` folder contains the historical hashes;
combined with the cross-runtime parity probe (Go reader gives the
same byte-identical view as Rust), the chain history pre-loss is
preservable as long as the on-disk JSONL files were not also lost.
---
## 7. ⚖ counsel notes
These are areas where counsel may want to opine before this runbook
is formally adopted:
1. **Rotation cadence.** BIPA itself does not require periodic rotation;
counsel may set a 90-day schedule to satisfy a separate compliance
posture (SOC2, internal policy).
2. **Custody of `/etc/lakehouse/_archived/`.** The archived hashes do
NOT contain the keys, but the archived raw key files DO. Counsel
may want a more aggressive destruction schedule for the raw archived
keys — say 1 year — to reduce a long-tail compromise surface.
3. **Notification obligations on rotation due to compromise.** §1
triggers a rotation; §1 does not address whether candidates whose
biometric data was protected by the compromised key must be notified.
This is a counsel call.
---
## 8. Operator acknowledgment
| Operator | Date acknowledged | Signature |
|---|---|---|
| J | _____ | _______________ |
| _____ | _____ | _______________ |
---
## 9. Change log
- 2026-05-05 — Initial runbook authored after the /tmp wipe incident
on the same day (key was at `/tmp/subject_audit.key` and was deleted
on reboot, disabling `/audit` + `/biometric` until the key was
regenerated at `/etc/lakehouse/subject_audit.key`). Recovery of
that incident produced a working procedure; this runbook captures
it as the canonical playbook for any future rotation.

View File

@ -1,6 +1,8 @@
# Gate 3b — Deepface Classification Integration (Design) # Gate 3b — Deepface Classification Integration (Design)
**Status:** Design draft — 2026-05-03 morning · **Companion to:** [`PHASE_1_6_BIPA_GATES.md`](../PHASE_1_6_BIPA_GATES.md) Gate 3 · **Depends on:** Gate 3a (photo upload) which is shipped (`f1fa6e4`) **Status:** **DECIDED 2026-05-05 — Option C (defer classifications)** · Original design draft 2026-05-03 morning · **Companion to:** [`PHASE_1_6_BIPA_GATES.md`](../PHASE_1_6_BIPA_GATES.md) Gate 3 · **Depends on:** Gate 3a (photo upload) which is shipped (`f1fa6e4`)
> **Decision summary (2026-05-05):** J accepted Option C. `BiometricCollection.classifications` remains `Option<JSON> = None` in v1. The consent template and retention schedule were revised the same day to remove all "automated facial-classification" language so the disclosed scope matches the implemented scope. If a real product requirement for classifications surfaces later, this doc's Option A (Python subprocess) or Option B (ONNX-in-Rust) is picked up under a v2 consent template + v2 retention schedule.
> **What this is.** Three options for how `BiometricCollection.classifications` (currently `Option<JSON>`, always `None`) gets populated by an automated facial-attribute classifier. Phase 1.6 Gate 3a ships the consent-gated upload + audit chain + transactional rollback; Gate 3b adds the classification step. The substrate is ready — what's missing is the design choice for HOW classification happens. > **What this is.** Three options for how `BiometricCollection.classifications` (currently `Option<JSON>`, always `None`) gets populated by an automated facial-attribute classifier. Phase 1.6 Gate 3a ships the consent-gated upload + audit chain + transactional rollback; Gate 3b adds the classification step. The substrate is ready — what's missing is the design choice for HOW classification happens.
> >
@ -152,6 +154,8 @@ Reasoning:
⚖ J — pick A / B / C. The substrate accommodates any choice; the cost is the design-doc → counsel-coordination → engineering loop, which differs by an order of magnitude across the options. ⚖ J — pick A / B / C. The substrate accommodates any choice; the cost is the design-doc → counsel-coordination → engineering loop, which differs by an order of magnitude across the options.
**[2026-05-05] J's decision: Option C.** Reasoning recorded in change log below. Consent + retention doc revisions for Option C shipped same day; counsel review of revised text is the remaining work.
--- ---
## Open questions for J ## Open questions for J

View File

@ -0,0 +1,263 @@
#!/usr/bin/env bash
# biometric_destruction_report — monthly destruction event aggregation.
#
# Specification: docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §5.
# Spec: docs/PHASE_1_6_BIPA_GATES.md §1 Gate 5.
#
# Why this exists: counsel and operations review need a periodic
# attestation that destructions have happened in a defensible cadence.
# This script produces an anonymized monthly report aggregating
# per-subject audit logs.
#
# Output is anonymized — counts, timings, scope/trigger breakdowns,
# and chain attestations. Candidate IDs are hashed (sha256-prefix) so
# the report can be shared with counsel without exposing identifiers.
#
# Usage:
# biometric_destruction_report.sh \
# [--month YYYY-MM] \
# [--audit-dir data/_catalog/subjects] \
# [--output reports/biometric/destruction_<month>.md]
#
# Defaults:
# --month — current UTC month (YYYY-MM)
# --audit-dir — data/_catalog/subjects
# --output — reports/biometric/destruction_<month>.md
#
# Exit codes:
# 0 — report written successfully (whether or not events were found)
# 1 — report written but with anomalies that need review
# 2 — script error (missing tools, unreadable audit dir)
set -uo pipefail
cd "$(dirname "$0")/../.."
MONTH=""
AUDIT_DIR="data/_catalog/subjects"
OUT=""
while [ "$#" -gt 0 ]; do
case "$1" in
--month) MONTH="$2"; shift 2 ;;
--audit-dir) AUDIT_DIR="$2"; shift 2 ;;
--output) OUT="$2"; shift 2 ;;
-h|--help)
sed -n '2,30p' "$0" | sed 's/^# \?//'
exit 0 ;;
*) echo "unknown flag: $1" >&2; exit 2 ;;
esac
done
# Default month = current UTC YYYY-MM. Validate format defensively
# so a malformed --month value (e.g. "May 2026") doesn't silently
# match nothing in the JSONL filter.
if [ -z "$MONTH" ]; then
MONTH=$(date -u +%Y-%m)
fi
if ! echo "$MONTH" | grep -qE '^[0-9]{4}-(0[1-9]|1[0-2])$'; then
echo "[report] FAIL: --month must be YYYY-MM, got '$MONTH'" >&2
exit 2
fi
if [ -z "$OUT" ]; then
OUT="reports/biometric/destruction_${MONTH}.md"
fi
# Dependency gates.
for cmd in jq sha256sum; do
if ! command -v "$cmd" >/dev/null 2>&1; then
echo "[report] FAIL: required tool '$cmd' not found in PATH" >&2
exit 2
fi
done
if [ ! -d "$AUDIT_DIR" ]; then
echo "[report] FAIL: audit dir not found at $AUDIT_DIR" >&2
exit 2
fi
mkdir -p "$(dirname "$OUT")"
# Aggregator storage.
EVENTS=$(mktemp)
ANOMALIES=$(mktemp)
trap 'rm -f "$EVENTS" "$ANOMALIES"' EXIT
# Iterate every per-subject audit log under AUDIT_DIR. Each file is
# JSONL — one row per line. We extract erasure rows in the requested
# month + emit a normalized one-line record per event.
TOTAL_FILES=0
TOTAL_ROWS_SCANNED=0
SHARDS_WITH_EVENTS=0
for f in "$AUDIT_DIR"/*.audit.jsonl; do
[ -e "$f" ] || continue
TOTAL_FILES=$((TOTAL_FILES + 1))
# File-level row count (cheap).
ROWS=$(wc -l < "$f" 2>/dev/null || echo 0)
TOTAL_ROWS_SCANNED=$((TOTAL_ROWS_SCANNED + ROWS))
# Filter rows for the month + erasure kinds.
HAD_EVENT=0
while IFS= read -r line; do
[ -n "$line" ] || continue
KIND=$(printf '%s' "$line" | jq -r '.accessor.kind // ""' 2>/dev/null || echo "")
case "$KIND" in
biometric_erasure|full_erasure) ;;
*) continue ;;
esac
TS=$(printf '%s' "$line" | jq -r '.ts // ""' 2>/dev/null || echo "")
case "$TS" in
"${MONTH}-"*) ;; # only this month
*) continue ;;
esac
HAD_EVENT=1
CID=$(printf '%s' "$line" | jq -r '.candidate_id // ""' 2>/dev/null || echo "")
PURPOSE=$(printf '%s' "$line" | jq -r '.accessor.purpose // ""' 2>/dev/null || echo "")
RESULT=$(printf '%s' "$line" | jq -r '.result // ""' 2>/dev/null || echo "")
# accessor.purpose has shape "trigger=<name>;..." per biometric_endpoint
TRIGGER=$(printf '%s' "$PURPOSE" | sed -nE 's/.*trigger=([a-z_]+).*/\1/p')
[ -n "$TRIGGER" ] || TRIGGER="unknown"
# Hash candidate_id so the report stays anonymized.
CID_HASH=$(printf '%s' "$CID" | sha256sum | awk '{print substr($1,1,12)}')
# Anomaly: erasure row but result not in {erased, success}.
case "$RESULT" in
erased|success) ;;
*)
echo " - candidate_hash=$CID_HASH ts=$TS kind=$KIND result=$RESULT trigger=$TRIGGER (unexpected result)" >> "$ANOMALIES"
;;
esac
# Tab-separated event line: ts, kind, trigger, result, cid_hash
printf '%s\t%s\t%s\t%s\t%s\n' "$TS" "$KIND" "$TRIGGER" "$RESULT" "$CID_HASH" >> "$EVENTS"
done < "$f"
if [ "$HAD_EVENT" = "1" ]; then
SHARDS_WITH_EVENTS=$((SHARDS_WITH_EVENTS + 1))
fi
done
EVENT_COUNT=$(wc -l < "$EVENTS" 2>/dev/null || echo 0)
EVENT_COUNT=$(printf '%s' "$EVENT_COUNT" | tr -d '[:space:]')
: "${EVENT_COUNT:=0}"
# Compute breakdowns.
COUNT_BIOMETRIC_ONLY=0
COUNT_FULL=0
if [ "$EVENT_COUNT" != "0" ]; then
COUNT_BIOMETRIC_ONLY=$(awk -F '\t' '$2=="biometric_erasure"' "$EVENTS" | wc -l | tr -d '[:space:]')
COUNT_FULL=$(awk -F '\t' '$2=="full_erasure"' "$EVENTS" | wc -l | tr -d '[:space:]')
fi
ANOMALY_COUNT=$(wc -l < "$ANOMALIES" 2>/dev/null || echo 0)
ANOMALY_COUNT=$(printf '%s' "$ANOMALY_COUNT" | tr -d '[:space:]')
: "${ANOMALY_COUNT:=0}"
# Render the report.
GENERATED_AT=$(date -u +%Y-%m-%dT%H:%M:%SZ)
{
echo "# Biometric Destruction Report — $MONTH"
echo
echo "**Generated:** $GENERATED_AT"
echo "**Audit dir scanned:** \`$AUDIT_DIR\`"
echo "**Spec:** docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §5"
echo "**Generator:** scripts/staffing/biometric_destruction_report.sh"
echo
echo "## Scope"
echo
echo "- **Subject audit shards scanned:** $TOTAL_FILES"
echo "- **Audit rows scanned (all kinds):** $TOTAL_ROWS_SCANNED"
echo "- **Shards containing $MONTH erasure events:** $SHARDS_WITH_EVENTS"
echo
echo "## Destruction events in $MONTH"
echo
echo "- **Total events:** $EVENT_COUNT"
echo "- **By scope:**"
echo " - \`biometric_erasure\` (BiometricCollection cleared, manifest retained): $COUNT_BIOMETRIC_ONLY"
echo " - \`full_erasure\` (manifest + biometric data cleared): $COUNT_FULL"
echo
if [ "$EVENT_COUNT" = "0" ]; then
echo "**No destruction events recorded for $MONTH.** This is correct"
echo "for a month with no retention expiries / withdrawal requests"
echo "/ RTBF requests / court orders."
echo
else
echo "### By trigger"
echo
echo "| Trigger | Count |"
echo "|---|---|"
awk -F '\t' '{print $3}' "$EVENTS" | sort | uniq -c | \
sort -rn | awk '{ printf("| %s | %d |\n", $2, $1); }'
echo
echo "### Event detail (anonymized)"
echo
echo "Candidate IDs are hashed (sha256-12-prefix) so this report can"
echo "be shared with outside counsel without exposing identifiers."
echo
echo "| ts | kind | trigger | result | candidate_hash |"
echo "|---|---|---|---|---|"
sort -k1,1 "$EVENTS" | awk -F '\t' '{
printf("| %s | %s | %s | %s | %s |\n", $1, $2, $3, $4, $5);
}'
echo
fi
if [ "$ANOMALY_COUNT" != "0" ]; then
echo "## Anomalies ($ANOMALY_COUNT)"
echo
echo "Events whose audit row deviates from expected shape (kind/result"
echo "mismatch, missing trigger, etc.). These do NOT necessarily mean"
echo "the destruction failed — the BIPA-load-bearing surface is the"
echo "audit chain, which still verifies cryptographically. They are"
echo "logged here so an operator can investigate and confirm."
echo
echo '```'
cat "$ANOMALIES"
echo '```'
echo
fi
echo "## Cryptographic attestation"
echo
echo "This report was produced by aggregating per-subject HMAC-chained"
echo "audit logs. The chain itself is the BIPA-defensible substrate;"
echo "this report is a derived view, not the chain of record. To verify"
echo "any individual event, run:"
echo
echo '```bash'
echo "./scripts/staffing/verify_biometric_erasure.sh <candidate_id>"
echo '```'
echo "(operator must un-hash the candidate ID through their own"
echo " operator log to perform spot-checks)."
echo
echo "**Cross-runtime parity:** the same audit logs are byte-identical"
echo "under Rust + Go (per scripts/cutover/parity/subject_audit_parity.sh)."
echo "If counsel needs cross-runtime attestation, that probe provides it."
echo
EVIDENCE_HASH=$(sha256sum "$EVENTS" 2>/dev/null | awk '{print $1}')
: "${EVIDENCE_HASH:=$(echo -n '' | sha256sum | awk '{print $1}')}"
echo "**Events SHA-256:** \`$EVIDENCE_HASH\`"
echo
echo "---"
echo
echo "**Operator (J):** _______________________________ Date: __________"
echo
} > "$OUT"
echo "[report] $EVENT_COUNT destruction events in $MONTH ($COUNT_BIOMETRIC_ONLY biometric_only, $COUNT_FULL full)"
echo "[report] anomalies: $ANOMALY_COUNT"
echo "[report] output: $OUT"
# Exit 1 if anomalies present (review needed) but report still written.
if [ "$ANOMALY_COUNT" != "0" ]; then
exit 1
fi
exit 0

View File

@ -0,0 +1,99 @@
#!/usr/bin/env bash
# bundle_counsel_packet — assemble the counsel-review packet tarball.
#
# Specification: docs/counsel/COUNSEL_REVIEW_PACKET_<DATE>.md §9.
#
# Why this exists: the cover note references a list of documents.
# Counsel needs them as a single transmittable artifact, with per-file
# integrity hashes so they can verify nothing changed in transit.
#
# Output:
# reports/counsel/counsel_packet_<DATE>.tar.gz
# reports/counsel/counsel_packet_<DATE>.manifest.txt (sha256 per file)
#
# Usage:
# bundle_counsel_packet.sh [--date YYYY-MM-DD]
#
# Exit codes:
# 0 — packet bundled successfully
# 1 — one or more referenced documents are missing
# 2 — script error (missing tools, write failure)
set -uo pipefail
cd "$(dirname "$0")/../.."
DATE="$(date -u +%Y-%m-%d)"
while [ "$#" -gt 0 ]; do
case "$1" in
--date) DATE="$2"; shift 2 ;;
-h|--help)
sed -n '2,20p' "$0" | sed 's/^# \?//'
exit 0 ;;
*) echo "unknown flag: $1" >&2; exit 2 ;;
esac
done
# Dependency gate.
for cmd in tar sha256sum; do
if ! command -v "$cmd" >/dev/null 2>&1; then
echo "[bundle] FAIL: required tool '$cmd' not found in PATH" >&2
exit 2
fi
done
# Files in the packet. Order is the recommended counsel-review order
# from the cover note §6.
FILES=(
"docs/counsel/COUNSEL_REVIEW_PACKET_${DATE}.md"
"docs/policies/consent/biometric_retention_schedule_v1.md"
"docs/policies/consent/biometric_consent_template_v1.md"
"docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md"
"docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md"
"docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md"
"docs/specs/GATE_3B_DEEPFACE_DESIGN.md"
"docs/PHASE_1_6_BIPA_GATES.md"
)
# Verify all referenced files exist before opening the tarball.
MISSING=0
for f in "${FILES[@]}"; do
if [ ! -r "$f" ]; then
echo "[bundle] MISSING: $f" >&2
MISSING=$((MISSING + 1))
fi
done
if [ "$MISSING" -gt 0 ]; then
echo "[bundle] FAIL: $MISSING required documents missing — aborting" >&2
exit 1
fi
OUT_DIR="reports/counsel"
mkdir -p "$OUT_DIR"
TARBALL="$OUT_DIR/counsel_packet_${DATE}.tar.gz"
MANIFEST="$OUT_DIR/counsel_packet_${DATE}.manifest.txt"
# Build the manifest first — counsel uses this to verify integrity.
{
echo "# Counsel Packet Manifest — $DATE"
echo "# Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "# Each file is listed with its SHA-256 hash. To verify on receipt:"
echo "# tar xzf counsel_packet_${DATE}.tar.gz"
echo "# sha256sum -c counsel_packet_${DATE}.manifest.txt"
echo "# (re-format the lines below with two spaces between hash and path"
echo "# for sha256sum -c compatibility — sha256sum's strict format)"
echo
for f in "${FILES[@]}"; do
sha256sum "$f"
done
} > "$MANIFEST"
# Build the tarball — include the manifest itself.
tar -czf "$TARBALL" "${FILES[@]}" "$MANIFEST"
PACKET_HASH=$(sha256sum "$TARBALL" | awk '{print $1}')
echo "[bundle] packet: $TARBALL"
echo "[bundle] manifest: $MANIFEST"
echo "[bundle] tarball SHA-256: $PACKET_HASH"
echo "[bundle] files: ${#FILES[@]}"

View File

@ -0,0 +1,266 @@
#!/usr/bin/env bash
# verify_biometric_erasure — confirm that a biometric erasure completed cleanly.
#
# Specification: docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §3 step 3.
# Spec: docs/PHASE_1_6_BIPA_GATES.md §1 Gate 5.
#
# Why this exists: when an operator runs the erasure curl call against
# /biometric/subject/{id}/erase, they need a defensible artifact proving
# destruction completed. This script produces that artifact by checking
# four things:
#
# 1. SubjectManifest.biometric_collection is null (catalogd cleared the row)
# 2. data/biometric/uploads/<safe_id>/ is empty or absent (photo file gone)
# 3. Most recent audit row has accessor.kind in {biometric_erasure, full_erasure}
# AND result is "erased" or "success" (the chain logged the erasure intent)
# 4. audit_log.chain_verified is true (HMAC chain still intact end-to-end)
#
# All four must pass for an operator to mark the destruction complete.
#
# Usage:
# verify_biometric_erasure.sh <candidate_id> [--from ISO] [--to ISO]
#
# Environment:
# GATEWAY_URL — default http://localhost:3100
# LEGAL_TOKEN_FILE — default /etc/lakehouse/legal_audit.token
# UPLOADS_ROOT — default data/biometric/uploads (relative to repo root)
# OUT_DIR — default reports/biometric (where the verification report lands)
#
# Exit codes:
# 0 — all four checks pass; erasure verified
# 1 — one or more checks failed; do NOT mark destruction complete; escalate
# 2 — script error (missing tools, network failure, bad token)
set -uo pipefail
cd "$(dirname "$0")/../.."
if [ "$#" -lt 1 ]; then
echo "usage: verify_biometric_erasure.sh <candidate_id> [--from ISO] [--to ISO]" >&2
exit 2
fi
CANDIDATE_ID="$1"
shift
FROM=""
TO=""
while [ "$#" -gt 0 ]; do
case "$1" in
--from) FROM="$2"; shift 2 ;;
--to) TO="$2"; shift 2 ;;
*) echo "unknown flag: $1" >&2; exit 2 ;;
esac
done
GATEWAY_URL="${GATEWAY_URL:-http://localhost:3100}"
LEGAL_TOKEN_FILE="${LEGAL_TOKEN_FILE:-/etc/lakehouse/legal_audit.token}"
UPLOADS_ROOT="${UPLOADS_ROOT:-data/biometric/uploads}"
OUT_DIR="${OUT_DIR:-reports/biometric}"
# Dependency gates — fail fast with clear errors rather than producing
# a misleading "evidence" file from missing tools.
for cmd in curl jq sha256sum; do
if ! command -v "$cmd" >/dev/null 2>&1; then
echo "[verify] FAIL: required tool '$cmd' not found in PATH" >&2
exit 2
fi
done
if [ ! -r "$LEGAL_TOKEN_FILE" ]; then
echo "[verify] FAIL: cannot read legal token at $LEGAL_TOKEN_FILE" >&2
echo "[verify] This script requires legal-tier auth to query /audit/subject/." >&2
exit 2
fi
LEGAL_TOKEN=$(tr -d '[:space:]' < "$LEGAL_TOKEN_FILE")
if [ -z "$LEGAL_TOKEN" ]; then
echo "[verify] FAIL: legal token file is empty" >&2
exit 2
fi
# safe_id matches catalogd::biometric_endpoint::sanitize_for_path:
# any non-[A-Za-z0-9_.-] char is replaced with underscore.
SAFE_ID=$(printf '%s' "$CANDIDATE_ID" | sed 's/[^A-Za-z0-9_.\-]/_/g')
mkdir -p "$OUT_DIR"
DATE=$(date -u +%Y-%m-%dT%H-%M-%SZ)
OUT="$OUT_DIR/erasure_verify_${SAFE_ID}_${DATE}.md"
EVIDENCE=$(mktemp)
trap 'rm -f "$EVIDENCE"' EXIT
PASS=0
FAIL=0
note() { echo "$1" >> "$EVIDENCE"; }
mark_pass() { PASS=$((PASS+1)); note " - PASS: $1"; }
mark_fail() { FAIL=$((FAIL+1)); note " - FAIL: $1"; }
note "## Verification target"
note ""
note "- **candidate_id:** \`$CANDIDATE_ID\`"
note "- **safe_id (filesystem):** \`$SAFE_ID\`"
note "- **gateway:** \`$GATEWAY_URL\`"
note "- **uploads root:** \`$UPLOADS_ROOT\`"
note "- **window:** ${FROM:-unbounded}${TO:-unbounded}"
note ""
# ── Fetch the audit response ────────────────────────────────────────
QUERY=""
if [ -n "$FROM" ]; then QUERY="from=$FROM"; fi
if [ -n "$TO" ]; then
if [ -n "$QUERY" ]; then QUERY="${QUERY}&to=$TO"; else QUERY="to=$TO"; fi
fi
URL="$GATEWAY_URL/audit/subject/$CANDIDATE_ID"
if [ -n "$QUERY" ]; then URL="$URL?$QUERY"; fi
RESP_FILE=$(mktemp)
HTTP_CODE=$(curl -sS -o "$RESP_FILE" -w '%{http_code}' \
-H "X-Lakehouse-Legal-Token: $LEGAL_TOKEN" \
-H "Accept: application/json" \
"$URL" 2>&1) || HTTP_CODE="000"
if [ "$HTTP_CODE" != "200" ]; then
echo "[verify] FAIL: GET $URL returned HTTP $HTTP_CODE" >&2
echo "[verify] response head:" >&2
head -c 500 "$RESP_FILE" >&2
echo >&2
rm -f "$RESP_FILE"
exit 2
fi
# Schema sanity — refuse to evaluate against an unrecognized response shape.
SCHEMA=$(jq -r '.schema // ""' < "$RESP_FILE")
if [ "$SCHEMA" != "subject_audit_response.v1" ]; then
echo "[verify] FAIL: unexpected response schema '$SCHEMA' (want subject_audit_response.v1)" >&2
rm -f "$RESP_FILE"
exit 2
fi
# ── Check 1: manifest.biometric_collection is null ──────────────────
note "## Check 1 — Subject manifest biometric_collection is null"
note ""
BIO_COLL=$(jq -c '.manifest.biometric_collection // null' < "$RESP_FILE")
note "**manifest.biometric_collection:** \`$BIO_COLL\`"
note ""
if [ "$BIO_COLL" = "null" ]; then
mark_pass "biometric_collection field is null on the subject manifest"
else
mark_fail "biometric_collection is still populated — erasure incomplete"
fi
note ""
# ── Check 2: filesystem uploads dir is empty/absent ─────────────────
note "## Check 2 — Quarantined upload directory empty or absent"
note ""
UPLOAD_DIR="$UPLOADS_ROOT/$SAFE_ID"
note "**path:** \`$UPLOAD_DIR\`"
if [ ! -e "$UPLOAD_DIR" ]; then
note "**state:** absent (directory was removed during erasure or never existed)"
note ""
mark_pass "upload directory is absent"
elif [ ! -d "$UPLOAD_DIR" ]; then
note "**state:** path exists but is not a directory — investigate"
note ""
mark_fail "upload path exists and is not a directory: $UPLOAD_DIR"
else
REMAINING=$(find "$UPLOAD_DIR" -maxdepth 1 -mindepth 1 2>/dev/null | wc -l | tr -d '[:space:]')
: "${REMAINING:=0}"
note "**state:** directory exists with $REMAINING remaining entries"
note ""
if [ "$REMAINING" = "0" ]; then
mark_pass "upload directory is empty (no residual photo files)"
else
mark_fail "$REMAINING file(s) remain under $UPLOAD_DIR — must be unlinked"
note "### Residual files"
note ""
note '```'
find "$UPLOAD_DIR" -maxdepth 2 >> "$EVIDENCE"
note '```'
note ""
fi
fi
# ── Check 3: most recent audit row reflects erasure ─────────────────
note "## Check 3 — Audit log records the erasure event"
note ""
ROW_COUNT=$(jq '.audit_log.rows | length' < "$RESP_FILE")
note "**rows in window:** $ROW_COUNT"
if [ "$ROW_COUNT" = "0" ]; then
mark_fail "no audit rows in the requested window — erasure should have appended one"
note ""
else
LAST_KIND=$(jq -r '.audit_log.rows | last | .accessor.kind // ""' < "$RESP_FILE")
LAST_RESULT=$(jq -r '.audit_log.rows | last | .result // ""' < "$RESP_FILE")
LAST_TS=$(jq -r '.audit_log.rows | last | .ts // ""' < "$RESP_FILE")
note "**last row:** ts=\`$LAST_TS\` accessor.kind=\`$LAST_KIND\` result=\`$LAST_RESULT\`"
note ""
case "$LAST_KIND" in
biometric_erasure|full_erasure)
case "$LAST_RESULT" in
erased|success)
mark_pass "last audit row is an erasure event ($LAST_KIND/$LAST_RESULT)"
;;
*)
mark_fail "last row kind is $LAST_KIND but result is '$LAST_RESULT' (expected erased/success)"
;;
esac
;;
*)
mark_fail "last audit row accessor.kind is '$LAST_KIND' (expected biometric_erasure or full_erasure)"
;;
esac
fi
note ""
# ── Check 4: HMAC chain verifies end-to-end ─────────────────────────
note "## Check 4 — HMAC chain integrity"
note ""
CHAIN_VERIFIED=$(jq -r '.audit_log.chain_verified' < "$RESP_FILE")
CHAIN_ROOT=$(jq -r '.audit_log.chain_root // ""' < "$RESP_FILE")
CHAIN_ROWS=$(jq -r '.audit_log.chain_rows_total // 0' < "$RESP_FILE")
CHAIN_ERR=$(jq -r '.audit_log.chain_verification_error // ""' < "$RESP_FILE")
note "**chain_verified:** \`$CHAIN_VERIFIED\`"
note "**chain_rows_total:** $CHAIN_ROWS"
note "**chain_root:** \`$CHAIN_ROOT\`"
if [ -n "$CHAIN_ERR" ]; then
note "**chain_verification_error:** \`$CHAIN_ERR\`"
fi
note ""
if [ "$CHAIN_VERIFIED" = "true" ]; then
mark_pass "chain verifies end-to-end ($CHAIN_ROWS rows)"
else
mark_fail "chain integrity broken — destruction is NOT defensible until investigated"
fi
note ""
# ── Render report ───────────────────────────────────────────────────
TOTAL=$((PASS + FAIL))
note "## Summary"
note ""
note "**$PASS / $TOTAL** verification checks pass."
note ""
if [ "$FAIL" -gt 0 ]; then
note "**Status: ERASURE NOT VERIFIED.** Do NOT mark destruction complete. Escalate to engineering before responding to candidate / counsel."
note ""
fi
# Hash response body so the report has a tamper-evident anchor.
RESP_HASH=$(sha256sum "$RESP_FILE" | awk '{print $1}')
EVIDENCE_HASH=$(sha256sum "$EVIDENCE" | awk '{print $1}')
{
echo "# Biometric Erasure Verification — $CANDIDATE_ID"
echo
echo "**Date:** $DATE"
echo "**Spec:** docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §3 step 3"
echo "**Generator:** scripts/staffing/verify_biometric_erasure.sh"
echo
cat "$EVIDENCE"
echo "---"
echo
echo "**Audit response SHA-256:** \`$RESP_HASH\`"
echo "**Evidence summary SHA-256:** \`$EVIDENCE_HASH\`"
echo
} > "$OUT"
rm -f "$RESP_FILE"
echo "[verify] $PASS / $TOTAL checks pass — report: $OUT"
echo "[verify] response hash: $RESP_HASH"
[ "$FAIL" -eq 0 ]