diff --git a/.gitignore b/.gitignore index 93ce997..37cad58 100644 --- a/.gitignore +++ b/.gitignore @@ -61,6 +61,16 @@ data/biometric/ # files stay local for forensics. reports/scrum/ +# Per-event biometric verification reports (timestamp-named, regenerated +# per `verify_biometric_erasure.sh` invocation). Source-of-truth is the +# audit chain itself; these reports are derived views. +reports/biometric/ + +# Counsel transmission tarballs + manifests are regenerated by +# `bundle_counsel_packet.sh` from the tracked `docs/counsel/` source. +# The bundle is transmittable, not source-of-truth. +reports/counsel/ + # Local experiments scratchpad — per the "Test code in main is ACTIVELY # being cleaned out" policy (commits 6aafd41 + f4ebd22), one-off # experiments stay out of the tracked tree. diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md index 33d0361..ee4dd9f 100644 --- a/STATE_OF_PLAY.md +++ b/STATE_OF_PLAY.md @@ -1,12 +1,58 @@ # STATE OF PLAY — Lakehouse -**Last verified:** 2026-05-03 evening CDT -**Verified by:** live probe (gateway restarted 2x, all 11 catalogd subject tests + 11 biometric tests + 6 audit tests + 4 mcp-server Gate-4 tests green; cross-runtime parity 6/6 byte-identical against live audit logs; live curl roundtrip on /biometric returned 200 + chained audit row), not memory. +**Last verified:** 2026-05-05 morning CDT +**Verified by:** live probe (`/audit/health` 200, `/biometric/subject/{id}/erase` 21-test substrate + `/audit/subject/{id}` legal-tier endpoint live verified against WORKER-100; new `verify_biometric_erasure.sh` + `biometric_destruction_report.sh` + `bundle_counsel_packet.sh` smoke-tested clean against live data) — not memory. > **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes. --- +## WHAT LANDED 2026-05-05 (doc reconciliation wave — Gate 3b decision + counsel packet ready) + +This was a **doc-only wave**, not code. Background: J asked for an audit of the BIPA/biometric documentation before production cutover. Audit found moderate fragmentation between docs and shipped code (post-`identityd` collapse, post-Gate-3a-ship, pre-Gate-3b-decision). Closed it in one pass. + +| Item | What changed | Status | +|---|---|---| +| **Gate 3b — DECIDED: Option C (defer classifications)** | `BiometricCollection.classifications` stays `Option = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status flipped from "draft / awaits product" to "DECIDED 2026-05-05". | Locked | +| **Endpoint-path drift** | `PHASE_1_6_BIPA_GATES.md` (3 spots), `BIPA_DESTRUCTION_RUNBOOK.md` (2 spots), `biometric_retention_schedule_v1.md` (1 spot) updated from legacy `/v1/identity/subjects/*` (proposed under separate identityd daemon, never shipped) to actual `/biometric/subject/*` (catalogd-local, shipped `848a458`). Schema block in `PHASE_1_6_BIPA_GATES.md` rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate (not the proposed Postgres `subjects` table). | Reconciled | +| **Consent template + retention schedule** | Both revised for Option C: removed all "automated facial-classification" / "deepface" language so disclosed scope matches implemented scope. Pending counsel review — they were already eng-staged with ⚖ markers. | Eng-staged for counsel | +| **`scripts/staffing/verify_biometric_erasure.sh`** (NEW) | Operator-side verification of an erasure event. Curls `/audit/subject/{id}` with legal-tier token, checks: manifest.biometric_collection null, uploads dir empty, last audit row is `biometric_erasure`/`full_erasure` with `erased`/`success`, chain_verified=true. Writes a hashed report to `reports/biometric/`. | Smoke-tested live | +| **`scripts/staffing/biometric_destruction_report.sh`** (NEW) | Monthly destruction-event aggregation. Anonymizes candidate IDs (sha256-12 prefix), counts by scope + trigger, flags anomalies. Smoke-test on May 2026 data found 1 historical `biometric_erasure`/`consent_withdrawal` event (test fixture). | Smoke-tested live | +| **`docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md`** (NEW) | Captures the rotation procedure operationalized after the 2026-05-05 `/tmp` wipe incident. Covers: when to rotate, pre-rotation snapshot, atomic-swap procedure, post-rotation verification (incl. expected pre-rotation chain tamper-detect under new key), recovery from lost key, ⚖ counsel notes. | Authored | +| **`docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md`** + `bundle_counsel_packet.sh` (NEW) | Cover note bundling all eng-staged BIPA docs for counsel review with per-doc questions, sign-off checklist, recommended review sequence. Bundler script tarballs the 8 referenced files + emits a SHA-256 manifest. Tarball ready for transmission: `reports/counsel/counsel_packet_2026-05-05.tar.gz`. | Bundled, ready to send | + +### Eng follow-up that this wave surfaced +- **Double-upload file leak — DIAGNOSED + FIXED** (2026-05-05 same wave). `verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo file. Investigation showed: + - The file was 13 bytes of test fixture (`ff d8 ff d9 + ASCII "TESTBYTES"`), byte-identical to the unit-test fixture at `biometric_endpoint.rs:841`. NO PII, NO biometric content, NO synthetic-face content. Came from manual integration testing on 2026-05-03. + - Audit log timeline showed two consecutive uploads (09:54, 10:04) followed by one erasure (10:22). The erasure unlinked only the SECOND file (which the manifest pointed at by then); the first file was orphaned because the second upload had silently overwritten `manifest.data_path`. + - **Real bug found**: the upload handler did NOT refuse a second upload to a subject with `biometric_collection.is_some()`. Patched `process_upload` to return HTTP 409 + `error: "biometric_already_collected"` when a re-upload is attempted; operator must explicitly POST `/biometric/subject/{id}/erase` first. + - Stranded test file removed (`rm` of the 13-byte fixture). + - New unit test `second_upload_without_erase_returns_409` asserts the 409 + that the first photo's data_path remains unchanged + that the first file remains untouched on disk. + - Existing `repeated_uploads_grow_the_chain` replaced with `upload_erase_upload_grows_the_chain_cleanly` (covers the legitimate re-collection cycle: upload → erase → upload, chain grows to 3 rows). + - Existing `content_type_with_parameters_accepted` test updated to use two distinct subjects (it had used one subject for two uploads to test content-type parsing — now would 409). + - **22 biometric_endpoint tests + 59 catalogd lib tests all green** post-patch (was 21+58 pre-patch). + - Production posture: gateway binary needs rebuild (`cargo build --release`) + `systemctl restart lakehouse.service` to pick up the new 409 behavior in live traffic. +- **Pre-rotation chain tamper-detect (expected, not a bug).** WORKER-{1..5} had pre-2026-05-05 audit chains under the prior `LH_SUBJECT_AUDIT_KEY`. Under the new key (post-`/tmp` wipe rotation), those chains correctly tamper-detect. The rotation runbook §4.4 documents this as expected; a §2.2 pre-rotation snapshot is what would prove they were intact pre-rotation if defensibility ever needs it. + +### What's blocking production cutover NOW (after this wave) +- **Counsel calendar:** the four sign-off items in `COUNSEL_REVIEW_PACKET_2026-05-05.md` (retention schedule, consent template, destruction runbook, pre-identityd attestation). The packet tarball is ready; ⚖ counsel is the bottleneck. +- **Nothing else.** Engineering is no longer the long pole. + +### Phase 1.6 BIPA gates — status table (this is the final post-Option-C state) + +| # | Gate | Status | +|---|---|---| +| 1 | Public retention schedule | **eng-staged**, revised for Option C, ready for counsel | +| 2 | Informed consent template | **eng-staged**, revised for Option C, ready for counsel | +| 3a | Photo upload endpoint | **DONE** (shipped `f1fa6e4`, 11 unit tests, live verified) | +| 3b | Deepface classification | **DECIDED 2026-05-05: Option C (defer)** | +| 4 | Name → ethnicity inference removal | **DONE** (shipped, 4/4 mcp-server tests pass) | +| 5 | Destruction runbook + erasure endpoint | **eng-DONE** (`848a458`, 21 tests). Runbook scripts (verify + report) shipped 2026-05-05. Counsel review pending. | +| §2 | Pre-identityd attestation | **eng-DONE** (3/3 evidence checks). Awaits J + counsel signature. | +| §3 | Employee training | **deferred** (consolidated into runbook §7 acknowledgment for current operator population) | + +--- + ## WHAT LANDED 2026-05-03 (16 commits this wave — local-first audit substrate + Phase 1.6 BIPA gates) The dominant work today: **`docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md` Steps 1-8 SHIPPED end-to-end** + **5 of 7 Phase 1.6 BIPA pre-launch gates** + **6th cross-runtime parity probe**. Wave was structured as eight ship-then-scrum cycles — every wave caught real bugs, every fix wave landed within the same session. diff --git a/crates/catalogd/src/biometric_endpoint.rs b/crates/catalogd/src/biometric_endpoint.rs index 034cf91..c4a0109 100644 --- a/crates/catalogd/src/biometric_endpoint.rs +++ b/crates/catalogd/src/biometric_endpoint.rs @@ -307,6 +307,21 @@ pub async fn process_upload( consent_status: None, })); } + // Refuse double-upload. If a BiometricCollection already exists on + // the manifest, the operator must explicitly erase before re-uploading. + // Without this gate, a second POST silently overwrites manifest.data_path + // and orphans the previous photo file on disk — creating a forever-leak + // pattern and a BIPA defensibility hole ("we said we erased the photo, + // but the previous version of it is still under the same subject dir"). + // Caught 2026-05-05 by verify_biometric_erasure.sh against WORKER-2. + if manifest.biometric_collection.is_some() { + return Err((StatusCode::CONFLICT, ErrorResponse { + error: "biometric_already_collected", + detail: "subject already has a BiometricCollection on the manifest; \ + POST /biometric/subject/{id}/erase first if you intend to replace the photo".into(), + consent_status: None, + })); + } let template_hash = { let mut h = Sha256::new(); @@ -947,15 +962,92 @@ mod tests { } #[tokio::test] - async fn repeated_uploads_grow_the_chain() { - let state = fixture_state("repeated").await; - let _ = state.registry.put_subject(fixture_manifest("WORKER-5", BiometricConsentStatus::Given, SubjectStatus::Active)).await; + async fn second_upload_without_erase_returns_409() { + // BIPA defensibility: a second upload to a subject that already + // has a BiometricCollection must fail-closed. Without this gate, + // the second upload silently overwrites manifest.data_path and + // orphans the first photo on disk forever (caught 2026-05-05 on + // WORKER-2 by verify_biometric_erasure.sh). + let state = fixture_state("second_upload_409").await; + let _ = state.registry.put_subject(fixture_manifest("WORKER-DUP", BiometricConsentStatus::Given, SubjectStatus::Active)).await; + let storage_root = state.storage_root.clone(); + let registry = state.registry.clone(); + + // First upload succeeds. + let resp1 = process_upload(&state, "WORKER-DUP", Some(TEST_TOKEN), Some("image/jpeg"), "v1", "", &jpeg_bytes()) + .await.unwrap(); + let first_path = storage_root.join(&resp1.data_path); + assert!(first_path.exists(), "first upload should produce a file"); + + // Second upload refused with 409. + let err = process_upload(&state, "WORKER-DUP", Some(TEST_TOKEN), Some("image/jpeg"), "v1", "", &jpeg_bytes()) + .await.unwrap_err(); + assert_eq!(err.0, StatusCode::CONFLICT); + assert_eq!(err.1.error, "biometric_already_collected"); + + // Manifest still points at the first upload — pointer was NOT overwritten. + let m = registry.get_subject("WORKER-DUP").await.unwrap(); + let bc = m.biometric_collection.as_ref().expect("collection should still be set"); + assert_eq!(bc.data_path, resp1.data_path, + "manifest data_path must be unchanged after refused second upload"); + + // First file remains on disk untouched (refusal must not unlink it). + assert!(first_path.exists(), "first upload's file must remain after refused second upload"); + let still_on_disk = tokio::fs::read(&first_path).await.unwrap(); + assert_eq!(still_on_disk, jpeg_bytes(), + "first upload's bytes must not have been overwritten"); + } + + #[tokio::test] + async fn upload_erase_upload_grows_the_chain_cleanly() { + // Prior version of this test allowed repeated uploads to chain; + // that conflated chain growth with allowed re-upload. Under the + // double-upload guard (409 above), the only legitimate way to + // re-collect is upload → erase → upload. Chain grows to 3 rows + // (collection, erasure, collection); on-disk file count returns + // to one after the second upload. + let state = fixture_state("upload_erase_upload").await; + let _ = state.registry.put_subject(fixture_manifest("WORKER-CYCLE", BiometricConsentStatus::Given, SubjectStatus::Active)).await; let writer = state.writer.clone(); - for _ in 0..2 { - let _ = process_upload(&state, "WORKER-5", Some(TEST_TOKEN), Some("image/jpeg"), "", "", &jpeg_bytes()) - .await.unwrap(); - } - assert_eq!(writer.verify_chain("WORKER-5").await.unwrap(), 2); + let storage_root = state.storage_root.clone(); + + // First upload. + let resp1 = process_upload(&state, "WORKER-CYCLE", Some(TEST_TOKEN), Some("image/jpeg"), "", "", &jpeg_bytes()) + .await.unwrap(); + let first_path = storage_root.join(&resp1.data_path); + assert!(first_path.exists()); + + // Erase. Uses process_erase test helper (the production path + // parses the EraseRequest from request body; tests inject it + // directly). Note: the erase flow flips biometric.status to + // Withdrawn, so the post-erase second upload must reset consent + // first (production flow would require new consent collection). + let _ = process_erase(&state, "WORKER-CYCLE", Some(TEST_TOKEN), "trace-cycle", fixture_erase_request("biometric_only")) + .await.unwrap(); + assert!(!first_path.exists(), "first photo file must be unlinked by erase"); + + // Reset consent + status on the post-erase manifest so the second + // upload can proceed (production flow would require new consent + // collection here; for this test we directly flip the manifest). + let mut post_erase = state.registry.get_subject("WORKER-CYCLE").await.unwrap(); + post_erase.consent.biometric.status = BiometricConsentStatus::Given; + post_erase.status = SubjectStatus::Active; + post_erase.biometric_collection = None; + let _ = state.registry.put_subject(post_erase).await; + + // Second upload (legitimate, after erase). + let resp2 = process_upload(&state, "WORKER-CYCLE", Some(TEST_TOKEN), Some("image/jpeg"), "", "", &jpeg_bytes()) + .await.unwrap(); + let second_path = storage_root.join(&resp2.data_path); + assert!(second_path.exists(), "second upload should produce a file"); + assert_ne!(resp1.data_path, resp2.data_path, "second upload should land at a new path"); + + // Chain has 3 rows: collection, erasure, collection. + assert_eq!(writer.verify_chain("WORKER-CYCLE").await.unwrap(), 3); + let rows = writer.read_rows_in_range("WORKER-CYCLE", None, None).await.unwrap(); + assert_eq!(rows[0].accessor.kind, "biometric_collection"); + assert_eq!(rows[1].accessor.kind, "biometric_erasure"); + assert_eq!(rows[2].accessor.kind, "biometric_collection"); } #[tokio::test] @@ -985,18 +1077,23 @@ mod tests { // Caught 2026-05-03 opus scrum WARN; regression test ensures // the bare media type is matched after stripping parameters. let state = fixture_state("ct_with_params").await; - let _ = state.registry.put_subject(fixture_manifest("WORKER-CT", BiometricConsentStatus::Given, SubjectStatus::Active)).await; + // Two distinct subjects so each upload exercises the "first upload" + // path. Prior version used one subject and two uploads — now blocked + // by the double-upload guard (409). The test's actual intent is + // content-type parsing, not re-upload tolerance. + let _ = state.registry.put_subject(fixture_manifest("WORKER-CT-A", BiometricConsentStatus::Given, SubjectStatus::Active)).await; + let _ = state.registry.put_subject(fixture_manifest("WORKER-CT-B", BiometricConsentStatus::Given, SubjectStatus::Active)).await; let resp = process_upload( - &state, "WORKER-CT", Some(TEST_TOKEN), + &state, "WORKER-CT-A", Some(TEST_TOKEN), Some("image/jpeg; charset=binary"), "", "", &jpeg_bytes(), ).await.unwrap(); - assert_eq!(resp.candidate_id, "WORKER-CT"); + assert_eq!(resp.candidate_id, "WORKER-CT-A"); // Also case-insensitive matching: "Image/JPEG" should work too. let resp2 = process_upload( - &state, "WORKER-CT", Some(TEST_TOKEN), + &state, "WORKER-CT-B", Some(TEST_TOKEN), Some("Image/JPEG"), "", "", &jpeg_bytes(), ).await.unwrap(); - assert_eq!(resp2.candidate_id, "WORKER-CT"); + assert_eq!(resp2.candidate_id, "WORKER-CT-B"); } // ─── Erasure tests (Gate 5) ────────────────────────────────────── diff --git a/docs/PHASE_1_6_BIPA_GATES.md b/docs/PHASE_1_6_BIPA_GATES.md index ddd25c5..5fd2727 100644 --- a/docs/PHASE_1_6_BIPA_GATES.md +++ b/docs/PHASE_1_6_BIPA_GATES.md @@ -22,7 +22,7 @@ Each gate is a deliverable that must ship before real-photo intake. None is opti - `docs/policies/consent/biometric_retention_schedule_v1.md` — public file - Linked from public privacy policy at the deployment URL - Specifies: - - Categories of biometric data collected (facial geometry derived from candidate photos, age estimate, gender classification, race classification — per Phase 1.5 deepface walk) + - Categories of biometric data collected (facial photograph for staff identification at job sites; classifications deferred per Gate 3b — see `docs/specs/GATE_3B_DEEPFACE_DESIGN.md`) - Purpose of collection (identity matching for staffing operations) - Maximum retention: BIPA §15(a) caps at "3 years from the individual's last interaction with the private entity, whichever occurs first" — recommend 18-24 months as the operational ceiling (provides safety margin) - Destruction procedure: per Gate 5 below @@ -67,7 +67,7 @@ Each gate is a deliverable that must ship before real-photo intake. None is opti **What ships:** -A new endpoint (proposed: `POST /v1/identity/subjects/{candidate_id}/photo`) with the following behavior: +An endpoint at `POST /biometric/subject/{candidate_id}/photo` (catalogd-local — the original v1 spec named this `/v1/identity/subjects/{candidate_id}/photo` under a separate identityd daemon; that daemon was collapsed into catalogd per the architecture pivot. See `IDENTITY_SERVICE_DESIGN.md` deprecation header.) with the following behavior: 1. Caller authenticates with service-tier token 2. Endpoint queries identityd for `subjects.biometric_consent_status` @@ -75,18 +75,23 @@ A new endpoint (proposed: `POST /v1/identity/subjects/{candidate_id}/photo`) wit 4. If status = `'given'`: a. Photo bytes accepted, stored to a quarantined path under `data/biometric/uploads/{candidate_id}/{ts}.{ext}` (NOT `data/headshots/`) b. deepface tagging runs against the photo - c. Classifications (gender, race, age) stored to `subjects` table fields (NEW columns — see schema additions below) + c. Classifications (gender, race, age) — **DEFERRED to Gate 3b** (`docs/specs/GATE_3B_DEEPFACE_DESIGN.md`). `BiometricCollection.classifications` remains `None` in v1. d. Original photo bytes encrypted under DEK + retained per Gate 1 schedule e. `pii_access_log` row written with `purpose_token='biometric_collection'` 5. Response: `{candidate_id, retention_until, consent_version}` -**Schema additions to identityd `subjects`:** +**Schema (as shipped — catalogd `SubjectManifest.biometric_collection`):** -```sql -ALTER TABLE subjects ADD COLUMN biometric_classifications JSONB; -- {gender, race, age} from deepface -ALTER TABLE subjects ADD COLUMN biometric_data_path TEXT; -- quarantined path -ALTER TABLE subjects ADD COLUMN biometric_collected_at TIMESTAMPTZ; -ALTER TABLE subjects ADD COLUMN biometric_template_hash TEXT; -- hash of the photo bytes (for integrity, NOT for re-derivation) +The original spec proposed JSONB columns on a Postgres `subjects` table under identityd. The shipped implementation collapses this into a per-subject JSON manifest at `data/_catalog/subjects/.json`, with the `BiometricCollection` struct holding `data_path`, `template_hash`, `collected_at`, and `classifications: Option`. See `crates/catalogd/src/subject_manifest.rs` for the canonical type. + +```rust +// crates/catalogd/src/subject_manifest.rs (paraphrased) +pub struct BiometricCollection { + pub data_path: String, // quarantined path + pub template_hash: String, // SHA-256 of original bytes (integrity, NOT re-derivation) + pub collected_at: DateTime, + pub classifications: Option, // None until Gate 3b ships (deferred — see GATE_3B_DEEPFACE_DESIGN.md) +} ``` **Engineering acceptance:** @@ -130,8 +135,8 @@ ALTER TABLE subjects ADD COLUMN biometric_template_hash TEXT; -- hash of the - `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` — operator-facing - Specifies: - Triggers: retention expiry (per Gate 1), withdrawal, RTBF request, candidate request - - Procedure: identityd `POST /v1/identity/subjects/{id}/erase` (legal-tier auth) - - Erasure scope: `subjects.biometric_*` columns ciphertext-deleted, `biometric_data_path` files securely overwritten + unlinked, deepface classifications nulled + - Procedure: catalogd-local `POST /biometric/subject/{id}/erase` (legal-tier auth) — formerly proposed under identityd; now serves from catalogd directly + - Erasure scope: `BiometricCollection` set to `None` on the subject manifest (drops `data_path`, `template_hash`, `classifications` together), quarantined photo files at `data/biometric/uploads//*` securely unlinked, audit row appended BEFORE photo unlink so the chain proves intent even if file delete fails - Backup window: per `IDENTITY_SERVICE_DESIGN` v3-B12, residual exists in DB backups for 30 days max; subject is informed - Witnessed: every erasure event written to `pii_access_log` with `purpose_token='biometric_erasure'` and the legal-tier JWT signature (proves authorized destruction) - Reporting: monthly internal report of erasures + retention-expiry sweeps; available to counsel on request @@ -140,7 +145,7 @@ ALTER TABLE subjects ADD COLUMN biometric_template_hash TEXT; -- hash of the **Engineering acceptance:** - Runbook committed -- `POST /v1/identity/subjects/{id}/erase` endpoint includes biometric-specific erasure path +- `POST /biometric/subject/{id}/erase` endpoint includes biometric-specific erasure path (shipped `848a458` — 21 unit tests, two scopes: biometric_only / full) - Daily sweep job destroys biometric data past `biometric_retention_until` (separate from general retention sweep — biometric has stricter clock) - Erasure events are logged with cryptographic attestation @@ -188,7 +193,7 @@ of 2026-05-03 — scaffolds vs. counsel sign-off vs. shipped code: |---|---|---|---|---| | 1 | Public retention schedule | scaffolded at `docs/policies/consent/biometric_retention_schedule_v1.md` | pending | **eng-staged** | | 2 | Consent template | scaffolded at `docs/policies/consent/biometric_consent_template_v1.md` | pending | **eng-staged** | -| 3 | Photo-upload endpoint with consent enforcement | DONE for the consent-gate substrate (`crates/catalogd/src/biometric_endpoint.rs` mounted at `/biometric/subject/{id}/photo`, 10 unit tests, live-verified end-to-end). Deepface classification deferred to **Gate 3b** (own session — needs Python subprocess design after sidecar drop). | n/a until 3b | **3a DONE, 3b deferred** | +| 3 | Photo-upload endpoint with consent enforcement | DONE — `crates/catalogd/src/biometric_endpoint.rs` mounted at `/biometric/subject/{id}/photo`, 11 unit tests, live-verified end-to-end. **Gate 3b DECIDED 2026-05-05: Option C (defer classifications).** `BiometricCollection.classifications` stays `Option = None` in v1; consent + retention docs revised to match. See `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` §6 + change log. | reviewed under Gate 2 (matching consent text) | **DONE — 3a shipped, 3b deferred per design doc** | | 4 | Name → ethnicity inference removed | DONE — `mcp-server/search.html:3372` removal note + `mcp-server/phase_1_6_gate_4.test.ts` absence test (3/3 green) | none required | **DONE** | | 5 | Destruction runbook | scaffolded at `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`; erasure endpoint + verify/report scripts marked TODO | pending | **eng-staged** | @@ -212,10 +217,14 @@ re-promotes to blocking and a separate training program must be authored. expected operator population size, or restore it to the blocking set. **Calendar bottleneck:** Items 1, 2, 5, 6 (and #7) await counsel -review of the engineering scaffolds. Gate 3 (photo-upload endpoint) -is the only remaining engineering work; it's deferred to its own -session because it crosses into identityd photo intake and deepface -integration scope that hasn't been designed yet. +review of the engineering scaffolds. Gate 3 substrate is fully +shipped; Gate 3b deepface classification was DECIDED on 2026-05-05 +as Option C (defer) — `BiometricCollection.classifications` stays +`None` in v1, consent + retention docs revised to match this +narrower scope. If a future product requirement surfaces a real +need for classifications, the substrate is forward-compatible +(`Option`) and either Option A (~1 day) or Option B (~5 days) +of the design doc can be picked up then under a v2 consent template. --- @@ -258,4 +267,5 @@ integration scope that hasn't been designed yet. ## Change log +- 2026-05-05 — Reconciled with shipped state: endpoint paths corrected from the legacy identityd v1 spec (`/v1/identity/subjects/*`) to the catalogd-local routes that actually shipped (`/biometric/subject/*`). Schema block rewritten to reflect the JSON `SubjectManifest.biometric_collection` substrate that replaced the proposed Postgres columns. Gate 3b deepface deferral marked in-line where Disclosure 1 / Gate 3 step 5c / Gate 5 erasure scope previously assumed classifications were collected. No legal text changed; this was doc/code drift cleanup. - 2026-05-03 — Initial draft. Authored after `IDENTITY_SERVICE_DESIGN` v3 §5 Step 0 named Phase 1.6 as a hard prerequisite to backfill. diff --git a/docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md b/docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md new file mode 100644 index 0000000..7dc61f1 --- /dev/null +++ b/docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md @@ -0,0 +1,260 @@ +# Counsel Review Packet — Phase 1.6 BIPA Pre-Launch + +**Date assembled:** 2026-05-05 +**For:** outside counsel +**From:** J, operator of record +**Scope:** documents that engineering has staged for legal sufficiency review + before the staffing platform begins collecting any real candidate + biometric data (BIPA §15(a)(b)). + +> **What this packet is.** The Phase 1.6 BIPA gates outline what +> engineering must ship before real-photo intake. As of 2026-05-05, +> all engineering substrate is shipped and verified live (see §1 +> below for the inventory). What remains is binding-text authoring +> + counsel sign-off on five documents, plus operational notification +> obligations counsel may want to layer on top. +> +> **What this packet is NOT.** Not a request for counsel to write +> binding text from scratch. The documents are eng-staged in +> reasonable plain language; the request is for counsel to render +> them into legally-sufficient text and attest where signatures +> are required. + +--- + +## 1. Engineering substrate — shipped + verified + +For factual context on what counsel is reviewing AGAINST. None of +this requires sign-off here; it's the system the documents bind to. + +| Component | Where it lives | Verification | +|---|---|---| +| Subject manifest registry | `crates/catalogd/src/registry.rs`, `data/_catalog/subjects/.json` | 17 unit tests + 100 backfilled WORKER manifests in production | +| Per-subject HMAC audit chain (SHA-256) | `crates/catalogd/src/subject_audit.rs`, `data/_catalog/subjects/.audit.jsonl` | Tamper-detection + concurrent-append race tests pass | +| Photo upload (consent-gated) | `POST /biometric/subject/{id}/photo` | 11 unit tests + live roundtrip 200 | +| Erasure (two-scope) | `POST /biometric/subject/{id}/erase` (`biometric_only` / `full`) | 21 unit tests; transactional rollback on audit failure | +| Legal-tier audit read | `GET /audit/subject/{id}` (X-Lakehouse-Legal-Token header) | Constant-time auth, chain re-verification per request | +| Retention sweep (BIPA-aware clock) | `crates/catalogd/src/bin/retention_sweep` | 8 unit tests; live verified against 100 backfilled subjects | +| Cross-runtime parity (Rust ↔ Go) | `scripts/cutover/parity/subject_audit_parity.sh` | 6/6 byte-identical assertions pass | + +**Key insight for counsel:** the audit chain is the BIPA-defensible +substrate. Every state-changing event (consent given, photo uploaded, +photo erased, legal-tier read) appends to a per-subject HMAC-chained +JSONL log. The chain verifies end-to-end on every legal-tier read. +A tampered chain is detectable; a forged chain requires the HMAC +signing key, which is held under root-only mode 0400 at +`/etc/lakehouse/subject_audit.key` and rotated per the runbook in +attachment §6 below. + +**Gate 3b (deepface classification) — decided 2026-05-05: Option C +(defer).** The system collects only the photograph, not derived +demographic information. The consent template + retention schedule +in this packet were revised the same day to match. + +--- + +## 2. Documents requiring counsel review + sign-off + +In recommended review order: + +| # | Document | Path | Counsel ask | Sign-off | +|---|---|---|---|---| +| A | Biometric Retention Schedule v1 | `docs/policies/consent/biometric_retention_schedule_v1.md` | Render into binding language; confirm 18-month operational ceiling vs. BIPA 3-year statutory cap | Counsel + J | +| B | Biometric Consent Template v1 | `docs/policies/consent/biometric_consent_template_v1.md` | Render Disclosures 1-3 into binding consent language; specify electronic vs. paper signature mechanism | Counsel + J | +| C | BIPA Destruction Runbook | `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` | Confirm 30-day SLA from trigger; confirm two-operator (operator + witness) requirement; confirm legal-hold check procedure | Counsel attestation | +| D | BIPA Pre-IdentityD Attestation | `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` | Sign as countersigning party; J signs as operator-of-record | Counsel + J | +| E | Legal-Tier Audit Key Rotation | `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` | Confirm rotation cadence; opine on candidate-notification obligation when rotation is compromise-driven | Counsel notes | +| F | Gate 3b Deepface Design (FYI) | `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` | Decision-of-record showing classifications were *deliberately deferred*, not omitted by oversight. No sign-off needed; provided for audit-trail completeness. | None | + +The five documents requiring sign-off are A, B, C, D, E. Document F +is included so the audit trail shows the Gate 3b decision was +deliberate. + +--- + +## 3. Specific questions for counsel — by document + +### Document A — Retention Schedule + +1. The schedule sets an **18-month** operational ceiling against the + BIPA 3-year statutory cap. Is the safety margin appropriate, or + should we move to a tighter window (12 months) given the + plaintiff-friendly Illinois posture? +2. The schedule references the **catalogd-local** storage substrate + rather than a separate identityd Postgres table. Does the + public-facing language need to mention the storage architecture + at all, or is "we keep the photo and a SHA-256 hash" sufficient? +3. Public publication URL — counsel to specify (placeholder marked + in §7 of the schedule). +4. Confirm whether existing consent under v1 carries forward when + a future v2 is published, or whether re-consent is required. + +### Document B — Consent Template + +1. Disclosure 1 says "we do NOT run automated facial-classification + in v1." Does that disclosure need to mention the *possibility* of + future classification, or is silence-with-supersession-clause + adequate? +2. Plain-language summary in §1 — counsel to confirm it's appropriate + to include alongside the binding disclosure, or recommend an + alternative comprehension aid. +3. Withdrawal SLA is set to **30 days** in §2. Counsel to confirm + against jurisdiction (Illinois primary; secondary deployments + would inherit). +4. Contact for withdrawal — counsel to specify the channel + (placeholder in §3). +5. Sign-off mechanism: electronic signature service, in-app + click-acceptance with timestamp, paper form? Each has different + evidentiary weight. + +### Document C — Destruction Runbook + +1. Confirm 30-day SLA from each of four triggers (retention expiry, + consent withdrawal, RTBF, court order). Some interpretations + prefer 7 or 14 days for withdrawal/RTBF. +2. Two-operator requirement (operator-of-record + witness): is the + witness role acceptable for counsel's defensibility view, or + should we elevate to dual-control with cryptographic split-key? +3. Legal-hold check procedure (§2 step 3) — counsel to specify the + actual procedure for confirming no hold is in force before + erasing. +4. Backup-window disclosure (§4) — confirm 30-day backup retention + is acceptable. +5. Candidate notification template (§3 step 4) — counsel to supply. + +### Document D — Pre-IdentityD Attestation + +1. Both signature lines blank — J signs as operator-of-record; + counsel signs as the countersigning legal party. +2. The attestation hash anchors the evidence; once signed, the + hash itself becomes a tamper-evident witness. Counsel to confirm + storage location for the signed copy (firm files?). + +### Document E — Key Rotation Runbook + +1. Recommended rotation cadence — 90 days suggested in §1. + Counsel to confirm or override. +2. Custody schedule for `/etc/lakehouse/_archived/` raw key files — + §7.2 question; suggested 1-year retention but counsel-driven. +3. Candidate-notification obligation when rotation is + compromise-driven (§7.3) — counsel call. + +--- + +## 4. Engineering changes counsel should know about (recent) + +These reconciled doc/code drift after a rapid wave on 2026-05-03: + +- **Endpoint paths:** the original v1 spec proposed + `/v1/identity/subjects/*` under a separate identityd daemon. That + daemon was collapsed into catalogd; endpoints actually shipped at + `/biometric/subject/*` (catalogd-local). Documents in this packet + reference the catalogd-local routes; legacy references in + `IDENTITY_SERVICE_DESIGN.md` are flagged "do NOT implement + as-written" in that doc's deprecation header. +- **No identityd Postgres database:** the original spec proposed + encrypted-at-rest Postgres + HashiCorp Vault + S3 Object Lock for + PII storage. The shipped substrate is local JSON manifests + + per-subject HMAC-chained JSONL, sized for J's local-only + deployment per `PRD.md` line 70 ("Everything runs locally — no + cloud APIs"). +- **Gate 3b deferral (Option C, 2026-05-05):** classifications + (gender / race / age inference) were deliberately deferred. The + consent template and retention schedule in this packet do NOT + disclose collection of derived demographic data, because we are + not collecting it. If a future product requirement reverses this, + we will publish a v2 consent + v2 retention with re-consent. +- **Key rotation 2026-05-05:** the prior `LH_SUBJECT_AUDIT_KEY` was + lost when a `/tmp` wipe on reboot disabled the audit and biometric + endpoints. The new key is at `/etc/lakehouse/subject_audit.key` + (mode 0400). Pre-rotation audit chains tamper-detect under the + new key — this is correct, expected behavior, not a bug. + +--- + +## 5. Open eng items NOT awaiting counsel + +For transparency. These are engineering work items, not legal items: + +1. **Residual photo unlink on erasure.** During verification of the + one historical erasure event (`WORKER-2`), the verify script + surfaced a stranded photo file that was not unlinked when + `BiometricCollection` was cleared from the manifest. Engineering + investigates; if the bug is real, the fix is `crates/catalogd/src/biometric_endpoint.rs` + in the erasure handler. This does NOT affect the current packet — + no real candidate photos have been collected yet (per §1 + attestation), so the residual is from a synthetic test event. +2. **Phase 1.6 §3 employee training.** Currently deferred per + acknowledgement coverage in §7 of the destruction runbook + (single-operator population). Re-promotes to blocking if the + operator population grows; counsel may want to opine on the + threshold. + +--- + +## 6. Sign-off sequence + +Recommended order so a hold-up on one doc doesn't block others: + +1. **First wave (parallel):** A (retention schedule) + B (consent + template). These two have the tightest interdependence (consent + v1 references retention v1 by hash); review them together. +2. **Second wave:** C (destruction runbook). Depends on A's retention + period being fixed. +3. **Third wave:** D (pre-identityd attestation). Sign once A + B + C + are settled; the attestation snapshot is the boundary between + pre-Phase-1.6 and post-Phase-1.6 system state. +4. **Fourth wave:** E (key rotation). Independent of A-D; can be + reviewed in parallel any time. + +--- + +## 7. After sign-off — engineering steps + +Once each document is signed: + +| Document | Engineering action | Trigger | +|---|---|---| +| A retention schedule | Hash + commit; reference in `consent_versions` table | Counsel signature | +| B consent template | Hash + commit; reference in candidate-facing intake UI | Counsel signature | +| C destruction runbook | Adopt; operator acknowledgment recorded in §7 | Counsel attestation | +| D pre-identityd attestation | Anchor hash to filesystem + git; counsel keeps original signature | Both signatures | +| E key rotation | Adopt; rotation event log seeded with counsel-approved cadence | Counsel notes | + +The HARD blocker for first real-candidate photo collection is +A + B + D signed. C and E are operationally important but do not +block the *first* photo (they govern destruction + key handling +which apply to any state, not the boundary state). + +--- + +## 8. Cover-note hash + +This packet is itself a snapshot. Future-Claude / future-J will refer +back to this packet to know what counsel saw on 2026-05-05. + +**Packet attached files (referenced by path):** + +- `docs/policies/consent/biometric_retention_schedule_v1.md` +- `docs/policies/consent/biometric_consent_template_v1.md` +- `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` +- `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` +- `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` +- `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` +- `docs/PHASE_1_6_BIPA_GATES.md` (the spec they all reference) + +Per-file SHA-256 hashes are produced by the bundler script (next +section); the bundler also creates a tarball ready for transmission. + +--- + +## 9. Generating the bundle for transmission + +```bash +./scripts/staffing/bundle_counsel_packet.sh +``` + +Produces `reports/counsel/counsel_packet_.tar.gz` with all +referenced documents + a manifest listing per-file SHA-256 hashes. +Counsel can verify file integrity on receipt by re-running +sha256sum against each file in the tarball. diff --git a/docs/policies/consent/biometric_consent_template_v1.md b/docs/policies/consent/biometric_consent_template_v1.md index 5753257..445097e 100644 --- a/docs/policies/consent/biometric_consent_template_v1.md +++ b/docs/policies/consent/biometric_consent_template_v1.md @@ -3,6 +3,7 @@ **Spec:** docs/PHASE_1_6_BIPA_GATES.md §1 Gate 2 (BIPA §15(b)(1)-(3)) **Status:** Engineering scaffold — ⚖ COUNSEL must author the binding text before deployment **Version:** v1 (initial; supersession requires a new version + new hash) +**Updated 2026-05-05:** Disclosure 1 + plain-language summary revised to match the Gate 3b deferral recommendation in `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` (Option C — defer classifications). Pending J's product confirmation of Gate 3b; if Gate 3b chooses Option A or B, this template needs counsel re-authoring. > This is the consent template a candidate signs (electronically or > on paper) BEFORE Lakehouse collects, stores, or processes any @@ -25,11 +26,12 @@ content; counsel provides the legally-sufficient wording. ### Disclosure 1 — Notice of collection (§15(b)(1)) -Lakehouse will collect, store, and use my **biometric identifier** -(facial geometry derived from a photograph of me) and **biometric -information** (gender, race, and age classifications derived from -that photograph by an automated facial-classification model called -deepface). +Lakehouse will collect and store my **biometric identifier** (a +photograph of me from which facial geometry is implicit). The +photograph itself is the data we keep — we do NOT run automated +facial-classification (gender / race / age inference) against it +in v1. If at a later date we add automated classification, we will +re-collect consent under a superseding template before doing so. ### Disclosure 2 — Specific purpose and length of term (§15(b)(2)) @@ -66,9 +68,9 @@ is appropriate to include or whether a different plain-language section is preferred. > **What you're agreeing to:** if you upload a photo of yourself, -> we'll keep that photo and a few descriptive labels about the photo -> (estimated age, perceived gender, perceived race) to help your -> staffing coordinator recognize you when you arrive at job sites. +> we'll keep that photo so your staffing coordinator can recognize +> you when you arrive at job sites. We don't run automated guesses +> about your age, gender, or race against the photo. > > **How long we keep it:** at most 18 months after your last > placement or interaction with us, then it's permanently destroyed. diff --git a/docs/policies/consent/biometric_retention_schedule_v1.md b/docs/policies/consent/biometric_retention_schedule_v1.md index 5c08b20..6a5cdd4 100644 --- a/docs/policies/consent/biometric_retention_schedule_v1.md +++ b/docs/policies/consent/biometric_retention_schedule_v1.md @@ -3,6 +3,7 @@ **Spec:** docs/PHASE_1_6_BIPA_GATES.md §1 Gate 1 (BIPA §15(a)) **Status:** Engineering scaffold — ⚖ COUNSEL must author the binding text before public publication **Version:** v1 (initial; supersession requires a new version + new hash) +**Updated 2026-05-05:** §1 + §2 revised to match the Gate 3b deferral recommendation in `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` (Option C — defer classifications). §5 destruction-trigger endpoint corrected to the shipped catalogd-local route. Pending J's product confirmation of Gate 3b. > This is a publicly-available retention schedule for biometric identifiers > and biometric information collected by the Lakehouse staffing platform. @@ -15,12 +16,15 @@ This schedule applies to: -- **Biometric identifiers** as defined in 740 ILCS 14/10: facial geometry - derived from candidate photographs. +- **Biometric identifiers** as defined in 740 ILCS 14/10: candidate + photographs from which facial geometry is implicit. - **Biometric information** as defined in 740 ILCS 14/10: any information - derived from a biometric identifier, including but not limited to - the gender, race, and age classifications produced by the deepface - model when applied to a candidate photograph. + derived from a biometric identifier. **In v1 of this schedule, no + derived information is collected** — automated facial-classification + (gender, race, age inference) is deferred per + `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` Option C. If a future version + of this schedule introduces classification, that is a superseding + v2 schedule with re-consent under the matching v2 consent template. **Out of scope** (explicitly NOT biometric data under this schedule): @@ -39,14 +43,15 @@ This schedule applies to: | Category | Source | Storage location | |---|---|---| -| Photograph (raw bytes) | Candidate upload via the consent-gated photo endpoint | Quarantined under `data/biometric/uploads//.`; encrypted at rest | -| Facial geometry classifications | deepface inference run against the photograph | `subjects.biometric_classifications` (JSONB on the identityd `subjects` row) | -| Photograph integrity hash | SHA-256 of the original bytes | `subjects.biometric_template_hash` | +| Photograph (raw bytes) | Candidate upload via the consent-gated photo endpoint | Quarantined under `data/biometric/uploads//_.`; mode 0700 dir / 0600 file | +| Photograph integrity hash | SHA-256 of the original bytes | `SubjectManifest.biometric_collection.template_hash` (catalogd JSON manifest at `data/_catalog/subjects/.json`) | We do NOT collect raw biometric template vectors that could be used -to re-derive a face from the encoded form. The deepface output is -stored as discrete classification labels (e.g. `{"age_estimate": 32, -"gender": "...", "race": "..."}`), not as a re-identifiable embedding. +to re-derive a face from the encoded form. We do NOT run automated +facial-classification (gender, race, age inference) in v1 — see +`docs/specs/GATE_3B_DEEPFACE_DESIGN.md` for the deferral rationale. +The `BiometricCollection.classifications` field on the subject +manifest exists in the schema but is `None` for every subject. --- @@ -104,8 +109,8 @@ Runbook** (`docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`) when: - Retention period under §4 expires - Candidate withdraws biometric consent under the consent template (Gate 2) - Candidate exercises a right-to-be-forgotten request -- An identityd `POST /v1/identity/subjects/{id}/erase` is invoked under - legal-tier authentication +- A catalogd-local `POST /biometric/subject/{id}/erase` is invoked + under legal-tier authentication (shipped `848a458`) Every destruction event is recorded as an append-only audit row in the affected subject's per-subject HMAC-chained audit log (see diff --git a/docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md b/docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md index 6062e8e..2a1fe1d 100644 --- a/docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md +++ b/docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md @@ -66,10 +66,11 @@ Before initiating destruction, the operator MUST: Invoke the legal-tier erasure endpoint: ```bash -curl -sf -X POST "http://localhost:3100/v1/identity/subjects/${CANDIDATE_ID}/erase" \ +curl -sf -X POST "http://localhost:3100/biometric/subject/${CANDIDATE_ID}/erase" \ -H "Authorization: Bearer $(cat /etc/lakehouse/legal_audit.token)" \ -H "Content-Type: application/json" \ -d '{ + "scope": "biometric_only|full", "trigger": "retention_expiry|consent_withdrawal|rtbf|court_order", "trigger_evidence_path": "", "operator_of_record": "", @@ -77,17 +78,25 @@ curl -sf -X POST "http://localhost:3100/v1/identity/subjects/${CANDIDATE_ID}/era }' ``` -⚖ ENGINEERING — `POST /v1/identity/subjects/{id}/erase` is Phase 1.6 -Gate 3 dependent. Until it ships, the manual procedure is: +The endpoint is **shipped** (commit `848a458`, 21 unit tests). It is +served from catalogd-local at `/biometric/subject/{id}/erase` (the +original v1 spec proposed `/v1/identity/subjects/{id}/erase` under a +separate identityd daemon — that daemon was collapsed into catalogd +per the architecture pivot). -a. Set `SubjectManifest.consent.biometric.status = "withdrawn"` and - `SubjectManifest.status = "erased"` via direct registry write - (operator-of-record only). -b. Securely overwrite + unlink the quarantined photo path: - `shred -uvz data/biometric/uploads/${CANDIDATE_ID}/*.jpg` - (or equivalent for the configured backend). -c. NULL the deepface classification fields on the subject row. -d. Append the destruction-event audit row (Step 2 below). +The endpoint exposes two scopes: + +- **`scope: "biometric_only"`** — clears `BiometricCollection` from + the SubjectManifest (drops `data_path`, `template_hash`, and + `classifications` together) + securely unlinks the quarantined + photo file. Subject manifest itself remains. Use for retention + expiry / consent withdrawal where only biometric data must go. +- **`scope: "full"`** — full subject erasure (manifest + biometric + files). Use for court-ordered erasure or full RTBF requests. + +In both scopes, the audit row is appended BEFORE photo unlink so +the chain has legal proof of intent even if the file delete fails +(transactional rollback on audit failure). ### Step 2 — Append the destruction-event audit row @@ -224,5 +233,12 @@ training program. ## 8. Change log +- 2026-05-05 — Endpoint path reconciled with shipped state: + `/v1/identity/subjects/{id}/erase` (legacy proposal under a + separate identityd daemon) → `/biometric/subject/{id}/erase` + (catalogd-local, shipped `848a458`). Step 1 manual-fallback + block removed (the endpoint is no longer "TODO"). Two-scope + body shape (`biometric_only` / `full`) documented to match + the implementation. - 2026-05-03 — Initial scaffold. ⚖ COUNSEL review required before adoption. diff --git a/docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md b/docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md new file mode 100644 index 0000000..60dc37c --- /dev/null +++ b/docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md @@ -0,0 +1,308 @@ +# Legal-Tier Audit Key & Token Rotation Runbook + +**Spec companion:** `docs/PHASE_1_6_BIPA_GATES.md` §2 + `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` +**Audience:** Operators with root on the gateway host (J + named operators) +**Status:** Engineering-authored — ⚖ counsel review encouraged before formal adoption + +> This runbook covers rotation of the two crypto-credentials that gate +> the Phase 1.6 audit substrate: +> +> 1. **`LH_SUBJECT_AUDIT_KEY`** — the 32-byte HMAC-SHA256 signing key +> that chains every per-subject audit row. If this key changes, all +> pre-rotation chain rows tamper-detect under the new key. That is +> correct, expected, BIPA-defensible behavior — the chain integrity +> it provided pre-rotation remains intact in the archive of the old +> key, and post-rotation chains remain intact going forward. +> +> 2. **`LH_LEGAL_AUDIT_TOKEN`** — the 32+-character bearer token that +> authorizes calls to `/audit/subject/{id}` and +> `/biometric/subject/{id}/erase`. Rotation does NOT touch any audit +> history; only access to the legal-tier endpoints flips. +> +> Both live at `/etc/lakehouse/` (mode 0400, owned by root) and are +> loaded by the gateway via systemd `Environment=` directives in +> `/etc/systemd/system/lakehouse.service.d/audit_env.conf`. They are +> NOT loaded from `/tmp` — a 2026-05-05 reboot incident wiped a +> `/tmp`-resident key and caused `/audit` + `/biometric` to fail-closed +> (which is what they should do); the rotation fix moved them to the +> persistent path. + +--- + +## 1. When to rotate + +Rotate **immediately** when any of the following is true: + +| Trigger | Urgency | Notes | +|---|---|---| +| Suspected operator credential compromise | Within 1 hour | Token mismatch is fail-closed by default; immediate rotation closes the window. | +| Operator with legal-tier access leaves the team | Within 24 hours | Treat as compromise. | +| Key/token file's filesystem permissions were ever weakened (mode > 0400, group readable, etc.) | Within 24 hours | Filesystem audit may have leaked the bytes. | +| Token was ever transmitted over an untrusted channel (printed in CI log, sent over SMS, etc.) | Within 24 hours | Same reasoning. | +| Scheduled rotation (recommended) | Every 90 days | BIPA does not mandate a rotation cadence; counsel may set one. | + +Do **not** rotate when: + +- A subject's audit chain tamper-detects in isolation. That is normal + if the audit log was edited (which would itself be the BIPA finding, + not the key). Investigate the chain, not the key. +- Cross-runtime parity drift appears. That's an HMAC-input-shape bug + (Go vs Rust serialization), not a key issue. See + `STATE_OF_PLAY.md` "three runtime-divergence classes" entry. + +--- + +## 2. Pre-rotation checks (5 minutes) + +Before generating new credentials, capture a clean baseline so you can +prove the rotation cause and sequence afterward. + +### 2.1. Take the engineering snapshot + +```bash +# Confirm the canonical files exist with correct permissions. +ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token + +# Hash the existing key + token (NEVER the bytes themselves) so the +# old credential is identifiable in retrospect without storing it. +sha256sum /etc/lakehouse/subject_audit.key +sha256sum /etc/lakehouse/legal_audit.token + +# Confirm the gateway is currently using these files. +sudo systemctl cat lakehouse.service | grep -E "Environment.*AUDIT" + +# Verify the audit endpoint is healthy with the current credentials. +curl -sf http://localhost:3100/audit/health +``` + +If `/audit/health` is already 503, the rotation is **recovery**, not +preventive — note this in the rotation event record (§5). + +### 2.2. Capture a known-good chain root + +Pick one or two subjects with non-empty audit logs and record their +chain roots **under the current key**: + +```bash +TOKEN=$(cat /etc/lakehouse/legal_audit.token) +for cid in WORKER-2 WORKER-100; do + curl -sf -H "X-Lakehouse-Legal-Token: $TOKEN" \ + "http://localhost:3100/audit/subject/$cid" \ + | jq '{cid: .candidate_id, verified: .audit_log.chain_verified, root: .audit_log.chain_root, rows: .audit_log.chain_rows_total}' +done +``` + +Save the output. Post-rotation, those chains will tamper-detect under +the new key — that is **expected** and the saved snapshot is the proof +that the chain WAS intact under the old key, before rotation. + +--- + +## 3. Generation + rotation + +### 3.1. Generate the new key + +```bash +# 32 random bytes as hex = 64 chars. Either format works for HMAC-SHA256; +# we follow the existing convention (44-char base64-ish with no padding). +sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \ + /etc/lakehouse/subject_audit.key.new + +sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \ + /etc/lakehouse/legal_audit.token.new + +# Sanity: confirm 44-char content + correct mode. +sudo wc -c /etc/lakehouse/subject_audit.key.new /etc/lakehouse/legal_audit.token.new +sudo ls -la /etc/lakehouse/*.new +``` + +Both must be `mode 0400`, owned by root, exactly **44 chars** (the +audit endpoint refuses tokens shorter than 32 chars at load — see +`crates/catalogd/src/audit_endpoint.rs:73`). + +### 3.2. Atomic swap + +The gateway reads these files **once at boot** (per +`crates/catalogd/src/audit_endpoint.rs::AuditEndpointState::new` and +the equivalent for the writer). Atomic mv → restart is required. + +```bash +# Move the old credentials to a quarantine path with timestamp so the +# old hashes remain identifiable post-rotation. +TS=$(date -u +%Y%m%dT%H%M%SZ) +sudo mkdir -p /etc/lakehouse/_archived +sudo install -d -m 0700 -o root -g root /etc/lakehouse/_archived + +sudo mv /etc/lakehouse/subject_audit.key /etc/lakehouse/_archived/subject_audit.key.$TS +sudo mv /etc/lakehouse/legal_audit.token /etc/lakehouse/_archived/legal_audit.token.$TS + +sudo mv /etc/lakehouse/subject_audit.key.new /etc/lakehouse/subject_audit.key +sudo mv /etc/lakehouse/legal_audit.token.new /etc/lakehouse/legal_audit.token + +sudo ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token +``` + +### 3.3. Restart the gateway + +```bash +sudo systemctl restart lakehouse.service +sleep 2 +sudo systemctl status lakehouse.service --no-pager | head -10 +``` + +Wait for the gateway to bind port 3100 cleanly. If it doesn't, check +`journalctl -u lakehouse.service -n 50 --no-pager` for the failure +mode — the most common cause is the new file having wrong mode/owner. + +--- + +## 4. Post-rotation verification (5 minutes) + +### 4.1. Health probes + +```bash +# Audit endpoint must be 200, not 503. +curl -sf http://localhost:3100/audit/health +# Expect: "audit endpoint ready" + +# /v1/health must list the gateway's full provider set. +curl -sf http://localhost:3100/v1/health | jq '.providers, .worker_count' +``` + +### 4.2. Confirm the new token works + +```bash +NEW_TOKEN=$(cat /etc/lakehouse/legal_audit.token) +curl -sS -o /dev/null -w '%{http_code}\n' \ + -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \ + http://localhost:3100/audit/subject/WORKER-100 +# Expect: 200 +``` + +If 401, the file the gateway loaded does NOT match the file you wrote. +Check ownership / mode / for trailing whitespace differences with +`hexdump -C /etc/lakehouse/legal_audit.token | head`. + +### 4.3. Confirm the new chain works + +Append-only chains are key-tied. Any *new* audit row written +post-rotation is signed under the new key and verifies cleanly: + +```bash +# Issue a /v1/validate call against any worker — it spawns an audit row. +curl -sf -X POST http://localhost:3100/v1/validate \ + -H 'Content-Type: application/json' \ + -d '{"mode":"fill","candidate_id":"WORKER-100","worker_id":"WORKER-100","fields":["exists"]}' >/dev/null + +# Read the chain back. Last row must verify under the new key. +curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \ + http://localhost:3100/audit/subject/WORKER-100 \ +| jq '.audit_log | {verified: .chain_verified, rows: .chain_rows_total, last_kind: .rows[-1].accessor.kind}' +``` + +`chain_verified: true` confirms the new key is signing + verifying. + +### 4.4. Confirm pre-rotation chains tamper-detect (expected) + +```bash +curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \ + http://localhost:3100/audit/subject/WORKER-2 \ +| jq '.audit_log | {verified: .chain_verified, error: .chain_verification_error}' +``` + +For any subject whose chain was written under the old key, this +returns `chain_verified: false` with an HMAC-mismatch error. **This +is correct behavior**, not a bug. The old chain was correctly signed +under the old key and verified under it; the new key cannot retroactively +verify rows it didn't sign. The pre-rotation snapshot you captured in +§2.2 is the defensible proof that those rows WERE valid pre-rotation. + +If, instead, you see a chain that *should* verify post-rotation +returning `verified: false`, that's the rotation having gone wrong — +likely an old-key file that didn't get archived cleanly. Restore from +`/etc/lakehouse/_archived//`, then re-attempt. + +--- + +## 5. Record the rotation event + +Append a row to the rotation log: + +```bash +sudo tee -a /etc/lakehouse/_archived/rotation_log.jsonl <","reason":"","old_key_sha256":"","new_key_sha256":"$(sha256sum /etc/lakehouse/subject_audit.key | awk '{print $1}')","old_token_sha256":"","new_token_sha256":"$(sha256sum /etc/lakehouse/legal_audit.token | awk '{print $1}')","witness":""} +EOF + +sudo chmod 0600 /etc/lakehouse/_archived/rotation_log.jsonl +sudo chown root:root /etc/lakehouse/_archived/rotation_log.jsonl +``` + +This file is the operator-side record of when the key changed and why. +It does NOT contain the key itself — only hashes — so it is safe to +back up and share with counsel on request. + +--- + +## 6. Recovery from a lost key + +If the active `subject_audit.key` is destroyed (filesystem corruption, +accidental delete, /tmp wipe per the 2026-05-05 incident), the gateway +will fail-closed at startup: + +- `/audit/subject/{id}` → 503 ("audit endpoint disabled (legal token missing)" or equivalent for the signing key) +- `/biometric/subject/{id}/photo` → 503 (same fail-closed posture) + +This is correct behavior — a server that cannot HMAC-sign new audit +rows must not accept new biometric writes. + +**Recovery is rotation.** Generate a new key per §3.1, atomic-swap +per §3.2, restart per §3.3, verify per §4. Pre-loss chains tamper-detect +under the new key (the old key is gone — there is no way to verify +them). Treat the loss event as the BIPA-defensible boundary: pre-loss +chain verification was provided by the working key; post-loss new +chains are signed under the new key. + +If a counsel-grade attestation of the pre-loss chains is needed, the +`/etc/lakehouse/_archived/` folder contains the historical hashes; +combined with the cross-runtime parity probe (Go reader gives the +same byte-identical view as Rust), the chain history pre-loss is +preservable as long as the on-disk JSONL files were not also lost. + +--- + +## 7. ⚖ counsel notes + +These are areas where counsel may want to opine before this runbook +is formally adopted: + +1. **Rotation cadence.** BIPA itself does not require periodic rotation; + counsel may set a 90-day schedule to satisfy a separate compliance + posture (SOC2, internal policy). +2. **Custody of `/etc/lakehouse/_archived/`.** The archived hashes do + NOT contain the keys, but the archived raw key files DO. Counsel + may want a more aggressive destruction schedule for the raw archived + keys — say 1 year — to reduce a long-tail compromise surface. +3. **Notification obligations on rotation due to compromise.** §1 + triggers a rotation; §1 does not address whether candidates whose + biometric data was protected by the compromised key must be notified. + This is a counsel call. + +--- + +## 8. Operator acknowledgment + +| Operator | Date acknowledged | Signature | +|---|---|---| +| J | _____ | _______________ | +| _____ | _____ | _______________ | + +--- + +## 9. Change log + +- 2026-05-05 — Initial runbook authored after the /tmp wipe incident + on the same day (key was at `/tmp/subject_audit.key` and was deleted + on reboot, disabling `/audit` + `/biometric` until the key was + regenerated at `/etc/lakehouse/subject_audit.key`). Recovery of + that incident produced a working procedure; this runbook captures + it as the canonical playbook for any future rotation. diff --git a/docs/specs/GATE_3B_DEEPFACE_DESIGN.md b/docs/specs/GATE_3B_DEEPFACE_DESIGN.md index 17b2437..a105afd 100644 --- a/docs/specs/GATE_3B_DEEPFACE_DESIGN.md +++ b/docs/specs/GATE_3B_DEEPFACE_DESIGN.md @@ -1,6 +1,8 @@ # Gate 3b — Deepface Classification Integration (Design) -**Status:** Design draft — 2026-05-03 morning · **Companion to:** [`PHASE_1_6_BIPA_GATES.md`](../PHASE_1_6_BIPA_GATES.md) Gate 3 · **Depends on:** Gate 3a (photo upload) which is shipped (`f1fa6e4`) +**Status:** **DECIDED 2026-05-05 — Option C (defer classifications)** · Original design draft 2026-05-03 morning · **Companion to:** [`PHASE_1_6_BIPA_GATES.md`](../PHASE_1_6_BIPA_GATES.md) Gate 3 · **Depends on:** Gate 3a (photo upload) which is shipped (`f1fa6e4`) + +> **Decision summary (2026-05-05):** J accepted Option C. `BiometricCollection.classifications` remains `Option = None` in v1. The consent template and retention schedule were revised the same day to remove all "automated facial-classification" language so the disclosed scope matches the implemented scope. If a real product requirement for classifications surfaces later, this doc's Option A (Python subprocess) or Option B (ONNX-in-Rust) is picked up under a v2 consent template + v2 retention schedule. > **What this is.** Three options for how `BiometricCollection.classifications` (currently `Option`, always `None`) gets populated by an automated facial-attribute classifier. Phase 1.6 Gate 3a ships the consent-gated upload + audit chain + transactional rollback; Gate 3b adds the classification step. The substrate is ready — what's missing is the design choice for HOW classification happens. > @@ -152,6 +154,8 @@ Reasoning: ⚖ J — pick A / B / C. The substrate accommodates any choice; the cost is the design-doc → counsel-coordination → engineering loop, which differs by an order of magnitude across the options. +**[2026-05-05] J's decision: Option C.** Reasoning recorded in change log below. Consent + retention doc revisions for Option C shipped same day; counsel review of revised text is the remaining work. + --- ## Open questions for J diff --git a/scripts/staffing/biometric_destruction_report.sh b/scripts/staffing/biometric_destruction_report.sh new file mode 100755 index 0000000..9e63894 --- /dev/null +++ b/scripts/staffing/biometric_destruction_report.sh @@ -0,0 +1,263 @@ +#!/usr/bin/env bash +# biometric_destruction_report — monthly destruction event aggregation. +# +# Specification: docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §5. +# Spec: docs/PHASE_1_6_BIPA_GATES.md §1 Gate 5. +# +# Why this exists: counsel and operations review need a periodic +# attestation that destructions have happened in a defensible cadence. +# This script produces an anonymized monthly report aggregating +# per-subject audit logs. +# +# Output is anonymized — counts, timings, scope/trigger breakdowns, +# and chain attestations. Candidate IDs are hashed (sha256-prefix) so +# the report can be shared with counsel without exposing identifiers. +# +# Usage: +# biometric_destruction_report.sh \ +# [--month YYYY-MM] \ +# [--audit-dir data/_catalog/subjects] \ +# [--output reports/biometric/destruction_.md] +# +# Defaults: +# --month — current UTC month (YYYY-MM) +# --audit-dir — data/_catalog/subjects +# --output — reports/biometric/destruction_.md +# +# Exit codes: +# 0 — report written successfully (whether or not events were found) +# 1 — report written but with anomalies that need review +# 2 — script error (missing tools, unreadable audit dir) + +set -uo pipefail +cd "$(dirname "$0")/../.." + +MONTH="" +AUDIT_DIR="data/_catalog/subjects" +OUT="" + +while [ "$#" -gt 0 ]; do + case "$1" in + --month) MONTH="$2"; shift 2 ;; + --audit-dir) AUDIT_DIR="$2"; shift 2 ;; + --output) OUT="$2"; shift 2 ;; + -h|--help) + sed -n '2,30p' "$0" | sed 's/^# \?//' + exit 0 ;; + *) echo "unknown flag: $1" >&2; exit 2 ;; + esac +done + +# Default month = current UTC YYYY-MM. Validate format defensively +# so a malformed --month value (e.g. "May 2026") doesn't silently +# match nothing in the JSONL filter. +if [ -z "$MONTH" ]; then + MONTH=$(date -u +%Y-%m) +fi +if ! echo "$MONTH" | grep -qE '^[0-9]{4}-(0[1-9]|1[0-2])$'; then + echo "[report] FAIL: --month must be YYYY-MM, got '$MONTH'" >&2 + exit 2 +fi + +if [ -z "$OUT" ]; then + OUT="reports/biometric/destruction_${MONTH}.md" +fi + +# Dependency gates. +for cmd in jq sha256sum; do + if ! command -v "$cmd" >/dev/null 2>&1; then + echo "[report] FAIL: required tool '$cmd' not found in PATH" >&2 + exit 2 + fi +done + +if [ ! -d "$AUDIT_DIR" ]; then + echo "[report] FAIL: audit dir not found at $AUDIT_DIR" >&2 + exit 2 +fi + +mkdir -p "$(dirname "$OUT")" + +# Aggregator storage. +EVENTS=$(mktemp) +ANOMALIES=$(mktemp) +trap 'rm -f "$EVENTS" "$ANOMALIES"' EXIT + +# Iterate every per-subject audit log under AUDIT_DIR. Each file is +# JSONL — one row per line. We extract erasure rows in the requested +# month + emit a normalized one-line record per event. +TOTAL_FILES=0 +TOTAL_ROWS_SCANNED=0 +SHARDS_WITH_EVENTS=0 + +for f in "$AUDIT_DIR"/*.audit.jsonl; do + [ -e "$f" ] || continue + TOTAL_FILES=$((TOTAL_FILES + 1)) + + # File-level row count (cheap). + ROWS=$(wc -l < "$f" 2>/dev/null || echo 0) + TOTAL_ROWS_SCANNED=$((TOTAL_ROWS_SCANNED + ROWS)) + + # Filter rows for the month + erasure kinds. + HAD_EVENT=0 + while IFS= read -r line; do + [ -n "$line" ] || continue + KIND=$(printf '%s' "$line" | jq -r '.accessor.kind // ""' 2>/dev/null || echo "") + case "$KIND" in + biometric_erasure|full_erasure) ;; + *) continue ;; + esac + + TS=$(printf '%s' "$line" | jq -r '.ts // ""' 2>/dev/null || echo "") + case "$TS" in + "${MONTH}-"*) ;; # only this month + *) continue ;; + esac + + HAD_EVENT=1 + CID=$(printf '%s' "$line" | jq -r '.candidate_id // ""' 2>/dev/null || echo "") + PURPOSE=$(printf '%s' "$line" | jq -r '.accessor.purpose // ""' 2>/dev/null || echo "") + RESULT=$(printf '%s' "$line" | jq -r '.result // ""' 2>/dev/null || echo "") + # accessor.purpose has shape "trigger=;..." per biometric_endpoint + TRIGGER=$(printf '%s' "$PURPOSE" | sed -nE 's/.*trigger=([a-z_]+).*/\1/p') + [ -n "$TRIGGER" ] || TRIGGER="unknown" + + # Hash candidate_id so the report stays anonymized. + CID_HASH=$(printf '%s' "$CID" | sha256sum | awk '{print substr($1,1,12)}') + + # Anomaly: erasure row but result not in {erased, success}. + case "$RESULT" in + erased|success) ;; + *) + echo " - candidate_hash=$CID_HASH ts=$TS kind=$KIND result=$RESULT trigger=$TRIGGER (unexpected result)" >> "$ANOMALIES" + ;; + esac + + # Tab-separated event line: ts, kind, trigger, result, cid_hash + printf '%s\t%s\t%s\t%s\t%s\n' "$TS" "$KIND" "$TRIGGER" "$RESULT" "$CID_HASH" >> "$EVENTS" + done < "$f" + + if [ "$HAD_EVENT" = "1" ]; then + SHARDS_WITH_EVENTS=$((SHARDS_WITH_EVENTS + 1)) + fi +done + +EVENT_COUNT=$(wc -l < "$EVENTS" 2>/dev/null || echo 0) +EVENT_COUNT=$(printf '%s' "$EVENT_COUNT" | tr -d '[:space:]') +: "${EVENT_COUNT:=0}" + +# Compute breakdowns. +COUNT_BIOMETRIC_ONLY=0 +COUNT_FULL=0 +if [ "$EVENT_COUNT" != "0" ]; then + COUNT_BIOMETRIC_ONLY=$(awk -F '\t' '$2=="biometric_erasure"' "$EVENTS" | wc -l | tr -d '[:space:]') + COUNT_FULL=$(awk -F '\t' '$2=="full_erasure"' "$EVENTS" | wc -l | tr -d '[:space:]') +fi + +ANOMALY_COUNT=$(wc -l < "$ANOMALIES" 2>/dev/null || echo 0) +ANOMALY_COUNT=$(printf '%s' "$ANOMALY_COUNT" | tr -d '[:space:]') +: "${ANOMALY_COUNT:=0}" + +# Render the report. +GENERATED_AT=$(date -u +%Y-%m-%dT%H:%M:%SZ) + +{ + echo "# Biometric Destruction Report — $MONTH" + echo + echo "**Generated:** $GENERATED_AT" + echo "**Audit dir scanned:** \`$AUDIT_DIR\`" + echo "**Spec:** docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §5" + echo "**Generator:** scripts/staffing/biometric_destruction_report.sh" + echo + echo "## Scope" + echo + echo "- **Subject audit shards scanned:** $TOTAL_FILES" + echo "- **Audit rows scanned (all kinds):** $TOTAL_ROWS_SCANNED" + echo "- **Shards containing $MONTH erasure events:** $SHARDS_WITH_EVENTS" + echo + echo "## Destruction events in $MONTH" + echo + echo "- **Total events:** $EVENT_COUNT" + echo "- **By scope:**" + echo " - \`biometric_erasure\` (BiometricCollection cleared, manifest retained): $COUNT_BIOMETRIC_ONLY" + echo " - \`full_erasure\` (manifest + biometric data cleared): $COUNT_FULL" + echo + + if [ "$EVENT_COUNT" = "0" ]; then + echo "**No destruction events recorded for $MONTH.** This is correct" + echo "for a month with no retention expiries / withdrawal requests" + echo "/ RTBF requests / court orders." + echo + else + echo "### By trigger" + echo + echo "| Trigger | Count |" + echo "|---|---|" + awk -F '\t' '{print $3}' "$EVENTS" | sort | uniq -c | \ + sort -rn | awk '{ printf("| %s | %d |\n", $2, $1); }' + echo + echo "### Event detail (anonymized)" + echo + echo "Candidate IDs are hashed (sha256-12-prefix) so this report can" + echo "be shared with outside counsel without exposing identifiers." + echo + echo "| ts | kind | trigger | result | candidate_hash |" + echo "|---|---|---|---|---|" + sort -k1,1 "$EVENTS" | awk -F '\t' '{ + printf("| %s | %s | %s | %s | %s |\n", $1, $2, $3, $4, $5); + }' + echo + fi + + if [ "$ANOMALY_COUNT" != "0" ]; then + echo "## Anomalies ($ANOMALY_COUNT)" + echo + echo "Events whose audit row deviates from expected shape (kind/result" + echo "mismatch, missing trigger, etc.). These do NOT necessarily mean" + echo "the destruction failed — the BIPA-load-bearing surface is the" + echo "audit chain, which still verifies cryptographically. They are" + echo "logged here so an operator can investigate and confirm." + echo + echo '```' + cat "$ANOMALIES" + echo '```' + echo + fi + + echo "## Cryptographic attestation" + echo + echo "This report was produced by aggregating per-subject HMAC-chained" + echo "audit logs. The chain itself is the BIPA-defensible substrate;" + echo "this report is a derived view, not the chain of record. To verify" + echo "any individual event, run:" + echo + echo '```bash' + echo "./scripts/staffing/verify_biometric_erasure.sh " + echo '```' + echo "(operator must un-hash the candidate ID through their own" + echo " operator log to perform spot-checks)." + echo + echo "**Cross-runtime parity:** the same audit logs are byte-identical" + echo "under Rust + Go (per scripts/cutover/parity/subject_audit_parity.sh)." + echo "If counsel needs cross-runtime attestation, that probe provides it." + echo + + EVIDENCE_HASH=$(sha256sum "$EVENTS" 2>/dev/null | awk '{print $1}') + : "${EVIDENCE_HASH:=$(echo -n '' | sha256sum | awk '{print $1}')}" + echo "**Events SHA-256:** \`$EVIDENCE_HASH\`" + echo + echo "---" + echo + echo "**Operator (J):** _______________________________ Date: __________" + echo +} > "$OUT" + +echo "[report] $EVENT_COUNT destruction events in $MONTH ($COUNT_BIOMETRIC_ONLY biometric_only, $COUNT_FULL full)" +echo "[report] anomalies: $ANOMALY_COUNT" +echo "[report] output: $OUT" + +# Exit 1 if anomalies present (review needed) but report still written. +if [ "$ANOMALY_COUNT" != "0" ]; then + exit 1 +fi +exit 0 diff --git a/scripts/staffing/bundle_counsel_packet.sh b/scripts/staffing/bundle_counsel_packet.sh new file mode 100755 index 0000000..0e7a8fb --- /dev/null +++ b/scripts/staffing/bundle_counsel_packet.sh @@ -0,0 +1,99 @@ +#!/usr/bin/env bash +# bundle_counsel_packet — assemble the counsel-review packet tarball. +# +# Specification: docs/counsel/COUNSEL_REVIEW_PACKET_.md §9. +# +# Why this exists: the cover note references a list of documents. +# Counsel needs them as a single transmittable artifact, with per-file +# integrity hashes so they can verify nothing changed in transit. +# +# Output: +# reports/counsel/counsel_packet_.tar.gz +# reports/counsel/counsel_packet_.manifest.txt (sha256 per file) +# +# Usage: +# bundle_counsel_packet.sh [--date YYYY-MM-DD] +# +# Exit codes: +# 0 — packet bundled successfully +# 1 — one or more referenced documents are missing +# 2 — script error (missing tools, write failure) + +set -uo pipefail +cd "$(dirname "$0")/../.." + +DATE="$(date -u +%Y-%m-%d)" +while [ "$#" -gt 0 ]; do + case "$1" in + --date) DATE="$2"; shift 2 ;; + -h|--help) + sed -n '2,20p' "$0" | sed 's/^# \?//' + exit 0 ;; + *) echo "unknown flag: $1" >&2; exit 2 ;; + esac +done + +# Dependency gate. +for cmd in tar sha256sum; do + if ! command -v "$cmd" >/dev/null 2>&1; then + echo "[bundle] FAIL: required tool '$cmd' not found in PATH" >&2 + exit 2 + fi +done + +# Files in the packet. Order is the recommended counsel-review order +# from the cover note §6. +FILES=( + "docs/counsel/COUNSEL_REVIEW_PACKET_${DATE}.md" + "docs/policies/consent/biometric_retention_schedule_v1.md" + "docs/policies/consent/biometric_consent_template_v1.md" + "docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md" + "docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md" + "docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md" + "docs/specs/GATE_3B_DEEPFACE_DESIGN.md" + "docs/PHASE_1_6_BIPA_GATES.md" +) + +# Verify all referenced files exist before opening the tarball. +MISSING=0 +for f in "${FILES[@]}"; do + if [ ! -r "$f" ]; then + echo "[bundle] MISSING: $f" >&2 + MISSING=$((MISSING + 1)) + fi +done +if [ "$MISSING" -gt 0 ]; then + echo "[bundle] FAIL: $MISSING required documents missing — aborting" >&2 + exit 1 +fi + +OUT_DIR="reports/counsel" +mkdir -p "$OUT_DIR" + +TARBALL="$OUT_DIR/counsel_packet_${DATE}.tar.gz" +MANIFEST="$OUT_DIR/counsel_packet_${DATE}.manifest.txt" + +# Build the manifest first — counsel uses this to verify integrity. +{ + echo "# Counsel Packet Manifest — $DATE" + echo "# Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)" + echo "# Each file is listed with its SHA-256 hash. To verify on receipt:" + echo "# tar xzf counsel_packet_${DATE}.tar.gz" + echo "# sha256sum -c counsel_packet_${DATE}.manifest.txt" + echo "# (re-format the lines below with two spaces between hash and path" + echo "# for sha256sum -c compatibility — sha256sum's strict format)" + echo + for f in "${FILES[@]}"; do + sha256sum "$f" + done +} > "$MANIFEST" + +# Build the tarball — include the manifest itself. +tar -czf "$TARBALL" "${FILES[@]}" "$MANIFEST" + +PACKET_HASH=$(sha256sum "$TARBALL" | awk '{print $1}') + +echo "[bundle] packet: $TARBALL" +echo "[bundle] manifest: $MANIFEST" +echo "[bundle] tarball SHA-256: $PACKET_HASH" +echo "[bundle] files: ${#FILES[@]}" diff --git a/scripts/staffing/verify_biometric_erasure.sh b/scripts/staffing/verify_biometric_erasure.sh new file mode 100755 index 0000000..1e01a93 --- /dev/null +++ b/scripts/staffing/verify_biometric_erasure.sh @@ -0,0 +1,266 @@ +#!/usr/bin/env bash +# verify_biometric_erasure — confirm that a biometric erasure completed cleanly. +# +# Specification: docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §3 step 3. +# Spec: docs/PHASE_1_6_BIPA_GATES.md §1 Gate 5. +# +# Why this exists: when an operator runs the erasure curl call against +# /biometric/subject/{id}/erase, they need a defensible artifact proving +# destruction completed. This script produces that artifact by checking +# four things: +# +# 1. SubjectManifest.biometric_collection is null (catalogd cleared the row) +# 2. data/biometric/uploads// is empty or absent (photo file gone) +# 3. Most recent audit row has accessor.kind in {biometric_erasure, full_erasure} +# AND result is "erased" or "success" (the chain logged the erasure intent) +# 4. audit_log.chain_verified is true (HMAC chain still intact end-to-end) +# +# All four must pass for an operator to mark the destruction complete. +# +# Usage: +# verify_biometric_erasure.sh [--from ISO] [--to ISO] +# +# Environment: +# GATEWAY_URL — default http://localhost:3100 +# LEGAL_TOKEN_FILE — default /etc/lakehouse/legal_audit.token +# UPLOADS_ROOT — default data/biometric/uploads (relative to repo root) +# OUT_DIR — default reports/biometric (where the verification report lands) +# +# Exit codes: +# 0 — all four checks pass; erasure verified +# 1 — one or more checks failed; do NOT mark destruction complete; escalate +# 2 — script error (missing tools, network failure, bad token) + +set -uo pipefail +cd "$(dirname "$0")/../.." + +if [ "$#" -lt 1 ]; then + echo "usage: verify_biometric_erasure.sh [--from ISO] [--to ISO]" >&2 + exit 2 +fi + +CANDIDATE_ID="$1" +shift +FROM="" +TO="" +while [ "$#" -gt 0 ]; do + case "$1" in + --from) FROM="$2"; shift 2 ;; + --to) TO="$2"; shift 2 ;; + *) echo "unknown flag: $1" >&2; exit 2 ;; + esac +done + +GATEWAY_URL="${GATEWAY_URL:-http://localhost:3100}" +LEGAL_TOKEN_FILE="${LEGAL_TOKEN_FILE:-/etc/lakehouse/legal_audit.token}" +UPLOADS_ROOT="${UPLOADS_ROOT:-data/biometric/uploads}" +OUT_DIR="${OUT_DIR:-reports/biometric}" + +# Dependency gates — fail fast with clear errors rather than producing +# a misleading "evidence" file from missing tools. +for cmd in curl jq sha256sum; do + if ! command -v "$cmd" >/dev/null 2>&1; then + echo "[verify] FAIL: required tool '$cmd' not found in PATH" >&2 + exit 2 + fi +done + +if [ ! -r "$LEGAL_TOKEN_FILE" ]; then + echo "[verify] FAIL: cannot read legal token at $LEGAL_TOKEN_FILE" >&2 + echo "[verify] This script requires legal-tier auth to query /audit/subject/." >&2 + exit 2 +fi +LEGAL_TOKEN=$(tr -d '[:space:]' < "$LEGAL_TOKEN_FILE") +if [ -z "$LEGAL_TOKEN" ]; then + echo "[verify] FAIL: legal token file is empty" >&2 + exit 2 +fi + +# safe_id matches catalogd::biometric_endpoint::sanitize_for_path: +# any non-[A-Za-z0-9_.-] char is replaced with underscore. +SAFE_ID=$(printf '%s' "$CANDIDATE_ID" | sed 's/[^A-Za-z0-9_.\-]/_/g') + +mkdir -p "$OUT_DIR" +DATE=$(date -u +%Y-%m-%dT%H-%M-%SZ) +OUT="$OUT_DIR/erasure_verify_${SAFE_ID}_${DATE}.md" +EVIDENCE=$(mktemp) +trap 'rm -f "$EVIDENCE"' EXIT + +PASS=0 +FAIL=0 +note() { echo "$1" >> "$EVIDENCE"; } +mark_pass() { PASS=$((PASS+1)); note " - PASS: $1"; } +mark_fail() { FAIL=$((FAIL+1)); note " - FAIL: $1"; } + +note "## Verification target" +note "" +note "- **candidate_id:** \`$CANDIDATE_ID\`" +note "- **safe_id (filesystem):** \`$SAFE_ID\`" +note "- **gateway:** \`$GATEWAY_URL\`" +note "- **uploads root:** \`$UPLOADS_ROOT\`" +note "- **window:** ${FROM:-unbounded} → ${TO:-unbounded}" +note "" + +# ── Fetch the audit response ──────────────────────────────────────── +QUERY="" +if [ -n "$FROM" ]; then QUERY="from=$FROM"; fi +if [ -n "$TO" ]; then + if [ -n "$QUERY" ]; then QUERY="${QUERY}&to=$TO"; else QUERY="to=$TO"; fi +fi +URL="$GATEWAY_URL/audit/subject/$CANDIDATE_ID" +if [ -n "$QUERY" ]; then URL="$URL?$QUERY"; fi + +RESP_FILE=$(mktemp) +HTTP_CODE=$(curl -sS -o "$RESP_FILE" -w '%{http_code}' \ + -H "X-Lakehouse-Legal-Token: $LEGAL_TOKEN" \ + -H "Accept: application/json" \ + "$URL" 2>&1) || HTTP_CODE="000" + +if [ "$HTTP_CODE" != "200" ]; then + echo "[verify] FAIL: GET $URL returned HTTP $HTTP_CODE" >&2 + echo "[verify] response head:" >&2 + head -c 500 "$RESP_FILE" >&2 + echo >&2 + rm -f "$RESP_FILE" + exit 2 +fi + +# Schema sanity — refuse to evaluate against an unrecognized response shape. +SCHEMA=$(jq -r '.schema // ""' < "$RESP_FILE") +if [ "$SCHEMA" != "subject_audit_response.v1" ]; then + echo "[verify] FAIL: unexpected response schema '$SCHEMA' (want subject_audit_response.v1)" >&2 + rm -f "$RESP_FILE" + exit 2 +fi + +# ── Check 1: manifest.biometric_collection is null ────────────────── +note "## Check 1 — Subject manifest biometric_collection is null" +note "" +BIO_COLL=$(jq -c '.manifest.biometric_collection // null' < "$RESP_FILE") +note "**manifest.biometric_collection:** \`$BIO_COLL\`" +note "" +if [ "$BIO_COLL" = "null" ]; then + mark_pass "biometric_collection field is null on the subject manifest" +else + mark_fail "biometric_collection is still populated — erasure incomplete" +fi +note "" + +# ── Check 2: filesystem uploads dir is empty/absent ───────────────── +note "## Check 2 — Quarantined upload directory empty or absent" +note "" +UPLOAD_DIR="$UPLOADS_ROOT/$SAFE_ID" +note "**path:** \`$UPLOAD_DIR\`" +if [ ! -e "$UPLOAD_DIR" ]; then + note "**state:** absent (directory was removed during erasure or never existed)" + note "" + mark_pass "upload directory is absent" +elif [ ! -d "$UPLOAD_DIR" ]; then + note "**state:** path exists but is not a directory — investigate" + note "" + mark_fail "upload path exists and is not a directory: $UPLOAD_DIR" +else + REMAINING=$(find "$UPLOAD_DIR" -maxdepth 1 -mindepth 1 2>/dev/null | wc -l | tr -d '[:space:]') + : "${REMAINING:=0}" + note "**state:** directory exists with $REMAINING remaining entries" + note "" + if [ "$REMAINING" = "0" ]; then + mark_pass "upload directory is empty (no residual photo files)" + else + mark_fail "$REMAINING file(s) remain under $UPLOAD_DIR — must be unlinked" + note "### Residual files" + note "" + note '```' + find "$UPLOAD_DIR" -maxdepth 2 >> "$EVIDENCE" + note '```' + note "" + fi +fi + +# ── Check 3: most recent audit row reflects erasure ───────────────── +note "## Check 3 — Audit log records the erasure event" +note "" +ROW_COUNT=$(jq '.audit_log.rows | length' < "$RESP_FILE") +note "**rows in window:** $ROW_COUNT" +if [ "$ROW_COUNT" = "0" ]; then + mark_fail "no audit rows in the requested window — erasure should have appended one" + note "" +else + LAST_KIND=$(jq -r '.audit_log.rows | last | .accessor.kind // ""' < "$RESP_FILE") + LAST_RESULT=$(jq -r '.audit_log.rows | last | .result // ""' < "$RESP_FILE") + LAST_TS=$(jq -r '.audit_log.rows | last | .ts // ""' < "$RESP_FILE") + note "**last row:** ts=\`$LAST_TS\` accessor.kind=\`$LAST_KIND\` result=\`$LAST_RESULT\`" + note "" + case "$LAST_KIND" in + biometric_erasure|full_erasure) + case "$LAST_RESULT" in + erased|success) + mark_pass "last audit row is an erasure event ($LAST_KIND/$LAST_RESULT)" + ;; + *) + mark_fail "last row kind is $LAST_KIND but result is '$LAST_RESULT' (expected erased/success)" + ;; + esac + ;; + *) + mark_fail "last audit row accessor.kind is '$LAST_KIND' (expected biometric_erasure or full_erasure)" + ;; + esac +fi +note "" + +# ── Check 4: HMAC chain verifies end-to-end ───────────────────────── +note "## Check 4 — HMAC chain integrity" +note "" +CHAIN_VERIFIED=$(jq -r '.audit_log.chain_verified' < "$RESP_FILE") +CHAIN_ROOT=$(jq -r '.audit_log.chain_root // ""' < "$RESP_FILE") +CHAIN_ROWS=$(jq -r '.audit_log.chain_rows_total // 0' < "$RESP_FILE") +CHAIN_ERR=$(jq -r '.audit_log.chain_verification_error // ""' < "$RESP_FILE") +note "**chain_verified:** \`$CHAIN_VERIFIED\`" +note "**chain_rows_total:** $CHAIN_ROWS" +note "**chain_root:** \`$CHAIN_ROOT\`" +if [ -n "$CHAIN_ERR" ]; then + note "**chain_verification_error:** \`$CHAIN_ERR\`" +fi +note "" +if [ "$CHAIN_VERIFIED" = "true" ]; then + mark_pass "chain verifies end-to-end ($CHAIN_ROWS rows)" +else + mark_fail "chain integrity broken — destruction is NOT defensible until investigated" +fi +note "" + +# ── Render report ─────────────────────────────────────────────────── +TOTAL=$((PASS + FAIL)) +note "## Summary" +note "" +note "**$PASS / $TOTAL** verification checks pass." +note "" +if [ "$FAIL" -gt 0 ]; then + note "**Status: ERASURE NOT VERIFIED.** Do NOT mark destruction complete. Escalate to engineering before responding to candidate / counsel." + note "" +fi + +# Hash response body so the report has a tamper-evident anchor. +RESP_HASH=$(sha256sum "$RESP_FILE" | awk '{print $1}') +EVIDENCE_HASH=$(sha256sum "$EVIDENCE" | awk '{print $1}') + +{ + echo "# Biometric Erasure Verification — $CANDIDATE_ID" + echo + echo "**Date:** $DATE" + echo "**Spec:** docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §3 step 3" + echo "**Generator:** scripts/staffing/verify_biometric_erasure.sh" + echo + cat "$EVIDENCE" + echo "---" + echo + echo "**Audit response SHA-256:** \`$RESP_HASH\`" + echo "**Evidence summary SHA-256:** \`$EVIDENCE_HASH\`" + echo +} > "$OUT" + +rm -f "$RESP_FILE" +echo "[verify] $PASS / $TOTAL checks pass — report: $OUT" +echo "[verify] response hash: $RESP_HASH" +[ "$FAIL" -eq 0 ]