phase 1.6: lock Gate 3b = C, reconcile docs to shipped state, fix double-upload file leak
Four threads landing together — all driven by the audit J asked for before
production cutover.
(1) Gate 3b DECIDED: Option C (defer classifications). `BiometricCollection.classifications`
stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status
flipped from "draft / awaits product" to DECIDED. Consent template + retention
schedule revised to remove all "automated facial-classification" / "deepface"
language so disclosed scope matches implemented scope.
(2) Endpoint-path drift reconciled across 3 docs. `PHASE_1_6_BIPA_GATES.md`,
`BIPA_DESTRUCTION_RUNBOOK.md`, and `biometric_retention_schedule_v1.md` had
references to legacy `/v1/identity/subjects/*` paths (proposed under a separate
identityd daemon, never shipped) — corrected to actual shipped routes
`/biometric/subject/*` (catalogd-local). Schema block in PHASE_1_6_BIPA_GATES
rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate
(not the proposed Postgres `subjects` table).
(3) New operational artifacts:
- `scripts/staffing/verify_biometric_erasure.sh` — checks 4 things post-erasure
(manifest cleared, uploads dir empty, audit row matches, chain verified).
Smoke-tested live against WORKER-2.
- `scripts/staffing/biometric_destruction_report.sh` — monthly anonymized
destruction-event aggregation. Smoke-tested clean.
- `scripts/staffing/bundle_counsel_packet.sh` — tarballs the counsel-review
packet with per-file SHA-256 manifest.
- `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` — formal rotation procedure
operationalized after the 2026-05-05 /tmp wipe incident.
- `docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md` — cover note bundling
all eng-staged BIPA docs for counsel review with per-doc questions, sign-off
checklist, recommended review sequence.
(4) Double-upload file leak fixed in `crates/catalogd/src/biometric_endpoint.rs`.
`verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo
file. Investigation showed the file was 13-byte test-fixture bytes (zero PII,
no biometric content); audit timeline showed two consecutive uploads followed
by one erasure — the second upload had silently overwritten manifest.data_path,
orphaning the first file. Patched `process_upload` to refuse a second upload
with HTTP 409 + `error: "biometric_already_collected"` when
`biometric_collection.is_some()` on the manifest. Operator must explicitly
POST `/biometric/subject/{id}/erase` first.
Tests: new `second_upload_without_erase_returns_409` (asserts 409 + manifest
pointer unchanged + first file untouched on disk). Replaced
`repeated_uploads_grow_the_chain` with `upload_erase_upload_grows_the_chain_cleanly`
(covers the legitimate re-collection cycle: chain grows to 3 rows). Updated
`content_type_with_parameters_accepted` to use 2 distinct subjects (was
using 1 subject with 2 uploads to test ct parsing — would now 409).
22/22 biometric_endpoint tests + 59/59 catalogd lib tests green post-patch.
Production posture: gateway needs `cargo build --release -p gateway` +
`systemctl restart lakehouse.service` to pick up the new 409 in live traffic.
Counsel calendar is now the only remaining blocker for first real-photo intake.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
03e8a91d97
commit
b2c34b80b3
10
.gitignore
vendored
10
.gitignore
vendored
@ -61,6 +61,16 @@ data/biometric/
|
||||
# files stay local for forensics.
|
||||
reports/scrum/
|
||||
|
||||
# Per-event biometric verification reports (timestamp-named, regenerated
|
||||
# per `verify_biometric_erasure.sh` invocation). Source-of-truth is the
|
||||
# audit chain itself; these reports are derived views.
|
||||
reports/biometric/
|
||||
|
||||
# Counsel transmission tarballs + manifests are regenerated by
|
||||
# `bundle_counsel_packet.sh` from the tracked `docs/counsel/` source.
|
||||
# The bundle is transmittable, not source-of-truth.
|
||||
reports/counsel/
|
||||
|
||||
# Local experiments scratchpad — per the "Test code in main is ACTIVELY
|
||||
# being cleaned out" policy (commits 6aafd41 + f4ebd22), one-off
|
||||
# experiments stay out of the tracked tree.
|
||||
|
||||
@ -1,12 +1,58 @@
|
||||
# STATE OF PLAY — Lakehouse
|
||||
|
||||
**Last verified:** 2026-05-03 evening CDT
|
||||
**Verified by:** live probe (gateway restarted 2x, all 11 catalogd subject tests + 11 biometric tests + 6 audit tests + 4 mcp-server Gate-4 tests green; cross-runtime parity 6/6 byte-identical against live audit logs; live curl roundtrip on /biometric returned 200 + chained audit row), not memory.
|
||||
**Last verified:** 2026-05-05 morning CDT
|
||||
**Verified by:** live probe (`/audit/health` 200, `/biometric/subject/{id}/erase` 21-test substrate + `/audit/subject/{id}` legal-tier endpoint live verified against WORKER-100; new `verify_biometric_erasure.sh` + `biometric_destruction_report.sh` + `bundle_counsel_packet.sh` smoke-tested clean against live data) — not memory.
|
||||
|
||||
> **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
|
||||
|
||||
---
|
||||
|
||||
## WHAT LANDED 2026-05-05 (doc reconciliation wave — Gate 3b decision + counsel packet ready)
|
||||
|
||||
This was a **doc-only wave**, not code. Background: J asked for an audit of the BIPA/biometric documentation before production cutover. Audit found moderate fragmentation between docs and shipped code (post-`identityd` collapse, post-Gate-3a-ship, pre-Gate-3b-decision). Closed it in one pass.
|
||||
|
||||
| Item | What changed | Status |
|
||||
|---|---|---|
|
||||
| **Gate 3b — DECIDED: Option C (defer classifications)** | `BiometricCollection.classifications` stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status flipped from "draft / awaits product" to "DECIDED 2026-05-05". | Locked |
|
||||
| **Endpoint-path drift** | `PHASE_1_6_BIPA_GATES.md` (3 spots), `BIPA_DESTRUCTION_RUNBOOK.md` (2 spots), `biometric_retention_schedule_v1.md` (1 spot) updated from legacy `/v1/identity/subjects/*` (proposed under separate identityd daemon, never shipped) to actual `/biometric/subject/*` (catalogd-local, shipped `848a458`). Schema block in `PHASE_1_6_BIPA_GATES.md` rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate (not the proposed Postgres `subjects` table). | Reconciled |
|
||||
| **Consent template + retention schedule** | Both revised for Option C: removed all "automated facial-classification" / "deepface" language so disclosed scope matches implemented scope. Pending counsel review — they were already eng-staged with ⚖ markers. | Eng-staged for counsel |
|
||||
| **`scripts/staffing/verify_biometric_erasure.sh`** (NEW) | Operator-side verification of an erasure event. Curls `/audit/subject/{id}` with legal-tier token, checks: manifest.biometric_collection null, uploads dir empty, last audit row is `biometric_erasure`/`full_erasure` with `erased`/`success`, chain_verified=true. Writes a hashed report to `reports/biometric/`. | Smoke-tested live |
|
||||
| **`scripts/staffing/biometric_destruction_report.sh`** (NEW) | Monthly destruction-event aggregation. Anonymizes candidate IDs (sha256-12 prefix), counts by scope + trigger, flags anomalies. Smoke-test on May 2026 data found 1 historical `biometric_erasure`/`consent_withdrawal` event (test fixture). | Smoke-tested live |
|
||||
| **`docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md`** (NEW) | Captures the rotation procedure operationalized after the 2026-05-05 `/tmp` wipe incident. Covers: when to rotate, pre-rotation snapshot, atomic-swap procedure, post-rotation verification (incl. expected pre-rotation chain tamper-detect under new key), recovery from lost key, ⚖ counsel notes. | Authored |
|
||||
| **`docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md`** + `bundle_counsel_packet.sh` (NEW) | Cover note bundling all eng-staged BIPA docs for counsel review with per-doc questions, sign-off checklist, recommended review sequence. Bundler script tarballs the 8 referenced files + emits a SHA-256 manifest. Tarball ready for transmission: `reports/counsel/counsel_packet_2026-05-05.tar.gz`. | Bundled, ready to send |
|
||||
|
||||
### Eng follow-up that this wave surfaced
|
||||
- **Double-upload file leak — DIAGNOSED + FIXED** (2026-05-05 same wave). `verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo file. Investigation showed:
|
||||
- The file was 13 bytes of test fixture (`ff d8 ff d9 + ASCII "TESTBYTES"`), byte-identical to the unit-test fixture at `biometric_endpoint.rs:841`. NO PII, NO biometric content, NO synthetic-face content. Came from manual integration testing on 2026-05-03.
|
||||
- Audit log timeline showed two consecutive uploads (09:54, 10:04) followed by one erasure (10:22). The erasure unlinked only the SECOND file (which the manifest pointed at by then); the first file was orphaned because the second upload had silently overwritten `manifest.data_path`.
|
||||
- **Real bug found**: the upload handler did NOT refuse a second upload to a subject with `biometric_collection.is_some()`. Patched `process_upload` to return HTTP 409 + `error: "biometric_already_collected"` when a re-upload is attempted; operator must explicitly POST `/biometric/subject/{id}/erase` first.
|
||||
- Stranded test file removed (`rm` of the 13-byte fixture).
|
||||
- New unit test `second_upload_without_erase_returns_409` asserts the 409 + that the first photo's data_path remains unchanged + that the first file remains untouched on disk.
|
||||
- Existing `repeated_uploads_grow_the_chain` replaced with `upload_erase_upload_grows_the_chain_cleanly` (covers the legitimate re-collection cycle: upload → erase → upload, chain grows to 3 rows).
|
||||
- Existing `content_type_with_parameters_accepted` test updated to use two distinct subjects (it had used one subject for two uploads to test content-type parsing — now would 409).
|
||||
- **22 biometric_endpoint tests + 59 catalogd lib tests all green** post-patch (was 21+58 pre-patch).
|
||||
- Production posture: gateway binary needs rebuild (`cargo build --release`) + `systemctl restart lakehouse.service` to pick up the new 409 behavior in live traffic.
|
||||
- **Pre-rotation chain tamper-detect (expected, not a bug).** WORKER-{1..5} had pre-2026-05-05 audit chains under the prior `LH_SUBJECT_AUDIT_KEY`. Under the new key (post-`/tmp` wipe rotation), those chains correctly tamper-detect. The rotation runbook §4.4 documents this as expected; a §2.2 pre-rotation snapshot is what would prove they were intact pre-rotation if defensibility ever needs it.
|
||||
|
||||
### What's blocking production cutover NOW (after this wave)
|
||||
- **Counsel calendar:** the four sign-off items in `COUNSEL_REVIEW_PACKET_2026-05-05.md` (retention schedule, consent template, destruction runbook, pre-identityd attestation). The packet tarball is ready; ⚖ counsel is the bottleneck.
|
||||
- **Nothing else.** Engineering is no longer the long pole.
|
||||
|
||||
### Phase 1.6 BIPA gates — status table (this is the final post-Option-C state)
|
||||
|
||||
| # | Gate | Status |
|
||||
|---|---|---|
|
||||
| 1 | Public retention schedule | **eng-staged**, revised for Option C, ready for counsel |
|
||||
| 2 | Informed consent template | **eng-staged**, revised for Option C, ready for counsel |
|
||||
| 3a | Photo upload endpoint | **DONE** (shipped `f1fa6e4`, 11 unit tests, live verified) |
|
||||
| 3b | Deepface classification | **DECIDED 2026-05-05: Option C (defer)** |
|
||||
| 4 | Name → ethnicity inference removal | **DONE** (shipped, 4/4 mcp-server tests pass) |
|
||||
| 5 | Destruction runbook + erasure endpoint | **eng-DONE** (`848a458`, 21 tests). Runbook scripts (verify + report) shipped 2026-05-05. Counsel review pending. |
|
||||
| §2 | Pre-identityd attestation | **eng-DONE** (3/3 evidence checks). Awaits J + counsel signature. |
|
||||
| §3 | Employee training | **deferred** (consolidated into runbook §7 acknowledgment for current operator population) |
|
||||
|
||||
---
|
||||
|
||||
## WHAT LANDED 2026-05-03 (16 commits this wave — local-first audit substrate + Phase 1.6 BIPA gates)
|
||||
|
||||
The dominant work today: **`docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md` Steps 1-8 SHIPPED end-to-end** + **5 of 7 Phase 1.6 BIPA pre-launch gates** + **6th cross-runtime parity probe**. Wave was structured as eight ship-then-scrum cycles — every wave caught real bugs, every fix wave landed within the same session.
|
||||
|
||||
@ -307,6 +307,21 @@ pub async fn process_upload(
|
||||
consent_status: None,
|
||||
}));
|
||||
}
|
||||
// Refuse double-upload. If a BiometricCollection already exists on
|
||||
// the manifest, the operator must explicitly erase before re-uploading.
|
||||
// Without this gate, a second POST silently overwrites manifest.data_path
|
||||
// and orphans the previous photo file on disk — creating a forever-leak
|
||||
// pattern and a BIPA defensibility hole ("we said we erased the photo,
|
||||
// but the previous version of it is still under the same subject dir").
|
||||
// Caught 2026-05-05 by verify_biometric_erasure.sh against WORKER-2.
|
||||
if manifest.biometric_collection.is_some() {
|
||||
return Err((StatusCode::CONFLICT, ErrorResponse {
|
||||
error: "biometric_already_collected",
|
||||
detail: "subject already has a BiometricCollection on the manifest; \
|
||||
POST /biometric/subject/{id}/erase first if you intend to replace the photo".into(),
|
||||
consent_status: None,
|
||||
}));
|
||||
}
|
||||
|
||||
let template_hash = {
|
||||
let mut h = Sha256::new();
|
||||
@ -947,15 +962,92 @@ mod tests {
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn repeated_uploads_grow_the_chain() {
|
||||
let state = fixture_state("repeated").await;
|
||||
let _ = state.registry.put_subject(fixture_manifest("WORKER-5", BiometricConsentStatus::Given, SubjectStatus::Active)).await;
|
||||
async fn second_upload_without_erase_returns_409() {
|
||||
// BIPA defensibility: a second upload to a subject that already
|
||||
// has a BiometricCollection must fail-closed. Without this gate,
|
||||
// the second upload silently overwrites manifest.data_path and
|
||||
// orphans the first photo on disk forever (caught 2026-05-05 on
|
||||
// WORKER-2 by verify_biometric_erasure.sh).
|
||||
let state = fixture_state("second_upload_409").await;
|
||||
let _ = state.registry.put_subject(fixture_manifest("WORKER-DUP", BiometricConsentStatus::Given, SubjectStatus::Active)).await;
|
||||
let storage_root = state.storage_root.clone();
|
||||
let registry = state.registry.clone();
|
||||
|
||||
// First upload succeeds.
|
||||
let resp1 = process_upload(&state, "WORKER-DUP", Some(TEST_TOKEN), Some("image/jpeg"), "v1", "", &jpeg_bytes())
|
||||
.await.unwrap();
|
||||
let first_path = storage_root.join(&resp1.data_path);
|
||||
assert!(first_path.exists(), "first upload should produce a file");
|
||||
|
||||
// Second upload refused with 409.
|
||||
let err = process_upload(&state, "WORKER-DUP", Some(TEST_TOKEN), Some("image/jpeg"), "v1", "", &jpeg_bytes())
|
||||
.await.unwrap_err();
|
||||
assert_eq!(err.0, StatusCode::CONFLICT);
|
||||
assert_eq!(err.1.error, "biometric_already_collected");
|
||||
|
||||
// Manifest still points at the first upload — pointer was NOT overwritten.
|
||||
let m = registry.get_subject("WORKER-DUP").await.unwrap();
|
||||
let bc = m.biometric_collection.as_ref().expect("collection should still be set");
|
||||
assert_eq!(bc.data_path, resp1.data_path,
|
||||
"manifest data_path must be unchanged after refused second upload");
|
||||
|
||||
// First file remains on disk untouched (refusal must not unlink it).
|
||||
assert!(first_path.exists(), "first upload's file must remain after refused second upload");
|
||||
let still_on_disk = tokio::fs::read(&first_path).await.unwrap();
|
||||
assert_eq!(still_on_disk, jpeg_bytes(),
|
||||
"first upload's bytes must not have been overwritten");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn upload_erase_upload_grows_the_chain_cleanly() {
|
||||
// Prior version of this test allowed repeated uploads to chain;
|
||||
// that conflated chain growth with allowed re-upload. Under the
|
||||
// double-upload guard (409 above), the only legitimate way to
|
||||
// re-collect is upload → erase → upload. Chain grows to 3 rows
|
||||
// (collection, erasure, collection); on-disk file count returns
|
||||
// to one after the second upload.
|
||||
let state = fixture_state("upload_erase_upload").await;
|
||||
let _ = state.registry.put_subject(fixture_manifest("WORKER-CYCLE", BiometricConsentStatus::Given, SubjectStatus::Active)).await;
|
||||
let writer = state.writer.clone();
|
||||
for _ in 0..2 {
|
||||
let _ = process_upload(&state, "WORKER-5", Some(TEST_TOKEN), Some("image/jpeg"), "", "", &jpeg_bytes())
|
||||
.await.unwrap();
|
||||
}
|
||||
assert_eq!(writer.verify_chain("WORKER-5").await.unwrap(), 2);
|
||||
let storage_root = state.storage_root.clone();
|
||||
|
||||
// First upload.
|
||||
let resp1 = process_upload(&state, "WORKER-CYCLE", Some(TEST_TOKEN), Some("image/jpeg"), "", "", &jpeg_bytes())
|
||||
.await.unwrap();
|
||||
let first_path = storage_root.join(&resp1.data_path);
|
||||
assert!(first_path.exists());
|
||||
|
||||
// Erase. Uses process_erase test helper (the production path
|
||||
// parses the EraseRequest from request body; tests inject it
|
||||
// directly). Note: the erase flow flips biometric.status to
|
||||
// Withdrawn, so the post-erase second upload must reset consent
|
||||
// first (production flow would require new consent collection).
|
||||
let _ = process_erase(&state, "WORKER-CYCLE", Some(TEST_TOKEN), "trace-cycle", fixture_erase_request("biometric_only"))
|
||||
.await.unwrap();
|
||||
assert!(!first_path.exists(), "first photo file must be unlinked by erase");
|
||||
|
||||
// Reset consent + status on the post-erase manifest so the second
|
||||
// upload can proceed (production flow would require new consent
|
||||
// collection here; for this test we directly flip the manifest).
|
||||
let mut post_erase = state.registry.get_subject("WORKER-CYCLE").await.unwrap();
|
||||
post_erase.consent.biometric.status = BiometricConsentStatus::Given;
|
||||
post_erase.status = SubjectStatus::Active;
|
||||
post_erase.biometric_collection = None;
|
||||
let _ = state.registry.put_subject(post_erase).await;
|
||||
|
||||
// Second upload (legitimate, after erase).
|
||||
let resp2 = process_upload(&state, "WORKER-CYCLE", Some(TEST_TOKEN), Some("image/jpeg"), "", "", &jpeg_bytes())
|
||||
.await.unwrap();
|
||||
let second_path = storage_root.join(&resp2.data_path);
|
||||
assert!(second_path.exists(), "second upload should produce a file");
|
||||
assert_ne!(resp1.data_path, resp2.data_path, "second upload should land at a new path");
|
||||
|
||||
// Chain has 3 rows: collection, erasure, collection.
|
||||
assert_eq!(writer.verify_chain("WORKER-CYCLE").await.unwrap(), 3);
|
||||
let rows = writer.read_rows_in_range("WORKER-CYCLE", None, None).await.unwrap();
|
||||
assert_eq!(rows[0].accessor.kind, "biometric_collection");
|
||||
assert_eq!(rows[1].accessor.kind, "biometric_erasure");
|
||||
assert_eq!(rows[2].accessor.kind, "biometric_collection");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
@ -985,18 +1077,23 @@ mod tests {
|
||||
// Caught 2026-05-03 opus scrum WARN; regression test ensures
|
||||
// the bare media type is matched after stripping parameters.
|
||||
let state = fixture_state("ct_with_params").await;
|
||||
let _ = state.registry.put_subject(fixture_manifest("WORKER-CT", BiometricConsentStatus::Given, SubjectStatus::Active)).await;
|
||||
// Two distinct subjects so each upload exercises the "first upload"
|
||||
// path. Prior version used one subject and two uploads — now blocked
|
||||
// by the double-upload guard (409). The test's actual intent is
|
||||
// content-type parsing, not re-upload tolerance.
|
||||
let _ = state.registry.put_subject(fixture_manifest("WORKER-CT-A", BiometricConsentStatus::Given, SubjectStatus::Active)).await;
|
||||
let _ = state.registry.put_subject(fixture_manifest("WORKER-CT-B", BiometricConsentStatus::Given, SubjectStatus::Active)).await;
|
||||
let resp = process_upload(
|
||||
&state, "WORKER-CT", Some(TEST_TOKEN),
|
||||
&state, "WORKER-CT-A", Some(TEST_TOKEN),
|
||||
Some("image/jpeg; charset=binary"), "", "", &jpeg_bytes(),
|
||||
).await.unwrap();
|
||||
assert_eq!(resp.candidate_id, "WORKER-CT");
|
||||
assert_eq!(resp.candidate_id, "WORKER-CT-A");
|
||||
// Also case-insensitive matching: "Image/JPEG" should work too.
|
||||
let resp2 = process_upload(
|
||||
&state, "WORKER-CT", Some(TEST_TOKEN),
|
||||
&state, "WORKER-CT-B", Some(TEST_TOKEN),
|
||||
Some("Image/JPEG"), "", "", &jpeg_bytes(),
|
||||
).await.unwrap();
|
||||
assert_eq!(resp2.candidate_id, "WORKER-CT");
|
||||
assert_eq!(resp2.candidate_id, "WORKER-CT-B");
|
||||
}
|
||||
|
||||
// ─── Erasure tests (Gate 5) ──────────────────────────────────────
|
||||
|
||||
@ -22,7 +22,7 @@ Each gate is a deliverable that must ship before real-photo intake. None is opti
|
||||
- `docs/policies/consent/biometric_retention_schedule_v1.md` — public file
|
||||
- Linked from public privacy policy at the deployment URL
|
||||
- Specifies:
|
||||
- Categories of biometric data collected (facial geometry derived from candidate photos, age estimate, gender classification, race classification — per Phase 1.5 deepface walk)
|
||||
- Categories of biometric data collected (facial photograph for staff identification at job sites; classifications deferred per Gate 3b — see `docs/specs/GATE_3B_DEEPFACE_DESIGN.md`)
|
||||
- Purpose of collection (identity matching for staffing operations)
|
||||
- Maximum retention: BIPA §15(a) caps at "3 years from the individual's last interaction with the private entity, whichever occurs first" — recommend 18-24 months as the operational ceiling (provides safety margin)
|
||||
- Destruction procedure: per Gate 5 below
|
||||
@ -67,7 +67,7 @@ Each gate is a deliverable that must ship before real-photo intake. None is opti
|
||||
|
||||
**What ships:**
|
||||
|
||||
A new endpoint (proposed: `POST /v1/identity/subjects/{candidate_id}/photo`) with the following behavior:
|
||||
An endpoint at `POST /biometric/subject/{candidate_id}/photo` (catalogd-local — the original v1 spec named this `/v1/identity/subjects/{candidate_id}/photo` under a separate identityd daemon; that daemon was collapsed into catalogd per the architecture pivot. See `IDENTITY_SERVICE_DESIGN.md` deprecation header.) with the following behavior:
|
||||
|
||||
1. Caller authenticates with service-tier token
|
||||
2. Endpoint queries identityd for `subjects.biometric_consent_status`
|
||||
@ -75,18 +75,23 @@ A new endpoint (proposed: `POST /v1/identity/subjects/{candidate_id}/photo`) wit
|
||||
4. If status = `'given'`:
|
||||
a. Photo bytes accepted, stored to a quarantined path under `data/biometric/uploads/{candidate_id}/{ts}.{ext}` (NOT `data/headshots/`)
|
||||
b. deepface tagging runs against the photo
|
||||
c. Classifications (gender, race, age) stored to `subjects` table fields (NEW columns — see schema additions below)
|
||||
c. Classifications (gender, race, age) — **DEFERRED to Gate 3b** (`docs/specs/GATE_3B_DEEPFACE_DESIGN.md`). `BiometricCollection.classifications` remains `None` in v1.
|
||||
d. Original photo bytes encrypted under DEK + retained per Gate 1 schedule
|
||||
e. `pii_access_log` row written with `purpose_token='biometric_collection'`
|
||||
5. Response: `{candidate_id, retention_until, consent_version}`
|
||||
|
||||
**Schema additions to identityd `subjects`:**
|
||||
**Schema (as shipped — catalogd `SubjectManifest.biometric_collection`):**
|
||||
|
||||
```sql
|
||||
ALTER TABLE subjects ADD COLUMN biometric_classifications JSONB; -- {gender, race, age} from deepface
|
||||
ALTER TABLE subjects ADD COLUMN biometric_data_path TEXT; -- quarantined path
|
||||
ALTER TABLE subjects ADD COLUMN biometric_collected_at TIMESTAMPTZ;
|
||||
ALTER TABLE subjects ADD COLUMN biometric_template_hash TEXT; -- hash of the photo bytes (for integrity, NOT for re-derivation)
|
||||
The original spec proposed JSONB columns on a Postgres `subjects` table under identityd. The shipped implementation collapses this into a per-subject JSON manifest at `data/_catalog/subjects/<id>.json`, with the `BiometricCollection` struct holding `data_path`, `template_hash`, `collected_at`, and `classifications: Option<JSON>`. See `crates/catalogd/src/subject_manifest.rs` for the canonical type.
|
||||
|
||||
```rust
|
||||
// crates/catalogd/src/subject_manifest.rs (paraphrased)
|
||||
pub struct BiometricCollection {
|
||||
pub data_path: String, // quarantined path
|
||||
pub template_hash: String, // SHA-256 of original bytes (integrity, NOT re-derivation)
|
||||
pub collected_at: DateTime<Utc>,
|
||||
pub classifications: Option<Value>, // None until Gate 3b ships (deferred — see GATE_3B_DEEPFACE_DESIGN.md)
|
||||
}
|
||||
```
|
||||
|
||||
**Engineering acceptance:**
|
||||
@ -130,8 +135,8 @@ ALTER TABLE subjects ADD COLUMN biometric_template_hash TEXT; -- hash of the
|
||||
- `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` — operator-facing
|
||||
- Specifies:
|
||||
- Triggers: retention expiry (per Gate 1), withdrawal, RTBF request, candidate request
|
||||
- Procedure: identityd `POST /v1/identity/subjects/{id}/erase` (legal-tier auth)
|
||||
- Erasure scope: `subjects.biometric_*` columns ciphertext-deleted, `biometric_data_path` files securely overwritten + unlinked, deepface classifications nulled
|
||||
- Procedure: catalogd-local `POST /biometric/subject/{id}/erase` (legal-tier auth) — formerly proposed under identityd; now serves from catalogd directly
|
||||
- Erasure scope: `BiometricCollection` set to `None` on the subject manifest (drops `data_path`, `template_hash`, `classifications` together), quarantined photo files at `data/biometric/uploads/<id>/*` securely unlinked, audit row appended BEFORE photo unlink so the chain proves intent even if file delete fails
|
||||
- Backup window: per `IDENTITY_SERVICE_DESIGN` v3-B12, residual exists in DB backups for 30 days max; subject is informed
|
||||
- Witnessed: every erasure event written to `pii_access_log` with `purpose_token='biometric_erasure'` and the legal-tier JWT signature (proves authorized destruction)
|
||||
- Reporting: monthly internal report of erasures + retention-expiry sweeps; available to counsel on request
|
||||
@ -140,7 +145,7 @@ ALTER TABLE subjects ADD COLUMN biometric_template_hash TEXT; -- hash of the
|
||||
|
||||
**Engineering acceptance:**
|
||||
- Runbook committed
|
||||
- `POST /v1/identity/subjects/{id}/erase` endpoint includes biometric-specific erasure path
|
||||
- `POST /biometric/subject/{id}/erase` endpoint includes biometric-specific erasure path (shipped `848a458` — 21 unit tests, two scopes: biometric_only / full)
|
||||
- Daily sweep job destroys biometric data past `biometric_retention_until` (separate from general retention sweep — biometric has stricter clock)
|
||||
- Erasure events are logged with cryptographic attestation
|
||||
|
||||
@ -188,7 +193,7 @@ of 2026-05-03 — scaffolds vs. counsel sign-off vs. shipped code:
|
||||
|---|---|---|---|---|
|
||||
| 1 | Public retention schedule | scaffolded at `docs/policies/consent/biometric_retention_schedule_v1.md` | pending | **eng-staged** |
|
||||
| 2 | Consent template | scaffolded at `docs/policies/consent/biometric_consent_template_v1.md` | pending | **eng-staged** |
|
||||
| 3 | Photo-upload endpoint with consent enforcement | DONE for the consent-gate substrate (`crates/catalogd/src/biometric_endpoint.rs` mounted at `/biometric/subject/{id}/photo`, 10 unit tests, live-verified end-to-end). Deepface classification deferred to **Gate 3b** (own session — needs Python subprocess design after sidecar drop). | n/a until 3b | **3a DONE, 3b deferred** |
|
||||
| 3 | Photo-upload endpoint with consent enforcement | DONE — `crates/catalogd/src/biometric_endpoint.rs` mounted at `/biometric/subject/{id}/photo`, 11 unit tests, live-verified end-to-end. **Gate 3b DECIDED 2026-05-05: Option C (defer classifications).** `BiometricCollection.classifications` stays `Option<JSON> = None` in v1; consent + retention docs revised to match. See `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` §6 + change log. | reviewed under Gate 2 (matching consent text) | **DONE — 3a shipped, 3b deferred per design doc** |
|
||||
| 4 | Name → ethnicity inference removed | DONE — `mcp-server/search.html:3372` removal note + `mcp-server/phase_1_6_gate_4.test.ts` absence test (3/3 green) | none required | **DONE** |
|
||||
| 5 | Destruction runbook | scaffolded at `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`; erasure endpoint + verify/report scripts marked TODO | pending | **eng-staged** |
|
||||
|
||||
@ -212,10 +217,14 @@ re-promotes to blocking and a separate training program must be authored.
|
||||
expected operator population size, or restore it to the blocking set.
|
||||
|
||||
**Calendar bottleneck:** Items 1, 2, 5, 6 (and #7) await counsel
|
||||
review of the engineering scaffolds. Gate 3 (photo-upload endpoint)
|
||||
is the only remaining engineering work; it's deferred to its own
|
||||
session because it crosses into identityd photo intake and deepface
|
||||
integration scope that hasn't been designed yet.
|
||||
review of the engineering scaffolds. Gate 3 substrate is fully
|
||||
shipped; Gate 3b deepface classification was DECIDED on 2026-05-05
|
||||
as Option C (defer) — `BiometricCollection.classifications` stays
|
||||
`None` in v1, consent + retention docs revised to match this
|
||||
narrower scope. If a future product requirement surfaces a real
|
||||
need for classifications, the substrate is forward-compatible
|
||||
(`Option<JSON>`) and either Option A (~1 day) or Option B (~5 days)
|
||||
of the design doc can be picked up then under a v2 consent template.
|
||||
|
||||
---
|
||||
|
||||
@ -258,4 +267,5 @@ integration scope that hasn't been designed yet.
|
||||
|
||||
## Change log
|
||||
|
||||
- 2026-05-05 — Reconciled with shipped state: endpoint paths corrected from the legacy identityd v1 spec (`/v1/identity/subjects/*`) to the catalogd-local routes that actually shipped (`/biometric/subject/*`). Schema block rewritten to reflect the JSON `SubjectManifest.biometric_collection` substrate that replaced the proposed Postgres columns. Gate 3b deepface deferral marked in-line where Disclosure 1 / Gate 3 step 5c / Gate 5 erasure scope previously assumed classifications were collected. No legal text changed; this was doc/code drift cleanup.
|
||||
- 2026-05-03 — Initial draft. Authored after `IDENTITY_SERVICE_DESIGN` v3 §5 Step 0 named Phase 1.6 as a hard prerequisite to backfill.
|
||||
|
||||
260
docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md
Normal file
260
docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md
Normal file
@ -0,0 +1,260 @@
|
||||
# Counsel Review Packet — Phase 1.6 BIPA Pre-Launch
|
||||
|
||||
**Date assembled:** 2026-05-05
|
||||
**For:** outside counsel
|
||||
**From:** J, operator of record
|
||||
**Scope:** documents that engineering has staged for legal sufficiency review
|
||||
before the staffing platform begins collecting any real candidate
|
||||
biometric data (BIPA §15(a)(b)).
|
||||
|
||||
> **What this packet is.** The Phase 1.6 BIPA gates outline what
|
||||
> engineering must ship before real-photo intake. As of 2026-05-05,
|
||||
> all engineering substrate is shipped and verified live (see §1
|
||||
> below for the inventory). What remains is binding-text authoring
|
||||
> + counsel sign-off on five documents, plus operational notification
|
||||
> obligations counsel may want to layer on top.
|
||||
>
|
||||
> **What this packet is NOT.** Not a request for counsel to write
|
||||
> binding text from scratch. The documents are eng-staged in
|
||||
> reasonable plain language; the request is for counsel to render
|
||||
> them into legally-sufficient text and attest where signatures
|
||||
> are required.
|
||||
|
||||
---
|
||||
|
||||
## 1. Engineering substrate — shipped + verified
|
||||
|
||||
For factual context on what counsel is reviewing AGAINST. None of
|
||||
this requires sign-off here; it's the system the documents bind to.
|
||||
|
||||
| Component | Where it lives | Verification |
|
||||
|---|---|---|
|
||||
| Subject manifest registry | `crates/catalogd/src/registry.rs`, `data/_catalog/subjects/<id>.json` | 17 unit tests + 100 backfilled WORKER manifests in production |
|
||||
| Per-subject HMAC audit chain (SHA-256) | `crates/catalogd/src/subject_audit.rs`, `data/_catalog/subjects/<id>.audit.jsonl` | Tamper-detection + concurrent-append race tests pass |
|
||||
| Photo upload (consent-gated) | `POST /biometric/subject/{id}/photo` | 11 unit tests + live roundtrip 200 |
|
||||
| Erasure (two-scope) | `POST /biometric/subject/{id}/erase` (`biometric_only` / `full`) | 21 unit tests; transactional rollback on audit failure |
|
||||
| Legal-tier audit read | `GET /audit/subject/{id}` (X-Lakehouse-Legal-Token header) | Constant-time auth, chain re-verification per request |
|
||||
| Retention sweep (BIPA-aware clock) | `crates/catalogd/src/bin/retention_sweep` | 8 unit tests; live verified against 100 backfilled subjects |
|
||||
| Cross-runtime parity (Rust ↔ Go) | `scripts/cutover/parity/subject_audit_parity.sh` | 6/6 byte-identical assertions pass |
|
||||
|
||||
**Key insight for counsel:** the audit chain is the BIPA-defensible
|
||||
substrate. Every state-changing event (consent given, photo uploaded,
|
||||
photo erased, legal-tier read) appends to a per-subject HMAC-chained
|
||||
JSONL log. The chain verifies end-to-end on every legal-tier read.
|
||||
A tampered chain is detectable; a forged chain requires the HMAC
|
||||
signing key, which is held under root-only mode 0400 at
|
||||
`/etc/lakehouse/subject_audit.key` and rotated per the runbook in
|
||||
attachment §6 below.
|
||||
|
||||
**Gate 3b (deepface classification) — decided 2026-05-05: Option C
|
||||
(defer).** The system collects only the photograph, not derived
|
||||
demographic information. The consent template + retention schedule
|
||||
in this packet were revised the same day to match.
|
||||
|
||||
---
|
||||
|
||||
## 2. Documents requiring counsel review + sign-off
|
||||
|
||||
In recommended review order:
|
||||
|
||||
| # | Document | Path | Counsel ask | Sign-off |
|
||||
|---|---|---|---|---|
|
||||
| A | Biometric Retention Schedule v1 | `docs/policies/consent/biometric_retention_schedule_v1.md` | Render into binding language; confirm 18-month operational ceiling vs. BIPA 3-year statutory cap | Counsel + J |
|
||||
| B | Biometric Consent Template v1 | `docs/policies/consent/biometric_consent_template_v1.md` | Render Disclosures 1-3 into binding consent language; specify electronic vs. paper signature mechanism | Counsel + J |
|
||||
| C | BIPA Destruction Runbook | `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` | Confirm 30-day SLA from trigger; confirm two-operator (operator + witness) requirement; confirm legal-hold check procedure | Counsel attestation |
|
||||
| D | BIPA Pre-IdentityD Attestation | `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` | Sign as countersigning party; J signs as operator-of-record | Counsel + J |
|
||||
| E | Legal-Tier Audit Key Rotation | `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` | Confirm rotation cadence; opine on candidate-notification obligation when rotation is compromise-driven | Counsel notes |
|
||||
| F | Gate 3b Deepface Design (FYI) | `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` | Decision-of-record showing classifications were *deliberately deferred*, not omitted by oversight. No sign-off needed; provided for audit-trail completeness. | None |
|
||||
|
||||
The five documents requiring sign-off are A, B, C, D, E. Document F
|
||||
is included so the audit trail shows the Gate 3b decision was
|
||||
deliberate.
|
||||
|
||||
---
|
||||
|
||||
## 3. Specific questions for counsel — by document
|
||||
|
||||
### Document A — Retention Schedule
|
||||
|
||||
1. The schedule sets an **18-month** operational ceiling against the
|
||||
BIPA 3-year statutory cap. Is the safety margin appropriate, or
|
||||
should we move to a tighter window (12 months) given the
|
||||
plaintiff-friendly Illinois posture?
|
||||
2. The schedule references the **catalogd-local** storage substrate
|
||||
rather than a separate identityd Postgres table. Does the
|
||||
public-facing language need to mention the storage architecture
|
||||
at all, or is "we keep the photo and a SHA-256 hash" sufficient?
|
||||
3. Public publication URL — counsel to specify (placeholder marked
|
||||
in §7 of the schedule).
|
||||
4. Confirm whether existing consent under v1 carries forward when
|
||||
a future v2 is published, or whether re-consent is required.
|
||||
|
||||
### Document B — Consent Template
|
||||
|
||||
1. Disclosure 1 says "we do NOT run automated facial-classification
|
||||
in v1." Does that disclosure need to mention the *possibility* of
|
||||
future classification, or is silence-with-supersession-clause
|
||||
adequate?
|
||||
2. Plain-language summary in §1 — counsel to confirm it's appropriate
|
||||
to include alongside the binding disclosure, or recommend an
|
||||
alternative comprehension aid.
|
||||
3. Withdrawal SLA is set to **30 days** in §2. Counsel to confirm
|
||||
against jurisdiction (Illinois primary; secondary deployments
|
||||
would inherit).
|
||||
4. Contact for withdrawal — counsel to specify the channel
|
||||
(placeholder in §3).
|
||||
5. Sign-off mechanism: electronic signature service, in-app
|
||||
click-acceptance with timestamp, paper form? Each has different
|
||||
evidentiary weight.
|
||||
|
||||
### Document C — Destruction Runbook
|
||||
|
||||
1. Confirm 30-day SLA from each of four triggers (retention expiry,
|
||||
consent withdrawal, RTBF, court order). Some interpretations
|
||||
prefer 7 or 14 days for withdrawal/RTBF.
|
||||
2. Two-operator requirement (operator-of-record + witness): is the
|
||||
witness role acceptable for counsel's defensibility view, or
|
||||
should we elevate to dual-control with cryptographic split-key?
|
||||
3. Legal-hold check procedure (§2 step 3) — counsel to specify the
|
||||
actual procedure for confirming no hold is in force before
|
||||
erasing.
|
||||
4. Backup-window disclosure (§4) — confirm 30-day backup retention
|
||||
is acceptable.
|
||||
5. Candidate notification template (§3 step 4) — counsel to supply.
|
||||
|
||||
### Document D — Pre-IdentityD Attestation
|
||||
|
||||
1. Both signature lines blank — J signs as operator-of-record;
|
||||
counsel signs as the countersigning legal party.
|
||||
2. The attestation hash anchors the evidence; once signed, the
|
||||
hash itself becomes a tamper-evident witness. Counsel to confirm
|
||||
storage location for the signed copy (firm files?).
|
||||
|
||||
### Document E — Key Rotation Runbook
|
||||
|
||||
1. Recommended rotation cadence — 90 days suggested in §1.
|
||||
Counsel to confirm or override.
|
||||
2. Custody schedule for `/etc/lakehouse/_archived/` raw key files —
|
||||
§7.2 question; suggested 1-year retention but counsel-driven.
|
||||
3. Candidate-notification obligation when rotation is
|
||||
compromise-driven (§7.3) — counsel call.
|
||||
|
||||
---
|
||||
|
||||
## 4. Engineering changes counsel should know about (recent)
|
||||
|
||||
These reconciled doc/code drift after a rapid wave on 2026-05-03:
|
||||
|
||||
- **Endpoint paths:** the original v1 spec proposed
|
||||
`/v1/identity/subjects/*` under a separate identityd daemon. That
|
||||
daemon was collapsed into catalogd; endpoints actually shipped at
|
||||
`/biometric/subject/*` (catalogd-local). Documents in this packet
|
||||
reference the catalogd-local routes; legacy references in
|
||||
`IDENTITY_SERVICE_DESIGN.md` are flagged "do NOT implement
|
||||
as-written" in that doc's deprecation header.
|
||||
- **No identityd Postgres database:** the original spec proposed
|
||||
encrypted-at-rest Postgres + HashiCorp Vault + S3 Object Lock for
|
||||
PII storage. The shipped substrate is local JSON manifests +
|
||||
per-subject HMAC-chained JSONL, sized for J's local-only
|
||||
deployment per `PRD.md` line 70 ("Everything runs locally — no
|
||||
cloud APIs").
|
||||
- **Gate 3b deferral (Option C, 2026-05-05):** classifications
|
||||
(gender / race / age inference) were deliberately deferred. The
|
||||
consent template and retention schedule in this packet do NOT
|
||||
disclose collection of derived demographic data, because we are
|
||||
not collecting it. If a future product requirement reverses this,
|
||||
we will publish a v2 consent + v2 retention with re-consent.
|
||||
- **Key rotation 2026-05-05:** the prior `LH_SUBJECT_AUDIT_KEY` was
|
||||
lost when a `/tmp` wipe on reboot disabled the audit and biometric
|
||||
endpoints. The new key is at `/etc/lakehouse/subject_audit.key`
|
||||
(mode 0400). Pre-rotation audit chains tamper-detect under the
|
||||
new key — this is correct, expected behavior, not a bug.
|
||||
|
||||
---
|
||||
|
||||
## 5. Open eng items NOT awaiting counsel
|
||||
|
||||
For transparency. These are engineering work items, not legal items:
|
||||
|
||||
1. **Residual photo unlink on erasure.** During verification of the
|
||||
one historical erasure event (`WORKER-2`), the verify script
|
||||
surfaced a stranded photo file that was not unlinked when
|
||||
`BiometricCollection` was cleared from the manifest. Engineering
|
||||
investigates; if the bug is real, the fix is `crates/catalogd/src/biometric_endpoint.rs`
|
||||
in the erasure handler. This does NOT affect the current packet —
|
||||
no real candidate photos have been collected yet (per §1
|
||||
attestation), so the residual is from a synthetic test event.
|
||||
2. **Phase 1.6 §3 employee training.** Currently deferred per
|
||||
acknowledgement coverage in §7 of the destruction runbook
|
||||
(single-operator population). Re-promotes to blocking if the
|
||||
operator population grows; counsel may want to opine on the
|
||||
threshold.
|
||||
|
||||
---
|
||||
|
||||
## 6. Sign-off sequence
|
||||
|
||||
Recommended order so a hold-up on one doc doesn't block others:
|
||||
|
||||
1. **First wave (parallel):** A (retention schedule) + B (consent
|
||||
template). These two have the tightest interdependence (consent
|
||||
v1 references retention v1 by hash); review them together.
|
||||
2. **Second wave:** C (destruction runbook). Depends on A's retention
|
||||
period being fixed.
|
||||
3. **Third wave:** D (pre-identityd attestation). Sign once A + B + C
|
||||
are settled; the attestation snapshot is the boundary between
|
||||
pre-Phase-1.6 and post-Phase-1.6 system state.
|
||||
4. **Fourth wave:** E (key rotation). Independent of A-D; can be
|
||||
reviewed in parallel any time.
|
||||
|
||||
---
|
||||
|
||||
## 7. After sign-off — engineering steps
|
||||
|
||||
Once each document is signed:
|
||||
|
||||
| Document | Engineering action | Trigger |
|
||||
|---|---|---|
|
||||
| A retention schedule | Hash + commit; reference in `consent_versions` table | Counsel signature |
|
||||
| B consent template | Hash + commit; reference in candidate-facing intake UI | Counsel signature |
|
||||
| C destruction runbook | Adopt; operator acknowledgment recorded in §7 | Counsel attestation |
|
||||
| D pre-identityd attestation | Anchor hash to filesystem + git; counsel keeps original signature | Both signatures |
|
||||
| E key rotation | Adopt; rotation event log seeded with counsel-approved cadence | Counsel notes |
|
||||
|
||||
The HARD blocker for first real-candidate photo collection is
|
||||
A + B + D signed. C and E are operationally important but do not
|
||||
block the *first* photo (they govern destruction + key handling
|
||||
which apply to any state, not the boundary state).
|
||||
|
||||
---
|
||||
|
||||
## 8. Cover-note hash
|
||||
|
||||
This packet is itself a snapshot. Future-Claude / future-J will refer
|
||||
back to this packet to know what counsel saw on 2026-05-05.
|
||||
|
||||
**Packet attached files (referenced by path):**
|
||||
|
||||
- `docs/policies/consent/biometric_retention_schedule_v1.md`
|
||||
- `docs/policies/consent/biometric_consent_template_v1.md`
|
||||
- `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`
|
||||
- `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md`
|
||||
- `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md`
|
||||
- `docs/specs/GATE_3B_DEEPFACE_DESIGN.md`
|
||||
- `docs/PHASE_1_6_BIPA_GATES.md` (the spec they all reference)
|
||||
|
||||
Per-file SHA-256 hashes are produced by the bundler script (next
|
||||
section); the bundler also creates a tarball ready for transmission.
|
||||
|
||||
---
|
||||
|
||||
## 9. Generating the bundle for transmission
|
||||
|
||||
```bash
|
||||
./scripts/staffing/bundle_counsel_packet.sh
|
||||
```
|
||||
|
||||
Produces `reports/counsel/counsel_packet_<DATE>.tar.gz` with all
|
||||
referenced documents + a manifest listing per-file SHA-256 hashes.
|
||||
Counsel can verify file integrity on receipt by re-running
|
||||
sha256sum against each file in the tarball.
|
||||
@ -3,6 +3,7 @@
|
||||
**Spec:** docs/PHASE_1_6_BIPA_GATES.md §1 Gate 2 (BIPA §15(b)(1)-(3))
|
||||
**Status:** Engineering scaffold — ⚖ COUNSEL must author the binding text before deployment
|
||||
**Version:** v1 (initial; supersession requires a new version + new hash)
|
||||
**Updated 2026-05-05:** Disclosure 1 + plain-language summary revised to match the Gate 3b deferral recommendation in `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` (Option C — defer classifications). Pending J's product confirmation of Gate 3b; if Gate 3b chooses Option A or B, this template needs counsel re-authoring.
|
||||
|
||||
> This is the consent template a candidate signs (electronically or
|
||||
> on paper) BEFORE Lakehouse collects, stores, or processes any
|
||||
@ -25,11 +26,12 @@ content; counsel provides the legally-sufficient wording.
|
||||
|
||||
### Disclosure 1 — Notice of collection (§15(b)(1))
|
||||
|
||||
Lakehouse will collect, store, and use my **biometric identifier**
|
||||
(facial geometry derived from a photograph of me) and **biometric
|
||||
information** (gender, race, and age classifications derived from
|
||||
that photograph by an automated facial-classification model called
|
||||
deepface).
|
||||
Lakehouse will collect and store my **biometric identifier** (a
|
||||
photograph of me from which facial geometry is implicit). The
|
||||
photograph itself is the data we keep — we do NOT run automated
|
||||
facial-classification (gender / race / age inference) against it
|
||||
in v1. If at a later date we add automated classification, we will
|
||||
re-collect consent under a superseding template before doing so.
|
||||
|
||||
### Disclosure 2 — Specific purpose and length of term (§15(b)(2))
|
||||
|
||||
@ -66,9 +68,9 @@ is appropriate to include or whether a different plain-language
|
||||
section is preferred.
|
||||
|
||||
> **What you're agreeing to:** if you upload a photo of yourself,
|
||||
> we'll keep that photo and a few descriptive labels about the photo
|
||||
> (estimated age, perceived gender, perceived race) to help your
|
||||
> staffing coordinator recognize you when you arrive at job sites.
|
||||
> we'll keep that photo so your staffing coordinator can recognize
|
||||
> you when you arrive at job sites. We don't run automated guesses
|
||||
> about your age, gender, or race against the photo.
|
||||
>
|
||||
> **How long we keep it:** at most 18 months after your last
|
||||
> placement or interaction with us, then it's permanently destroyed.
|
||||
|
||||
@ -3,6 +3,7 @@
|
||||
**Spec:** docs/PHASE_1_6_BIPA_GATES.md §1 Gate 1 (BIPA §15(a))
|
||||
**Status:** Engineering scaffold — ⚖ COUNSEL must author the binding text before public publication
|
||||
**Version:** v1 (initial; supersession requires a new version + new hash)
|
||||
**Updated 2026-05-05:** §1 + §2 revised to match the Gate 3b deferral recommendation in `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` (Option C — defer classifications). §5 destruction-trigger endpoint corrected to the shipped catalogd-local route. Pending J's product confirmation of Gate 3b.
|
||||
|
||||
> This is a publicly-available retention schedule for biometric identifiers
|
||||
> and biometric information collected by the Lakehouse staffing platform.
|
||||
@ -15,12 +16,15 @@
|
||||
|
||||
This schedule applies to:
|
||||
|
||||
- **Biometric identifiers** as defined in 740 ILCS 14/10: facial geometry
|
||||
derived from candidate photographs.
|
||||
- **Biometric identifiers** as defined in 740 ILCS 14/10: candidate
|
||||
photographs from which facial geometry is implicit.
|
||||
- **Biometric information** as defined in 740 ILCS 14/10: any information
|
||||
derived from a biometric identifier, including but not limited to
|
||||
the gender, race, and age classifications produced by the deepface
|
||||
model when applied to a candidate photograph.
|
||||
derived from a biometric identifier. **In v1 of this schedule, no
|
||||
derived information is collected** — automated facial-classification
|
||||
(gender, race, age inference) is deferred per
|
||||
`docs/specs/GATE_3B_DEEPFACE_DESIGN.md` Option C. If a future version
|
||||
of this schedule introduces classification, that is a superseding
|
||||
v2 schedule with re-consent under the matching v2 consent template.
|
||||
|
||||
**Out of scope** (explicitly NOT biometric data under this schedule):
|
||||
|
||||
@ -39,14 +43,15 @@ This schedule applies to:
|
||||
|
||||
| Category | Source | Storage location |
|
||||
|---|---|---|
|
||||
| Photograph (raw bytes) | Candidate upload via the consent-gated photo endpoint | Quarantined under `data/biometric/uploads/<candidate_id>/<ts>.<ext>`; encrypted at rest |
|
||||
| Facial geometry classifications | deepface inference run against the photograph | `subjects.biometric_classifications` (JSONB on the identityd `subjects` row) |
|
||||
| Photograph integrity hash | SHA-256 of the original bytes | `subjects.biometric_template_hash` |
|
||||
| Photograph (raw bytes) | Candidate upload via the consent-gated photo endpoint | Quarantined under `data/biometric/uploads/<candidate_id>/<ts>_<uuid>.<ext>`; mode 0700 dir / 0600 file |
|
||||
| Photograph integrity hash | SHA-256 of the original bytes | `SubjectManifest.biometric_collection.template_hash` (catalogd JSON manifest at `data/_catalog/subjects/<id>.json`) |
|
||||
|
||||
We do NOT collect raw biometric template vectors that could be used
|
||||
to re-derive a face from the encoded form. The deepface output is
|
||||
stored as discrete classification labels (e.g. `{"age_estimate": 32,
|
||||
"gender": "...", "race": "..."}`), not as a re-identifiable embedding.
|
||||
to re-derive a face from the encoded form. We do NOT run automated
|
||||
facial-classification (gender, race, age inference) in v1 — see
|
||||
`docs/specs/GATE_3B_DEEPFACE_DESIGN.md` for the deferral rationale.
|
||||
The `BiometricCollection.classifications` field on the subject
|
||||
manifest exists in the schema but is `None` for every subject.
|
||||
|
||||
---
|
||||
|
||||
@ -104,8 +109,8 @@ Runbook** (`docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`) when:
|
||||
- Retention period under §4 expires
|
||||
- Candidate withdraws biometric consent under the consent template (Gate 2)
|
||||
- Candidate exercises a right-to-be-forgotten request
|
||||
- An identityd `POST /v1/identity/subjects/{id}/erase` is invoked under
|
||||
legal-tier authentication
|
||||
- A catalogd-local `POST /biometric/subject/{id}/erase` is invoked
|
||||
under legal-tier authentication (shipped `848a458`)
|
||||
|
||||
Every destruction event is recorded as an append-only audit row in
|
||||
the affected subject's per-subject HMAC-chained audit log (see
|
||||
|
||||
@ -66,10 +66,11 @@ Before initiating destruction, the operator MUST:
|
||||
Invoke the legal-tier erasure endpoint:
|
||||
|
||||
```bash
|
||||
curl -sf -X POST "http://localhost:3100/v1/identity/subjects/${CANDIDATE_ID}/erase" \
|
||||
curl -sf -X POST "http://localhost:3100/biometric/subject/${CANDIDATE_ID}/erase" \
|
||||
-H "Authorization: Bearer $(cat /etc/lakehouse/legal_audit.token)" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"scope": "biometric_only|full",
|
||||
"trigger": "retention_expiry|consent_withdrawal|rtbf|court_order",
|
||||
"trigger_evidence_path": "<path to signed artifact>",
|
||||
"operator_of_record": "<operator name>",
|
||||
@ -77,17 +78,25 @@ curl -sf -X POST "http://localhost:3100/v1/identity/subjects/${CANDIDATE_ID}/era
|
||||
}'
|
||||
```
|
||||
|
||||
⚖ ENGINEERING — `POST /v1/identity/subjects/{id}/erase` is Phase 1.6
|
||||
Gate 3 dependent. Until it ships, the manual procedure is:
|
||||
The endpoint is **shipped** (commit `848a458`, 21 unit tests). It is
|
||||
served from catalogd-local at `/biometric/subject/{id}/erase` (the
|
||||
original v1 spec proposed `/v1/identity/subjects/{id}/erase` under a
|
||||
separate identityd daemon — that daemon was collapsed into catalogd
|
||||
per the architecture pivot).
|
||||
|
||||
a. Set `SubjectManifest.consent.biometric.status = "withdrawn"` and
|
||||
`SubjectManifest.status = "erased"` via direct registry write
|
||||
(operator-of-record only).
|
||||
b. Securely overwrite + unlink the quarantined photo path:
|
||||
`shred -uvz data/biometric/uploads/${CANDIDATE_ID}/*.jpg`
|
||||
(or equivalent for the configured backend).
|
||||
c. NULL the deepface classification fields on the subject row.
|
||||
d. Append the destruction-event audit row (Step 2 below).
|
||||
The endpoint exposes two scopes:
|
||||
|
||||
- **`scope: "biometric_only"`** — clears `BiometricCollection` from
|
||||
the SubjectManifest (drops `data_path`, `template_hash`, and
|
||||
`classifications` together) + securely unlinks the quarantined
|
||||
photo file. Subject manifest itself remains. Use for retention
|
||||
expiry / consent withdrawal where only biometric data must go.
|
||||
- **`scope: "full"`** — full subject erasure (manifest + biometric
|
||||
files). Use for court-ordered erasure or full RTBF requests.
|
||||
|
||||
In both scopes, the audit row is appended BEFORE photo unlink so
|
||||
the chain has legal proof of intent even if the file delete fails
|
||||
(transactional rollback on audit failure).
|
||||
|
||||
### Step 2 — Append the destruction-event audit row
|
||||
|
||||
@ -224,5 +233,12 @@ training program.
|
||||
|
||||
## 8. Change log
|
||||
|
||||
- 2026-05-05 — Endpoint path reconciled with shipped state:
|
||||
`/v1/identity/subjects/{id}/erase` (legacy proposal under a
|
||||
separate identityd daemon) → `/biometric/subject/{id}/erase`
|
||||
(catalogd-local, shipped `848a458`). Step 1 manual-fallback
|
||||
block removed (the endpoint is no longer "TODO"). Two-scope
|
||||
body shape (`biometric_only` / `full`) documented to match
|
||||
the implementation.
|
||||
- 2026-05-03 — Initial scaffold. ⚖ COUNSEL review required before
|
||||
adoption.
|
||||
|
||||
308
docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md
Normal file
308
docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md
Normal file
@ -0,0 +1,308 @@
|
||||
# Legal-Tier Audit Key & Token Rotation Runbook
|
||||
|
||||
**Spec companion:** `docs/PHASE_1_6_BIPA_GATES.md` §2 + `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`
|
||||
**Audience:** Operators with root on the gateway host (J + named operators)
|
||||
**Status:** Engineering-authored — ⚖ counsel review encouraged before formal adoption
|
||||
|
||||
> This runbook covers rotation of the two crypto-credentials that gate
|
||||
> the Phase 1.6 audit substrate:
|
||||
>
|
||||
> 1. **`LH_SUBJECT_AUDIT_KEY`** — the 32-byte HMAC-SHA256 signing key
|
||||
> that chains every per-subject audit row. If this key changes, all
|
||||
> pre-rotation chain rows tamper-detect under the new key. That is
|
||||
> correct, expected, BIPA-defensible behavior — the chain integrity
|
||||
> it provided pre-rotation remains intact in the archive of the old
|
||||
> key, and post-rotation chains remain intact going forward.
|
||||
>
|
||||
> 2. **`LH_LEGAL_AUDIT_TOKEN`** — the 32+-character bearer token that
|
||||
> authorizes calls to `/audit/subject/{id}` and
|
||||
> `/biometric/subject/{id}/erase`. Rotation does NOT touch any audit
|
||||
> history; only access to the legal-tier endpoints flips.
|
||||
>
|
||||
> Both live at `/etc/lakehouse/` (mode 0400, owned by root) and are
|
||||
> loaded by the gateway via systemd `Environment=` directives in
|
||||
> `/etc/systemd/system/lakehouse.service.d/audit_env.conf`. They are
|
||||
> NOT loaded from `/tmp` — a 2026-05-05 reboot incident wiped a
|
||||
> `/tmp`-resident key and caused `/audit` + `/biometric` to fail-closed
|
||||
> (which is what they should do); the rotation fix moved them to the
|
||||
> persistent path.
|
||||
|
||||
---
|
||||
|
||||
## 1. When to rotate
|
||||
|
||||
Rotate **immediately** when any of the following is true:
|
||||
|
||||
| Trigger | Urgency | Notes |
|
||||
|---|---|---|
|
||||
| Suspected operator credential compromise | Within 1 hour | Token mismatch is fail-closed by default; immediate rotation closes the window. |
|
||||
| Operator with legal-tier access leaves the team | Within 24 hours | Treat as compromise. |
|
||||
| Key/token file's filesystem permissions were ever weakened (mode > 0400, group readable, etc.) | Within 24 hours | Filesystem audit may have leaked the bytes. |
|
||||
| Token was ever transmitted over an untrusted channel (printed in CI log, sent over SMS, etc.) | Within 24 hours | Same reasoning. |
|
||||
| Scheduled rotation (recommended) | Every 90 days | BIPA does not mandate a rotation cadence; counsel may set one. |
|
||||
|
||||
Do **not** rotate when:
|
||||
|
||||
- A subject's audit chain tamper-detects in isolation. That is normal
|
||||
if the audit log was edited (which would itself be the BIPA finding,
|
||||
not the key). Investigate the chain, not the key.
|
||||
- Cross-runtime parity drift appears. That's an HMAC-input-shape bug
|
||||
(Go vs Rust serialization), not a key issue. See
|
||||
`STATE_OF_PLAY.md` "three runtime-divergence classes" entry.
|
||||
|
||||
---
|
||||
|
||||
## 2. Pre-rotation checks (5 minutes)
|
||||
|
||||
Before generating new credentials, capture a clean baseline so you can
|
||||
prove the rotation cause and sequence afterward.
|
||||
|
||||
### 2.1. Take the engineering snapshot
|
||||
|
||||
```bash
|
||||
# Confirm the canonical files exist with correct permissions.
|
||||
ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token
|
||||
|
||||
# Hash the existing key + token (NEVER the bytes themselves) so the
|
||||
# old credential is identifiable in retrospect without storing it.
|
||||
sha256sum /etc/lakehouse/subject_audit.key
|
||||
sha256sum /etc/lakehouse/legal_audit.token
|
||||
|
||||
# Confirm the gateway is currently using these files.
|
||||
sudo systemctl cat lakehouse.service | grep -E "Environment.*AUDIT"
|
||||
|
||||
# Verify the audit endpoint is healthy with the current credentials.
|
||||
curl -sf http://localhost:3100/audit/health
|
||||
```
|
||||
|
||||
If `/audit/health` is already 503, the rotation is **recovery**, not
|
||||
preventive — note this in the rotation event record (§5).
|
||||
|
||||
### 2.2. Capture a known-good chain root
|
||||
|
||||
Pick one or two subjects with non-empty audit logs and record their
|
||||
chain roots **under the current key**:
|
||||
|
||||
```bash
|
||||
TOKEN=$(cat /etc/lakehouse/legal_audit.token)
|
||||
for cid in WORKER-2 WORKER-100; do
|
||||
curl -sf -H "X-Lakehouse-Legal-Token: $TOKEN" \
|
||||
"http://localhost:3100/audit/subject/$cid" \
|
||||
| jq '{cid: .candidate_id, verified: .audit_log.chain_verified, root: .audit_log.chain_root, rows: .audit_log.chain_rows_total}'
|
||||
done
|
||||
```
|
||||
|
||||
Save the output. Post-rotation, those chains will tamper-detect under
|
||||
the new key — that is **expected** and the saved snapshot is the proof
|
||||
that the chain WAS intact under the old key, before rotation.
|
||||
|
||||
---
|
||||
|
||||
## 3. Generation + rotation
|
||||
|
||||
### 3.1. Generate the new key
|
||||
|
||||
```bash
|
||||
# 32 random bytes as hex = 64 chars. Either format works for HMAC-SHA256;
|
||||
# we follow the existing convention (44-char base64-ish with no padding).
|
||||
sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \
|
||||
/etc/lakehouse/subject_audit.key.new
|
||||
|
||||
sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \
|
||||
/etc/lakehouse/legal_audit.token.new
|
||||
|
||||
# Sanity: confirm 44-char content + correct mode.
|
||||
sudo wc -c /etc/lakehouse/subject_audit.key.new /etc/lakehouse/legal_audit.token.new
|
||||
sudo ls -la /etc/lakehouse/*.new
|
||||
```
|
||||
|
||||
Both must be `mode 0400`, owned by root, exactly **44 chars** (the
|
||||
audit endpoint refuses tokens shorter than 32 chars at load — see
|
||||
`crates/catalogd/src/audit_endpoint.rs:73`).
|
||||
|
||||
### 3.2. Atomic swap
|
||||
|
||||
The gateway reads these files **once at boot** (per
|
||||
`crates/catalogd/src/audit_endpoint.rs::AuditEndpointState::new` and
|
||||
the equivalent for the writer). Atomic mv → restart is required.
|
||||
|
||||
```bash
|
||||
# Move the old credentials to a quarantine path with timestamp so the
|
||||
# old hashes remain identifiable post-rotation.
|
||||
TS=$(date -u +%Y%m%dT%H%M%SZ)
|
||||
sudo mkdir -p /etc/lakehouse/_archived
|
||||
sudo install -d -m 0700 -o root -g root /etc/lakehouse/_archived
|
||||
|
||||
sudo mv /etc/lakehouse/subject_audit.key /etc/lakehouse/_archived/subject_audit.key.$TS
|
||||
sudo mv /etc/lakehouse/legal_audit.token /etc/lakehouse/_archived/legal_audit.token.$TS
|
||||
|
||||
sudo mv /etc/lakehouse/subject_audit.key.new /etc/lakehouse/subject_audit.key
|
||||
sudo mv /etc/lakehouse/legal_audit.token.new /etc/lakehouse/legal_audit.token
|
||||
|
||||
sudo ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token
|
||||
```
|
||||
|
||||
### 3.3. Restart the gateway
|
||||
|
||||
```bash
|
||||
sudo systemctl restart lakehouse.service
|
||||
sleep 2
|
||||
sudo systemctl status lakehouse.service --no-pager | head -10
|
||||
```
|
||||
|
||||
Wait for the gateway to bind port 3100 cleanly. If it doesn't, check
|
||||
`journalctl -u lakehouse.service -n 50 --no-pager` for the failure
|
||||
mode — the most common cause is the new file having wrong mode/owner.
|
||||
|
||||
---
|
||||
|
||||
## 4. Post-rotation verification (5 minutes)
|
||||
|
||||
### 4.1. Health probes
|
||||
|
||||
```bash
|
||||
# Audit endpoint must be 200, not 503.
|
||||
curl -sf http://localhost:3100/audit/health
|
||||
# Expect: "audit endpoint ready"
|
||||
|
||||
# /v1/health must list the gateway's full provider set.
|
||||
curl -sf http://localhost:3100/v1/health | jq '.providers, .worker_count'
|
||||
```
|
||||
|
||||
### 4.2. Confirm the new token works
|
||||
|
||||
```bash
|
||||
NEW_TOKEN=$(cat /etc/lakehouse/legal_audit.token)
|
||||
curl -sS -o /dev/null -w '%{http_code}\n' \
|
||||
-H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
|
||||
http://localhost:3100/audit/subject/WORKER-100
|
||||
# Expect: 200
|
||||
```
|
||||
|
||||
If 401, the file the gateway loaded does NOT match the file you wrote.
|
||||
Check ownership / mode / for trailing whitespace differences with
|
||||
`hexdump -C /etc/lakehouse/legal_audit.token | head`.
|
||||
|
||||
### 4.3. Confirm the new chain works
|
||||
|
||||
Append-only chains are key-tied. Any *new* audit row written
|
||||
post-rotation is signed under the new key and verifies cleanly:
|
||||
|
||||
```bash
|
||||
# Issue a /v1/validate call against any worker — it spawns an audit row.
|
||||
curl -sf -X POST http://localhost:3100/v1/validate \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"mode":"fill","candidate_id":"WORKER-100","worker_id":"WORKER-100","fields":["exists"]}' >/dev/null
|
||||
|
||||
# Read the chain back. Last row must verify under the new key.
|
||||
curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
|
||||
http://localhost:3100/audit/subject/WORKER-100 \
|
||||
| jq '.audit_log | {verified: .chain_verified, rows: .chain_rows_total, last_kind: .rows[-1].accessor.kind}'
|
||||
```
|
||||
|
||||
`chain_verified: true` confirms the new key is signing + verifying.
|
||||
|
||||
### 4.4. Confirm pre-rotation chains tamper-detect (expected)
|
||||
|
||||
```bash
|
||||
curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
|
||||
http://localhost:3100/audit/subject/WORKER-2 \
|
||||
| jq '.audit_log | {verified: .chain_verified, error: .chain_verification_error}'
|
||||
```
|
||||
|
||||
For any subject whose chain was written under the old key, this
|
||||
returns `chain_verified: false` with an HMAC-mismatch error. **This
|
||||
is correct behavior**, not a bug. The old chain was correctly signed
|
||||
under the old key and verified under it; the new key cannot retroactively
|
||||
verify rows it didn't sign. The pre-rotation snapshot you captured in
|
||||
§2.2 is the defensible proof that those rows WERE valid pre-rotation.
|
||||
|
||||
If, instead, you see a chain that *should* verify post-rotation
|
||||
returning `verified: false`, that's the rotation having gone wrong —
|
||||
likely an old-key file that didn't get archived cleanly. Restore from
|
||||
`/etc/lakehouse/_archived/<ts>/`, then re-attempt.
|
||||
|
||||
---
|
||||
|
||||
## 5. Record the rotation event
|
||||
|
||||
Append a row to the rotation log:
|
||||
|
||||
```bash
|
||||
sudo tee -a /etc/lakehouse/_archived/rotation_log.jsonl <<EOF
|
||||
{"ts":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","operator":"<your name>","reason":"<scheduled|compromise|cred_loss|recovery>","old_key_sha256":"<hash from §2.1>","new_key_sha256":"$(sha256sum /etc/lakehouse/subject_audit.key | awk '{print $1}')","old_token_sha256":"<hash from §2.1>","new_token_sha256":"$(sha256sum /etc/lakehouse/legal_audit.token | awk '{print $1}')","witness":"<witness name or N/A for routine>"}
|
||||
EOF
|
||||
|
||||
sudo chmod 0600 /etc/lakehouse/_archived/rotation_log.jsonl
|
||||
sudo chown root:root /etc/lakehouse/_archived/rotation_log.jsonl
|
||||
```
|
||||
|
||||
This file is the operator-side record of when the key changed and why.
|
||||
It does NOT contain the key itself — only hashes — so it is safe to
|
||||
back up and share with counsel on request.
|
||||
|
||||
---
|
||||
|
||||
## 6. Recovery from a lost key
|
||||
|
||||
If the active `subject_audit.key` is destroyed (filesystem corruption,
|
||||
accidental delete, /tmp wipe per the 2026-05-05 incident), the gateway
|
||||
will fail-closed at startup:
|
||||
|
||||
- `/audit/subject/{id}` → 503 ("audit endpoint disabled (legal token missing)" or equivalent for the signing key)
|
||||
- `/biometric/subject/{id}/photo` → 503 (same fail-closed posture)
|
||||
|
||||
This is correct behavior — a server that cannot HMAC-sign new audit
|
||||
rows must not accept new biometric writes.
|
||||
|
||||
**Recovery is rotation.** Generate a new key per §3.1, atomic-swap
|
||||
per §3.2, restart per §3.3, verify per §4. Pre-loss chains tamper-detect
|
||||
under the new key (the old key is gone — there is no way to verify
|
||||
them). Treat the loss event as the BIPA-defensible boundary: pre-loss
|
||||
chain verification was provided by the working key; post-loss new
|
||||
chains are signed under the new key.
|
||||
|
||||
If a counsel-grade attestation of the pre-loss chains is needed, the
|
||||
`/etc/lakehouse/_archived/` folder contains the historical hashes;
|
||||
combined with the cross-runtime parity probe (Go reader gives the
|
||||
same byte-identical view as Rust), the chain history pre-loss is
|
||||
preservable as long as the on-disk JSONL files were not also lost.
|
||||
|
||||
---
|
||||
|
||||
## 7. ⚖ counsel notes
|
||||
|
||||
These are areas where counsel may want to opine before this runbook
|
||||
is formally adopted:
|
||||
|
||||
1. **Rotation cadence.** BIPA itself does not require periodic rotation;
|
||||
counsel may set a 90-day schedule to satisfy a separate compliance
|
||||
posture (SOC2, internal policy).
|
||||
2. **Custody of `/etc/lakehouse/_archived/`.** The archived hashes do
|
||||
NOT contain the keys, but the archived raw key files DO. Counsel
|
||||
may want a more aggressive destruction schedule for the raw archived
|
||||
keys — say 1 year — to reduce a long-tail compromise surface.
|
||||
3. **Notification obligations on rotation due to compromise.** §1
|
||||
triggers a rotation; §1 does not address whether candidates whose
|
||||
biometric data was protected by the compromised key must be notified.
|
||||
This is a counsel call.
|
||||
|
||||
---
|
||||
|
||||
## 8. Operator acknowledgment
|
||||
|
||||
| Operator | Date acknowledged | Signature |
|
||||
|---|---|---|
|
||||
| J | _____ | _______________ |
|
||||
| _____ | _____ | _______________ |
|
||||
|
||||
---
|
||||
|
||||
## 9. Change log
|
||||
|
||||
- 2026-05-05 — Initial runbook authored after the /tmp wipe incident
|
||||
on the same day (key was at `/tmp/subject_audit.key` and was deleted
|
||||
on reboot, disabling `/audit` + `/biometric` until the key was
|
||||
regenerated at `/etc/lakehouse/subject_audit.key`). Recovery of
|
||||
that incident produced a working procedure; this runbook captures
|
||||
it as the canonical playbook for any future rotation.
|
||||
@ -1,6 +1,8 @@
|
||||
# Gate 3b — Deepface Classification Integration (Design)
|
||||
|
||||
**Status:** Design draft — 2026-05-03 morning · **Companion to:** [`PHASE_1_6_BIPA_GATES.md`](../PHASE_1_6_BIPA_GATES.md) Gate 3 · **Depends on:** Gate 3a (photo upload) which is shipped (`f1fa6e4`)
|
||||
**Status:** **DECIDED 2026-05-05 — Option C (defer classifications)** · Original design draft 2026-05-03 morning · **Companion to:** [`PHASE_1_6_BIPA_GATES.md`](../PHASE_1_6_BIPA_GATES.md) Gate 3 · **Depends on:** Gate 3a (photo upload) which is shipped (`f1fa6e4`)
|
||||
|
||||
> **Decision summary (2026-05-05):** J accepted Option C. `BiometricCollection.classifications` remains `Option<JSON> = None` in v1. The consent template and retention schedule were revised the same day to remove all "automated facial-classification" language so the disclosed scope matches the implemented scope. If a real product requirement for classifications surfaces later, this doc's Option A (Python subprocess) or Option B (ONNX-in-Rust) is picked up under a v2 consent template + v2 retention schedule.
|
||||
|
||||
> **What this is.** Three options for how `BiometricCollection.classifications` (currently `Option<JSON>`, always `None`) gets populated by an automated facial-attribute classifier. Phase 1.6 Gate 3a ships the consent-gated upload + audit chain + transactional rollback; Gate 3b adds the classification step. The substrate is ready — what's missing is the design choice for HOW classification happens.
|
||||
>
|
||||
@ -152,6 +154,8 @@ Reasoning:
|
||||
|
||||
⚖ J — pick A / B / C. The substrate accommodates any choice; the cost is the design-doc → counsel-coordination → engineering loop, which differs by an order of magnitude across the options.
|
||||
|
||||
**[2026-05-05] J's decision: Option C.** Reasoning recorded in change log below. Consent + retention doc revisions for Option C shipped same day; counsel review of revised text is the remaining work.
|
||||
|
||||
---
|
||||
|
||||
## Open questions for J
|
||||
|
||||
263
scripts/staffing/biometric_destruction_report.sh
Executable file
263
scripts/staffing/biometric_destruction_report.sh
Executable file
@ -0,0 +1,263 @@
|
||||
#!/usr/bin/env bash
|
||||
# biometric_destruction_report — monthly destruction event aggregation.
|
||||
#
|
||||
# Specification: docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §5.
|
||||
# Spec: docs/PHASE_1_6_BIPA_GATES.md §1 Gate 5.
|
||||
#
|
||||
# Why this exists: counsel and operations review need a periodic
|
||||
# attestation that destructions have happened in a defensible cadence.
|
||||
# This script produces an anonymized monthly report aggregating
|
||||
# per-subject audit logs.
|
||||
#
|
||||
# Output is anonymized — counts, timings, scope/trigger breakdowns,
|
||||
# and chain attestations. Candidate IDs are hashed (sha256-prefix) so
|
||||
# the report can be shared with counsel without exposing identifiers.
|
||||
#
|
||||
# Usage:
|
||||
# biometric_destruction_report.sh \
|
||||
# [--month YYYY-MM] \
|
||||
# [--audit-dir data/_catalog/subjects] \
|
||||
# [--output reports/biometric/destruction_<month>.md]
|
||||
#
|
||||
# Defaults:
|
||||
# --month — current UTC month (YYYY-MM)
|
||||
# --audit-dir — data/_catalog/subjects
|
||||
# --output — reports/biometric/destruction_<month>.md
|
||||
#
|
||||
# Exit codes:
|
||||
# 0 — report written successfully (whether or not events were found)
|
||||
# 1 — report written but with anomalies that need review
|
||||
# 2 — script error (missing tools, unreadable audit dir)
|
||||
|
||||
set -uo pipefail
|
||||
cd "$(dirname "$0")/../.."
|
||||
|
||||
MONTH=""
|
||||
AUDIT_DIR="data/_catalog/subjects"
|
||||
OUT=""
|
||||
|
||||
while [ "$#" -gt 0 ]; do
|
||||
case "$1" in
|
||||
--month) MONTH="$2"; shift 2 ;;
|
||||
--audit-dir) AUDIT_DIR="$2"; shift 2 ;;
|
||||
--output) OUT="$2"; shift 2 ;;
|
||||
-h|--help)
|
||||
sed -n '2,30p' "$0" | sed 's/^# \?//'
|
||||
exit 0 ;;
|
||||
*) echo "unknown flag: $1" >&2; exit 2 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
# Default month = current UTC YYYY-MM. Validate format defensively
|
||||
# so a malformed --month value (e.g. "May 2026") doesn't silently
|
||||
# match nothing in the JSONL filter.
|
||||
if [ -z "$MONTH" ]; then
|
||||
MONTH=$(date -u +%Y-%m)
|
||||
fi
|
||||
if ! echo "$MONTH" | grep -qE '^[0-9]{4}-(0[1-9]|1[0-2])$'; then
|
||||
echo "[report] FAIL: --month must be YYYY-MM, got '$MONTH'" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
if [ -z "$OUT" ]; then
|
||||
OUT="reports/biometric/destruction_${MONTH}.md"
|
||||
fi
|
||||
|
||||
# Dependency gates.
|
||||
for cmd in jq sha256sum; do
|
||||
if ! command -v "$cmd" >/dev/null 2>&1; then
|
||||
echo "[report] FAIL: required tool '$cmd' not found in PATH" >&2
|
||||
exit 2
|
||||
fi
|
||||
done
|
||||
|
||||
if [ ! -d "$AUDIT_DIR" ]; then
|
||||
echo "[report] FAIL: audit dir not found at $AUDIT_DIR" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
mkdir -p "$(dirname "$OUT")"
|
||||
|
||||
# Aggregator storage.
|
||||
EVENTS=$(mktemp)
|
||||
ANOMALIES=$(mktemp)
|
||||
trap 'rm -f "$EVENTS" "$ANOMALIES"' EXIT
|
||||
|
||||
# Iterate every per-subject audit log under AUDIT_DIR. Each file is
|
||||
# JSONL — one row per line. We extract erasure rows in the requested
|
||||
# month + emit a normalized one-line record per event.
|
||||
TOTAL_FILES=0
|
||||
TOTAL_ROWS_SCANNED=0
|
||||
SHARDS_WITH_EVENTS=0
|
||||
|
||||
for f in "$AUDIT_DIR"/*.audit.jsonl; do
|
||||
[ -e "$f" ] || continue
|
||||
TOTAL_FILES=$((TOTAL_FILES + 1))
|
||||
|
||||
# File-level row count (cheap).
|
||||
ROWS=$(wc -l < "$f" 2>/dev/null || echo 0)
|
||||
TOTAL_ROWS_SCANNED=$((TOTAL_ROWS_SCANNED + ROWS))
|
||||
|
||||
# Filter rows for the month + erasure kinds.
|
||||
HAD_EVENT=0
|
||||
while IFS= read -r line; do
|
||||
[ -n "$line" ] || continue
|
||||
KIND=$(printf '%s' "$line" | jq -r '.accessor.kind // ""' 2>/dev/null || echo "")
|
||||
case "$KIND" in
|
||||
biometric_erasure|full_erasure) ;;
|
||||
*) continue ;;
|
||||
esac
|
||||
|
||||
TS=$(printf '%s' "$line" | jq -r '.ts // ""' 2>/dev/null || echo "")
|
||||
case "$TS" in
|
||||
"${MONTH}-"*) ;; # only this month
|
||||
*) continue ;;
|
||||
esac
|
||||
|
||||
HAD_EVENT=1
|
||||
CID=$(printf '%s' "$line" | jq -r '.candidate_id // ""' 2>/dev/null || echo "")
|
||||
PURPOSE=$(printf '%s' "$line" | jq -r '.accessor.purpose // ""' 2>/dev/null || echo "")
|
||||
RESULT=$(printf '%s' "$line" | jq -r '.result // ""' 2>/dev/null || echo "")
|
||||
# accessor.purpose has shape "trigger=<name>;..." per biometric_endpoint
|
||||
TRIGGER=$(printf '%s' "$PURPOSE" | sed -nE 's/.*trigger=([a-z_]+).*/\1/p')
|
||||
[ -n "$TRIGGER" ] || TRIGGER="unknown"
|
||||
|
||||
# Hash candidate_id so the report stays anonymized.
|
||||
CID_HASH=$(printf '%s' "$CID" | sha256sum | awk '{print substr($1,1,12)}')
|
||||
|
||||
# Anomaly: erasure row but result not in {erased, success}.
|
||||
case "$RESULT" in
|
||||
erased|success) ;;
|
||||
*)
|
||||
echo " - candidate_hash=$CID_HASH ts=$TS kind=$KIND result=$RESULT trigger=$TRIGGER (unexpected result)" >> "$ANOMALIES"
|
||||
;;
|
||||
esac
|
||||
|
||||
# Tab-separated event line: ts, kind, trigger, result, cid_hash
|
||||
printf '%s\t%s\t%s\t%s\t%s\n' "$TS" "$KIND" "$TRIGGER" "$RESULT" "$CID_HASH" >> "$EVENTS"
|
||||
done < "$f"
|
||||
|
||||
if [ "$HAD_EVENT" = "1" ]; then
|
||||
SHARDS_WITH_EVENTS=$((SHARDS_WITH_EVENTS + 1))
|
||||
fi
|
||||
done
|
||||
|
||||
EVENT_COUNT=$(wc -l < "$EVENTS" 2>/dev/null || echo 0)
|
||||
EVENT_COUNT=$(printf '%s' "$EVENT_COUNT" | tr -d '[:space:]')
|
||||
: "${EVENT_COUNT:=0}"
|
||||
|
||||
# Compute breakdowns.
|
||||
COUNT_BIOMETRIC_ONLY=0
|
||||
COUNT_FULL=0
|
||||
if [ "$EVENT_COUNT" != "0" ]; then
|
||||
COUNT_BIOMETRIC_ONLY=$(awk -F '\t' '$2=="biometric_erasure"' "$EVENTS" | wc -l | tr -d '[:space:]')
|
||||
COUNT_FULL=$(awk -F '\t' '$2=="full_erasure"' "$EVENTS" | wc -l | tr -d '[:space:]')
|
||||
fi
|
||||
|
||||
ANOMALY_COUNT=$(wc -l < "$ANOMALIES" 2>/dev/null || echo 0)
|
||||
ANOMALY_COUNT=$(printf '%s' "$ANOMALY_COUNT" | tr -d '[:space:]')
|
||||
: "${ANOMALY_COUNT:=0}"
|
||||
|
||||
# Render the report.
|
||||
GENERATED_AT=$(date -u +%Y-%m-%dT%H:%M:%SZ)
|
||||
|
||||
{
|
||||
echo "# Biometric Destruction Report — $MONTH"
|
||||
echo
|
||||
echo "**Generated:** $GENERATED_AT"
|
||||
echo "**Audit dir scanned:** \`$AUDIT_DIR\`"
|
||||
echo "**Spec:** docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §5"
|
||||
echo "**Generator:** scripts/staffing/biometric_destruction_report.sh"
|
||||
echo
|
||||
echo "## Scope"
|
||||
echo
|
||||
echo "- **Subject audit shards scanned:** $TOTAL_FILES"
|
||||
echo "- **Audit rows scanned (all kinds):** $TOTAL_ROWS_SCANNED"
|
||||
echo "- **Shards containing $MONTH erasure events:** $SHARDS_WITH_EVENTS"
|
||||
echo
|
||||
echo "## Destruction events in $MONTH"
|
||||
echo
|
||||
echo "- **Total events:** $EVENT_COUNT"
|
||||
echo "- **By scope:**"
|
||||
echo " - \`biometric_erasure\` (BiometricCollection cleared, manifest retained): $COUNT_BIOMETRIC_ONLY"
|
||||
echo " - \`full_erasure\` (manifest + biometric data cleared): $COUNT_FULL"
|
||||
echo
|
||||
|
||||
if [ "$EVENT_COUNT" = "0" ]; then
|
||||
echo "**No destruction events recorded for $MONTH.** This is correct"
|
||||
echo "for a month with no retention expiries / withdrawal requests"
|
||||
echo "/ RTBF requests / court orders."
|
||||
echo
|
||||
else
|
||||
echo "### By trigger"
|
||||
echo
|
||||
echo "| Trigger | Count |"
|
||||
echo "|---|---|"
|
||||
awk -F '\t' '{print $3}' "$EVENTS" | sort | uniq -c | \
|
||||
sort -rn | awk '{ printf("| %s | %d |\n", $2, $1); }'
|
||||
echo
|
||||
echo "### Event detail (anonymized)"
|
||||
echo
|
||||
echo "Candidate IDs are hashed (sha256-12-prefix) so this report can"
|
||||
echo "be shared with outside counsel without exposing identifiers."
|
||||
echo
|
||||
echo "| ts | kind | trigger | result | candidate_hash |"
|
||||
echo "|---|---|---|---|---|"
|
||||
sort -k1,1 "$EVENTS" | awk -F '\t' '{
|
||||
printf("| %s | %s | %s | %s | %s |\n", $1, $2, $3, $4, $5);
|
||||
}'
|
||||
echo
|
||||
fi
|
||||
|
||||
if [ "$ANOMALY_COUNT" != "0" ]; then
|
||||
echo "## Anomalies ($ANOMALY_COUNT)"
|
||||
echo
|
||||
echo "Events whose audit row deviates from expected shape (kind/result"
|
||||
echo "mismatch, missing trigger, etc.). These do NOT necessarily mean"
|
||||
echo "the destruction failed — the BIPA-load-bearing surface is the"
|
||||
echo "audit chain, which still verifies cryptographically. They are"
|
||||
echo "logged here so an operator can investigate and confirm."
|
||||
echo
|
||||
echo '```'
|
||||
cat "$ANOMALIES"
|
||||
echo '```'
|
||||
echo
|
||||
fi
|
||||
|
||||
echo "## Cryptographic attestation"
|
||||
echo
|
||||
echo "This report was produced by aggregating per-subject HMAC-chained"
|
||||
echo "audit logs. The chain itself is the BIPA-defensible substrate;"
|
||||
echo "this report is a derived view, not the chain of record. To verify"
|
||||
echo "any individual event, run:"
|
||||
echo
|
||||
echo '```bash'
|
||||
echo "./scripts/staffing/verify_biometric_erasure.sh <candidate_id>"
|
||||
echo '```'
|
||||
echo "(operator must un-hash the candidate ID through their own"
|
||||
echo " operator log to perform spot-checks)."
|
||||
echo
|
||||
echo "**Cross-runtime parity:** the same audit logs are byte-identical"
|
||||
echo "under Rust + Go (per scripts/cutover/parity/subject_audit_parity.sh)."
|
||||
echo "If counsel needs cross-runtime attestation, that probe provides it."
|
||||
echo
|
||||
|
||||
EVIDENCE_HASH=$(sha256sum "$EVENTS" 2>/dev/null | awk '{print $1}')
|
||||
: "${EVIDENCE_HASH:=$(echo -n '' | sha256sum | awk '{print $1}')}"
|
||||
echo "**Events SHA-256:** \`$EVIDENCE_HASH\`"
|
||||
echo
|
||||
echo "---"
|
||||
echo
|
||||
echo "**Operator (J):** _______________________________ Date: __________"
|
||||
echo
|
||||
} > "$OUT"
|
||||
|
||||
echo "[report] $EVENT_COUNT destruction events in $MONTH ($COUNT_BIOMETRIC_ONLY biometric_only, $COUNT_FULL full)"
|
||||
echo "[report] anomalies: $ANOMALY_COUNT"
|
||||
echo "[report] output: $OUT"
|
||||
|
||||
# Exit 1 if anomalies present (review needed) but report still written.
|
||||
if [ "$ANOMALY_COUNT" != "0" ]; then
|
||||
exit 1
|
||||
fi
|
||||
exit 0
|
||||
99
scripts/staffing/bundle_counsel_packet.sh
Executable file
99
scripts/staffing/bundle_counsel_packet.sh
Executable file
@ -0,0 +1,99 @@
|
||||
#!/usr/bin/env bash
|
||||
# bundle_counsel_packet — assemble the counsel-review packet tarball.
|
||||
#
|
||||
# Specification: docs/counsel/COUNSEL_REVIEW_PACKET_<DATE>.md §9.
|
||||
#
|
||||
# Why this exists: the cover note references a list of documents.
|
||||
# Counsel needs them as a single transmittable artifact, with per-file
|
||||
# integrity hashes so they can verify nothing changed in transit.
|
||||
#
|
||||
# Output:
|
||||
# reports/counsel/counsel_packet_<DATE>.tar.gz
|
||||
# reports/counsel/counsel_packet_<DATE>.manifest.txt (sha256 per file)
|
||||
#
|
||||
# Usage:
|
||||
# bundle_counsel_packet.sh [--date YYYY-MM-DD]
|
||||
#
|
||||
# Exit codes:
|
||||
# 0 — packet bundled successfully
|
||||
# 1 — one or more referenced documents are missing
|
||||
# 2 — script error (missing tools, write failure)
|
||||
|
||||
set -uo pipefail
|
||||
cd "$(dirname "$0")/../.."
|
||||
|
||||
DATE="$(date -u +%Y-%m-%d)"
|
||||
while [ "$#" -gt 0 ]; do
|
||||
case "$1" in
|
||||
--date) DATE="$2"; shift 2 ;;
|
||||
-h|--help)
|
||||
sed -n '2,20p' "$0" | sed 's/^# \?//'
|
||||
exit 0 ;;
|
||||
*) echo "unknown flag: $1" >&2; exit 2 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
# Dependency gate.
|
||||
for cmd in tar sha256sum; do
|
||||
if ! command -v "$cmd" >/dev/null 2>&1; then
|
||||
echo "[bundle] FAIL: required tool '$cmd' not found in PATH" >&2
|
||||
exit 2
|
||||
fi
|
||||
done
|
||||
|
||||
# Files in the packet. Order is the recommended counsel-review order
|
||||
# from the cover note §6.
|
||||
FILES=(
|
||||
"docs/counsel/COUNSEL_REVIEW_PACKET_${DATE}.md"
|
||||
"docs/policies/consent/biometric_retention_schedule_v1.md"
|
||||
"docs/policies/consent/biometric_consent_template_v1.md"
|
||||
"docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md"
|
||||
"docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md"
|
||||
"docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md"
|
||||
"docs/specs/GATE_3B_DEEPFACE_DESIGN.md"
|
||||
"docs/PHASE_1_6_BIPA_GATES.md"
|
||||
)
|
||||
|
||||
# Verify all referenced files exist before opening the tarball.
|
||||
MISSING=0
|
||||
for f in "${FILES[@]}"; do
|
||||
if [ ! -r "$f" ]; then
|
||||
echo "[bundle] MISSING: $f" >&2
|
||||
MISSING=$((MISSING + 1))
|
||||
fi
|
||||
done
|
||||
if [ "$MISSING" -gt 0 ]; then
|
||||
echo "[bundle] FAIL: $MISSING required documents missing — aborting" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
OUT_DIR="reports/counsel"
|
||||
mkdir -p "$OUT_DIR"
|
||||
|
||||
TARBALL="$OUT_DIR/counsel_packet_${DATE}.tar.gz"
|
||||
MANIFEST="$OUT_DIR/counsel_packet_${DATE}.manifest.txt"
|
||||
|
||||
# Build the manifest first — counsel uses this to verify integrity.
|
||||
{
|
||||
echo "# Counsel Packet Manifest — $DATE"
|
||||
echo "# Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
|
||||
echo "# Each file is listed with its SHA-256 hash. To verify on receipt:"
|
||||
echo "# tar xzf counsel_packet_${DATE}.tar.gz"
|
||||
echo "# sha256sum -c counsel_packet_${DATE}.manifest.txt"
|
||||
echo "# (re-format the lines below with two spaces between hash and path"
|
||||
echo "# for sha256sum -c compatibility — sha256sum's strict format)"
|
||||
echo
|
||||
for f in "${FILES[@]}"; do
|
||||
sha256sum "$f"
|
||||
done
|
||||
} > "$MANIFEST"
|
||||
|
||||
# Build the tarball — include the manifest itself.
|
||||
tar -czf "$TARBALL" "${FILES[@]}" "$MANIFEST"
|
||||
|
||||
PACKET_HASH=$(sha256sum "$TARBALL" | awk '{print $1}')
|
||||
|
||||
echo "[bundle] packet: $TARBALL"
|
||||
echo "[bundle] manifest: $MANIFEST"
|
||||
echo "[bundle] tarball SHA-256: $PACKET_HASH"
|
||||
echo "[bundle] files: ${#FILES[@]}"
|
||||
266
scripts/staffing/verify_biometric_erasure.sh
Executable file
266
scripts/staffing/verify_biometric_erasure.sh
Executable file
@ -0,0 +1,266 @@
|
||||
#!/usr/bin/env bash
|
||||
# verify_biometric_erasure — confirm that a biometric erasure completed cleanly.
|
||||
#
|
||||
# Specification: docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §3 step 3.
|
||||
# Spec: docs/PHASE_1_6_BIPA_GATES.md §1 Gate 5.
|
||||
#
|
||||
# Why this exists: when an operator runs the erasure curl call against
|
||||
# /biometric/subject/{id}/erase, they need a defensible artifact proving
|
||||
# destruction completed. This script produces that artifact by checking
|
||||
# four things:
|
||||
#
|
||||
# 1. SubjectManifest.biometric_collection is null (catalogd cleared the row)
|
||||
# 2. data/biometric/uploads/<safe_id>/ is empty or absent (photo file gone)
|
||||
# 3. Most recent audit row has accessor.kind in {biometric_erasure, full_erasure}
|
||||
# AND result is "erased" or "success" (the chain logged the erasure intent)
|
||||
# 4. audit_log.chain_verified is true (HMAC chain still intact end-to-end)
|
||||
#
|
||||
# All four must pass for an operator to mark the destruction complete.
|
||||
#
|
||||
# Usage:
|
||||
# verify_biometric_erasure.sh <candidate_id> [--from ISO] [--to ISO]
|
||||
#
|
||||
# Environment:
|
||||
# GATEWAY_URL — default http://localhost:3100
|
||||
# LEGAL_TOKEN_FILE — default /etc/lakehouse/legal_audit.token
|
||||
# UPLOADS_ROOT — default data/biometric/uploads (relative to repo root)
|
||||
# OUT_DIR — default reports/biometric (where the verification report lands)
|
||||
#
|
||||
# Exit codes:
|
||||
# 0 — all four checks pass; erasure verified
|
||||
# 1 — one or more checks failed; do NOT mark destruction complete; escalate
|
||||
# 2 — script error (missing tools, network failure, bad token)
|
||||
|
||||
set -uo pipefail
|
||||
cd "$(dirname "$0")/../.."
|
||||
|
||||
if [ "$#" -lt 1 ]; then
|
||||
echo "usage: verify_biometric_erasure.sh <candidate_id> [--from ISO] [--to ISO]" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
CANDIDATE_ID="$1"
|
||||
shift
|
||||
FROM=""
|
||||
TO=""
|
||||
while [ "$#" -gt 0 ]; do
|
||||
case "$1" in
|
||||
--from) FROM="$2"; shift 2 ;;
|
||||
--to) TO="$2"; shift 2 ;;
|
||||
*) echo "unknown flag: $1" >&2; exit 2 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
GATEWAY_URL="${GATEWAY_URL:-http://localhost:3100}"
|
||||
LEGAL_TOKEN_FILE="${LEGAL_TOKEN_FILE:-/etc/lakehouse/legal_audit.token}"
|
||||
UPLOADS_ROOT="${UPLOADS_ROOT:-data/biometric/uploads}"
|
||||
OUT_DIR="${OUT_DIR:-reports/biometric}"
|
||||
|
||||
# Dependency gates — fail fast with clear errors rather than producing
|
||||
# a misleading "evidence" file from missing tools.
|
||||
for cmd in curl jq sha256sum; do
|
||||
if ! command -v "$cmd" >/dev/null 2>&1; then
|
||||
echo "[verify] FAIL: required tool '$cmd' not found in PATH" >&2
|
||||
exit 2
|
||||
fi
|
||||
done
|
||||
|
||||
if [ ! -r "$LEGAL_TOKEN_FILE" ]; then
|
||||
echo "[verify] FAIL: cannot read legal token at $LEGAL_TOKEN_FILE" >&2
|
||||
echo "[verify] This script requires legal-tier auth to query /audit/subject/." >&2
|
||||
exit 2
|
||||
fi
|
||||
LEGAL_TOKEN=$(tr -d '[:space:]' < "$LEGAL_TOKEN_FILE")
|
||||
if [ -z "$LEGAL_TOKEN" ]; then
|
||||
echo "[verify] FAIL: legal token file is empty" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
# safe_id matches catalogd::biometric_endpoint::sanitize_for_path:
|
||||
# any non-[A-Za-z0-9_.-] char is replaced with underscore.
|
||||
SAFE_ID=$(printf '%s' "$CANDIDATE_ID" | sed 's/[^A-Za-z0-9_.\-]/_/g')
|
||||
|
||||
mkdir -p "$OUT_DIR"
|
||||
DATE=$(date -u +%Y-%m-%dT%H-%M-%SZ)
|
||||
OUT="$OUT_DIR/erasure_verify_${SAFE_ID}_${DATE}.md"
|
||||
EVIDENCE=$(mktemp)
|
||||
trap 'rm -f "$EVIDENCE"' EXIT
|
||||
|
||||
PASS=0
|
||||
FAIL=0
|
||||
note() { echo "$1" >> "$EVIDENCE"; }
|
||||
mark_pass() { PASS=$((PASS+1)); note " - PASS: $1"; }
|
||||
mark_fail() { FAIL=$((FAIL+1)); note " - FAIL: $1"; }
|
||||
|
||||
note "## Verification target"
|
||||
note ""
|
||||
note "- **candidate_id:** \`$CANDIDATE_ID\`"
|
||||
note "- **safe_id (filesystem):** \`$SAFE_ID\`"
|
||||
note "- **gateway:** \`$GATEWAY_URL\`"
|
||||
note "- **uploads root:** \`$UPLOADS_ROOT\`"
|
||||
note "- **window:** ${FROM:-unbounded} → ${TO:-unbounded}"
|
||||
note ""
|
||||
|
||||
# ── Fetch the audit response ────────────────────────────────────────
|
||||
QUERY=""
|
||||
if [ -n "$FROM" ]; then QUERY="from=$FROM"; fi
|
||||
if [ -n "$TO" ]; then
|
||||
if [ -n "$QUERY" ]; then QUERY="${QUERY}&to=$TO"; else QUERY="to=$TO"; fi
|
||||
fi
|
||||
URL="$GATEWAY_URL/audit/subject/$CANDIDATE_ID"
|
||||
if [ -n "$QUERY" ]; then URL="$URL?$QUERY"; fi
|
||||
|
||||
RESP_FILE=$(mktemp)
|
||||
HTTP_CODE=$(curl -sS -o "$RESP_FILE" -w '%{http_code}' \
|
||||
-H "X-Lakehouse-Legal-Token: $LEGAL_TOKEN" \
|
||||
-H "Accept: application/json" \
|
||||
"$URL" 2>&1) || HTTP_CODE="000"
|
||||
|
||||
if [ "$HTTP_CODE" != "200" ]; then
|
||||
echo "[verify] FAIL: GET $URL returned HTTP $HTTP_CODE" >&2
|
||||
echo "[verify] response head:" >&2
|
||||
head -c 500 "$RESP_FILE" >&2
|
||||
echo >&2
|
||||
rm -f "$RESP_FILE"
|
||||
exit 2
|
||||
fi
|
||||
|
||||
# Schema sanity — refuse to evaluate against an unrecognized response shape.
|
||||
SCHEMA=$(jq -r '.schema // ""' < "$RESP_FILE")
|
||||
if [ "$SCHEMA" != "subject_audit_response.v1" ]; then
|
||||
echo "[verify] FAIL: unexpected response schema '$SCHEMA' (want subject_audit_response.v1)" >&2
|
||||
rm -f "$RESP_FILE"
|
||||
exit 2
|
||||
fi
|
||||
|
||||
# ── Check 1: manifest.biometric_collection is null ──────────────────
|
||||
note "## Check 1 — Subject manifest biometric_collection is null"
|
||||
note ""
|
||||
BIO_COLL=$(jq -c '.manifest.biometric_collection // null' < "$RESP_FILE")
|
||||
note "**manifest.biometric_collection:** \`$BIO_COLL\`"
|
||||
note ""
|
||||
if [ "$BIO_COLL" = "null" ]; then
|
||||
mark_pass "biometric_collection field is null on the subject manifest"
|
||||
else
|
||||
mark_fail "biometric_collection is still populated — erasure incomplete"
|
||||
fi
|
||||
note ""
|
||||
|
||||
# ── Check 2: filesystem uploads dir is empty/absent ─────────────────
|
||||
note "## Check 2 — Quarantined upload directory empty or absent"
|
||||
note ""
|
||||
UPLOAD_DIR="$UPLOADS_ROOT/$SAFE_ID"
|
||||
note "**path:** \`$UPLOAD_DIR\`"
|
||||
if [ ! -e "$UPLOAD_DIR" ]; then
|
||||
note "**state:** absent (directory was removed during erasure or never existed)"
|
||||
note ""
|
||||
mark_pass "upload directory is absent"
|
||||
elif [ ! -d "$UPLOAD_DIR" ]; then
|
||||
note "**state:** path exists but is not a directory — investigate"
|
||||
note ""
|
||||
mark_fail "upload path exists and is not a directory: $UPLOAD_DIR"
|
||||
else
|
||||
REMAINING=$(find "$UPLOAD_DIR" -maxdepth 1 -mindepth 1 2>/dev/null | wc -l | tr -d '[:space:]')
|
||||
: "${REMAINING:=0}"
|
||||
note "**state:** directory exists with $REMAINING remaining entries"
|
||||
note ""
|
||||
if [ "$REMAINING" = "0" ]; then
|
||||
mark_pass "upload directory is empty (no residual photo files)"
|
||||
else
|
||||
mark_fail "$REMAINING file(s) remain under $UPLOAD_DIR — must be unlinked"
|
||||
note "### Residual files"
|
||||
note ""
|
||||
note '```'
|
||||
find "$UPLOAD_DIR" -maxdepth 2 >> "$EVIDENCE"
|
||||
note '```'
|
||||
note ""
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Check 3: most recent audit row reflects erasure ─────────────────
|
||||
note "## Check 3 — Audit log records the erasure event"
|
||||
note ""
|
||||
ROW_COUNT=$(jq '.audit_log.rows | length' < "$RESP_FILE")
|
||||
note "**rows in window:** $ROW_COUNT"
|
||||
if [ "$ROW_COUNT" = "0" ]; then
|
||||
mark_fail "no audit rows in the requested window — erasure should have appended one"
|
||||
note ""
|
||||
else
|
||||
LAST_KIND=$(jq -r '.audit_log.rows | last | .accessor.kind // ""' < "$RESP_FILE")
|
||||
LAST_RESULT=$(jq -r '.audit_log.rows | last | .result // ""' < "$RESP_FILE")
|
||||
LAST_TS=$(jq -r '.audit_log.rows | last | .ts // ""' < "$RESP_FILE")
|
||||
note "**last row:** ts=\`$LAST_TS\` accessor.kind=\`$LAST_KIND\` result=\`$LAST_RESULT\`"
|
||||
note ""
|
||||
case "$LAST_KIND" in
|
||||
biometric_erasure|full_erasure)
|
||||
case "$LAST_RESULT" in
|
||||
erased|success)
|
||||
mark_pass "last audit row is an erasure event ($LAST_KIND/$LAST_RESULT)"
|
||||
;;
|
||||
*)
|
||||
mark_fail "last row kind is $LAST_KIND but result is '$LAST_RESULT' (expected erased/success)"
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
*)
|
||||
mark_fail "last audit row accessor.kind is '$LAST_KIND' (expected biometric_erasure or full_erasure)"
|
||||
;;
|
||||
esac
|
||||
fi
|
||||
note ""
|
||||
|
||||
# ── Check 4: HMAC chain verifies end-to-end ─────────────────────────
|
||||
note "## Check 4 — HMAC chain integrity"
|
||||
note ""
|
||||
CHAIN_VERIFIED=$(jq -r '.audit_log.chain_verified' < "$RESP_FILE")
|
||||
CHAIN_ROOT=$(jq -r '.audit_log.chain_root // ""' < "$RESP_FILE")
|
||||
CHAIN_ROWS=$(jq -r '.audit_log.chain_rows_total // 0' < "$RESP_FILE")
|
||||
CHAIN_ERR=$(jq -r '.audit_log.chain_verification_error // ""' < "$RESP_FILE")
|
||||
note "**chain_verified:** \`$CHAIN_VERIFIED\`"
|
||||
note "**chain_rows_total:** $CHAIN_ROWS"
|
||||
note "**chain_root:** \`$CHAIN_ROOT\`"
|
||||
if [ -n "$CHAIN_ERR" ]; then
|
||||
note "**chain_verification_error:** \`$CHAIN_ERR\`"
|
||||
fi
|
||||
note ""
|
||||
if [ "$CHAIN_VERIFIED" = "true" ]; then
|
||||
mark_pass "chain verifies end-to-end ($CHAIN_ROWS rows)"
|
||||
else
|
||||
mark_fail "chain integrity broken — destruction is NOT defensible until investigated"
|
||||
fi
|
||||
note ""
|
||||
|
||||
# ── Render report ───────────────────────────────────────────────────
|
||||
TOTAL=$((PASS + FAIL))
|
||||
note "## Summary"
|
||||
note ""
|
||||
note "**$PASS / $TOTAL** verification checks pass."
|
||||
note ""
|
||||
if [ "$FAIL" -gt 0 ]; then
|
||||
note "**Status: ERASURE NOT VERIFIED.** Do NOT mark destruction complete. Escalate to engineering before responding to candidate / counsel."
|
||||
note ""
|
||||
fi
|
||||
|
||||
# Hash response body so the report has a tamper-evident anchor.
|
||||
RESP_HASH=$(sha256sum "$RESP_FILE" | awk '{print $1}')
|
||||
EVIDENCE_HASH=$(sha256sum "$EVIDENCE" | awk '{print $1}')
|
||||
|
||||
{
|
||||
echo "# Biometric Erasure Verification — $CANDIDATE_ID"
|
||||
echo
|
||||
echo "**Date:** $DATE"
|
||||
echo "**Spec:** docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §3 step 3"
|
||||
echo "**Generator:** scripts/staffing/verify_biometric_erasure.sh"
|
||||
echo
|
||||
cat "$EVIDENCE"
|
||||
echo "---"
|
||||
echo
|
||||
echo "**Audit response SHA-256:** \`$RESP_HASH\`"
|
||||
echo "**Evidence summary SHA-256:** \`$EVIDENCE_HASH\`"
|
||||
echo
|
||||
} > "$OUT"
|
||||
|
||||
rm -f "$RESP_FILE"
|
||||
echo "[verify] $PASS / $TOTAL checks pass — report: $OUT"
|
||||
echo "[verify] response hash: $RESP_HASH"
|
||||
[ "$FAIL" -eq 0 ]
|
||||
Loading…
x
Reference in New Issue
Block a user