Four threads landing together — all driven by the audit J asked for before
production cutover.
(1) Gate 3b DECIDED: Option C (defer classifications). `BiometricCollection.classifications`
stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status
flipped from "draft / awaits product" to DECIDED. Consent template + retention
schedule revised to remove all "automated facial-classification" / "deepface"
language so disclosed scope matches implemented scope.
(2) Endpoint-path drift reconciled across 3 docs. `PHASE_1_6_BIPA_GATES.md`,
`BIPA_DESTRUCTION_RUNBOOK.md`, and `biometric_retention_schedule_v1.md` had
references to legacy `/v1/identity/subjects/*` paths (proposed under a separate
identityd daemon, never shipped) — corrected to actual shipped routes
`/biometric/subject/*` (catalogd-local). Schema block in PHASE_1_6_BIPA_GATES
rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate
(not the proposed Postgres `subjects` table).
(3) New operational artifacts:
- `scripts/staffing/verify_biometric_erasure.sh` — checks 4 things post-erasure
(manifest cleared, uploads dir empty, audit row matches, chain verified).
Smoke-tested live against WORKER-2.
- `scripts/staffing/biometric_destruction_report.sh` — monthly anonymized
destruction-event aggregation. Smoke-tested clean.
- `scripts/staffing/bundle_counsel_packet.sh` — tarballs the counsel-review
packet with per-file SHA-256 manifest.
- `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` — formal rotation procedure
operationalized after the 2026-05-05 /tmp wipe incident.
- `docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md` — cover note bundling
all eng-staged BIPA docs for counsel review with per-doc questions, sign-off
checklist, recommended review sequence.
(4) Double-upload file leak fixed in `crates/catalogd/src/biometric_endpoint.rs`.
`verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo
file. Investigation showed the file was 13-byte test-fixture bytes (zero PII,
no biometric content); audit timeline showed two consecutive uploads followed
by one erasure — the second upload had silently overwritten manifest.data_path,
orphaning the first file. Patched `process_upload` to refuse a second upload
with HTTP 409 + `error: "biometric_already_collected"` when
`biometric_collection.is_some()` on the manifest. Operator must explicitly
POST `/biometric/subject/{id}/erase` first.
Tests: new `second_upload_without_erase_returns_409` (asserts 409 + manifest
pointer unchanged + first file untouched on disk). Replaced
`repeated_uploads_grow_the_chain` with `upload_erase_upload_grows_the_chain_cleanly`
(covers the legitimate re-collection cycle: chain grows to 3 rows). Updated
`content_type_with_parameters_accepted` to use 2 distinct subjects (was
using 1 subject with 2 uploads to test ct parsing — would now 409).
22/22 biometric_endpoint tests + 59/59 catalogd lib tests green post-patch.
Production posture: gateway needs `cargo build --release -p gateway` +
`systemctl restart lakehouse.service` to pick up the new 409 in live traffic.
Counsel calendar is now the only remaining blocker for first real-photo intake.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 KiB
Legal-Tier Audit Key & Token Rotation Runbook
Spec companion: docs/PHASE_1_6_BIPA_GATES.md §2 + docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md
Audience: Operators with root on the gateway host (J + named operators)
Status: Engineering-authored — ⚖ counsel review encouraged before formal adoption
This runbook covers rotation of the two crypto-credentials that gate the Phase 1.6 audit substrate:
LH_SUBJECT_AUDIT_KEY— the 32-byte HMAC-SHA256 signing key that chains every per-subject audit row. If this key changes, all pre-rotation chain rows tamper-detect under the new key. That is correct, expected, BIPA-defensible behavior — the chain integrity it provided pre-rotation remains intact in the archive of the old key, and post-rotation chains remain intact going forward.
LH_LEGAL_AUDIT_TOKEN— the 32+-character bearer token that authorizes calls to/audit/subject/{id}and/biometric/subject/{id}/erase. Rotation does NOT touch any audit history; only access to the legal-tier endpoints flips.Both live at
/etc/lakehouse/(mode 0400, owned by root) and are loaded by the gateway via systemdEnvironment=directives in/etc/systemd/system/lakehouse.service.d/audit_env.conf. They are NOT loaded from/tmp— a 2026-05-05 reboot incident wiped a/tmp-resident key and caused/audit+/biometricto fail-closed (which is what they should do); the rotation fix moved them to the persistent path.
1. When to rotate
Rotate immediately when any of the following is true:
| Trigger | Urgency | Notes |
|---|---|---|
| Suspected operator credential compromise | Within 1 hour | Token mismatch is fail-closed by default; immediate rotation closes the window. |
| Operator with legal-tier access leaves the team | Within 24 hours | Treat as compromise. |
| Key/token file's filesystem permissions were ever weakened (mode > 0400, group readable, etc.) | Within 24 hours | Filesystem audit may have leaked the bytes. |
| Token was ever transmitted over an untrusted channel (printed in CI log, sent over SMS, etc.) | Within 24 hours | Same reasoning. |
| Scheduled rotation (recommended) | Every 90 days | BIPA does not mandate a rotation cadence; counsel may set one. |
Do not rotate when:
- A subject's audit chain tamper-detects in isolation. That is normal if the audit log was edited (which would itself be the BIPA finding, not the key). Investigate the chain, not the key.
- Cross-runtime parity drift appears. That's an HMAC-input-shape bug
(Go vs Rust serialization), not a key issue. See
STATE_OF_PLAY.md"three runtime-divergence classes" entry.
2. Pre-rotation checks (5 minutes)
Before generating new credentials, capture a clean baseline so you can prove the rotation cause and sequence afterward.
2.1. Take the engineering snapshot
# Confirm the canonical files exist with correct permissions.
ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token
# Hash the existing key + token (NEVER the bytes themselves) so the
# old credential is identifiable in retrospect without storing it.
sha256sum /etc/lakehouse/subject_audit.key
sha256sum /etc/lakehouse/legal_audit.token
# Confirm the gateway is currently using these files.
sudo systemctl cat lakehouse.service | grep -E "Environment.*AUDIT"
# Verify the audit endpoint is healthy with the current credentials.
curl -sf http://localhost:3100/audit/health
If /audit/health is already 503, the rotation is recovery, not
preventive — note this in the rotation event record (§5).
2.2. Capture a known-good chain root
Pick one or two subjects with non-empty audit logs and record their chain roots under the current key:
TOKEN=$(cat /etc/lakehouse/legal_audit.token)
for cid in WORKER-2 WORKER-100; do
curl -sf -H "X-Lakehouse-Legal-Token: $TOKEN" \
"http://localhost:3100/audit/subject/$cid" \
| jq '{cid: .candidate_id, verified: .audit_log.chain_verified, root: .audit_log.chain_root, rows: .audit_log.chain_rows_total}'
done
Save the output. Post-rotation, those chains will tamper-detect under the new key — that is expected and the saved snapshot is the proof that the chain WAS intact under the old key, before rotation.
3. Generation + rotation
3.1. Generate the new key
# 32 random bytes as hex = 64 chars. Either format works for HMAC-SHA256;
# we follow the existing convention (44-char base64-ish with no padding).
sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \
/etc/lakehouse/subject_audit.key.new
sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \
/etc/lakehouse/legal_audit.token.new
# Sanity: confirm 44-char content + correct mode.
sudo wc -c /etc/lakehouse/subject_audit.key.new /etc/lakehouse/legal_audit.token.new
sudo ls -la /etc/lakehouse/*.new
Both must be mode 0400, owned by root, exactly 44 chars (the
audit endpoint refuses tokens shorter than 32 chars at load — see
crates/catalogd/src/audit_endpoint.rs:73).
3.2. Atomic swap
The gateway reads these files once at boot (per
crates/catalogd/src/audit_endpoint.rs::AuditEndpointState::new and
the equivalent for the writer). Atomic mv → restart is required.
# Move the old credentials to a quarantine path with timestamp so the
# old hashes remain identifiable post-rotation.
TS=$(date -u +%Y%m%dT%H%M%SZ)
sudo mkdir -p /etc/lakehouse/_archived
sudo install -d -m 0700 -o root -g root /etc/lakehouse/_archived
sudo mv /etc/lakehouse/subject_audit.key /etc/lakehouse/_archived/subject_audit.key.$TS
sudo mv /etc/lakehouse/legal_audit.token /etc/lakehouse/_archived/legal_audit.token.$TS
sudo mv /etc/lakehouse/subject_audit.key.new /etc/lakehouse/subject_audit.key
sudo mv /etc/lakehouse/legal_audit.token.new /etc/lakehouse/legal_audit.token
sudo ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token
3.3. Restart the gateway
sudo systemctl restart lakehouse.service
sleep 2
sudo systemctl status lakehouse.service --no-pager | head -10
Wait for the gateway to bind port 3100 cleanly. If it doesn't, check
journalctl -u lakehouse.service -n 50 --no-pager for the failure
mode — the most common cause is the new file having wrong mode/owner.
4. Post-rotation verification (5 minutes)
4.1. Health probes
# Audit endpoint must be 200, not 503.
curl -sf http://localhost:3100/audit/health
# Expect: "audit endpoint ready"
# /v1/health must list the gateway's full provider set.
curl -sf http://localhost:3100/v1/health | jq '.providers, .worker_count'
4.2. Confirm the new token works
NEW_TOKEN=$(cat /etc/lakehouse/legal_audit.token)
curl -sS -o /dev/null -w '%{http_code}\n' \
-H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
http://localhost:3100/audit/subject/WORKER-100
# Expect: 200
If 401, the file the gateway loaded does NOT match the file you wrote.
Check ownership / mode / for trailing whitespace differences with
hexdump -C /etc/lakehouse/legal_audit.token | head.
4.3. Confirm the new chain works
Append-only chains are key-tied. Any new audit row written post-rotation is signed under the new key and verifies cleanly:
# Issue a /v1/validate call against any worker — it spawns an audit row.
curl -sf -X POST http://localhost:3100/v1/validate \
-H 'Content-Type: application/json' \
-d '{"mode":"fill","candidate_id":"WORKER-100","worker_id":"WORKER-100","fields":["exists"]}' >/dev/null
# Read the chain back. Last row must verify under the new key.
curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
http://localhost:3100/audit/subject/WORKER-100 \
| jq '.audit_log | {verified: .chain_verified, rows: .chain_rows_total, last_kind: .rows[-1].accessor.kind}'
chain_verified: true confirms the new key is signing + verifying.
4.4. Confirm pre-rotation chains tamper-detect (expected)
curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
http://localhost:3100/audit/subject/WORKER-2 \
| jq '.audit_log | {verified: .chain_verified, error: .chain_verification_error}'
For any subject whose chain was written under the old key, this
returns chain_verified: false with an HMAC-mismatch error. This
is correct behavior, not a bug. The old chain was correctly signed
under the old key and verified under it; the new key cannot retroactively
verify rows it didn't sign. The pre-rotation snapshot you captured in
§2.2 is the defensible proof that those rows WERE valid pre-rotation.
If, instead, you see a chain that should verify post-rotation
returning verified: false, that's the rotation having gone wrong —
likely an old-key file that didn't get archived cleanly. Restore from
/etc/lakehouse/_archived/<ts>/, then re-attempt.
5. Record the rotation event
Append a row to the rotation log:
sudo tee -a /etc/lakehouse/_archived/rotation_log.jsonl <<EOF
{"ts":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","operator":"<your name>","reason":"<scheduled|compromise|cred_loss|recovery>","old_key_sha256":"<hash from §2.1>","new_key_sha256":"$(sha256sum /etc/lakehouse/subject_audit.key | awk '{print $1}')","old_token_sha256":"<hash from §2.1>","new_token_sha256":"$(sha256sum /etc/lakehouse/legal_audit.token | awk '{print $1}')","witness":"<witness name or N/A for routine>"}
EOF
sudo chmod 0600 /etc/lakehouse/_archived/rotation_log.jsonl
sudo chown root:root /etc/lakehouse/_archived/rotation_log.jsonl
This file is the operator-side record of when the key changed and why. It does NOT contain the key itself — only hashes — so it is safe to back up and share with counsel on request.
6. Recovery from a lost key
If the active subject_audit.key is destroyed (filesystem corruption,
accidental delete, /tmp wipe per the 2026-05-05 incident), the gateway
will fail-closed at startup:
/audit/subject/{id}→ 503 ("audit endpoint disabled (legal token missing)" or equivalent for the signing key)/biometric/subject/{id}/photo→ 503 (same fail-closed posture)
This is correct behavior — a server that cannot HMAC-sign new audit rows must not accept new biometric writes.
Recovery is rotation. Generate a new key per §3.1, atomic-swap per §3.2, restart per §3.3, verify per §4. Pre-loss chains tamper-detect under the new key (the old key is gone — there is no way to verify them). Treat the loss event as the BIPA-defensible boundary: pre-loss chain verification was provided by the working key; post-loss new chains are signed under the new key.
If a counsel-grade attestation of the pre-loss chains is needed, the
/etc/lakehouse/_archived/ folder contains the historical hashes;
combined with the cross-runtime parity probe (Go reader gives the
same byte-identical view as Rust), the chain history pre-loss is
preservable as long as the on-disk JSONL files were not also lost.
7. ⚖ counsel notes
These are areas where counsel may want to opine before this runbook is formally adopted:
- Rotation cadence. BIPA itself does not require periodic rotation; counsel may set a 90-day schedule to satisfy a separate compliance posture (SOC2, internal policy).
- Custody of
/etc/lakehouse/_archived/. The archived hashes do NOT contain the keys, but the archived raw key files DO. Counsel may want a more aggressive destruction schedule for the raw archived keys — say 1 year — to reduce a long-tail compromise surface. - Notification obligations on rotation due to compromise. §1 triggers a rotation; §1 does not address whether candidates whose biometric data was protected by the compromised key must be notified. This is a counsel call.
8. Operator acknowledgment
| Operator | Date acknowledged | Signature |
|---|---|---|
| J | _____ | _______________ |
| _____ | _____ | _______________ |
9. Change log
- 2026-05-05 — Initial runbook authored after the /tmp wipe incident
on the same day (key was at
/tmp/subject_audit.keyand was deleted on reboot, disabling/audit+/biometricuntil the key was regenerated at/etc/lakehouse/subject_audit.key). Recovery of that incident produced a working procedure; this runbook captures it as the canonical playbook for any future rotation.