Four threads landing together — all driven by the audit J asked for before
production cutover.
(1) Gate 3b DECIDED: Option C (defer classifications). `BiometricCollection.classifications`
stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status
flipped from "draft / awaits product" to DECIDED. Consent template + retention
schedule revised to remove all "automated facial-classification" / "deepface"
language so disclosed scope matches implemented scope.
(2) Endpoint-path drift reconciled across 3 docs. `PHASE_1_6_BIPA_GATES.md`,
`BIPA_DESTRUCTION_RUNBOOK.md`, and `biometric_retention_schedule_v1.md` had
references to legacy `/v1/identity/subjects/*` paths (proposed under a separate
identityd daemon, never shipped) — corrected to actual shipped routes
`/biometric/subject/*` (catalogd-local). Schema block in PHASE_1_6_BIPA_GATES
rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate
(not the proposed Postgres `subjects` table).
(3) New operational artifacts:
- `scripts/staffing/verify_biometric_erasure.sh` — checks 4 things post-erasure
(manifest cleared, uploads dir empty, audit row matches, chain verified).
Smoke-tested live against WORKER-2.
- `scripts/staffing/biometric_destruction_report.sh` — monthly anonymized
destruction-event aggregation. Smoke-tested clean.
- `scripts/staffing/bundle_counsel_packet.sh` — tarballs the counsel-review
packet with per-file SHA-256 manifest.
- `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` — formal rotation procedure
operationalized after the 2026-05-05 /tmp wipe incident.
- `docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md` — cover note bundling
all eng-staged BIPA docs for counsel review with per-doc questions, sign-off
checklist, recommended review sequence.
(4) Double-upload file leak fixed in `crates/catalogd/src/biometric_endpoint.rs`.
`verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo
file. Investigation showed the file was 13-byte test-fixture bytes (zero PII,
no biometric content); audit timeline showed two consecutive uploads followed
by one erasure — the second upload had silently overwritten manifest.data_path,
orphaning the first file. Patched `process_upload` to refuse a second upload
with HTTP 409 + `error: "biometric_already_collected"` when
`biometric_collection.is_some()` on the manifest. Operator must explicitly
POST `/biometric/subject/{id}/erase` first.
Tests: new `second_upload_without_erase_returns_409` (asserts 409 + manifest
pointer unchanged + first file untouched on disk). Replaced
`repeated_uploads_grow_the_chain` with `upload_erase_upload_grows_the_chain_cleanly`
(covers the legitimate re-collection cycle: chain grows to 3 rows). Updated
`content_type_with_parameters_accepted` to use 2 distinct subjects (was
using 1 subject with 2 uploads to test ct parsing — would now 409).
22/22 biometric_endpoint tests + 59/59 catalogd lib tests green post-patch.
Production posture: gateway needs `cargo build --release -p gateway` +
`systemctl restart lakehouse.service` to pick up the new 409 in live traffic.
Counsel calendar is now the only remaining blocker for first real-photo intake.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
309 lines
12 KiB
Markdown
309 lines
12 KiB
Markdown
# Legal-Tier Audit Key & Token Rotation Runbook
|
|
|
|
**Spec companion:** `docs/PHASE_1_6_BIPA_GATES.md` §2 + `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md`
|
|
**Audience:** Operators with root on the gateway host (J + named operators)
|
|
**Status:** Engineering-authored — ⚖ counsel review encouraged before formal adoption
|
|
|
|
> This runbook covers rotation of the two crypto-credentials that gate
|
|
> the Phase 1.6 audit substrate:
|
|
>
|
|
> 1. **`LH_SUBJECT_AUDIT_KEY`** — the 32-byte HMAC-SHA256 signing key
|
|
> that chains every per-subject audit row. If this key changes, all
|
|
> pre-rotation chain rows tamper-detect under the new key. That is
|
|
> correct, expected, BIPA-defensible behavior — the chain integrity
|
|
> it provided pre-rotation remains intact in the archive of the old
|
|
> key, and post-rotation chains remain intact going forward.
|
|
>
|
|
> 2. **`LH_LEGAL_AUDIT_TOKEN`** — the 32+-character bearer token that
|
|
> authorizes calls to `/audit/subject/{id}` and
|
|
> `/biometric/subject/{id}/erase`. Rotation does NOT touch any audit
|
|
> history; only access to the legal-tier endpoints flips.
|
|
>
|
|
> Both live at `/etc/lakehouse/` (mode 0400, owned by root) and are
|
|
> loaded by the gateway via systemd `Environment=` directives in
|
|
> `/etc/systemd/system/lakehouse.service.d/audit_env.conf`. They are
|
|
> NOT loaded from `/tmp` — a 2026-05-05 reboot incident wiped a
|
|
> `/tmp`-resident key and caused `/audit` + `/biometric` to fail-closed
|
|
> (which is what they should do); the rotation fix moved them to the
|
|
> persistent path.
|
|
|
|
---
|
|
|
|
## 1. When to rotate
|
|
|
|
Rotate **immediately** when any of the following is true:
|
|
|
|
| Trigger | Urgency | Notes |
|
|
|---|---|---|
|
|
| Suspected operator credential compromise | Within 1 hour | Token mismatch is fail-closed by default; immediate rotation closes the window. |
|
|
| Operator with legal-tier access leaves the team | Within 24 hours | Treat as compromise. |
|
|
| Key/token file's filesystem permissions were ever weakened (mode > 0400, group readable, etc.) | Within 24 hours | Filesystem audit may have leaked the bytes. |
|
|
| Token was ever transmitted over an untrusted channel (printed in CI log, sent over SMS, etc.) | Within 24 hours | Same reasoning. |
|
|
| Scheduled rotation (recommended) | Every 90 days | BIPA does not mandate a rotation cadence; counsel may set one. |
|
|
|
|
Do **not** rotate when:
|
|
|
|
- A subject's audit chain tamper-detects in isolation. That is normal
|
|
if the audit log was edited (which would itself be the BIPA finding,
|
|
not the key). Investigate the chain, not the key.
|
|
- Cross-runtime parity drift appears. That's an HMAC-input-shape bug
|
|
(Go vs Rust serialization), not a key issue. See
|
|
`STATE_OF_PLAY.md` "three runtime-divergence classes" entry.
|
|
|
|
---
|
|
|
|
## 2. Pre-rotation checks (5 minutes)
|
|
|
|
Before generating new credentials, capture a clean baseline so you can
|
|
prove the rotation cause and sequence afterward.
|
|
|
|
### 2.1. Take the engineering snapshot
|
|
|
|
```bash
|
|
# Confirm the canonical files exist with correct permissions.
|
|
ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token
|
|
|
|
# Hash the existing key + token (NEVER the bytes themselves) so the
|
|
# old credential is identifiable in retrospect without storing it.
|
|
sha256sum /etc/lakehouse/subject_audit.key
|
|
sha256sum /etc/lakehouse/legal_audit.token
|
|
|
|
# Confirm the gateway is currently using these files.
|
|
sudo systemctl cat lakehouse.service | grep -E "Environment.*AUDIT"
|
|
|
|
# Verify the audit endpoint is healthy with the current credentials.
|
|
curl -sf http://localhost:3100/audit/health
|
|
```
|
|
|
|
If `/audit/health` is already 503, the rotation is **recovery**, not
|
|
preventive — note this in the rotation event record (§5).
|
|
|
|
### 2.2. Capture a known-good chain root
|
|
|
|
Pick one or two subjects with non-empty audit logs and record their
|
|
chain roots **under the current key**:
|
|
|
|
```bash
|
|
TOKEN=$(cat /etc/lakehouse/legal_audit.token)
|
|
for cid in WORKER-2 WORKER-100; do
|
|
curl -sf -H "X-Lakehouse-Legal-Token: $TOKEN" \
|
|
"http://localhost:3100/audit/subject/$cid" \
|
|
| jq '{cid: .candidate_id, verified: .audit_log.chain_verified, root: .audit_log.chain_root, rows: .audit_log.chain_rows_total}'
|
|
done
|
|
```
|
|
|
|
Save the output. Post-rotation, those chains will tamper-detect under
|
|
the new key — that is **expected** and the saved snapshot is the proof
|
|
that the chain WAS intact under the old key, before rotation.
|
|
|
|
---
|
|
|
|
## 3. Generation + rotation
|
|
|
|
### 3.1. Generate the new key
|
|
|
|
```bash
|
|
# 32 random bytes as hex = 64 chars. Either format works for HMAC-SHA256;
|
|
# we follow the existing convention (44-char base64-ish with no padding).
|
|
sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \
|
|
/etc/lakehouse/subject_audit.key.new
|
|
|
|
sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \
|
|
/etc/lakehouse/legal_audit.token.new
|
|
|
|
# Sanity: confirm 44-char content + correct mode.
|
|
sudo wc -c /etc/lakehouse/subject_audit.key.new /etc/lakehouse/legal_audit.token.new
|
|
sudo ls -la /etc/lakehouse/*.new
|
|
```
|
|
|
|
Both must be `mode 0400`, owned by root, exactly **44 chars** (the
|
|
audit endpoint refuses tokens shorter than 32 chars at load — see
|
|
`crates/catalogd/src/audit_endpoint.rs:73`).
|
|
|
|
### 3.2. Atomic swap
|
|
|
|
The gateway reads these files **once at boot** (per
|
|
`crates/catalogd/src/audit_endpoint.rs::AuditEndpointState::new` and
|
|
the equivalent for the writer). Atomic mv → restart is required.
|
|
|
|
```bash
|
|
# Move the old credentials to a quarantine path with timestamp so the
|
|
# old hashes remain identifiable post-rotation.
|
|
TS=$(date -u +%Y%m%dT%H%M%SZ)
|
|
sudo mkdir -p /etc/lakehouse/_archived
|
|
sudo install -d -m 0700 -o root -g root /etc/lakehouse/_archived
|
|
|
|
sudo mv /etc/lakehouse/subject_audit.key /etc/lakehouse/_archived/subject_audit.key.$TS
|
|
sudo mv /etc/lakehouse/legal_audit.token /etc/lakehouse/_archived/legal_audit.token.$TS
|
|
|
|
sudo mv /etc/lakehouse/subject_audit.key.new /etc/lakehouse/subject_audit.key
|
|
sudo mv /etc/lakehouse/legal_audit.token.new /etc/lakehouse/legal_audit.token
|
|
|
|
sudo ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token
|
|
```
|
|
|
|
### 3.3. Restart the gateway
|
|
|
|
```bash
|
|
sudo systemctl restart lakehouse.service
|
|
sleep 2
|
|
sudo systemctl status lakehouse.service --no-pager | head -10
|
|
```
|
|
|
|
Wait for the gateway to bind port 3100 cleanly. If it doesn't, check
|
|
`journalctl -u lakehouse.service -n 50 --no-pager` for the failure
|
|
mode — the most common cause is the new file having wrong mode/owner.
|
|
|
|
---
|
|
|
|
## 4. Post-rotation verification (5 minutes)
|
|
|
|
### 4.1. Health probes
|
|
|
|
```bash
|
|
# Audit endpoint must be 200, not 503.
|
|
curl -sf http://localhost:3100/audit/health
|
|
# Expect: "audit endpoint ready"
|
|
|
|
# /v1/health must list the gateway's full provider set.
|
|
curl -sf http://localhost:3100/v1/health | jq '.providers, .worker_count'
|
|
```
|
|
|
|
### 4.2. Confirm the new token works
|
|
|
|
```bash
|
|
NEW_TOKEN=$(cat /etc/lakehouse/legal_audit.token)
|
|
curl -sS -o /dev/null -w '%{http_code}\n' \
|
|
-H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
|
|
http://localhost:3100/audit/subject/WORKER-100
|
|
# Expect: 200
|
|
```
|
|
|
|
If 401, the file the gateway loaded does NOT match the file you wrote.
|
|
Check ownership / mode / for trailing whitespace differences with
|
|
`hexdump -C /etc/lakehouse/legal_audit.token | head`.
|
|
|
|
### 4.3. Confirm the new chain works
|
|
|
|
Append-only chains are key-tied. Any *new* audit row written
|
|
post-rotation is signed under the new key and verifies cleanly:
|
|
|
|
```bash
|
|
# Issue a /v1/validate call against any worker — it spawns an audit row.
|
|
curl -sf -X POST http://localhost:3100/v1/validate \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"mode":"fill","candidate_id":"WORKER-100","worker_id":"WORKER-100","fields":["exists"]}' >/dev/null
|
|
|
|
# Read the chain back. Last row must verify under the new key.
|
|
curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
|
|
http://localhost:3100/audit/subject/WORKER-100 \
|
|
| jq '.audit_log | {verified: .chain_verified, rows: .chain_rows_total, last_kind: .rows[-1].accessor.kind}'
|
|
```
|
|
|
|
`chain_verified: true` confirms the new key is signing + verifying.
|
|
|
|
### 4.4. Confirm pre-rotation chains tamper-detect (expected)
|
|
|
|
```bash
|
|
curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \
|
|
http://localhost:3100/audit/subject/WORKER-2 \
|
|
| jq '.audit_log | {verified: .chain_verified, error: .chain_verification_error}'
|
|
```
|
|
|
|
For any subject whose chain was written under the old key, this
|
|
returns `chain_verified: false` with an HMAC-mismatch error. **This
|
|
is correct behavior**, not a bug. The old chain was correctly signed
|
|
under the old key and verified under it; the new key cannot retroactively
|
|
verify rows it didn't sign. The pre-rotation snapshot you captured in
|
|
§2.2 is the defensible proof that those rows WERE valid pre-rotation.
|
|
|
|
If, instead, you see a chain that *should* verify post-rotation
|
|
returning `verified: false`, that's the rotation having gone wrong —
|
|
likely an old-key file that didn't get archived cleanly. Restore from
|
|
`/etc/lakehouse/_archived/<ts>/`, then re-attempt.
|
|
|
|
---
|
|
|
|
## 5. Record the rotation event
|
|
|
|
Append a row to the rotation log:
|
|
|
|
```bash
|
|
sudo tee -a /etc/lakehouse/_archived/rotation_log.jsonl <<EOF
|
|
{"ts":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","operator":"<your name>","reason":"<scheduled|compromise|cred_loss|recovery>","old_key_sha256":"<hash from §2.1>","new_key_sha256":"$(sha256sum /etc/lakehouse/subject_audit.key | awk '{print $1}')","old_token_sha256":"<hash from §2.1>","new_token_sha256":"$(sha256sum /etc/lakehouse/legal_audit.token | awk '{print $1}')","witness":"<witness name or N/A for routine>"}
|
|
EOF
|
|
|
|
sudo chmod 0600 /etc/lakehouse/_archived/rotation_log.jsonl
|
|
sudo chown root:root /etc/lakehouse/_archived/rotation_log.jsonl
|
|
```
|
|
|
|
This file is the operator-side record of when the key changed and why.
|
|
It does NOT contain the key itself — only hashes — so it is safe to
|
|
back up and share with counsel on request.
|
|
|
|
---
|
|
|
|
## 6. Recovery from a lost key
|
|
|
|
If the active `subject_audit.key` is destroyed (filesystem corruption,
|
|
accidental delete, /tmp wipe per the 2026-05-05 incident), the gateway
|
|
will fail-closed at startup:
|
|
|
|
- `/audit/subject/{id}` → 503 ("audit endpoint disabled (legal token missing)" or equivalent for the signing key)
|
|
- `/biometric/subject/{id}/photo` → 503 (same fail-closed posture)
|
|
|
|
This is correct behavior — a server that cannot HMAC-sign new audit
|
|
rows must not accept new biometric writes.
|
|
|
|
**Recovery is rotation.** Generate a new key per §3.1, atomic-swap
|
|
per §3.2, restart per §3.3, verify per §4. Pre-loss chains tamper-detect
|
|
under the new key (the old key is gone — there is no way to verify
|
|
them). Treat the loss event as the BIPA-defensible boundary: pre-loss
|
|
chain verification was provided by the working key; post-loss new
|
|
chains are signed under the new key.
|
|
|
|
If a counsel-grade attestation of the pre-loss chains is needed, the
|
|
`/etc/lakehouse/_archived/` folder contains the historical hashes;
|
|
combined with the cross-runtime parity probe (Go reader gives the
|
|
same byte-identical view as Rust), the chain history pre-loss is
|
|
preservable as long as the on-disk JSONL files were not also lost.
|
|
|
|
---
|
|
|
|
## 7. ⚖ counsel notes
|
|
|
|
These are areas where counsel may want to opine before this runbook
|
|
is formally adopted:
|
|
|
|
1. **Rotation cadence.** BIPA itself does not require periodic rotation;
|
|
counsel may set a 90-day schedule to satisfy a separate compliance
|
|
posture (SOC2, internal policy).
|
|
2. **Custody of `/etc/lakehouse/_archived/`.** The archived hashes do
|
|
NOT contain the keys, but the archived raw key files DO. Counsel
|
|
may want a more aggressive destruction schedule for the raw archived
|
|
keys — say 1 year — to reduce a long-tail compromise surface.
|
|
3. **Notification obligations on rotation due to compromise.** §1
|
|
triggers a rotation; §1 does not address whether candidates whose
|
|
biometric data was protected by the compromised key must be notified.
|
|
This is a counsel call.
|
|
|
|
---
|
|
|
|
## 8. Operator acknowledgment
|
|
|
|
| Operator | Date acknowledged | Signature |
|
|
|---|---|---|
|
|
| J | _____ | _______________ |
|
|
| _____ | _____ | _______________ |
|
|
|
|
---
|
|
|
|
## 9. Change log
|
|
|
|
- 2026-05-05 — Initial runbook authored after the /tmp wipe incident
|
|
on the same day (key was at `/tmp/subject_audit.key` and was deleted
|
|
on reboot, disabling `/audit` + `/biometric` until the key was
|
|
regenerated at `/etc/lakehouse/subject_audit.key`). Recovery of
|
|
that incident produced a working procedure; this runbook captures
|
|
it as the canonical playbook for any future rotation.
|