# Legal-Tier Audit Key & Token Rotation Runbook **Spec companion:** `docs/PHASE_1_6_BIPA_GATES.md` §2 + `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` **Audience:** Operators with root on the gateway host (J + named operators) **Status:** Engineering-authored — ⚖ counsel review encouraged before formal adoption > This runbook covers rotation of the two crypto-credentials that gate > the Phase 1.6 audit substrate: > > 1. **`LH_SUBJECT_AUDIT_KEY`** — the 32-byte HMAC-SHA256 signing key > that chains every per-subject audit row. If this key changes, all > pre-rotation chain rows tamper-detect under the new key. That is > correct, expected, BIPA-defensible behavior — the chain integrity > it provided pre-rotation remains intact in the archive of the old > key, and post-rotation chains remain intact going forward. > > 2. **`LH_LEGAL_AUDIT_TOKEN`** — the 32+-character bearer token that > authorizes calls to `/audit/subject/{id}` and > `/biometric/subject/{id}/erase`. Rotation does NOT touch any audit > history; only access to the legal-tier endpoints flips. > > Both live at `/etc/lakehouse/` (mode 0400, owned by root) and are > loaded by the gateway via systemd `Environment=` directives in > `/etc/systemd/system/lakehouse.service.d/audit_env.conf`. They are > NOT loaded from `/tmp` — a 2026-05-05 reboot incident wiped a > `/tmp`-resident key and caused `/audit` + `/biometric` to fail-closed > (which is what they should do); the rotation fix moved them to the > persistent path. --- ## 1. When to rotate Rotate **immediately** when any of the following is true: | Trigger | Urgency | Notes | |---|---|---| | Suspected operator credential compromise | Within 1 hour | Token mismatch is fail-closed by default; immediate rotation closes the window. | | Operator with legal-tier access leaves the team | Within 24 hours | Treat as compromise. | | Key/token file's filesystem permissions were ever weakened (mode > 0400, group readable, etc.) | Within 24 hours | Filesystem audit may have leaked the bytes. | | Token was ever transmitted over an untrusted channel (printed in CI log, sent over SMS, etc.) | Within 24 hours | Same reasoning. | | Scheduled rotation (recommended) | Every 90 days | BIPA does not mandate a rotation cadence; counsel may set one. | Do **not** rotate when: - A subject's audit chain tamper-detects in isolation. That is normal if the audit log was edited (which would itself be the BIPA finding, not the key). Investigate the chain, not the key. - Cross-runtime parity drift appears. That's an HMAC-input-shape bug (Go vs Rust serialization), not a key issue. See `STATE_OF_PLAY.md` "three runtime-divergence classes" entry. --- ## 2. Pre-rotation checks (5 minutes) Before generating new credentials, capture a clean baseline so you can prove the rotation cause and sequence afterward. ### 2.1. Take the engineering snapshot ```bash # Confirm the canonical files exist with correct permissions. ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token # Hash the existing key + token (NEVER the bytes themselves) so the # old credential is identifiable in retrospect without storing it. sha256sum /etc/lakehouse/subject_audit.key sha256sum /etc/lakehouse/legal_audit.token # Confirm the gateway is currently using these files. sudo systemctl cat lakehouse.service | grep -E "Environment.*AUDIT" # Verify the audit endpoint is healthy with the current credentials. curl -sf http://localhost:3100/audit/health ``` If `/audit/health` is already 503, the rotation is **recovery**, not preventive — note this in the rotation event record (§5). ### 2.2. Capture a known-good chain root Pick one or two subjects with non-empty audit logs and record their chain roots **under the current key**: ```bash TOKEN=$(cat /etc/lakehouse/legal_audit.token) for cid in WORKER-2 WORKER-100; do curl -sf -H "X-Lakehouse-Legal-Token: $TOKEN" \ "http://localhost:3100/audit/subject/$cid" \ | jq '{cid: .candidate_id, verified: .audit_log.chain_verified, root: .audit_log.chain_root, rows: .audit_log.chain_rows_total}' done ``` Save the output. Post-rotation, those chains will tamper-detect under the new key — that is **expected** and the saved snapshot is the proof that the chain WAS intact under the old key, before rotation. --- ## 3. Generation + rotation ### 3.1. Generate the new key ```bash # 32 random bytes as hex = 64 chars. Either format works for HMAC-SHA256; # we follow the existing convention (44-char base64-ish with no padding). sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \ /etc/lakehouse/subject_audit.key.new sudo install -m 0400 -o root -g root <(openssl rand -base64 33 | tr -d '\n=' | head -c 44) \ /etc/lakehouse/legal_audit.token.new # Sanity: confirm 44-char content + correct mode. sudo wc -c /etc/lakehouse/subject_audit.key.new /etc/lakehouse/legal_audit.token.new sudo ls -la /etc/lakehouse/*.new ``` Both must be `mode 0400`, owned by root, exactly **44 chars** (the audit endpoint refuses tokens shorter than 32 chars at load — see `crates/catalogd/src/audit_endpoint.rs:73`). ### 3.2. Atomic swap The gateway reads these files **once at boot** (per `crates/catalogd/src/audit_endpoint.rs::AuditEndpointState::new` and the equivalent for the writer). Atomic mv → restart is required. ```bash # Move the old credentials to a quarantine path with timestamp so the # old hashes remain identifiable post-rotation. TS=$(date -u +%Y%m%dT%H%M%SZ) sudo mkdir -p /etc/lakehouse/_archived sudo install -d -m 0700 -o root -g root /etc/lakehouse/_archived sudo mv /etc/lakehouse/subject_audit.key /etc/lakehouse/_archived/subject_audit.key.$TS sudo mv /etc/lakehouse/legal_audit.token /etc/lakehouse/_archived/legal_audit.token.$TS sudo mv /etc/lakehouse/subject_audit.key.new /etc/lakehouse/subject_audit.key sudo mv /etc/lakehouse/legal_audit.token.new /etc/lakehouse/legal_audit.token sudo ls -la /etc/lakehouse/subject_audit.key /etc/lakehouse/legal_audit.token ``` ### 3.3. Restart the gateway ```bash sudo systemctl restart lakehouse.service sleep 2 sudo systemctl status lakehouse.service --no-pager | head -10 ``` Wait for the gateway to bind port 3100 cleanly. If it doesn't, check `journalctl -u lakehouse.service -n 50 --no-pager` for the failure mode — the most common cause is the new file having wrong mode/owner. --- ## 4. Post-rotation verification (5 minutes) ### 4.1. Health probes ```bash # Audit endpoint must be 200, not 503. curl -sf http://localhost:3100/audit/health # Expect: "audit endpoint ready" # /v1/health must list the gateway's full provider set. curl -sf http://localhost:3100/v1/health | jq '.providers, .worker_count' ``` ### 4.2. Confirm the new token works ```bash NEW_TOKEN=$(cat /etc/lakehouse/legal_audit.token) curl -sS -o /dev/null -w '%{http_code}\n' \ -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \ http://localhost:3100/audit/subject/WORKER-100 # Expect: 200 ``` If 401, the file the gateway loaded does NOT match the file you wrote. Check ownership / mode / for trailing whitespace differences with `hexdump -C /etc/lakehouse/legal_audit.token | head`. ### 4.3. Confirm the new chain works Append-only chains are key-tied. Any *new* audit row written post-rotation is signed under the new key and verifies cleanly: ```bash # Issue a /v1/validate call against any worker — it spawns an audit row. curl -sf -X POST http://localhost:3100/v1/validate \ -H 'Content-Type: application/json' \ -d '{"mode":"fill","candidate_id":"WORKER-100","worker_id":"WORKER-100","fields":["exists"]}' >/dev/null # Read the chain back. Last row must verify under the new key. curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \ http://localhost:3100/audit/subject/WORKER-100 \ | jq '.audit_log | {verified: .chain_verified, rows: .chain_rows_total, last_kind: .rows[-1].accessor.kind}' ``` `chain_verified: true` confirms the new key is signing + verifying. ### 4.4. Confirm pre-rotation chains tamper-detect (expected) ```bash curl -sf -H "X-Lakehouse-Legal-Token: $NEW_TOKEN" \ http://localhost:3100/audit/subject/WORKER-2 \ | jq '.audit_log | {verified: .chain_verified, error: .chain_verification_error}' ``` For any subject whose chain was written under the old key, this returns `chain_verified: false` with an HMAC-mismatch error. **This is correct behavior**, not a bug. The old chain was correctly signed under the old key and verified under it; the new key cannot retroactively verify rows it didn't sign. The pre-rotation snapshot you captured in §2.2 is the defensible proof that those rows WERE valid pre-rotation. If, instead, you see a chain that *should* verify post-rotation returning `verified: false`, that's the rotation having gone wrong — likely an old-key file that didn't get archived cleanly. Restore from `/etc/lakehouse/_archived//`, then re-attempt. --- ## 5. Record the rotation event Append a row to the rotation log: ```bash sudo tee -a /etc/lakehouse/_archived/rotation_log.jsonl <","reason":"","old_key_sha256":"","new_key_sha256":"$(sha256sum /etc/lakehouse/subject_audit.key | awk '{print $1}')","old_token_sha256":"","new_token_sha256":"$(sha256sum /etc/lakehouse/legal_audit.token | awk '{print $1}')","witness":""} EOF sudo chmod 0600 /etc/lakehouse/_archived/rotation_log.jsonl sudo chown root:root /etc/lakehouse/_archived/rotation_log.jsonl ``` This file is the operator-side record of when the key changed and why. It does NOT contain the key itself — only hashes — so it is safe to back up and share with counsel on request. --- ## 6. Recovery from a lost key If the active `subject_audit.key` is destroyed (filesystem corruption, accidental delete, /tmp wipe per the 2026-05-05 incident), the gateway will fail-closed at startup: - `/audit/subject/{id}` → 503 ("audit endpoint disabled (legal token missing)" or equivalent for the signing key) - `/biometric/subject/{id}/photo` → 503 (same fail-closed posture) This is correct behavior — a server that cannot HMAC-sign new audit rows must not accept new biometric writes. **Recovery is rotation.** Generate a new key per §3.1, atomic-swap per §3.2, restart per §3.3, verify per §4. Pre-loss chains tamper-detect under the new key (the old key is gone — there is no way to verify them). Treat the loss event as the BIPA-defensible boundary: pre-loss chain verification was provided by the working key; post-loss new chains are signed under the new key. If a counsel-grade attestation of the pre-loss chains is needed, the `/etc/lakehouse/_archived/` folder contains the historical hashes; combined with the cross-runtime parity probe (Go reader gives the same byte-identical view as Rust), the chain history pre-loss is preservable as long as the on-disk JSONL files were not also lost. --- ## 7. ⚖ counsel notes These are areas where counsel may want to opine before this runbook is formally adopted: 1. **Rotation cadence.** BIPA itself does not require periodic rotation; counsel may set a 90-day schedule to satisfy a separate compliance posture (SOC2, internal policy). 2. **Custody of `/etc/lakehouse/_archived/`.** The archived hashes do NOT contain the keys, but the archived raw key files DO. Counsel may want a more aggressive destruction schedule for the raw archived keys — say 1 year — to reduce a long-tail compromise surface. 3. **Notification obligations on rotation due to compromise.** §1 triggers a rotation; §1 does not address whether candidates whose biometric data was protected by the compromised key must be notified. This is a counsel call. --- ## 8. Operator acknowledgment | Operator | Date acknowledged | Signature | |---|---|---| | J | _____ | _______________ | | _____ | _____ | _______________ | --- ## 9. Change log - 2026-05-05 — Initial runbook authored after the /tmp wipe incident on the same day (key was at `/tmp/subject_audit.key` and was deleted on reboot, disabling `/audit` + `/biometric` until the key was regenerated at `/etc/lakehouse/subject_audit.key`). Recovery of that incident produced a working procedure; this runbook captures it as the canonical playbook for any future rotation.