lakehouse/docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md
root b2c34b80b3 phase 1.6: lock Gate 3b = C, reconcile docs to shipped state, fix double-upload file leak
Four threads landing together — all driven by the audit J asked for before
production cutover.

(1) Gate 3b DECIDED: Option C (defer classifications). `BiometricCollection.classifications`
    stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status
    flipped from "draft / awaits product" to DECIDED. Consent template + retention
    schedule revised to remove all "automated facial-classification" / "deepface"
    language so disclosed scope matches implemented scope.

(2) Endpoint-path drift reconciled across 3 docs. `PHASE_1_6_BIPA_GATES.md`,
    `BIPA_DESTRUCTION_RUNBOOK.md`, and `biometric_retention_schedule_v1.md` had
    references to legacy `/v1/identity/subjects/*` paths (proposed under a separate
    identityd daemon, never shipped) — corrected to actual shipped routes
    `/biometric/subject/*` (catalogd-local). Schema block in PHASE_1_6_BIPA_GATES
    rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate
    (not the proposed Postgres `subjects` table).

(3) New operational artifacts:
    - `scripts/staffing/verify_biometric_erasure.sh` — checks 4 things post-erasure
      (manifest cleared, uploads dir empty, audit row matches, chain verified).
      Smoke-tested live against WORKER-2.
    - `scripts/staffing/biometric_destruction_report.sh` — monthly anonymized
      destruction-event aggregation. Smoke-tested clean.
    - `scripts/staffing/bundle_counsel_packet.sh` — tarballs the counsel-review
      packet with per-file SHA-256 manifest.
    - `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` — formal rotation procedure
      operationalized after the 2026-05-05 /tmp wipe incident.
    - `docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md` — cover note bundling
      all eng-staged BIPA docs for counsel review with per-doc questions, sign-off
      checklist, recommended review sequence.

(4) Double-upload file leak fixed in `crates/catalogd/src/biometric_endpoint.rs`.
    `verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo
    file. Investigation showed the file was 13-byte test-fixture bytes (zero PII,
    no biometric content); audit timeline showed two consecutive uploads followed
    by one erasure — the second upload had silently overwritten manifest.data_path,
    orphaning the first file. Patched `process_upload` to refuse a second upload
    with HTTP 409 + `error: "biometric_already_collected"` when
    `biometric_collection.is_some()` on the manifest. Operator must explicitly
    POST `/biometric/subject/{id}/erase` first.

    Tests: new `second_upload_without_erase_returns_409` (asserts 409 + manifest
    pointer unchanged + first file untouched on disk). Replaced
    `repeated_uploads_grow_the_chain` with `upload_erase_upload_grows_the_chain_cleanly`
    (covers the legitimate re-collection cycle: chain grows to 3 rows). Updated
    `content_type_with_parameters_accepted` to use 2 distinct subjects (was
    using 1 subject with 2 uploads to test ct parsing — would now 409).

    22/22 biometric_endpoint tests + 59/59 catalogd lib tests green post-patch.

Production posture: gateway needs `cargo build --release -p gateway` +
`systemctl restart lakehouse.service` to pick up the new 409 in live traffic.

Counsel calendar is now the only remaining blocker for first real-photo intake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:19:40 -05:00

8.8 KiB

BIPA Biometric Data Destruction Runbook

Spec: docs/PHASE_1_6_BIPA_GATES.md §1 Gate 5 (BIPA §15(a)) Audience: Operators (J + named operators with legal-tier credentials) Status: Engineering scaffold — ⚖ COUNSEL must review for legal sufficiency before adoption

This runbook tells an operator HOW to destroy biometric data when a destruction trigger fires. It is a procedural document, not a design document. The cryptographic substrate that the destruction writes against (per-subject HMAC audit log + tombstone manifests) already ships in crates/catalogd/.


1. When this runbook fires

Destruction is mandatory when ANY of the following occurs:

Trigger Source signal SLA
Retention expiry Daily retention_sweep flags consent.biometric.retention_until < now 30 days from sweep flagging
Consent withdrawal Candidate submits withdrawal per consent template §2 30 days from receipt
Right-to-be-forgotten request Candidate submits RTBF request through documented contact channel 30 days from receipt
Court-ordered erasure Legal counsel directs erasure via a documented order Per court order; default 30 days

⚖ COUNSEL — confirm 30 days is correct for all four. Some deployments have stricter contractual or jurisdictional clocks (CCPA: 45 days but sooner is better; GDPR Art. 17: "without undue delay").


2. Pre-destruction checks (5 minutes)

Before initiating destruction, the operator MUST:

  1. Verify the trigger. Cross-reference one of the four sources above. If the trigger is a candidate-initiated request, confirm identity per the standard PII verification procedure (knowledge factor + possession factor; see counsel for the threshold).

  2. Pull the current subject record. Hit GET /audit/subject/{candidate_id} with the legal-tier token. The response includes:

    • The current SubjectManifest (including consent.biometric.status)
    • The full HMAC-chained audit log
    • chain_verified: true (if false, STOP — chain integrity issue must be investigated before destruction)
  3. Check for legal hold. ⚖ COUNSEL — if a legal hold can apply to a subject's data (litigation, regulatory inquiry, subpoena), document the procedure for checking that no hold is in force before erasing.

  4. Get the second-operator sign-off. Per BIPA defensibility, destruction is a two-operator action (operator-of-record + one witness). The witness records their attestation in the destruction-event audit row (§4 below).


3. Destruction procedure

Step 1 — Erase via identityd

Invoke the legal-tier erasure endpoint:

curl -sf -X POST "http://localhost:3100/biometric/subject/${CANDIDATE_ID}/erase" \
  -H "Authorization: Bearer $(cat /etc/lakehouse/legal_audit.token)" \
  -H "Content-Type: application/json" \
  -d '{
    "scope": "biometric_only|full",
    "trigger": "retention_expiry|consent_withdrawal|rtbf|court_order",
    "trigger_evidence_path": "<path to signed artifact>",
    "operator_of_record": "<operator name>",
    "witness": "<witness name>"
  }'

The endpoint is shipped (commit 848a458, 21 unit tests). It is served from catalogd-local at /biometric/subject/{id}/erase (the original v1 spec proposed /v1/identity/subjects/{id}/erase under a separate identityd daemon — that daemon was collapsed into catalogd per the architecture pivot).

The endpoint exposes two scopes:

  • scope: "biometric_only" — clears BiometricCollection from the SubjectManifest (drops data_path, template_hash, and classifications together) + securely unlinks the quarantined photo file. Subject manifest itself remains. Use for retention expiry / consent withdrawal where only biometric data must go.
  • scope: "full" — full subject erasure (manifest + biometric files). Use for court-ordered erasure or full RTBF requests.

In both scopes, the audit row is appended BEFORE photo unlink so the chain has legal proof of intent even if the file delete fails (transactional rollback on audit failure).

Step 2 — Append the destruction-event audit row

The erasure endpoint AUTOMATICALLY writes one row to the subject's per-subject audit log:

{
  "schema": "subject_audit.v1",
  "ts": "<ISO-8601>",
  "candidate_id": "<id>",
  "accessor": {
    "kind": "biometric_erasure",
    "daemon": "identityd",
    "purpose": "biometric_erasure",
    "trace_id": "<X-Lakehouse-Trace-Id>"
  },
  "fields_accessed": ["biometric_classifications", "biometric_data_path", "biometric_template_hash"],
  "result": "erased",
  "prev_chain_hash": "<previous row hmac>",
  "row_hmac": "<new chain link>"
}

The HMAC chain extends through the erasure event, so the audit log itself is preserved as anonymous-event proof of compliant destruction even after the underlying biometric data is gone.

Step 3 — Verify destruction

Run the verification script:

./scripts/staffing/verify_biometric_erasure.sh "${CANDIDATE_ID}"

⚖ ENGINEERING — script TODO. Acceptance:

  • Subject row biometric fields are NULL
  • data/biometric/uploads/${CANDIDATE_ID}/ directory is empty
  • Most recent audit log row has result: "erased", accessor.kind: "biometric_erasure"
  • Chain still verifies (chain_verified: true) under the legal-tier endpoint

If any check fails: STOP, do not mark the destruction complete, escalate to engineering.

Step 4 — Notify the candidate (when applicable)

For consent-withdrawal and RTBF triggers, the operator notifies the candidate that destruction is complete. ⚖ COUNSEL — supply the notification template (typically email; medium and language are counsel-determined).


4. Backup window disclosure

Per IDENTITY_SERVICE_DESIGN.md v3-B12, biometric data may persist in encrypted system backups for up to 30 days after destruction (rolling backup window). The candidate must be informed of this when destruction is requested, and the destruction-event audit row records the backup-window expiry date so the operator knows when the residual is fully eliminated.

⚖ COUNSEL — confirm whether the 30-day backup window is acceptable under BIPA. Some interpretations require backups to be addressed within a shorter window; some accept the operational reality of backup retention.


5. Reporting cadence

Monthly, the operator-of-record produces a destruction-events report:

./scripts/staffing/biometric_destruction_report.sh \
  --month "$(date +%Y-%m)" \
  --output reports/biometric/destruction_$(date +%Y_%m).md

⚖ ENGINEERING — script TODO. The report aggregates:

  • Total destruction events in the month
  • Breakdown by trigger (retention / withdrawal / RTBF / court)
  • Median time-to-destruction from trigger to completion
  • Any failures / escalations

The monthly report is available to outside counsel on request. It does NOT include candidate-identifying details — only the counts, timings, and cryptographic attestations of the events.


6. Audit trail attestation

The per-subject HMAC chain is the cryptographic substrate that makes destructions defensible after the fact. To produce an attestation for a specific candidate's destruction:

  1. Hit GET /audit/subject/{candidate_id} with legal-tier token
  2. Confirm chain_verified: true and most-recent row has accessor.kind: "biometric_erasure"
  3. Cross-runtime verify: the same audit log is byte-identical under Rust + Go (per scripts/cutover/parity/subject_audit_parity.sh)
  4. Counsel signs an attestation referencing the audit log's chain root hash

The chain root hash is itself a tamper-evident anchor. A motivated insider would need the HMAC signing key (held in a separate location from the audit logs themselves, per the spec) AND the original log to forge a clean destruction record — and the cross-runtime parity probe would catch a forgery that touched only one runtime's view.


7. Operator acknowledgment

Operators with legal-tier credentials acknowledge they have read, understood, and will follow this runbook before being granted access to the legal_audit token.

Operator Date acknowledged Signature
J _____ _______________
_____ _____ _______________

⚖ COUNSEL — adopt this acknowledgment as the substrate for §3 of Phase 1.6 (employee training acknowledgment), or specify a separate training program.


8. Change log

  • 2026-05-05 — Endpoint path reconciled with shipped state: /v1/identity/subjects/{id}/erase (legacy proposal under a separate identityd daemon) → /biometric/subject/{id}/erase (catalogd-local, shipped 848a458). Step 1 manual-fallback block removed (the endpoint is no longer "TODO"). Two-scope body shape (biometric_only / full) documented to match the implementation.
  • 2026-05-03 — Initial scaffold. ⚖ COUNSEL review required before adoption.