Four threads landing together — all driven by the audit J asked for before
production cutover.
(1) Gate 3b DECIDED: Option C (defer classifications). `BiometricCollection.classifications`
stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status
flipped from "draft / awaits product" to DECIDED. Consent template + retention
schedule revised to remove all "automated facial-classification" / "deepface"
language so disclosed scope matches implemented scope.
(2) Endpoint-path drift reconciled across 3 docs. `PHASE_1_6_BIPA_GATES.md`,
`BIPA_DESTRUCTION_RUNBOOK.md`, and `biometric_retention_schedule_v1.md` had
references to legacy `/v1/identity/subjects/*` paths (proposed under a separate
identityd daemon, never shipped) — corrected to actual shipped routes
`/biometric/subject/*` (catalogd-local). Schema block in PHASE_1_6_BIPA_GATES
rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate
(not the proposed Postgres `subjects` table).
(3) New operational artifacts:
- `scripts/staffing/verify_biometric_erasure.sh` — checks 4 things post-erasure
(manifest cleared, uploads dir empty, audit row matches, chain verified).
Smoke-tested live against WORKER-2.
- `scripts/staffing/biometric_destruction_report.sh` — monthly anonymized
destruction-event aggregation. Smoke-tested clean.
- `scripts/staffing/bundle_counsel_packet.sh` — tarballs the counsel-review
packet with per-file SHA-256 manifest.
- `docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md` — formal rotation procedure
operationalized after the 2026-05-05 /tmp wipe incident.
- `docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md` — cover note bundling
all eng-staged BIPA docs for counsel review with per-doc questions, sign-off
checklist, recommended review sequence.
(4) Double-upload file leak fixed in `crates/catalogd/src/biometric_endpoint.rs`.
`verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo
file. Investigation showed the file was 13-byte test-fixture bytes (zero PII,
no biometric content); audit timeline showed two consecutive uploads followed
by one erasure — the second upload had silently overwritten manifest.data_path,
orphaning the first file. Patched `process_upload` to refuse a second upload
with HTTP 409 + `error: "biometric_already_collected"` when
`biometric_collection.is_some()` on the manifest. Operator must explicitly
POST `/biometric/subject/{id}/erase` first.
Tests: new `second_upload_without_erase_returns_409` (asserts 409 + manifest
pointer unchanged + first file untouched on disk). Replaced
`repeated_uploads_grow_the_chain` with `upload_erase_upload_grows_the_chain_cleanly`
(covers the legitimate re-collection cycle: chain grows to 3 rows). Updated
`content_type_with_parameters_accepted` to use 2 distinct subjects (was
using 1 subject with 2 uploads to test ct parsing — would now 409).
22/22 biometric_endpoint tests + 59/59 catalogd lib tests green post-patch.
Production posture: gateway needs `cargo build --release -p gateway` +
`systemctl restart lakehouse.service` to pick up the new 409 in live traffic.
Counsel calendar is now the only remaining blocker for first real-photo intake.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
267 lines
9.5 KiB
Bash
Executable File
267 lines
9.5 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# verify_biometric_erasure — confirm that a biometric erasure completed cleanly.
|
|
#
|
|
# Specification: docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §3 step 3.
|
|
# Spec: docs/PHASE_1_6_BIPA_GATES.md §1 Gate 5.
|
|
#
|
|
# Why this exists: when an operator runs the erasure curl call against
|
|
# /biometric/subject/{id}/erase, they need a defensible artifact proving
|
|
# destruction completed. This script produces that artifact by checking
|
|
# four things:
|
|
#
|
|
# 1. SubjectManifest.biometric_collection is null (catalogd cleared the row)
|
|
# 2. data/biometric/uploads/<safe_id>/ is empty or absent (photo file gone)
|
|
# 3. Most recent audit row has accessor.kind in {biometric_erasure, full_erasure}
|
|
# AND result is "erased" or "success" (the chain logged the erasure intent)
|
|
# 4. audit_log.chain_verified is true (HMAC chain still intact end-to-end)
|
|
#
|
|
# All four must pass for an operator to mark the destruction complete.
|
|
#
|
|
# Usage:
|
|
# verify_biometric_erasure.sh <candidate_id> [--from ISO] [--to ISO]
|
|
#
|
|
# Environment:
|
|
# GATEWAY_URL — default http://localhost:3100
|
|
# LEGAL_TOKEN_FILE — default /etc/lakehouse/legal_audit.token
|
|
# UPLOADS_ROOT — default data/biometric/uploads (relative to repo root)
|
|
# OUT_DIR — default reports/biometric (where the verification report lands)
|
|
#
|
|
# Exit codes:
|
|
# 0 — all four checks pass; erasure verified
|
|
# 1 — one or more checks failed; do NOT mark destruction complete; escalate
|
|
# 2 — script error (missing tools, network failure, bad token)
|
|
|
|
set -uo pipefail
|
|
cd "$(dirname "$0")/../.."
|
|
|
|
if [ "$#" -lt 1 ]; then
|
|
echo "usage: verify_biometric_erasure.sh <candidate_id> [--from ISO] [--to ISO]" >&2
|
|
exit 2
|
|
fi
|
|
|
|
CANDIDATE_ID="$1"
|
|
shift
|
|
FROM=""
|
|
TO=""
|
|
while [ "$#" -gt 0 ]; do
|
|
case "$1" in
|
|
--from) FROM="$2"; shift 2 ;;
|
|
--to) TO="$2"; shift 2 ;;
|
|
*) echo "unknown flag: $1" >&2; exit 2 ;;
|
|
esac
|
|
done
|
|
|
|
GATEWAY_URL="${GATEWAY_URL:-http://localhost:3100}"
|
|
LEGAL_TOKEN_FILE="${LEGAL_TOKEN_FILE:-/etc/lakehouse/legal_audit.token}"
|
|
UPLOADS_ROOT="${UPLOADS_ROOT:-data/biometric/uploads}"
|
|
OUT_DIR="${OUT_DIR:-reports/biometric}"
|
|
|
|
# Dependency gates — fail fast with clear errors rather than producing
|
|
# a misleading "evidence" file from missing tools.
|
|
for cmd in curl jq sha256sum; do
|
|
if ! command -v "$cmd" >/dev/null 2>&1; then
|
|
echo "[verify] FAIL: required tool '$cmd' not found in PATH" >&2
|
|
exit 2
|
|
fi
|
|
done
|
|
|
|
if [ ! -r "$LEGAL_TOKEN_FILE" ]; then
|
|
echo "[verify] FAIL: cannot read legal token at $LEGAL_TOKEN_FILE" >&2
|
|
echo "[verify] This script requires legal-tier auth to query /audit/subject/." >&2
|
|
exit 2
|
|
fi
|
|
LEGAL_TOKEN=$(tr -d '[:space:]' < "$LEGAL_TOKEN_FILE")
|
|
if [ -z "$LEGAL_TOKEN" ]; then
|
|
echo "[verify] FAIL: legal token file is empty" >&2
|
|
exit 2
|
|
fi
|
|
|
|
# safe_id matches catalogd::biometric_endpoint::sanitize_for_path:
|
|
# any non-[A-Za-z0-9_.-] char is replaced with underscore.
|
|
SAFE_ID=$(printf '%s' "$CANDIDATE_ID" | sed 's/[^A-Za-z0-9_.\-]/_/g')
|
|
|
|
mkdir -p "$OUT_DIR"
|
|
DATE=$(date -u +%Y-%m-%dT%H-%M-%SZ)
|
|
OUT="$OUT_DIR/erasure_verify_${SAFE_ID}_${DATE}.md"
|
|
EVIDENCE=$(mktemp)
|
|
trap 'rm -f "$EVIDENCE"' EXIT
|
|
|
|
PASS=0
|
|
FAIL=0
|
|
note() { echo "$1" >> "$EVIDENCE"; }
|
|
mark_pass() { PASS=$((PASS+1)); note " - PASS: $1"; }
|
|
mark_fail() { FAIL=$((FAIL+1)); note " - FAIL: $1"; }
|
|
|
|
note "## Verification target"
|
|
note ""
|
|
note "- **candidate_id:** \`$CANDIDATE_ID\`"
|
|
note "- **safe_id (filesystem):** \`$SAFE_ID\`"
|
|
note "- **gateway:** \`$GATEWAY_URL\`"
|
|
note "- **uploads root:** \`$UPLOADS_ROOT\`"
|
|
note "- **window:** ${FROM:-unbounded} → ${TO:-unbounded}"
|
|
note ""
|
|
|
|
# ── Fetch the audit response ────────────────────────────────────────
|
|
QUERY=""
|
|
if [ -n "$FROM" ]; then QUERY="from=$FROM"; fi
|
|
if [ -n "$TO" ]; then
|
|
if [ -n "$QUERY" ]; then QUERY="${QUERY}&to=$TO"; else QUERY="to=$TO"; fi
|
|
fi
|
|
URL="$GATEWAY_URL/audit/subject/$CANDIDATE_ID"
|
|
if [ -n "$QUERY" ]; then URL="$URL?$QUERY"; fi
|
|
|
|
RESP_FILE=$(mktemp)
|
|
HTTP_CODE=$(curl -sS -o "$RESP_FILE" -w '%{http_code}' \
|
|
-H "X-Lakehouse-Legal-Token: $LEGAL_TOKEN" \
|
|
-H "Accept: application/json" \
|
|
"$URL" 2>&1) || HTTP_CODE="000"
|
|
|
|
if [ "$HTTP_CODE" != "200" ]; then
|
|
echo "[verify] FAIL: GET $URL returned HTTP $HTTP_CODE" >&2
|
|
echo "[verify] response head:" >&2
|
|
head -c 500 "$RESP_FILE" >&2
|
|
echo >&2
|
|
rm -f "$RESP_FILE"
|
|
exit 2
|
|
fi
|
|
|
|
# Schema sanity — refuse to evaluate against an unrecognized response shape.
|
|
SCHEMA=$(jq -r '.schema // ""' < "$RESP_FILE")
|
|
if [ "$SCHEMA" != "subject_audit_response.v1" ]; then
|
|
echo "[verify] FAIL: unexpected response schema '$SCHEMA' (want subject_audit_response.v1)" >&2
|
|
rm -f "$RESP_FILE"
|
|
exit 2
|
|
fi
|
|
|
|
# ── Check 1: manifest.biometric_collection is null ──────────────────
|
|
note "## Check 1 — Subject manifest biometric_collection is null"
|
|
note ""
|
|
BIO_COLL=$(jq -c '.manifest.biometric_collection // null' < "$RESP_FILE")
|
|
note "**manifest.biometric_collection:** \`$BIO_COLL\`"
|
|
note ""
|
|
if [ "$BIO_COLL" = "null" ]; then
|
|
mark_pass "biometric_collection field is null on the subject manifest"
|
|
else
|
|
mark_fail "biometric_collection is still populated — erasure incomplete"
|
|
fi
|
|
note ""
|
|
|
|
# ── Check 2: filesystem uploads dir is empty/absent ─────────────────
|
|
note "## Check 2 — Quarantined upload directory empty or absent"
|
|
note ""
|
|
UPLOAD_DIR="$UPLOADS_ROOT/$SAFE_ID"
|
|
note "**path:** \`$UPLOAD_DIR\`"
|
|
if [ ! -e "$UPLOAD_DIR" ]; then
|
|
note "**state:** absent (directory was removed during erasure or never existed)"
|
|
note ""
|
|
mark_pass "upload directory is absent"
|
|
elif [ ! -d "$UPLOAD_DIR" ]; then
|
|
note "**state:** path exists but is not a directory — investigate"
|
|
note ""
|
|
mark_fail "upload path exists and is not a directory: $UPLOAD_DIR"
|
|
else
|
|
REMAINING=$(find "$UPLOAD_DIR" -maxdepth 1 -mindepth 1 2>/dev/null | wc -l | tr -d '[:space:]')
|
|
: "${REMAINING:=0}"
|
|
note "**state:** directory exists with $REMAINING remaining entries"
|
|
note ""
|
|
if [ "$REMAINING" = "0" ]; then
|
|
mark_pass "upload directory is empty (no residual photo files)"
|
|
else
|
|
mark_fail "$REMAINING file(s) remain under $UPLOAD_DIR — must be unlinked"
|
|
note "### Residual files"
|
|
note ""
|
|
note '```'
|
|
find "$UPLOAD_DIR" -maxdepth 2 >> "$EVIDENCE"
|
|
note '```'
|
|
note ""
|
|
fi
|
|
fi
|
|
|
|
# ── Check 3: most recent audit row reflects erasure ─────────────────
|
|
note "## Check 3 — Audit log records the erasure event"
|
|
note ""
|
|
ROW_COUNT=$(jq '.audit_log.rows | length' < "$RESP_FILE")
|
|
note "**rows in window:** $ROW_COUNT"
|
|
if [ "$ROW_COUNT" = "0" ]; then
|
|
mark_fail "no audit rows in the requested window — erasure should have appended one"
|
|
note ""
|
|
else
|
|
LAST_KIND=$(jq -r '.audit_log.rows | last | .accessor.kind // ""' < "$RESP_FILE")
|
|
LAST_RESULT=$(jq -r '.audit_log.rows | last | .result // ""' < "$RESP_FILE")
|
|
LAST_TS=$(jq -r '.audit_log.rows | last | .ts // ""' < "$RESP_FILE")
|
|
note "**last row:** ts=\`$LAST_TS\` accessor.kind=\`$LAST_KIND\` result=\`$LAST_RESULT\`"
|
|
note ""
|
|
case "$LAST_KIND" in
|
|
biometric_erasure|full_erasure)
|
|
case "$LAST_RESULT" in
|
|
erased|success)
|
|
mark_pass "last audit row is an erasure event ($LAST_KIND/$LAST_RESULT)"
|
|
;;
|
|
*)
|
|
mark_fail "last row kind is $LAST_KIND but result is '$LAST_RESULT' (expected erased/success)"
|
|
;;
|
|
esac
|
|
;;
|
|
*)
|
|
mark_fail "last audit row accessor.kind is '$LAST_KIND' (expected biometric_erasure or full_erasure)"
|
|
;;
|
|
esac
|
|
fi
|
|
note ""
|
|
|
|
# ── Check 4: HMAC chain verifies end-to-end ─────────────────────────
|
|
note "## Check 4 — HMAC chain integrity"
|
|
note ""
|
|
CHAIN_VERIFIED=$(jq -r '.audit_log.chain_verified' < "$RESP_FILE")
|
|
CHAIN_ROOT=$(jq -r '.audit_log.chain_root // ""' < "$RESP_FILE")
|
|
CHAIN_ROWS=$(jq -r '.audit_log.chain_rows_total // 0' < "$RESP_FILE")
|
|
CHAIN_ERR=$(jq -r '.audit_log.chain_verification_error // ""' < "$RESP_FILE")
|
|
note "**chain_verified:** \`$CHAIN_VERIFIED\`"
|
|
note "**chain_rows_total:** $CHAIN_ROWS"
|
|
note "**chain_root:** \`$CHAIN_ROOT\`"
|
|
if [ -n "$CHAIN_ERR" ]; then
|
|
note "**chain_verification_error:** \`$CHAIN_ERR\`"
|
|
fi
|
|
note ""
|
|
if [ "$CHAIN_VERIFIED" = "true" ]; then
|
|
mark_pass "chain verifies end-to-end ($CHAIN_ROWS rows)"
|
|
else
|
|
mark_fail "chain integrity broken — destruction is NOT defensible until investigated"
|
|
fi
|
|
note ""
|
|
|
|
# ── Render report ───────────────────────────────────────────────────
|
|
TOTAL=$((PASS + FAIL))
|
|
note "## Summary"
|
|
note ""
|
|
note "**$PASS / $TOTAL** verification checks pass."
|
|
note ""
|
|
if [ "$FAIL" -gt 0 ]; then
|
|
note "**Status: ERASURE NOT VERIFIED.** Do NOT mark destruction complete. Escalate to engineering before responding to candidate / counsel."
|
|
note ""
|
|
fi
|
|
|
|
# Hash response body so the report has a tamper-evident anchor.
|
|
RESP_HASH=$(sha256sum "$RESP_FILE" | awk '{print $1}')
|
|
EVIDENCE_HASH=$(sha256sum "$EVIDENCE" | awk '{print $1}')
|
|
|
|
{
|
|
echo "# Biometric Erasure Verification — $CANDIDATE_ID"
|
|
echo
|
|
echo "**Date:** $DATE"
|
|
echo "**Spec:** docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §3 step 3"
|
|
echo "**Generator:** scripts/staffing/verify_biometric_erasure.sh"
|
|
echo
|
|
cat "$EVIDENCE"
|
|
echo "---"
|
|
echo
|
|
echo "**Audit response SHA-256:** \`$RESP_HASH\`"
|
|
echo "**Evidence summary SHA-256:** \`$EVIDENCE_HASH\`"
|
|
echo
|
|
} > "$OUT"
|
|
|
|
rm -f "$RESP_FILE"
|
|
echo "[verify] $PASS / $TOTAL checks pass — report: $OUT"
|
|
echo "[verify] response hash: $RESP_HASH"
|
|
[ "$FAIL" -eq 0 ]
|