lakehouse/scripts/staffing/subject_timeline.sh
root 87b034f5f9 phase 1.6: ops dashboard + consent_versions allowlist + subject timeline tool
Closes the afternoon's "all four" wave (per J's request to do all the
items in one pass instead of pick-one-of-options):

(1) Live demo on WORKER-100 — full lifecycle exercised end-to-end
    against the running gateway. 3 audit rows landed in correct
    order (consent_grant → biometric_collection →
    consent_withdrawal), chain_verified=true, photo on disk at
    data/biometric/uploads/WORKER-100/1778011967957907731_027b6bb1.jpg
    (180 bytes JFIF). retention_until=2026-06-04 (30d from
    withdrawal per consent template v1 §2).

(2) GET /biometric/stats — read-only aggregate over all subjects.
    Returns counts by biometric.status + subject.status, photo
    count, oldest_active_retention_until, and the last 20
    state-change events (consent_grant / collection / withdrawal /
    erasure — validator_lookup and other noise filtered out).
    Walks per-subject audit logs via the existing writer; cheap
    for 100 subjects, would want an event-stream index at 100k.
    Legal-tier auth (same posture as /audit). 4 unit tests.

(3) /biometric/dashboard mcp-server frontend. Auto-refreshes
    /biometric/stats every 15s, neo-brutalist tile layout for
    the per-status counts + retention horizon block + recent
    events table with kind badges + event-kind breakdown pills.
    sessionStorage-backed token; logout button clears state.
    DOM-built throughout (textContent + createElement) — never
    innerHTML on audit-row values, since trace_id et al. could
    in theory carry operator-supplied strings.

(4) consent_versions allowlist. BiometricEndpointState gains
    `allowed_consent_versions: Option<Arc<HashSet<String>>>`,
    loaded at startup from /etc/lakehouse/consent_versions.json
    (override via LH_CONSENT_VERSIONS_FILE). process_consent
    refuses unknown hashes with HTTP 400 consent_version_unknown
    when configured. Resolution semantics:
      - Missing file → permissive (v1 compat, warn-log)
      - Parse error → permissive (error-log; broken config
        silently going strict would be worse)
      - Empty array → strict, refuse all (deliberate freeze
        mode for "counsel hasn't signed v1 yet")
      - Populated → strict, lowercase-normalized comparison
    5 unit tests (known/unknown/case/empty/none-permissive).
    Example template at ops/consent_versions.example.json with
    a counsel-tier deployment note.

(5) scripts/staffing/subject_timeline.sh — operator one-shot
    pretty-print of any subject's full BIPA lifecycle. Curls
    /audit/subject/{id} with legal token; renders manifest
    summary + on-disk photo state + chronological audit chain
    with kind badges + chain verification status. Smoke-tested
    on WORKER-100 (3 rows verified).

(6) STATE_OF_PLAY.md refresh. New section "afternoon wave"
    captures all four commits (76cb5ac, 7f0f500, 68d226c, this
    one) + the live demo evidence + the v1 endpoint matrix +
    UI/CLI inventory + the production-cutover blocking set
    (counsel calendar only — eng substrate is done).

Verified live post-restart:
- /audit/health + /biometric/health both 200
- /biometric/stats returns 100 subjects, 2 withdrawn (WORKER-2 from
  earlier scrum + WORKER-100 from today's demo), 1 photo on record,
  6 recent state-change events
- /biometric/intake + /biometric/withdraw + /biometric/dashboard
  all 200 on mcp-server :3700
- subject_timeline.sh on WORKER-100: chain_verified=true,
  chain_root=a47563ff937d50de…
- 88/88 catalogd lib tests + 55/55 biometric_endpoint tests green

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:27:52 -05:00

152 lines
6.5 KiB
Bash
Executable File

#!/usr/bin/env bash
# subject_timeline — pretty-print a subject's full BIPA lifecycle.
#
# Specification: docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md §6
# + docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §3.
#
# Why this exists: when an operator gets a question like "what
# happened to candidate X's biometric data" — counsel inquiry,
# subject access request, or just routine triage — they need a
# one-shot view of the full lineage. /audit/subject/{id} returns
# the raw JSON; this wraps it in a human-readable timeline.
#
# Output:
# - Manifest summary (status, biometric status, retention_until)
# - Audit chain (chronological, kind + result + ts + hmac prefix)
# - Chain verification status (HMAC chain integrity)
# - On-disk photo presence + size if applicable
#
# Usage:
# subject_timeline.sh <candidate_id>
#
# Environment:
# GATEWAY_URL — default http://localhost:3100
# LEGAL_TOKEN_FILE — default /etc/lakehouse/legal_audit.token
# UPLOADS_ROOT — default data/biometric/uploads (relative to repo)
#
# Exit codes:
# 0 — timeline printed (chain may or may not verify; that's a fact, not a script error)
# 1 — chain verification failed (still prints, but flagged)
# 2 — script error (missing tools, network failure, bad token, subject not found)
set -uo pipefail
cd "$(dirname "$0")/../.."
if [ "$#" -lt 1 ]; then
echo "usage: subject_timeline.sh <candidate_id>" >&2
exit 2
fi
CANDIDATE_ID="$1"
GATEWAY_URL="${GATEWAY_URL:-http://localhost:3100}"
LEGAL_TOKEN_FILE="${LEGAL_TOKEN_FILE:-/etc/lakehouse/legal_audit.token}"
UPLOADS_ROOT="${UPLOADS_ROOT:-data/biometric/uploads}"
for cmd in curl jq; do
if ! command -v "$cmd" >/dev/null 2>&1; then
echo "[timeline] FAIL: required tool '$cmd' not found" >&2
exit 2
fi
done
if [ ! -r "$LEGAL_TOKEN_FILE" ]; then
echo "[timeline] FAIL: cannot read legal token at $LEGAL_TOKEN_FILE" >&2
exit 2
fi
LEGAL_TOKEN=$(tr -d '[:space:]' < "$LEGAL_TOKEN_FILE")
[ -n "$LEGAL_TOKEN" ] || { echo "[timeline] FAIL: legal token file is empty" >&2; exit 2; }
# safe_id matches catalogd::biometric_endpoint::sanitize_for_path
SAFE_ID=$(printf '%s' "$CANDIDATE_ID" | sed 's/[^A-Za-z0-9_.\-]/_/g')
RESP_FILE=$(mktemp)
trap 'rm -f "$RESP_FILE"' EXIT
HTTP_CODE=$(curl -sS -o "$RESP_FILE" -w '%{http_code}' \
-H "X-Lakehouse-Legal-Token: $LEGAL_TOKEN" \
-H "Accept: application/json" \
"$GATEWAY_URL/audit/subject/$CANDIDATE_ID")
if [ "$HTTP_CODE" != "200" ]; then
echo "[timeline] FAIL: GET /audit/subject/$CANDIDATE_ID returned HTTP $HTTP_CODE" >&2
echo "[timeline] response:" >&2
cat "$RESP_FILE" >&2
echo >&2
exit 2
fi
# ── Header ──────────────────────────────────────────────────────────
printf '\n'
printf '═══ Subject Timeline — %s ═══\n' "$CANDIDATE_ID"
printf '\n'
# ── Manifest summary ───────────────────────────────────────────────
printf 'Manifest\n'
printf ' candidate_id : %s\n' "$(jq -r '.manifest.candidate_id' < "$RESP_FILE")"
printf ' subject status : %s\n' "$(jq -r '.manifest.status' < "$RESP_FILE")"
printf ' vertical : %s\n' "$(jq -r '.manifest.vertical' < "$RESP_FILE")"
printf ' general_pii : %s (until %s)\n' \
"$(jq -r '.manifest.consent.general_pii.status' < "$RESP_FILE")" \
"$(jq -r '.manifest.retention.general_pii_until' < "$RESP_FILE")"
printf ' biometric : %s\n' "$(jq -r '.manifest.consent.biometric.status' < "$RESP_FILE")"
RET=$(jq -r '.manifest.consent.biometric.retention_until // "—"' < "$RESP_FILE")
printf ' biometric retent. : %s\n' "$RET"
BC_PRESENT=$(jq -r '.manifest.biometric_collection != null' < "$RESP_FILE")
if [ "$BC_PRESENT" = "true" ]; then
printf ' photo data_path : %s\n' "$(jq -r '.manifest.biometric_collection.data_path' < "$RESP_FILE")"
printf ' photo template : %s\n' "$(jq -r '.manifest.biometric_collection.template_hash' < "$RESP_FILE")"
printf ' photo collected : %s\n' "$(jq -r '.manifest.biometric_collection.collected_at' < "$RESP_FILE")"
printf ' consent_ver_hash : %s\n' "$(jq -r '.manifest.biometric_collection.consent_version_hash' < "$RESP_FILE")"
fi
# ── On-disk photo state ────────────────────────────────────────────
printf '\nOn disk\n'
PHOTO_DIR="$UPLOADS_ROOT/$SAFE_ID"
if [ -d "$PHOTO_DIR" ]; then
COUNT=$(find "$PHOTO_DIR" -maxdepth 1 -type f 2>/dev/null | wc -l | tr -d '[:space:]')
printf ' uploads dir : %s (%s file(s))\n' "$PHOTO_DIR" "${COUNT:-0}"
if [ "${COUNT:-0}" != "0" ]; then
while IFS= read -r f; do
printf ' - %s (%s bytes)\n' "$f" "$(stat -c '%s' "$f" 2>/dev/null || echo '?')"
done < <(find "$PHOTO_DIR" -maxdepth 1 -type f 2>/dev/null)
fi
else
printf ' uploads dir : %s (absent)\n' "$PHOTO_DIR"
fi
# ── Audit chain ────────────────────────────────────────────────────
printf '\nAudit chain\n'
ROWS_TOTAL=$(jq -r '.audit_log.chain_rows_total' < "$RESP_FILE")
VERIFIED=$(jq -r '.audit_log.chain_verified' < "$RESP_FILE")
ROOT=$(jq -r '.audit_log.chain_root // "—"' < "$RESP_FILE")
ERROR=$(jq -r '.audit_log.chain_verification_error // ""' < "$RESP_FILE")
printf ' rows total : %s\n' "$ROWS_TOTAL"
printf ' verified : %s\n' "$VERIFIED"
printf ' chain root (last) : %s\n' "$ROOT"
if [ -n "$ERROR" ] && [ "$ERROR" != "null" ]; then
printf ' verification err : %s\n' "$ERROR"
fi
if [ "$ROWS_TOTAL" != "0" ]; then
printf '\n events (chronological):\n'
jq -r '
.audit_log.rows
| sort_by(.ts)
| .[]
| " \(.ts) | \(.accessor.kind | ascii_upcase) | result=\(.result) | hmac=\(.row_hmac[0:16])… | trace=\(.accessor.trace_id // "—")"
' < "$RESP_FILE"
fi
# ── Footer ─────────────────────────────────────────────────────────
printf '\n'
if [ "$VERIFIED" = "true" ]; then
printf 'Status: chain verified end-to-end.\n'
printf '\n'
exit 0
else
printf 'Status: CHAIN VERIFICATION FAILED. Investigate before quoting this timeline\n'
printf ' in any external response. Likely causes: post-rotation legacy chain\n'
printf ' (expected) or actual tampering (escalate to engineering + counsel).\n'
printf '\n'
exit 1
fi