diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md index ee4dd9f..97ad3f1 100644 --- a/STATE_OF_PLAY.md +++ b/STATE_OF_PLAY.md @@ -7,7 +7,66 @@ --- -## WHAT LANDED 2026-05-05 (doc reconciliation wave — Gate 3b decision + counsel packet ready) +## WHAT LANDED 2026-05-05 (afternoon wave — full BIPA lifecycle endpoints + UIs + ops dashboard) + +After the morning's doc-reconciliation wave (described below), the afternoon wave shipped the full **operational lifecycle** for biometric data. Phase 1.6 now has the complete end-to-end runtime: candidate consent flow → photo upload → withdrawal → retention sweep flag → erase. Five commits, all on `demo/post-pr11-polish-2026-04-28`. + +| Commit | What | Verified | +|---|---|---| +| `76cb5ac` | `POST /biometric/subject/{id}/consent` — Gate 2 backend. Flips status NeverCollected→Given, computes retention_until = given_at + 18mo (per retention schedule v1 §4), captures consent_version_hash + collection_method (closed set: electronic_signature/paper/click_acceptance) + operator + evidence path to audit row. State machine: 409 already_given, 409 post-withdrawal-requires-erase, 403 subject_inactive. | 12 unit tests; live POST returns 401/400/404 on guards | +| `7f0f500` | Candidate intake UI at `/biometric/intake?candidate_id=XXX` (mcp-server :3700, exposed via nginx at devop.live/lakehouse/biometric/intake). 4-screen flow: operator auth → consent template render + click-accept → photo capture (file or webcam getUserMedia) → confirmation showing audit hmac. SHA-256 of rendered consent block becomes consent_version_hash. sessionStorage-backed token (clears on tab close), neo-brutalist style matching onboard.html. | 22631-byte HTML, mcp-server route returns 200 | +| `68d226c` | Three things in one wave:
(a) `POST /biometric/subject/{id}/withdraw` — BIPA right of withdrawal. Flips Given→Withdrawn, accelerates retention_until from 18mo→30d (per consent template v1 §2 SLA), audit kind=biometric_consent_withdrawal. State machine 409s on NeverCollected/Pending (nothing_to_withdraw), Withdrawn (already_withdrawn), Expired (already_expired). 12 unit tests.
(b) Withdraw UI at `/biometric/withdraw` — 3-screen operator flow (token+name auth → reason+evidence form → confirmation showing 30-day clock + verify curl recipe).
(c) `lakehouse-retention-sweep.{service,timer}` systemd units in `ops/systemd/`. Daily 03:00 UTC, Persistent=true, install.sh updated to handle paired timer+oneshot service.
Plus operator_of_record bug fix in intake UI (was hardcoded `'intake_ui_operator'`). | 46/46 biometric_endpoint + 71/71 catalogd lib tests; manual sweep run: 100 subjects, 0 overdue, exit 0 | +| **(current HEAD post-this-wave)** | Stats endpoint `GET /biometric/stats` (legal-tier auth, returns subject counts by status + photo count + oldest active retention + last 20 state-change events with anonymizable trace_ids) + ops dashboard at `/biometric/dashboard` (single-page, polls /stats every 15s, table + status tiles, XSS-safe DOM construction not innerHTML). Plus consent_versions allowlist: `BiometricEndpointState.allowed_consent_versions: Option>>`, loaded from `/etc/lakehouse/consent_versions.json` (`LH_CONSENT_VERSIONS_FILE` override), missing-file = permissive (v1 compat), present + populated = strict mode (refuses unknown hashes with 400 consent_version_unknown). Plus `scripts/staffing/subject_timeline.sh` — pretty-prints any subject's full BIPA lifecycle from /audit/subject/{id} (manifest + on-disk photo + chronological audit chain + verification status). | 5 new allowlist unit tests + 4 stats tests; live demo on WORKER-100 ran end-to-end (consent → photo → withdraw, chain verified=true, chain_root=a47563ff…) | + +### Live demo evidence (WORKER-100, 2026-05-05 20:12 UTC) +The full lifecycle was exercised against the live gateway as a verification artifact. The audit chain on WORKER-100 now contains 3 rows: + +``` +20:12:33.054 BIOMETRIC_CONSENT_GRANT result=given hmac=9c6f4153341e97d2… trace=live-demo-2026-05-05 +20:12:47.957 BIOMETRIC_COLLECTION result=success hmac=856be6173c88277c… trace=live-demo-2026-05-05 +20:15:27.298 BIOMETRIC_CONSENT_WITHDRAWAL result=withdrawn hmac=a47563ff937d50de… trace=live-demo-2026-05-05 +``` + +`chain_verified=true`, chain_root = a47563ff937d50de43b09a0c903cff954233836c219a928ee8ca2aa6792272dd. Photo file at `data/biometric/uploads/WORKER-100/1778011967957907731_027b6bb1.jpg` (180 bytes — a minimal real JFIF JPEG), retention_until=2026-06-04 (= 30 days from withdrawal). Retention sweep will flag this subject on or after that date; operator runs `/biometric/subject/WORKER-100/erase` to destroy. + +To re-verify: `./scripts/staffing/subject_timeline.sh WORKER-100`. + +### Endpoint matrix (v1 BIPA lifecycle complete) + +| Event | Endpoint | Method | Auth | Status flip | retention_until | +|---|---|---|---|---|---| +| consent given | `/biometric/subject/{id}/consent` | POST | legal | NeverCollected/Pending → Given | now + 18mo | +| photo collected | `/biometric/subject/{id}/photo` | POST | legal + consent gate | (no change) | (no change) | +| consent withdrawn | `/biometric/subject/{id}/withdraw` | POST | legal | Given → Withdrawn | now + 30d | +| destruction | `/biometric/subject/{id}/erase` | POST | legal | (manifest cleared) | n/a | +| audit read | `/audit/subject/{id}` | GET | legal | (read-only) | (read-only) | +| ops aggregates | `/biometric/stats` | GET | legal | (read-only) | (read-only) | + +UIs: +- `/biometric/intake?candidate_id=X` — operator-driven consent + photo +- `/biometric/withdraw` — operator-driven withdrawal recording +- `/biometric/dashboard` — read-only ops aggregate, auto-refresh + +CLI tools: +- `scripts/staffing/verify_biometric_erasure.sh ` — post-erasure verification +- `scripts/staffing/biometric_destruction_report.sh --month YYYY-MM` — anonymized monthly report +- `scripts/staffing/subject_timeline.sh ` — full lifecycle pretty-print (NEW 2026-05-05) +- `scripts/staffing/bundle_counsel_packet.sh` — counsel review tarball +- `scripts/staffing/attest_pre_identityd_biometric_state.sh` — defense attestation generator + +### What's blocking production cutover NOW +**Counsel calendar.** Engineering substrate is done end-to-end: every state transition has a defensible endpoint, every endpoint has tests + live verification, every UI is reachable, retention sweep is scheduled, allowlist hardening is wired. The remaining work is signature/review: +1. Counsel review of consent template v1 (revised for Option C — classifications deferred) +2. Counsel review of retention schedule v1 (revised for Option C) +3. Counsel review of destruction runbook +4. Counsel + J signatures on §2 attestation +5. Once counsel signs the consent template, populate `/etc/lakehouse/consent_versions.json` with the signed hash to flip the gateway from permissive to strict mode + +Counsel-review packet at `reports/counsel/counsel_packet_2026-05-05.tar.gz` (regenerable via `bundle_counsel_packet.sh` to pick up the latest doc state). + +--- + +## WHAT LANDED 2026-05-05 (morning wave — doc reconciliation + Gate 3b decision + counsel packet) This was a **doc-only wave**, not code. Background: J asked for an audit of the BIPA/biometric documentation before production cutover. Audit found moderate fragmentation between docs and shipped code (post-`identityd` collapse, post-Gate-3a-ship, pre-Gate-3b-decision). Closed it in one pass. diff --git a/crates/catalogd/src/biometric_endpoint.rs b/crates/catalogd/src/biometric_endpoint.rs index 1768684..c5f13d0 100644 --- a/crates/catalogd/src/biometric_endpoint.rs +++ b/crates/catalogd/src/biometric_endpoint.rs @@ -69,6 +69,22 @@ pub struct BiometricEndpointState { /// Default: `/data/biometric/uploads`. Set per host via /// LH_BIOMETRIC_STORAGE_ROOT env var. pub storage_root: PathBuf, + /// Optional allowlist of accepted `consent_version_hash` values. + /// `None` = permissive (any non-empty hash accepted, the v1 + /// behavior). `Some(set)` = strict (refuse anything not in the + /// set). Loaded from a JSON config file at startup if one exists. + /// Counsel-tier production deployments should populate this so + /// operator typos / stale templates / drift can't silently land + /// invalid consent records. + pub allowed_consent_versions: Option>>, +} + +#[derive(serde::Deserialize)] +struct ConsentVersionsConfig { + /// Array of accepted SHA-256 hash strings. Each must be 64 hex + /// chars (lowercase). Comments / metadata can ride on the JSON + /// object itself; the parser only consumes "versions". + versions: Vec, } impl BiometricEndpointState { @@ -116,7 +132,76 @@ impl BiometricEndpointState { let _ = std::fs::set_permissions(&storage_root, std::fs::Permissions::from_mode(0o700)); } } - Self { registry, writer, legal_token, storage_root } + Self { + registry, + writer, + legal_token, + storage_root, + allowed_consent_versions: None, + } + } + + /// Load a JSON allowlist of accepted consent_version_hash values. + /// File shape: + /// `{"versions": ["", "", ...]}` + /// + /// Resolution rules: + /// - File missing or unreadable → permissive (None). Logged as + /// a warning so the operator sees the deployment is in v1 + /// compatibility mode. + /// - File present but parse fails → permissive (None) + ERROR + /// log (broken config silently degrading to permissive is + /// wrong; flag loudly). + /// - File present + parses + has ≥1 entry → strict mode + /// (Some(set)). Each hash is normalized to lowercase before + /// insertion to prevent operator-case drift. + /// - File present + empty `versions` array → strict (refuse + /// all consent grants). This is "intentionally locked, + /// waiting for counsel to publish v1 hash" mode. + pub async fn with_consent_versions(mut self, path: &std::path::Path) -> Self { + match tokio::fs::read_to_string(path).await { + Ok(body) => { + match serde_json::from_str::(&body) { + Ok(cfg) => { + let set: std::collections::HashSet = cfg + .versions + .into_iter() + .map(|s| s.trim().to_ascii_lowercase()) + .filter(|s| !s.is_empty()) + .collect(); + tracing::info!( + "biometric endpoint: consent_versions allowlist loaded from {} ({} entries)", + path.display(), set.len() + ); + self.allowed_consent_versions = Some(Arc::new(set)); + } + Err(e) => { + tracing::error!( + "biometric endpoint: consent_versions file at {} failed to parse ({e}); \ + running in PERMISSIVE mode (any non-empty hash accepted). \ + Fix the file or remove it to silence this warning.", + path.display() + ); + } + } + } + Err(e) if e.kind() == std::io::ErrorKind::NotFound => { + tracing::warn!( + "biometric endpoint: consent_versions allowlist not found at {} — \ + running in PERMISSIVE mode (any non-empty hash accepted). \ + For counsel-tier deployment, create the file with the signed v1 hash.", + path.display() + ); + } + Err(e) => { + tracing::error!( + "biometric endpoint: consent_versions file at {} unreadable ({e}); \ + running in PERMISSIVE mode.", + path.display() + ); + } + } + self } } @@ -126,6 +211,7 @@ pub fn router(state: BiometricEndpointState) -> Router { .route("/subject/{candidate_id}/consent", post(record_consent)) .route("/subject/{candidate_id}/withdraw", post(withdraw_consent)) .route("/subject/{candidate_id}/erase", post(erase_subject)) + .route("/stats", get(stats_handler)) .route("/health", get(biometric_health)) .layer(DefaultBodyLimit::max(MAX_PHOTO_BYTES)) .with_state(state) @@ -920,6 +1006,26 @@ pub async fn process_consent( consent_status: None, })); } + // Allowlist enforcement (counsel-tier deployment safety). When + // `allowed_consent_versions` is configured, refuse any hash not + // in the set. Compared lowercase to defeat operator-case drift. + // Permissive mode (None) keeps v1 compatibility for deployments + // that haven't published a signed allowlist yet. + if let Some(allow) = state.allowed_consent_versions.as_ref() { + let normalized = req.consent_version_hash.trim().to_ascii_lowercase(); + if !allow.contains(&normalized) { + return Err((StatusCode::BAD_REQUEST, ErrorResponse { + error: "consent_version_unknown", + detail: format!( + "consent_version_hash {} not in the configured allowlist; \ + this gateway accepts only hashes published in /etc/lakehouse/consent_versions.json. \ + Contact the deployment operator if a new template version needs to be authorized.", + normalized, + ), + consent_status: None, + })); + } + } if req.operator_of_record.trim().is_empty() { return Err((StatusCode::BAD_REQUEST, ErrorResponse { error: "bad_request", @@ -1342,6 +1448,193 @@ pub async fn process_withdraw( }) } +// ─── Stats endpoint (ops dashboard backend) ────────────────────── +// +// Read-only aggregate over all subjects. Returns counts by biometric +// + subject status, total photo count, oldest active retention_until, +// and the last 20 audit events across all subjects (consent_grant / +// biometric_collection / consent_withdrawal / erasure rows). Used by +// the /biometric/dashboard ops UI page. +// +// Auth: legal-tier (same posture as /audit/subject). The aggregate +// numbers leak nothing per-subject by themselves, but the "recent +// events" list contains candidate_ids — counsel-tier protection +// is appropriate. + +const STATS_RESPONSE_SCHEMA: &str = "biometric_stats_response.v1"; +const STATS_RECENT_EVENTS_LIMIT: usize = 20; + +#[derive(Serialize, Debug, Default)] +pub struct StatsCounts { + pub never_collected: usize, + pub pending: usize, + pub given: usize, + pub withdrawn: usize, + pub expired: usize, +} + +#[derive(Serialize, Debug, Default)] +pub struct SubjectStatusCounts { + pub pending_consent: usize, + pub active: usize, + pub withdrawn: usize, + pub retention_expired: usize, + pub erased: usize, +} + +#[derive(Serialize, Debug)] +pub struct StatsEvent { + pub ts: chrono::DateTime, + pub candidate_id: String, + pub kind: String, + pub result: String, + pub trace_id: String, +} + +#[derive(Serialize, Debug)] +pub struct StatsResponse { + pub schema: &'static str, + pub generated_at: chrono::DateTime, + pub total_subjects: usize, + pub biometric_status: StatsCounts, + pub subject_status: SubjectStatusCounts, + pub photos_on_record: usize, + pub oldest_active_retention_until: Option>, + pub upcoming_destruction_window_days: Option, + pub recent_events: Vec, + pub recent_event_counts: std::collections::BTreeMap, +} + +async fn stats_handler( + State(state): State, + headers: HeaderMap, +) -> impl IntoResponse { + let auth_token = headers + .get(LEGAL_TOKEN_HEADER) + .and_then(|v| v.to_str().ok()) + .map(|s| s.to_string()); + match process_stats(&state, auth_token.as_deref()).await { + Ok(resp) => (StatusCode::OK, Json(resp)).into_response(), + Err((status, err)) => (status, Json(err)).into_response(), + } +} + +pub async fn process_stats( + state: &BiometricEndpointState, + legal_token: Option<&str>, +) -> Result { + // Auth (same posture as upload/erase/consent/withdraw). + let configured = state.legal_token.as_ref().ok_or(( + StatusCode::SERVICE_UNAVAILABLE, + ErrorResponse { error: "auth_failed", detail: "no legal token configured".into(), consent_status: None }, + ))?; + let provided = legal_token.ok_or(( + StatusCode::UNAUTHORIZED, + ErrorResponse { error: "auth_failed", detail: "missing X-Lakehouse-Legal-Token".into(), consent_status: None }, + ))?; + if !constant_time_eq(provided.as_bytes(), configured.as_bytes()) { + return Err(( + StatusCode::UNAUTHORIZED, + ErrorResponse { error: "auth_failed", detail: "X-Lakehouse-Legal-Token mismatch".into(), consent_status: None }, + )); + } + + use shared::types::{BiometricConsentStatus, SubjectStatus}; + + let now = chrono::Utc::now(); + let subjects = state.registry.list_subjects().await; + + let mut bio = StatsCounts::default(); + let mut subj = SubjectStatusCounts::default(); + let mut photos_on_record = 0usize; + let mut oldest_active_retention: Option> = None; + + for s in &subjects { + match s.consent.biometric.status { + BiometricConsentStatus::NeverCollected => bio.never_collected += 1, + BiometricConsentStatus::Pending => bio.pending += 1, + BiometricConsentStatus::Given => bio.given += 1, + BiometricConsentStatus::Withdrawn => bio.withdrawn += 1, + BiometricConsentStatus::Expired => bio.expired += 1, + } + match s.status { + SubjectStatus::PendingConsent => subj.pending_consent += 1, + SubjectStatus::Active => subj.active += 1, + SubjectStatus::Withdrawn => subj.withdrawn += 1, + SubjectStatus::RetentionExpired => subj.retention_expired += 1, + SubjectStatus::Erased => subj.erased += 1, + } + if s.biometric_collection.is_some() { + photos_on_record += 1; + } + // Oldest-active = earliest retention_until among Given subjects + // (these are the subjects whose data is currently retained + // under canonical consent — Withdrawn subjects also have + // retention_until set but as a 30-day countdown to destruction, + // surfaced separately). + if matches!(s.consent.biometric.status, BiometricConsentStatus::Given) { + if let Some(t) = s.consent.biometric.retention_until { + oldest_active_retention = Some(match oldest_active_retention { + Some(prev) if prev < t => prev, + _ => t, + }); + } + } + } + + let upcoming_destruction_window_days = oldest_active_retention.map(|t| (t - now).num_days()); + + // Walk every subject's audit log; collect erasure / consent / + // upload / withdrawal rows. Sort by ts desc, keep first + // STATS_RECENT_EVENTS_LIMIT. For 100 subjects this is cheap; + // future-proofing against 100k would want a separate event-stream + // index, out of scope for v1. + let mut events: Vec = Vec::new(); + let mut event_counts: std::collections::BTreeMap = Default::default(); + for s in &subjects { + let rows = state + .writer + .read_rows_in_range(&s.candidate_id, None, None) + .await + .unwrap_or_default(); + for r in rows { + // Filter: only state-changing events (skip validator_lookup + // / gateway_lookup / etc. — they're noise for an ops dash). + match r.accessor.kind.as_str() { + "biometric_consent_grant" + | "biometric_collection" + | "biometric_consent_withdrawal" + | "biometric_erasure" + | "full_erasure" => {} + _ => continue, + } + *event_counts.entry(r.accessor.kind.clone()).or_insert(0) += 1; + events.push(StatsEvent { + ts: r.ts, + candidate_id: r.candidate_id.clone(), + kind: r.accessor.kind.clone(), + result: r.result.clone(), + trace_id: r.accessor.trace_id.clone(), + }); + } + } + events.sort_by(|a, b| b.ts.cmp(&a.ts)); + events.truncate(STATS_RECENT_EVENTS_LIMIT); + + Ok(StatsResponse { + schema: STATS_RESPONSE_SCHEMA, + generated_at: now, + total_subjects: subjects.len(), + biometric_status: bio, + subject_status: subj, + photos_on_record, + oldest_active_retention_until: oldest_active_retention, + upcoming_destruction_window_days, + recent_events: events, + recent_event_counts: event_counts, + }) +} + #[cfg(test)] mod tests { use super::*; @@ -1377,6 +1670,7 @@ mod tests { writer, legal_token: Some(Arc::new(TEST_TOKEN.into())), storage_root, + allowed_consent_versions: None, } } @@ -2249,4 +2543,188 @@ mod tests { // Withdraw row chains off the grant's hmac. assert_eq!(rows[1].prev_chain_hash, grant.audit_row_hmac); } + + // ─── Stats endpoint tests ────────────────────────────────────── + + #[tokio::test] + async fn stats_missing_token_rejected() { + let state = fixture_state("stats_no_token").await; + let err = process_stats(&state, None).await.unwrap_err(); + assert_eq!(err.0, StatusCode::UNAUTHORIZED); + } + + #[tokio::test] + async fn stats_aggregates_subjects_by_status() { + let state = fixture_state("stats_aggregate").await; + // Plant a mix of states. + for (id, bs, ss) in [ + ("WORKER-S1", BiometricConsentStatus::NeverCollected, SubjectStatus::Active), + ("WORKER-S2", BiometricConsentStatus::NeverCollected, SubjectStatus::Active), + ("WORKER-S3", BiometricConsentStatus::Given, SubjectStatus::Active), + ("WORKER-S4", BiometricConsentStatus::Given, SubjectStatus::Active), + ("WORKER-S5", BiometricConsentStatus::Given, SubjectStatus::Active), + ("WORKER-S6", BiometricConsentStatus::Withdrawn, SubjectStatus::Active), + ("WORKER-S7", BiometricConsentStatus::Pending, SubjectStatus::PendingConsent), + ("WORKER-S8", BiometricConsentStatus::Expired, SubjectStatus::Erased), + ] { + let _ = state.registry.put_subject(fixture_manifest(id, bs, ss)).await; + } + let stats = process_stats(&state, Some(TEST_TOKEN)).await.unwrap(); + assert_eq!(stats.total_subjects, 8); + assert_eq!(stats.biometric_status.never_collected, 2); + assert_eq!(stats.biometric_status.given, 3); + assert_eq!(stats.biometric_status.withdrawn, 1); + assert_eq!(stats.biometric_status.pending, 1); + assert_eq!(stats.biometric_status.expired, 1); + assert_eq!(stats.subject_status.active, 6); + assert_eq!(stats.subject_status.pending_consent, 1); + assert_eq!(stats.subject_status.erased, 1); + } + + #[tokio::test] + async fn stats_recent_events_filters_state_changes_only() { + // The stats endpoint surfaces only state-changing event kinds + // (consent_grant / collection / withdrawal / erasure) — not + // gateway_lookup / validator_lookup / etc. which are noise + // for an ops dashboard. + let state = fixture_state("stats_event_filter").await; + // Subject must start at NeverCollected so process_consent can flip + // it to Given (otherwise the consent endpoint returns 409 already_given). + let _ = state.registry.put_subject(fixture_manifest("WORKER-EV", BiometricConsentStatus::NeverCollected, SubjectStatus::Active)).await; + + // Plant a noisy validator_lookup row + a state-change row. + use shared::types::AuditAccessor; + let noise_row = SubjectAuditRow { + schema: "subject_audit.v1".into(), + ts: chrono::Utc::now(), + candidate_id: "WORKER-EV".into(), + accessor: AuditAccessor { + kind: "validator_lookup".into(), + daemon: "gateway".into(), + purpose: "noise".into(), + trace_id: String::new(), + }, + fields_accessed: vec!["exists".into()], + result: "not_found".into(), + prev_chain_hash: String::new(), + row_hmac: String::new(), + }; + let _ = state.writer.append(noise_row).await; + // grant a consent (state change) so chain has at least one + // event the dashboard should surface. + let _ = process_consent(&state, "WORKER-EV", Some(TEST_TOKEN), "trace-ev", fixture_consent_request()) + .await.unwrap(); + + let stats = process_stats(&state, Some(TEST_TOKEN)).await.unwrap(); + // recent_events must NOT contain the validator_lookup row. + assert!(stats.recent_events.iter().all(|e| e.kind != "validator_lookup")); + // The consent grant SHOULD be present. + assert!(stats.recent_events.iter().any(|e| e.kind == "biometric_consent_grant" && e.candidate_id == "WORKER-EV")); + // event_counts breakdown must exclude the noise. + assert!(!stats.recent_event_counts.contains_key("validator_lookup")); + assert_eq!(stats.recent_event_counts.get("biometric_consent_grant"), Some(&1)); + } + + // ─── Consent version allowlist tests ────────────────────────── + + async fn fixture_state_with_allowlist(name: &str, hashes: Vec) -> BiometricEndpointState { + let mut s = fixture_state(name).await; + let set: std::collections::HashSet = hashes.into_iter().collect(); + s.allowed_consent_versions = Some(Arc::new(set)); + s + } + + #[tokio::test] + async fn consent_allowlist_accepts_known_hash() { + // Allowlist contains the fixture's consent_version_hash — + // grant succeeds. + let allowed = "abcdef0123456789".repeat(4); // matches fixture_consent_request + let state = fixture_state_with_allowlist("al_known", vec![allowed]).await; + let _ = state.registry.put_subject(fixture_manifest("WORKER-AL1", BiometricConsentStatus::NeverCollected, SubjectStatus::Active)).await; + let resp = process_consent(&state, "WORKER-AL1", Some(TEST_TOKEN), "trace-al", fixture_consent_request()) + .await.unwrap(); + assert_eq!(resp.status_after, "Given"); + } + + #[tokio::test] + async fn consent_allowlist_rejects_unknown_hash() { + // Allowlist contains a different hash; the fixture's hash + // is rejected with consent_version_unknown. + let other = "0011223344556677".repeat(4); + let state = fixture_state_with_allowlist("al_unknown", vec![other]).await; + let _ = state.registry.put_subject(fixture_manifest("WORKER-AL2", BiometricConsentStatus::NeverCollected, SubjectStatus::Active)).await; + let err = process_consent(&state, "WORKER-AL2", Some(TEST_TOKEN), "trace-al", fixture_consent_request()) + .await.unwrap_err(); + assert_eq!(err.0, StatusCode::BAD_REQUEST); + assert_eq!(err.1.error, "consent_version_unknown"); + } + + #[tokio::test] + async fn consent_allowlist_normalizes_case() { + // Operator might paste an uppercase hash; allowlist comparison + // is lowercase to defeat that drift. + let allowed_lower = "abcdef0123456789".repeat(4); + let state = fixture_state_with_allowlist("al_case", vec![allowed_lower]).await; + let _ = state.registry.put_subject(fixture_manifest("WORKER-AL3", BiometricConsentStatus::NeverCollected, SubjectStatus::Active)).await; + let mut req = fixture_consent_request(); + req.consent_version_hash = req.consent_version_hash.to_uppercase(); + let resp = process_consent(&state, "WORKER-AL3", Some(TEST_TOKEN), "trace-al", req) + .await.unwrap(); + assert_eq!(resp.status_after, "Given"); + } + + #[tokio::test] + async fn consent_allowlist_empty_set_refuses_all() { + // Strict-locked mode: operator created the file but populated + // it with an empty array. Refuses every grant — used as a + // deliberate freeze (e.g., counsel hasn't signed v1 yet). + let state = fixture_state_with_allowlist("al_empty", vec![]).await; + let _ = state.registry.put_subject(fixture_manifest("WORKER-AL4", BiometricConsentStatus::NeverCollected, SubjectStatus::Active)).await; + let err = process_consent(&state, "WORKER-AL4", Some(TEST_TOKEN), "trace-al", fixture_consent_request()) + .await.unwrap_err(); + assert_eq!(err.0, StatusCode::BAD_REQUEST); + assert_eq!(err.1.error, "consent_version_unknown"); + } + + #[tokio::test] + async fn consent_allowlist_none_is_permissive() { + // The default fixture has allowed_consent_versions=None. + // Grant succeeds with any non-empty hash (v1 compat behavior). + let state = fixture_state("consent_allowlist_none").await; + let _ = state.registry.put_subject(fixture_manifest("WORKER-AL5", BiometricConsentStatus::NeverCollected, SubjectStatus::Active)).await; + let resp = process_consent(&state, "WORKER-AL5", Some(TEST_TOKEN), "trace-al", fixture_consent_request()) + .await.unwrap(); + assert_eq!(resp.status_after, "Given"); + } + + #[tokio::test] + async fn stats_oldest_retention_picks_earliest_given() { + let state = fixture_state("stats_retention").await; + let now = chrono::Utc::now(); + // Three Given subjects with different retention dates. + let later = now + chrono::Duration::days(540); // 18 months + let middle = now + chrono::Duration::days(300); + let early = now + chrono::Duration::days(60); // earliest + for (id, ret) in [("WORKER-R1", later), ("WORKER-R2", middle), ("WORKER-R3", early)] { + let mut m = fixture_manifest(id, BiometricConsentStatus::Given, SubjectStatus::Active); + m.consent.biometric.retention_until = Some(ret); + let _ = state.registry.put_subject(m).await; + } + // Plus a Withdrawn subject with an even earlier retention date — + // should NOT count for "oldest active" because it's not Given. + let mut withdrawn = fixture_manifest("WORKER-R4", BiometricConsentStatus::Withdrawn, SubjectStatus::Active); + withdrawn.consent.biometric.retention_until = Some(now + chrono::Duration::days(15)); + let _ = state.registry.put_subject(withdrawn).await; + + let stats = process_stats(&state, Some(TEST_TOKEN)).await.unwrap(); + // The oldest_active should be the earliest Given retention (WORKER-R3, ~60d). + // (Equality on DateTime doesn't suffer the put_subject + // microsecond-skew issue here because retention_until is a + // pre-computed value we set ourselves; it round-trips intact.) + assert_eq!(stats.oldest_active_retention_until, Some(early)); + let days = stats.upcoming_destruction_window_days.unwrap(); + // Allow ±1-day fudge for the now() taken inside process_stats. + assert!((59..=61).contains(&days), + "expected ~60 days, got {} (early={} now-ish={:?})", days, early, stats.generated_at); + } } diff --git a/crates/gateway/src/main.rs b/crates/gateway/src/main.rs index b1ee157..8685d9b 100644 --- a/crates/gateway/src/main.rs +++ b/crates/gateway/src/main.rs @@ -444,16 +444,24 @@ async fn main() { let biometric_storage_root: std::path::PathBuf = std::env::var("LH_BIOMETRIC_STORAGE_ROOT") .map(std::path::PathBuf::from) .unwrap_or_else(|_| std::path::PathBuf::from("./data/biometric/uploads")); + // consent_versions allowlist — counsel-tier deployment safety. + // Defaults to /etc/lakehouse/consent_versions.json; override + // via LH_CONSENT_VERSIONS_FILE. Missing file → permissive + // mode (any non-empty hash accepted, the v1 behavior). + let consent_versions_path: std::path::PathBuf = std::env::var("LH_CONSENT_VERSIONS_FILE") + .map(std::path::PathBuf::from) + .unwrap_or_else(|_| std::path::PathBuf::from("/etc/lakehouse/consent_versions.json")); let biometric_state = catalogd::biometric_endpoint::BiometricEndpointState::new( registry.clone(), writer, std::path::Path::new(&legal_token_path), biometric_storage_root.clone(), - ).await; + ).await + .with_consent_versions(&consent_versions_path).await; app = app.nest("/biometric", catalogd::biometric_endpoint::router(biometric_state)); tracing::info!( - "biometric endpoint mounted at /biometric (storage_root: {}, legal token: {})", - biometric_storage_root.display(), legal_token_path + "biometric endpoint mounted at /biometric (storage_root: {}, legal token: {}, consent_versions file: {})", + biometric_storage_root.display(), legal_token_path, consent_versions_path.display() ); } else { tracing::warn!("/audit + /biometric endpoints NOT mounted — subject_audit writer is None (no signing key)"); diff --git a/mcp-server/biometric_dashboard.html b/mcp-server/biometric_dashboard.html new file mode 100644 index 0000000..fd8860c --- /dev/null +++ b/mcp-server/biometric_dashboard.html @@ -0,0 +1,353 @@ + + + + + +Lakehouse — Biometric Ops Dashboard + + + + +
+

⚡ Biometric Ops Dashboard

+ + + + + +
+ +
+ + +
+

Operator authentication

+

Read-only ops dashboard over /biometric/stats. Aggregate counts + recent state-change events. Auth via legal-tier token (sessionStorage; clears on tab close).

+
+ + +
+
+
+
+ + +
+

Biometric consent state — by subject

+
+
Never collected
no biometric data on record
+
Pending
consent in flight
+
Given
active biometric data
+
Withdrawn
awaiting destruction (≤30d)
+
Expired
retention window passed
+
Photos on disk
quarantined uploads
+
+ +
+
+

Subject lifecycle status

+
+
+ Total subjects + Active + Pending consent + Withdrawn + Retention expired + Erased +
+
+
+ +
+

Retention horizon

+
+
+ Oldest active retention + Days until earliest expiry + Sweep scheduledaily 03:00 UTC + Sweep unitlakehouse-retention-sweep.timer +
+
+
+
+ +

Recent state-change events (last 20 across all subjects, newest first)

+
+ + + + + + + + + + + + + +
TimestampCandidateKindResultTrace ID
Loading…
+
+ +

Event-kind breakdown (across recent events shown above)

+
+
+
+ +
+
+
+ + + + + + diff --git a/mcp-server/index.ts b/mcp-server/index.ts index f315eab..c1d74bb 100644 --- a/mcp-server/index.ts +++ b/mcp-server/index.ts @@ -789,6 +789,15 @@ async function main() { }); } + // Biometric ops dashboard — read-only aggregate over + // /biometric/stats. Counts by status + recent state-change + // events. Auto-refreshes every 15s. Operator-tier auth. + if (url.pathname === "/biometric/dashboard") { + return new Response(Bun.file(import.meta.dir + "/biometric_dashboard.html"), { + headers: { ...cors, "Content-Type": "text/html" }, + }); + } + // Workspaces — per-contract state (Phase 8.5). UI layer over the // gateway's /workspaces/* routes: list, create, detail, handoff, // save-search, shortlist, log-activity. All persisted on the diff --git a/ops/consent_versions.example.json b/ops/consent_versions.example.json new file mode 100644 index 0000000..f2f3537 --- /dev/null +++ b/ops/consent_versions.example.json @@ -0,0 +1,7 @@ +{ + "_comment": "consent_versions allowlist for the biometric consent endpoint. Loaded by the gateway at startup from the path in LH_CONSENT_VERSIONS_FILE (default: /etc/lakehouse/consent_versions.json). Hashes are SHA-256 of the rendered consent template text (lowercase hex). Generate from /biometric/intake — the JS computes the hash client-side and you can capture it from a smoke run, OR compute server-side from the markdown file. Counsel-tier deployment SHOULD have at least one entry; absent file = permissive (any non-empty hash accepted, v1 compat).", + "_doc": "docs/PHASE_1_6_BIPA_GATES.md §1 Gate 2", + "versions": [ + "REPLACE_WITH_REAL_SHA256_OF_SIGNED_CONSENT_TEMPLATE_v1" + ] +} diff --git a/scripts/staffing/subject_timeline.sh b/scripts/staffing/subject_timeline.sh new file mode 100755 index 0000000..ea02377 --- /dev/null +++ b/scripts/staffing/subject_timeline.sh @@ -0,0 +1,151 @@ +#!/usr/bin/env bash +# subject_timeline — pretty-print a subject's full BIPA lifecycle. +# +# Specification: docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md §6 +# + docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md §3. +# +# Why this exists: when an operator gets a question like "what +# happened to candidate X's biometric data" — counsel inquiry, +# subject access request, or just routine triage — they need a +# one-shot view of the full lineage. /audit/subject/{id} returns +# the raw JSON; this wraps it in a human-readable timeline. +# +# Output: +# - Manifest summary (status, biometric status, retention_until) +# - Audit chain (chronological, kind + result + ts + hmac prefix) +# - Chain verification status (HMAC chain integrity) +# - On-disk photo presence + size if applicable +# +# Usage: +# subject_timeline.sh +# +# Environment: +# GATEWAY_URL — default http://localhost:3100 +# LEGAL_TOKEN_FILE — default /etc/lakehouse/legal_audit.token +# UPLOADS_ROOT — default data/biometric/uploads (relative to repo) +# +# Exit codes: +# 0 — timeline printed (chain may or may not verify; that's a fact, not a script error) +# 1 — chain verification failed (still prints, but flagged) +# 2 — script error (missing tools, network failure, bad token, subject not found) + +set -uo pipefail +cd "$(dirname "$0")/../.." + +if [ "$#" -lt 1 ]; then + echo "usage: subject_timeline.sh " >&2 + exit 2 +fi + +CANDIDATE_ID="$1" +GATEWAY_URL="${GATEWAY_URL:-http://localhost:3100}" +LEGAL_TOKEN_FILE="${LEGAL_TOKEN_FILE:-/etc/lakehouse/legal_audit.token}" +UPLOADS_ROOT="${UPLOADS_ROOT:-data/biometric/uploads}" + +for cmd in curl jq; do + if ! command -v "$cmd" >/dev/null 2>&1; then + echo "[timeline] FAIL: required tool '$cmd' not found" >&2 + exit 2 + fi +done + +if [ ! -r "$LEGAL_TOKEN_FILE" ]; then + echo "[timeline] FAIL: cannot read legal token at $LEGAL_TOKEN_FILE" >&2 + exit 2 +fi +LEGAL_TOKEN=$(tr -d '[:space:]' < "$LEGAL_TOKEN_FILE") +[ -n "$LEGAL_TOKEN" ] || { echo "[timeline] FAIL: legal token file is empty" >&2; exit 2; } + +# safe_id matches catalogd::biometric_endpoint::sanitize_for_path +SAFE_ID=$(printf '%s' "$CANDIDATE_ID" | sed 's/[^A-Za-z0-9_.\-]/_/g') + +RESP_FILE=$(mktemp) +trap 'rm -f "$RESP_FILE"' EXIT +HTTP_CODE=$(curl -sS -o "$RESP_FILE" -w '%{http_code}' \ + -H "X-Lakehouse-Legal-Token: $LEGAL_TOKEN" \ + -H "Accept: application/json" \ + "$GATEWAY_URL/audit/subject/$CANDIDATE_ID") + +if [ "$HTTP_CODE" != "200" ]; then + echo "[timeline] FAIL: GET /audit/subject/$CANDIDATE_ID returned HTTP $HTTP_CODE" >&2 + echo "[timeline] response:" >&2 + cat "$RESP_FILE" >&2 + echo >&2 + exit 2 +fi + +# ── Header ────────────────────────────────────────────────────────── +printf '\n' +printf '═══ Subject Timeline — %s ═══\n' "$CANDIDATE_ID" +printf '\n' + +# ── Manifest summary ─────────────────────────────────────────────── +printf 'Manifest\n' +printf ' candidate_id : %s\n' "$(jq -r '.manifest.candidate_id' < "$RESP_FILE")" +printf ' subject status : %s\n' "$(jq -r '.manifest.status' < "$RESP_FILE")" +printf ' vertical : %s\n' "$(jq -r '.manifest.vertical' < "$RESP_FILE")" +printf ' general_pii : %s (until %s)\n' \ + "$(jq -r '.manifest.consent.general_pii.status' < "$RESP_FILE")" \ + "$(jq -r '.manifest.retention.general_pii_until' < "$RESP_FILE")" +printf ' biometric : %s\n' "$(jq -r '.manifest.consent.biometric.status' < "$RESP_FILE")" +RET=$(jq -r '.manifest.consent.biometric.retention_until // "—"' < "$RESP_FILE") +printf ' biometric retent. : %s\n' "$RET" +BC_PRESENT=$(jq -r '.manifest.biometric_collection != null' < "$RESP_FILE") +if [ "$BC_PRESENT" = "true" ]; then + printf ' photo data_path : %s\n' "$(jq -r '.manifest.biometric_collection.data_path' < "$RESP_FILE")" + printf ' photo template : %s\n' "$(jq -r '.manifest.biometric_collection.template_hash' < "$RESP_FILE")" + printf ' photo collected : %s\n' "$(jq -r '.manifest.biometric_collection.collected_at' < "$RESP_FILE")" + printf ' consent_ver_hash : %s\n' "$(jq -r '.manifest.biometric_collection.consent_version_hash' < "$RESP_FILE")" +fi + +# ── On-disk photo state ──────────────────────────────────────────── +printf '\nOn disk\n' +PHOTO_DIR="$UPLOADS_ROOT/$SAFE_ID" +if [ -d "$PHOTO_DIR" ]; then + COUNT=$(find "$PHOTO_DIR" -maxdepth 1 -type f 2>/dev/null | wc -l | tr -d '[:space:]') + printf ' uploads dir : %s (%s file(s))\n' "$PHOTO_DIR" "${COUNT:-0}" + if [ "${COUNT:-0}" != "0" ]; then + while IFS= read -r f; do + printf ' - %s (%s bytes)\n' "$f" "$(stat -c '%s' "$f" 2>/dev/null || echo '?')" + done < <(find "$PHOTO_DIR" -maxdepth 1 -type f 2>/dev/null) + fi +else + printf ' uploads dir : %s (absent)\n' "$PHOTO_DIR" +fi + +# ── Audit chain ──────────────────────────────────────────────────── +printf '\nAudit chain\n' +ROWS_TOTAL=$(jq -r '.audit_log.chain_rows_total' < "$RESP_FILE") +VERIFIED=$(jq -r '.audit_log.chain_verified' < "$RESP_FILE") +ROOT=$(jq -r '.audit_log.chain_root // "—"' < "$RESP_FILE") +ERROR=$(jq -r '.audit_log.chain_verification_error // ""' < "$RESP_FILE") +printf ' rows total : %s\n' "$ROWS_TOTAL" +printf ' verified : %s\n' "$VERIFIED" +printf ' chain root (last) : %s\n' "$ROOT" +if [ -n "$ERROR" ] && [ "$ERROR" != "null" ]; then + printf ' verification err : %s\n' "$ERROR" +fi + +if [ "$ROWS_TOTAL" != "0" ]; then + printf '\n events (chronological):\n' + jq -r ' + .audit_log.rows + | sort_by(.ts) + | .[] + | " \(.ts) | \(.accessor.kind | ascii_upcase) | result=\(.result) | hmac=\(.row_hmac[0:16])… | trace=\(.accessor.trace_id // "—")" + ' < "$RESP_FILE" +fi + +# ── Footer ───────────────────────────────────────────────────────── +printf '\n' +if [ "$VERIFIED" = "true" ]; then + printf 'Status: chain verified end-to-end.\n' + printf '\n' + exit 0 +else + printf 'Status: CHAIN VERIFICATION FAILED. Investigate before quoting this timeline\n' + printf ' in any external response. Likely causes: post-rotation legacy chain\n' + printf ' (expected) or actual tampering (escalate to engineering + counsel).\n' + printf '\n' + exit 1 +fi