lakehouse

Go to file

root 15cfd76c04 catalogd + gateway: Step 6 — /audit/subject/{id} legal-tier HTTP endpoint

Implementation of docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md §5 Step 6
+ §4 (response shape) + §6 (auth model). The defense-against-EEOC-
discovery surface is live: legal counsel hits one URL with one token,
gets back a signed-by-HMAC-chain audit response naming every PII access
for a subject in a time window.

New module: crates/catalogd/src/audit_endpoint.rs (~340 LOC)
  - AuditEndpointState { registry, writer, legal_token }
  - router() exposes:
      GET /subject/{candidate_id}?from=ISO&to=ISO  (full audit response)
      GET /health                                    (liveness + token check)
  - require_legal_auth() — constant-time-eq compare against the
    X-Lakehouse-Legal-Token header. Avoids timing leaks on the token
    check without pulling in `subtle` for one comparison.
  - Token loaded from /etc/lakehouse/legal_audit.token (env-overridable
    via LH_LEGAL_AUDIT_TOKEN_FILE). Empty file or <16 chars = endpoint
    serves 503 with a clear reason. Token value NEVER logged.
  - Response schema: subject_audit_response.v1 with manifest +
    audit_log (rows + chain verification) + datasets_referenced +
    safe_views_available + completeness_attestation.

New helper on SubjectAuditWriter:
  - read_rows_in_range(candidate_id, from, to) — returns rows in window,
    used by the endpoint to assemble the response without re-reading
    the entire chain.
  - verify_chain() now returns Ok(0) when the audit log file doesn't
    exist (empty = trivially valid). Prevents legitimate "no PII access
    yet for this subject" from showing as integrity=BROKEN in the
    audit response. Caller can detect "log was deleted" via comparison
    to SubjectManifest.audit_log_chain_root (when that mirror lands).

main.rs:
  - Audit endpoint mounted at /audit ONLY when both subject_audit
    writer AND legal token are present. Disabled-by-default keeps the
    surface from accidentally serving in dev/bring-up environments
    without proper credentials.

Tests (9/9 passing):
  - constant_time_eq (correctness on equal/diff/empty/length-mismatch)
  - missing_legal_token_returns_503
  - missing_header_returns_401
  - wrong_token_returns_401
  - correct_token_passes_auth
  - audit_response_assembly_full_path (manifest + 3 rows + chain verify)
  - audit_response_window_filters_rows (time-bounded window)
  - empty_token_file_results_in_disabled_endpoint
  - short_token_file_rejected_at_load (<16 char min)

LIVE end-to-end verification:
  1. Plant signing key + legal token in /tmp/lakehouse_audit/
  2. Restart gateway with LH_SUBJECT_AUDIT_KEY + LH_LEGAL_AUDIT_TOKEN_FILE
     pointing at the test files
  3. /audit/health → 200 "audit endpoint ready"
  4. /audit/subject/WORKER-1 (no token) → 401 "missing X-Lakehouse-Legal-Token"
  5. /audit/subject/WORKER-1 (wrong token) → 401 "X-Lakehouse-Legal-Token mismatch"
  6. /audit/subject/WORKER-1 (correct token) → 200 + full manifest + 0 rows
     + chain_verified=true (empty log path)
  7. POST /v1/validate with candidate_id=WORKER-1 → triggers WorkerLookup.find()
     via the AuditingWorkerLookup wrapper from Step 5
  8. data/_catalog/subjects/WORKER-1.audit.jsonl now exists with 1 row
     (accessor.purpose=validator_worker_lookup, result=not_found,
     prev_chain_hash=GENESIS, valid HMAC)
  9. /audit/subject/WORKER-1 (correct token) → 200 + manifest + 1 row +
     chain_verified=true + chain_rows_total=1 + completeness attestation

The full audit-trail loop (PII access → audit row → chain → audit response)
works end-to-end on the live gateway.

NOT in this commit (future steps):
  - Step 7: Daily retention sweep
  - Step 8: Cross-runtime parity (Go side reads the same shapes)
  - Mirror chain root to SubjectManifest.audit_log_chain_root after
    each append (so tampering detection can use the manifest's
    cached root as ground truth)
  - Live row projection from datasets (currently caller follows up
    via /query/sql against the safe_views named in the response)
  - Ed25519 signature on the response (chain verification IS the v1
    attestation; signing is future hardening per spec §10)

cargo build --release clean. cargo test -p catalogd audit_endpoint
9/9 PASS. Live verification successful.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 03:52:04 -05:00

.archon/workflows

.archon: add lakehouse-architect-review workflow

2026-04-26 18:05:43 -05:00

auditor

auditor: layer-2 path-traversal guard — symlink resolution before read

2026-04-27 08:32:33 -05:00

bot

infra: replace gpt-oss with Ollama Pro + OpenCode Zen across hot paths

2026-04-28 06:13:30 -05:00

config

REVERT cloud routing on hot path — back to local Ollama per PRD line 70