Implementation of docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md §5 Step 6
+ §4 (response shape) + §6 (auth model). The defense-against-EEOC-
discovery surface is live: legal counsel hits one URL with one token,
gets back a signed-by-HMAC-chain audit response naming every PII access
for a subject in a time window.
New module: crates/catalogd/src/audit_endpoint.rs (~340 LOC)
- AuditEndpointState { registry, writer, legal_token }
- router() exposes:
GET /subject/{candidate_id}?from=ISO&to=ISO (full audit response)
GET /health (liveness + token check)
- require_legal_auth() — constant-time-eq compare against the
X-Lakehouse-Legal-Token header. Avoids timing leaks on the token
check without pulling in `subtle` for one comparison.
- Token loaded from /etc/lakehouse/legal_audit.token (env-overridable
via LH_LEGAL_AUDIT_TOKEN_FILE). Empty file or <16 chars = endpoint
serves 503 with a clear reason. Token value NEVER logged.
- Response schema: subject_audit_response.v1 with manifest +
audit_log (rows + chain verification) + datasets_referenced +
safe_views_available + completeness_attestation.
New helper on SubjectAuditWriter:
- read_rows_in_range(candidate_id, from, to) — returns rows in window,
used by the endpoint to assemble the response without re-reading
the entire chain.
- verify_chain() now returns Ok(0) when the audit log file doesn't
exist (empty = trivially valid). Prevents legitimate "no PII access
yet for this subject" from showing as integrity=BROKEN in the
audit response. Caller can detect "log was deleted" via comparison
to SubjectManifest.audit_log_chain_root (when that mirror lands).
main.rs:
- Audit endpoint mounted at /audit ONLY when both subject_audit
writer AND legal token are present. Disabled-by-default keeps the
surface from accidentally serving in dev/bring-up environments
without proper credentials.
Tests (9/9 passing):
- constant_time_eq (correctness on equal/diff/empty/length-mismatch)
- missing_legal_token_returns_503
- missing_header_returns_401
- wrong_token_returns_401
- correct_token_passes_auth
- audit_response_assembly_full_path (manifest + 3 rows + chain verify)
- audit_response_window_filters_rows (time-bounded window)
- empty_token_file_results_in_disabled_endpoint
- short_token_file_rejected_at_load (<16 char min)
LIVE end-to-end verification:
1. Plant signing key + legal token in /tmp/lakehouse_audit/
2. Restart gateway with LH_SUBJECT_AUDIT_KEY + LH_LEGAL_AUDIT_TOKEN_FILE
pointing at the test files
3. /audit/health → 200 "audit endpoint ready"
4. /audit/subject/WORKER-1 (no token) → 401 "missing X-Lakehouse-Legal-Token"
5. /audit/subject/WORKER-1 (wrong token) → 401 "X-Lakehouse-Legal-Token mismatch"
6. /audit/subject/WORKER-1 (correct token) → 200 + full manifest + 0 rows
+ chain_verified=true (empty log path)
7. POST /v1/validate with candidate_id=WORKER-1 → triggers WorkerLookup.find()
via the AuditingWorkerLookup wrapper from Step 5
8. data/_catalog/subjects/WORKER-1.audit.jsonl now exists with 1 row
(accessor.purpose=validator_worker_lookup, result=not_found,
prev_chain_hash=GENESIS, valid HMAC)
9. /audit/subject/WORKER-1 (correct token) → 200 + manifest + 1 row +
chain_verified=true + chain_rows_total=1 + completeness attestation
The full audit-trail loop (PII access → audit row → chain → audit response)
works end-to-end on the live gateway.
NOT in this commit (future steps):
- Step 7: Daily retention sweep
- Step 8: Cross-runtime parity (Go side reads the same shapes)
- Mirror chain root to SubjectManifest.audit_log_chain_root after
each append (so tampering detection can use the manifest's
cached root as ground truth)
- Live row projection from datasets (currently caller follows up
via /query/sql against the safe_views named in the response)
- Ed25519 signature on the response (chain verification IS the v1
attestation; signing is future hardening per spec §10)
cargo build --release clean. cargo test -p catalogd audit_endpoint
9/9 PASS. Live verification successful.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
Rust-first object storage system
Languages
TypeScript
38.4%
Rust
35.8%
HTML
13.9%
Python
7.8%
Shell
2.1%
Other
2%