Implementation of docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md §5 Step 4.
Every tool result that returns rows referencing a subject now produces
audit rows in the per-subject HMAC-chained JSONL. Failures non-blocking.
Changes:
v1/mod.rs:
+ V1State.subject_audit: Option<Arc<SubjectAuditWriter>>
(None when key file missing — audit becomes a no-op with warning,
PII paths still serve)
main.rs:
+ Construct SubjectAuditWriter at startup from
LH_SUBJECT_AUDIT_KEY env or /etc/lakehouse/subject_audit.key.
Missing/short key = log warning + leave None (gateway boots, audit
disabled). Same store as the rest of catalogd.
execution_loop/mod.rs:
+ audit_subject_hits_in() — called after every successful tool
dispatch. Walks the result JSON, finds candidate_id / worker_id
fields, fires one SubjectAuditRow per (subject, fields) pair.
Tokio::spawn so audit latency never adds to tool path.
+ collect_subject_hits() — free fn, recursive JSON walker. Handles:
"candidate_id":"X" → audit candidate_id="X"
"worker_id":42 → audit candidate_id="WORKER-42" (matches
backfill convention)
"worker_id":"42" → audit candidate_id="WORKER-42" (string form)
Other fields in the same object become fields_accessed (so audit
row records "this access surfaced name + phone for candidate X").
Ignores objects without id fields. Skips empty id strings. Recurses
through nested objects + arrays.
Tests (6/6 passing — gateway::collect_subject_hits_*):
- finds_candidate_id_strings (basic case + fields_accessed extraction)
- prefixes_worker_id_int (int → WORKER-N)
- handles_worker_id_string (string → WORKER-N)
- recurses_through_nested_objects (joins / mixed payloads)
- ignores_objects_without_id_fields (no false positives)
- skips_empty_id_strings (defensive)
Per spec §3.2: failures are logged, never propagated. Better to leak
an audit row than block a tool response. Operators monitor warning
volume to detect audit-write regressions.
NOT in this commit (future steps):
- Step 5: Wire validator WorkerLookup similarly (each candidate_id
resolved by FillValidator gets an audit row)
- Step 6: /audit/subject/{id} HTTP endpoint
- Step 7: Daily retention sweep
- Mirror chain root to SubjectManifest.audit_log_chain_root after
each append (currently the chain is verifiable via verify_chain()
even without the manifest mirror; the mirror is an optimization)
- Thread X-Lakehouse-Trace-Id from request through to audit row
cargo build --release clean. cargo test -p gateway collect_subject_hits
6/6 PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>