root fef1efd2ac gateway: Step 4 — wire SubjectAuditWriter into tool dispatch
Implementation of docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md §5 Step 4.
Every tool result that returns rows referencing a subject now produces
audit rows in the per-subject HMAC-chained JSONL. Failures non-blocking.

Changes:

v1/mod.rs:
  + V1State.subject_audit: Option<Arc<SubjectAuditWriter>>
    (None when key file missing — audit becomes a no-op with warning,
    PII paths still serve)

main.rs:
  + Construct SubjectAuditWriter at startup from
    LH_SUBJECT_AUDIT_KEY env or /etc/lakehouse/subject_audit.key.
    Missing/short key = log warning + leave None (gateway boots, audit
    disabled). Same store as the rest of catalogd.

execution_loop/mod.rs:
  + audit_subject_hits_in() — called after every successful tool
    dispatch. Walks the result JSON, finds candidate_id / worker_id
    fields, fires one SubjectAuditRow per (subject, fields) pair.
    Tokio::spawn so audit latency never adds to tool path.
  + collect_subject_hits() — free fn, recursive JSON walker. Handles:
      "candidate_id":"X"  → audit candidate_id="X"
      "worker_id":42      → audit candidate_id="WORKER-42" (matches
                              backfill convention)
      "worker_id":"42"    → audit candidate_id="WORKER-42" (string form)
    Other fields in the same object become fields_accessed (so audit
    row records "this access surfaced name + phone for candidate X").
    Ignores objects without id fields. Skips empty id strings. Recurses
    through nested objects + arrays.

Tests (6/6 passing — gateway::collect_subject_hits_*):
  - finds_candidate_id_strings (basic case + fields_accessed extraction)
  - prefixes_worker_id_int (int → WORKER-N)
  - handles_worker_id_string (string → WORKER-N)
  - recurses_through_nested_objects (joins / mixed payloads)
  - ignores_objects_without_id_fields (no false positives)
  - skips_empty_id_strings (defensive)

Per spec §3.2: failures are logged, never propagated. Better to leak
an audit row than block a tool response. Operators monitor warning
volume to detect audit-write regressions.

NOT in this commit (future steps):
  - Step 5: Wire validator WorkerLookup similarly (each candidate_id
    resolved by FillValidator gets an audit row)
  - Step 6: /audit/subject/{id} HTTP endpoint
  - Step 7: Daily retention sweep
  - Mirror chain root to SubjectManifest.audit_log_chain_root after
    each append (currently the chain is verifiable via verify_chain()
    even without the manifest mirror; the mirror is an optimization)
  - Thread X-Lakehouse-Trace-Id from request through to audit row

cargo build --release clean. cargo test -p gateway collect_subject_hits
6/6 PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 03:29:24 -05:00
2026-04-22 02:41:15 -05:00
Description
Rust-first object storage system
6.3 GiB
Languages
TypeScript 38.4%
Rust 35.8%
HTML 13.9%
Python 7.8%
Shell 2.1%
Other 2%