Ran cross-lineage scrum on the discovery doc with the new model fleet
(opus + kimi-k2.6 + gemini-3-flash via Go gateway :4110, custom
"senior security architect" prompt). 3/3 reviewers responded with
substantive 800-1200 word reviews. Saved at /tmp/audit_scrum/.
5 convergent findings (≥2 reviewers) added as §10/C1-C5:
C1. §1F matrix-indexer "good for audit defensibility" claim is over-
claimed — walked back in TL;DR. Trace bodies unverified; treat as
SUSPECTED PII sink until §8.1 sampling completes.
C2. §1E (Langfuse) is the most dangerous leak — fix FIRST, ahead of
view-routing. Boundary-crossing leak (GDPR Art. 44 / CPRA sale /
SOC2 disposal). All 3 reviewers converge on this priority.
C3. Discrimination defense requires the FULL CANDIDATE POOL, not just
fills. EEOC UGESP (1978): need adverse-impact stats on everyone
who could have been picked. Phase 1 worked example missed this.
C4. BIPA / biometric exposure understated in findings (in PRD §10.5
but not translated to actionables). $1k-$5k per-violation regime.
C5. candidate_id must be promoted to top-level field in all JSONL
sinks. Grepping natural-language strings is not defensible audit
strategy. 3/3 reviewers converge.
11 single-reviewer high-value catches added as §10 single-reviewer
section: opus on LLM provider egress (8th PII path), Art. 22 right-
to-explanation, special-category data, DPIA/ROPA/DPA inventory; kimi
on sequential ID enumeration risk, Langfuse retention config, CCPA
de-identified-in-place vs crypto-shred, Bun common-mode failure,
cryptographic audit-trail integrity (Merkle/FRE 901), HIPAA BAA,
revised SELECT * effort estimate; gemini on data residency, "culture
fit" reasoning proxies, comparator-pool snapshot.
§9 reordered: sample first → defense-layer second → Langfuse
boundary third (was view-routing first per original draft;
boundary-crossing leak is higher priority per scrum).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
34 KiB
Phase 1 Discovery — Subject + PII Surface Map
Status: Draft — 2026-05-03 · Drafted by: working session 2026-05-03 · Companion to: AUDIT_TRAIL_PRD.md
Purpose. Read-only walk of both runtimes. Fills in the "UNKNOWN" cells in
AUDIT_TRAIL_PRD.md§3 surface map and §7 current-state-vs-target table with file:line evidence. No code changes. Output: a complete picture of where subject identifiers + PII flow today, where they leak, and what an audit response could and could not produce right now.
TL;DR — what the walk found
-
A defense layer EXISTS but is BYPASSED. Two PII-masked views (
candidates_safe,workers_safe) live indata/_catalog/views/and correctly drop name/email/phone. No production tool query uses them. The MCP tool registry'ssearch_candidatesandget_candidateSQL templates query the rawcandidates/workers_500ktables and return full PII to the LLM context. Worth considering: the views are functioning policy artifacts that nobody routes through. -
PII flows freely through the LLM substrate. In a single fill scenario, candidate names + emails + phones traverse: SQL → tool_result → LogEntry → execution_loop log →
/v1/respondHTTP response → Langfuse trace input + output →data/_kb/outcomes.jsonl(fills with name) →data/_kb/overseer_corrections.jsonl(operation + correction text). At least 7 distinct persistence/transmission paths per scenario. -
candidate_idis a stable token but not an isolated one. It's a column on the sameworkers_500k.parquetthat holds name/email/phone — there is no separate identity service. Joining candidate_id back to PII is one DataFusion query away. The audit-trail PRD's §5 identity-service intent is not yet built; the architectural separation does not exist. -
No
/audit/subject/{id}endpoint exists. Reconstructing "every decision about person X" today requires manual cross-correlation across 4+ JSONL files + Langfuse + pathway memory + observer events. There is no canonical query path; the audit response cannot be produced in any reasonable time today. -
Go side mirrors the Rust pattern. Go validator's
rosterRowcarries Name; Go SessionRecord carriesPrompt(truncated to 4000 chars) which contains the natural-language operation including candidate names. Cross-runtime parity in the PII-leak too. -
Append-only persistence is universal.
outcomes.jsonl,overseer_corrections.jsonl,pathway_memory/state.json,sessions.jsonl, Langfuse — all append-only. Right-to-be-forgotten under the current architecture requires the cryptographic-erasure approach from PRD §6 because no hot data store supports per-subject deletion today. -
The matrix-indexer fingerprint is subject-agnostic; trace bodies are UNVERIFIED.
pathway_memory::PathwayTracefingerprints are keyed bytask_class + file_prefix + signal_class— none of those are subject identifiers, which is structurally defensive. However: trace bodies (reducer_summary,final_verdict) are written from execution-loop output and are highly likely to leak PII. Per §8.1 these are unverified. Treat the matrix indexer as a SUSPECTED PII SINK until sampled — do NOT rely on "matrix can't drive discrimination" framing. (Walked back from earlier draft per cross-lineage scrum §10/C1: that claim was over-stated in 3/3 reviews.)
§1 — PII flow paths in the Rust stack (file:line evidence)
1A — MCP tools query raw tables (PII enters the LLM context)
| File:line | What | PII columns returned |
|---|---|---|
crates/gateway/src/tools/registry.rs:96 |
search_candidates SQL template |
first_name, last_name, phone, email, city, state |
crates/gateway/src/tools/registry.rs:107 |
get_candidate SQL template |
SELECT * — returns ALL columns including hourly_rate_usd, address, anything in the table |
crates/gateway/src/tools/registry.rs:129 |
top_recruiters SQL |
recruiter names (employee PII, not candidate but still PII) |
crates/gateway/src/tools/registry.rs:141 |
engaged_unplaced_candidates SQL |
c.candidate_id, c.first_name, c.last_name, c.phone, c.vertical |
mcp-server/index.ts:1722, 2173, 2269 |
address concatenation (street_number + street_direction + street_name) |
full street addresses surfaced to MCP UI clients |
Architectural gap: the MCP tool registry has no "view-only" mode. Adding _safe view routing would be one sql_template rewrite per tool. Today: every fill scenario invokes search_candidates → LLM sees full PII for every candidate in the result set.
1B — LLM tool results captured in execution log
| File:line | What |
|---|---|
crates/gateway/src/execution_loop/mod.rs:30-39 |
LogEntry struct: {turn, role, model, kind, content: Value, at}. content is the raw tool_result body (carries PII when kind=tool_result). |
crates/gateway/src/execution_loop/mod.rs:217 |
self.append(LogEntry::new(..., "tool_result", trimmed)) — every tool result appended to self.log |
crates/gateway/src/execution_loop/mod.rs:1319 |
per-render cap: kind == "tool_result" gets 1200 char cap (vs 200 for other kinds). Trimming is for token economy, not PII redaction — names/emails fit in 1200 chars. |
crates/gateway/src/execution_loop/mod.rs:30-50 |
Fill struct (in same file): {candidate_id: String, name: String, reason: Option<String>}. The name field is load-bearing per the PRD-defined FillProposal artifact contract. |
1C — Execution log returned to HTTP caller
| File:line | What |
|---|---|
crates/gateway/src/v1/respond.rs:111-119 |
RespondResponse { status, artifact, log: outcome.into_log(), iterations, error } — full log array returned in JSON response body to whoever called POST /v1/respond |
crates/gateway/src/execution_loop/mod.rs:82-93 |
RespondOutcome::{Ok,Failed,Blocked} all carry log: Vec<LogEntry> — no path drops the log on its way to the caller |
Audit implication: any HTTP client calling /v1/respond today receives full PII in the response. Authentication on /v1/respond is the only access control. Authorization is binary (either you can call it or you can't); there is no row-level filtering of which candidates you're allowed to see in the response.
1D — Persisted JSONL sinks on disk
| File:line | Path written | PII shape |
|---|---|---|
crates/gateway/src/execution_loop/mod.rs:508 |
data/_kb/outcomes.jsonl (append) |
{operation, fills: [{candidate_id, name, reason?}], error, ...} — observed live: operation carries natural-language fill request including role + city + state; fills carries the full Fill struct with name. |
crates/gateway/src/execution_loop/mod.rs:701 |
data/_kb/overseer_corrections.jsonl (append) |
{operation, correction, sig_hash, ...} — correction is the overseer's free-text guidance which often references specific candidates by name (e.g. "the executor picked Emily Garcia who is at fill capacity, try Maria Rodriguez instead") |
mcp-server/observer.ts:138-139 |
data/_observer/ops.jsonl (append) |
observer event log; PII content depends on what the operation context included. Empty on this box right now (file doesn't exist), but the writer is wired. |
mcp-server/observer.ts:161 |
data/_kb/observer_escalations.jsonl (append) |
sampled live: technical analysis fields (no PII observed in current rows), but analysis field is free-form LLM output and could include names depending on the escalation trigger. |
crates/gateway/src/v1/session_log.rs (writer) → /tmp/lakehouse-validator/sessions.jsonl |
per lakehouse.toml [gateway].session_log_path |
SessionRecord { ..., prompt, attempts: [{... raw}], artifact } — prompt carries the operation text (PII), raw carries each model attempt's raw text (PII when model reasoned about specific candidates), artifact carries the final FillProposal {candidate_id, name} shape on success. |
1E — Langfuse external persistence
| File:line | What gets sent |
|---|---|
crates/gateway/src/v1/langfuse_trace.rs:166 |
body.input.messages = ev.input — full message array (system + user + assistant + tool messages). Tool messages contain raw PII for tool_result entries. |
crates/gateway/src/v1/langfuse_trace.rs:191 |
body.output = ev.output — full model generation. When the model emits a Fill or reasons about candidates, names appear here. |
Langfuse runs at :3001 per memory, with credentials in /etc/lakehouse/langfuse.env. The Langfuse storage tier holds these traces. PII leaves the lakehouse infrastructure boundary at this point — Langfuse's storage layer (Postgres + ClickHouse, default config) holds it.
1F — Pathway memory
| File:line | What |
|---|---|
crates/vectord/src/pathway_memory.rs:169 |
PathwayTrace struct definition |
data/_pathway_memory/state.json |
Persisted state — need read to confirm whether real names appear in traces. |
Pathway traces are keyed by pathway_id = SHA256(task_class + file_prefix + signal_class) — none of which are subject identifiers. The fingerprint is structurally subject-agnostic. Whether the trace BODY (kb_chunks, observer_signals, reducer_summary, final_verdict) leaks PII depends on what those fields contain at write time. From the struct shape: reducer_summary and final_verdict are strings written from execution loop output — highly likely to leak PII when summarizing fill outcomes. Need to confirm against live state.json.
§2 — Defense layer that exists but is bypassed
data/_catalog/views/candidates_safe.json:
- Drops:
last_name, email, phone, hourly_rate_usd - Masks:
candidate_id(keep first 3, last 2 chars —CAND-...01shape) - Row filter:
status != 'blocked' - Visibility intent (per
description): "Visible to recruiter / mode-runner agents."
data/_catalog/views/workers_safe.json:
- Drops:
name, email, phone, zip, communications, resume_text - Reason given: "resume_text + communications carry verbatim PII (full names) and there's no in-view text scrubber, so they're dropped wholesale"
- Source for the rebuilt
workers_500k_v9vector corpus
The _safe views are the right policy artifact. The ARCHITECTURAL gap is at the SQL-template layer in crates/gateway/src/tools/registry.rs — the LLM-facing tools query the RAW tables, not the SAFE views. Three out of three candidate-touching tool templates (search_candidates, get_candidate, engaged_unplaced_candidates) bypass the safe view.
There is no enforcement preventing this. There is no test that asserts "tool SQL must reference only _safe tables." There is no warning logged when raw tables are queried via tool surface.
Quick fix shape (NOT a phase 1 deliverable, just noting the change shape): rewrite the tool sql_templates to FROM candidates_safe (or FROM workers_safe); add a build-time check that crates/gateway/src/tools/registry.rs only references *_safe tables; add a runtime gate in queryd that refuses LLM-attributed queries on raw tables. These would land in Phase 4 (subject tagging across substrates) or possibly Phase 1.5 (defense-layer enforcement) per the AUDIT_TRAIL_PRD.
§3 — Identity / candidate_id provenance
| Question | Finding |
|---|---|
Is candidate_id a stable token? |
Yes — observed format CAND-NNNNNN (e.g. CAND-000001). Stable across the table. Used as join key in the schema (per crates/ui/src/main.rs:160 cross-table join hints). |
| Is the candidate_id ↔ PII mapping in a separate service? | No. Both live in data/datasets/workers_500k.parquet. A SQL SELECT name, email FROM candidates WHERE candidate_id = '...' resolves the mapping in one query. |
| Is the mapping itself audited? | No. No log records "who looked up the PII for which candidate when." This is the identity-service gap from PRD §5. |
| Does anything ever generate a different token (UUID, opaque hash) instead of using candidate_id directly? | No — every tool, every validator, every persistence sink uses candidate_id as-is. |
Production-ready implication: the staffing client's lawyer asks "show me the access log for candidate X's PII" → we cannot produce one. Every access happens via SQL against workers_500k; no row-level access log exists.
§4 — Go-side parity with the leak surface
| Rust file:line | Equivalent Go file:line | Same shape? |
|---|---|---|
crates/validator/src/lib.rs (FillProposal carries {candidate_id, name}) |
internal/validator/fill.go + internal/validator/lookup_jsonl.go:23-31 (rosterRow carries CandidateID, Name, Status, City, State, Role, BlacklistedClients) |
Yes — same PII shape |
crates/gateway/src/v1/session_log.rs SessionRecord with prompt/artifact |
internal/validator/session_log.go SessionRecord with Prompt string, Attempts[].Raw string, Artifact map[string]any |
Yes — same shape |
outcomes.jsonl / overseer_corrections.jsonl writers |
Not present on Go side (Go validatord doesn't currently run the execution_loop with tool dispatch — Phase 4 of Go rewrite is what wires that) | Asymmetric — Go side writes less but only because feature parity isn't done |
| MCP tools registry | Not on Go side yet (mcp-server is still Bun) | Bun is the surface for tool dispatch in both runtimes today |
Cross-runtime audit implication: even though our 5 parity probes (validator/extract_json/session_log/materializer/embed) all pass 32/32, none of them assert PII handling. A new pii_parity.sh probe would feed identical PII-tagged input through both runtimes and assert identical redaction behavior. As of today, neither runtime does redaction at the substrate level, so the probe would just confirm "both leak identically."
§5 — Mapping back to AUDIT_TRAIL_PRD §3 surface map
Updated cells with file:line evidence:
| Decision happens at | Currently logged where | Audit-completeness gap (revised) |
|---|---|---|
| Ingestion (candidate added to pool) | data/datasets/workers_*.parquet rows; ingestd writes via crates/ingestd/; no per-subject "added at" event journal found |
GAP — no subject-tagged ingest event. When + who + how a candidate entered is not auditable per-subject. |
| Embedding creation | crates/aibridge/src/client.rs (post-sidecar-drop direct path); LRU cache crates/aibridge/src/cache.rs (commit 150cc3b); cached entries are keyed by (model, text) — the text is the cached body, which for candidate embeddings IS PII (candidate name + role + skills appear in source text per workers_500k_v9 build SQL). Cache itself is in-memory, not persisted, but text-as-key means PII is in process memory. |
MAJOR GAP — no per-embedding audit row. Cache key contains PII. No subject tagging means we cannot answer "what embedding was generated for candidate X." |
| Search inclusion | Tool result entries in execution log + Langfuse + outcomes.jsonl fills field |
Partial — fills are logged, but you have to grep across outcomes.jsonl + Langfuse for any mention of a candidate_id. No "all search results that included X" canonical view. |
| Search rank | Result set in chat traces (Langfuse), not indexed by candidate | Partial — Langfuse trace has the result set; no inversion (candidate → ranks-received). |
| Fill recommendation | outcomes.jsonl fills array with name + candidate_id |
Partial — present, but mixed with PII in the natural-language operation field. |
| Validation outcome | sessions.jsonl per [gateway].session_log_path config; per-attempt verdict_kind, error |
Partial — works per-session, not per-subject. To find "all validations that touched candidate X" you'd need to grep the JSONL by candidate_id. |
| Iterate retry escalations | Same sessions.jsonl attempts[] array |
Same as above |
| Observer signals | data/_observer/ops.jsonl (file not present today on this box, writer is wired) |
UNKNOWN content shape until the writer fires for a fill scenario — needs verification |
| Matrix-indexer compounding | data/_pathway_memory/state.json — fingerprinted by code, NOT by subject |
NO subject leak in fingerprint structure (good for audit defensibility). Trace bodies (reducer_summary, final_verdict) MAY carry PII — needs sampling to confirm. |
§6 — Mapping back to AUDIT_TRAIL_PRD §7 current-state-vs-target gap table
| Capability | Phase 1 finding | Status |
|---|---|---|
| candidate_id as canonical token | Stable CAND-NNNNNN format. Same column lives in same parquet as PII. |
Token exists; isolation does not. |
| Identity service | Doesn't exist. PII + candidate_id co-located in workers_500k.parquet. |
Real gap — needs new service. |
/audit/subject/{id} endpoint |
Doesn't exist. No audit route in crates/gateway/src/v1/mod.rs route listing. |
Real gap — needs new endpoint. |
| Subject-tagged embeddings | LRU cache (model, text) keys; text contains PII; no audit row per embed. |
Real gap. |
| Subject-tagged search results | Langfuse trace contains result set; not subject-indexed. | Real gap (queryability, not capture). |
| Subject-tagged validation outcomes | Yes — sessions.jsonl carries candidate_id (in the attempt's raw field) but not as a queryable top-level field. |
Partial — needs subject_id top-level promotion. |
| Subject-tagged matrix indexer entries | Pathway fingerprints are subject-agnostic by design. Trace bodies may leak. | Decision needed (PRD open question 7) — keep code-only OR risk PII surface. |
| Protected-attribute filter at decision time | Not enforced anywhere. SQL templates return whatever columns they SELECT; no protected-attribute removal at the gateway boundary. The candidates table schema in the demo SQL includes age (years_experience is a proxy), and the call_log/email_log tables likely contain free-text correspondence. |
Real gap — major. Requires both schema audit + boundary enforcement. |
| Retention policy | None enforced. Append-only files grow indefinitely. | Real gap. |
| Right to be forgotten | Not implementable today on append-only logs without cryptographic erasure. | Per PRD §6 design, real engineering needed. |
| Cross-runtime parity | 5 algorithm probes pass; 0 audit/PII probes exist. | Probe set needs extension. |
§7 — Worked example: John Martinez audit trail TODAY (negative result)
If John Martinez (candidate_id CAND-042195) requests an audit:
-
Find his candidate_id. Manual SQL:
SELECT candidate_id FROM candidates WHERE first_name='John' AND last_name='Martinez'— returns 1+ rows. Already a leak: someone with SQL access can correlate name → candidate_id ad-hoc. -
Find every fill scenario that included him.
grep "CAND-042195" data/_kb/outcomes.jsonl— returns rows where his fill was included. But the row's natural-languageoperationfield may NOT contain his candidate_id (it's the fill request, e.g. "fill: Welder x2 in Toledo, OH"), so we'd miss scenarios where he was a candidate but not a fill. To find those, we'd need to grep Langfuse traces (off-box) or the per-tool result content in execution logs (which aren't persisted as separate JSONL — they live in the HTTP response that already left the building). -
Find every validation that touched him. Grep
/tmp/lakehouse-validator/sessions.jsonlforCAND-042195— would catch FillValidator phantom-ID rejections AND successful fills, but the candidate_id would appear in either the prompt or the raw attempt text, not as a top-level queryable field. -
Find every embedding generated for him. Cache is in-memory; nothing persisted. Cannot answer.
-
Find every search result that ranked him. Off-box in Langfuse. Untrieable in lakehouse without a separate Langfuse query pipeline (which doesn't exist).
-
Find pathway memory traces involving him. Pathway fingerprints don't carry his ID. The
reducer_summarystrings might mention him (need to grep state.json) but the fingerprint search wouldn't surface them. -
Show what protected attributes were exposed to the model. No record of input_features per decision — the LLM saw whatever the SQL returned, and we have no per-decision input_features audit row.
-
Format the output for legal. Even if we collected all the above, there's no signing, no integrity hash, no schema, no template.
Estimated time to produce a complete-and-defensible response today: not possible. Estimated time to produce an INCOMPLETE response by cobbling JSONLs + Langfuse exports: 2-5 hours per request, manual, error-prone, and the response would over-share (other candidates appearing in the same fill scenarios) AND under-share (embedding events, search rankings, pathway traces missed).
This is the Phase 1 result the AUDIT_TRAIL_PRD predicted: today's substrate is not production-ready for a discrimination-defense audit response.
§8 — What this discovery DID NOT cover
Phase 1 was scoped to file:line evidence + sampling of live JSONL state. The following deserve their own subsequent walks before phase 2+ design:
- Live sample of
data/_pathway_memory/state.json— confirm whetherreducer_summary/final_verdictstrings actually leak PII or stay generic. Read 3-5 traces and grep for names fromworkers_500k. - Live sample of Langfuse traces — confirm input message PII for a real fill scenario from past 7 days. Use Langfuse
:3001query API. - observerd event content — the writer is wired but
data/_observer/ops.jsonldoesn't exist on this box. Trigger a fill scenario and inspect the resulting event. - Bun mcp-server tool dispatch — does the Bun server log tool calls anywhere?
mcp-server/index.tsis 2900+ lines; partial walk only here. - bot/propose.ts — the bot proposal flow likely touches candidates; not walked in this pass.
crates/journaldmutation log — designed for row-level mutations per ADR-012; haven't confirmed whether candidate-table mutations land here with PII.- Go side observerd + chatd PII surface — the Go cmd/* binaries likely have analogous logging; confirmed validator parity but didn't walk observer/chat logging.
- Process memory + crash dumps — if the gateway dumps core, what PII is in it? Out of scope for code walk; comes up in security audit.
- Operator runbook — who has access to logs/, /tmp/lakehouse-validator/, MinIO buckets, Langfuse Postgres? Out of scope for code walk; comes up in operational security review.
These are listed so the next phase doesn't accidentally re-walk what's done OR skip what wasn't covered.
§9 — Recommended next moves (not commitments)
Per AUDIT_TRAIL_PRD §8, Phase 2 is the identity service design doc. Before that doc gets written, the cheapest high-value moves discovered here:
-
Defense-layer enforcement (1-2 hours work). Rewrite the 3 tool SQL templates in
crates/gateway/src/tools/registry.rsto use_safeviews. Add a unit test that asserts nocrates/gateway/src/tools/registry.rstemplate referencesFROM candidatesorFROM workers_500k(only*_safe). This is one commit. It prevents the most-trafficked PII leak path TODAY without waiting on the identity service. Cost: the LLM sees masked candidate_ids (CAN...95instead ofCAND-042195); some downstream tools (validator existence check) would need a "resolve to full ID" path that goes through the identity service — but that's exactly the architectural shape PRD §5 wants anyway. -
Sample state.json + Langfuse before phase 2 design. §8.1 + §8.2 above. ~30 minutes. Either confirms or refutes the "matrix indexer is subject-clean" finding from §1F.
-
Document the Bun mcp-server tool surface. §8.4. The Bun layer is a major PII transit point not fully covered here.
-
Identify whether protected attributes (age proxies, photo features, zip-code → race correlations) are currently in any tool-returned column. Schema-level audit of
candidates+workers_500k. ~1 hour. Might surface that some "neutral" columns are actually protected-attribute proxies.
These four moves give the phase-2 design doc strong evidence to lean on. None are commitments — J's call on what to do next.
§10 — Cross-lineage scrum review of this discovery doc (2026-05-03)
After the discovery walk, this document was reviewed by three independent model lineages via /v1/chat against the Go gateway (post-PR-#13 model fleet): opus (opencode/claude-opus-4-7), kimi (ollama_cloud/kimi-k2.6), gemini (ollama_cloud/gemini-3-flash-preview). Custom prompt: senior security architect reviewing a discovery report. (DeepSeek-v3.2 timed out; not included.)
Verbatim reviews saved at /tmp/audit_scrum/{opus,kimi,gemini}_review.md. Convergent findings (≥2 reviewers) are treated as high signal per feedback_cross_lineage_review.md.
Convergent findings — must address before phase 2 design
C1. §1F matrix-indexer claim is OVER-CLAIMED (3/3 reviewers). The TL;DR #7 line "good for audit defensibility (matrix index can't drive discrimination)" overstates structural subject-agnosticism as behavioral fairness. Per opus: "until §8.1 is executed, the correct framing is 'fingerprint structurally subject-agnostic; body content unverified — treat as suspected PII sink.'" Per kimi: dangerous reasoning — if reducer_summary says "candidates named Emily Garcia were rejected for fill capacity," the matrix learns proxy variables and future similar names get downranked = temporal discrimination risk. Per gemini: "you cannot claim audit defensibility until you prove the content of the matrix indexer doesn't contain PII." Action: walk back the §1F TL;DR claim; reframe as "fingerprint structure is defensive; body content unverified — treat as suspected PII sink until §8.1 confirms."
C2. §1E (Langfuse) is the MOST DANGEROUS leak — fix FIRST (3/3 reviewers). Opus: "boundary-crossing leak that makes a regulator's eyes light up." Kimi: "Article 44 GDPR transfer if SaaS-hosted; CPRA 'sale/sharing' question; subprocessor notification failure." Gemini: "un-certifiable for SOC2 Type II" + "unauthorized data transfer to third-party storage tier." All three would do Langfuse redaction/sampling BEFORE the §9.1 view-routing fix. Action: revise §9 priority order — Langfuse boundary first, view-routing second.
C3. Discrimination defense requires the FULL CANDIDATE POOL, not just fills (3/3 reviewers). Opus: "you need not just 'what did we do for candidate X' but 'what was the selection rate for protected class Y vs comparator' — that requires capturing protected attributes WITH outcomes." Kimi: "EEOC Uniform Guidelines on Employee Selection Procedures (1978) — matrix index that learns from historical fill outcomes IS a selection procedure under the Guidelines." Gemini: "The system doesn't log the entire candidate pool for a specific search — only the fills (§5). To defend a lawsuit, you must show the stats for everyone who could have been picked, not just the person who was." Action: Phase 1 didn't capture this load-bearing requirement. PRD §1 worked example needs expanding: the audit response must include the comparator pool + adverse-impact statistics, not just the subject's decision row.
C4. BIPA + biometric exposure UNDERSTATED (3/3 reviewers). Already in AUDIT_TRAIL_PRD.md §10.5 jurisdictional checklist but NOT translated into Phase 1 findings actionables. If workers_500k columns include photo paths, video interview references, or anything that could yield biometric inference (per gemini: "even descriptors that could be reconstructed into biometric templates"), BIPA's $1k-$5k per-violation regime applies BEFORE the GDPR analysis matters. Action: add to §8 (what discovery did NOT cover): explicit photo/video/biometric column audit of workers_500k schema.
C5. candidate_id must be PROMOTED to top-level field in all JSONL sinks (3/3 reviewers). Opus + kimi + gemini converge: grepping natural-language strings (operation, raw, prompt, reducer_summary) for candidate_id is not a defensible audit strategy. Even if subject_id appears in those strings TODAY, it appears co-mingled with other candidate names, model reasoning, etc. — making subject filtering unsafe. Action: add to PRD §7 target column "subject_id top-level promotion" — change session_log writer + outcomes.jsonl writer + observer event writer to ALL include subject_id (or subject_ids[]) as a first-class top-level field.
Single-reviewer findings — verified, worth incorporating
OPUS unique:
- 8th PII path missing: LLM provider egress. PR #13 routes models through opencode + ollama_cloud + openrouter — opencode and openrouter are external services. Cross-border data transfer under GDPR Ch. V; third-party processor relationship requiring DPA under Art. 28. Phase 1 did not enumerate this path. Action: add §1G "LLM provider egress" to §1.
- GDPR Art. 22 / EU AI Act right to explanation. Audit must capture the model's REASONING CHAIN, not just decision output. Phase 1 §7 worked example doesn't include this. Action: add to subject-audit response shape (PRD §2).
- Special-category data under GDPR Art. 9. resume_text + call_log + email_log routinely contain health (accommodation requests), union, religion. Higher legal bar — Art. 9(2) explicit consent required. Phase 1 §6 mentions these tables exist but doesn't flag the special-category exposure. Action: call out in §6.
- DPIA / ROPA / DPA inventory. None of these documents referenced. Some may exist outside code; Phase 1 should at minimum note their absence as Phase 1.5 input.
KIMI unique:
- Sequential
CAND-NNNNNNIDs enable enumeration attacks. Predictable IDs let an attacker scan the candidate space. Security finding Phase 1 didn't flag. - Langfuse retention config unaudited. Default trace retention is 30 days in some versions, indefinite in others. Directly impacts RTBF analysis. Action: check live Langfuse config.
- CCPA "de-identified in place" may be faster than crypto-shred. Replacing PII with
REDACTED-{hash}while preserving log structure may satisfy CPRA's de-identified exception. Worth considering vs. crypto-shred. - Bun MCP server is the cross-runtime bridge — likely COMMON-MODE failure. Phase 1 framed Go side as "mirrors Rust pattern" implying independent failure; it's likely shared infrastructure failure. Action: add to §4 — common-mode reframe.
- Cryptographic integrity for the audit trail itself. Merkle trees / signed logs / chain-of-custody under FRE 901. Opposing expert can challenge admissibility without this. Action: add to PRD §2 audit response shape — integrity-signed.
- HIPAA Business Associate Agreement scoping not done. If any candidate is healthcare-vertical, BAA analysis required.
get_candidateSELECT * has 40+ load-bearing columns. §9.1 "1-2 hours" estimate for view rewrite is irresponsible without scoping downstream consumers. Action: revise §9.1 estimate; flag dependency analysis as prerequisite.
GEMINI unique:
- Data residency — JSONL on US box. If any candidate is EU resident, GDPR violation without SCC/DPF. Phase 1 didn't ask whether IL+IN target market includes EU residents (probably not, but staffing-co clients sometimes have international placements).
- "Culture fit" reasoning strings as discrimination proxies. Common LLM-generated phrases ("not a culture fit," "communication concerns," "team chemistry") often correlate with protected-attribute discrimination. Phase 1 didn't audit the actual reasoning text in
outcomes.jsonlfor these phrases. - Comparator-pool snapshot for every fill. Need to capture WHO COULD HAVE BEEN PICKED, not just who was. Action: PRD §2 audit response shape needs
comparator_poolfield per decision.
Revised §9 — recommended next moves (reordered by scrum convergence)
- (NEW PRIORITY 1, was P3) Sample state.json + Langfuse content — confirm/refute the matrix-indexer subject-clean claim and quantify Langfuse PII exposure. Cheapest move that resolves the over-claim AND informs the boundary-leak fix.
- (NEW PRIORITY 2, was P1) Defense-layer enforcement at SQL template level — rewrite tool registry to use
_safeviews. Estimate revised UPWARD per kimi: scopeget_candidateSELECT * downstream consumers first; estimate 4-8 hours including the existence-check resolution path through the (not-yet-built) identity service. Stop-gap until then: add LLM-attribution flag to queryd, refuseFROM candidates/FROM workers_500kqueries that originate from tool dispatch. - (NEW) Langfuse boundary audit + redaction — sample retention config, check DPA status, scope a redaction/sampling layer that strips PII from message arrays before the Langfuse POST. This is the boundary-crossing leak — fix BEFORE chasing internal sinks.
- (NEW) Subject_id top-level promotion to all JSONL writers — single architectural change spanning Rust + Go session_log + observerd event writer + outcomes/corrections appenders. Makes subject-correlation queries defensible (no more grepping natural language strings).
- (was P4) Schema-audit for protected-attribute proxies — extend to include "culture fit"-shaped reasoning text in outcomes.jsonl + the comparator-pool retention requirement.
- (NEW) BIPA-specific audit of workers_500k schema — explicit photo/video/biometric column inventory before any production deployment in IL.
- (NEW) Operational discovery — DPIA, ROPA, DPA inventory, SCC for cross-border, Langfuse retention config. Out-of-code-walk; needs J + counsel input.
What I'm walking back
§1F TL;DR claim "matrix indexer is good for audit defensibility" — per all 3 reviewers, this is over-claimed without §8.1 verification. The correct frame is "fingerprint structure is subject-agnostic by design; trace body content unverified — treat as suspected PII sink until sampled."
§9 ordering — view-routing was P1; per all 3 reviewers, Langfuse boundary should be P3 in front of it. View-routing is the source-side fix; Langfuse is the boundary-crossing fix; both matter, do them in BOUNDARY-FIRST order.
§9.1 effort estimate — kimi's "irresponsible without dependency scoping" critique is right. Revised UP to 4-8 hours.
Change log
- 2026-05-03 — Phase 1 discovery walk complete. Findings cited above with file:line references. No code changes. Companion to
AUDIT_TRAIL_PRD.md. - 2026-05-03 — §10 added: cross-lineage scrum review of the discovery doc (opus + kimi + gemini). 5 convergent findings (matrix-indexer over-claim, Langfuse first-priority, comparator pool gap, BIPA understated, subject_id top-level promotion). Plus single-reviewer high-value catches. §9 reordered.