Adds kimi_architect as a fifth check kind in the auditor. Runs
sequentially after static/dynamic/inference/kb_query, consumes their
findings as context, and asks Kimi For Coding "what did everyone
miss?" — targeting load-bearing issues that deepseek N=3 voting can't
see (compile errors, false telemetry, schema bypasses, determinism
leaks). 7/7 grounded on the distillation v1.0.0 audit experiment
2026-04-27.
Off by default. Enable on the lakehouse-auditor service:
systemctl edit lakehouse-auditor.service
Environment=LH_AUDITOR_KIMI=1
Tunable env (all optional):
LH_AUDITOR_KIMI_MODEL default kimi-for-coding
LH_AUDITOR_KIMI_MAX_TOKENS default 12000
LH_GATEWAY_URL default http://localhost:3100
Guardrails:
- Failure-isolated. Any Kimi error / 429 / TOS revocation returns a
single info-level skip-finding so the existing pipeline never blocks
on a Kimi outage.
- Cost-bounded. Cached verdicts at data/_auditor/kimi_verdicts/<pr>-
<sha>.json with 24h TTL — re-audits within the window return cached
findings instead of re-calling upstream. New commits produce new
SHAs so caching is per-head, not per-day.
- 6min upstream timeout (vs 2min for openrouter inference) — Kimi is
a reasoning model and the audit prompt is large.
- Grounding verification baked in. Every finding's cited file:line is
greppped against the actual file before the verdict is persisted.
Per-finding evidence carries [grounding: verified at FILE:LINE] or
[grounding: line N > EOF] / [grounding: file not found]. Confab-
ulation rate goes into data/_kb/kimi_audits.jsonl as grounding_rate
for "is this still valuable" tracking.
Persisted artifacts:
data/_auditor/kimi_verdicts/<pr>-<sha>.json full verdict + raw
Kimi response + grounding
data/_kb/kimi_audits.jsonl one row per call:
latency, tokens, findings,
grounding rate
Verdict-rendering: kimi_architect now appears in the per-check
sections of the human-readable comment posted to PRs (auditor/audit.ts
checkOrder), after kb_query.
Verification:
bun build auditor/checks/kimi_architect.ts compiles
bun build auditor/audit.ts compiles
parser sanity (3-finding fixture) 3/3 lifted correctly
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lakehouse/auditor 13 blocking issues: cloud: claim not backed — "Invariants enforced (proven by tests + real run):"
Architectural simplification leveraging Phase 5 distillation work:
the auditor no longer pre-extracts facts via per-shard summaries
because lakehouse_answers_v1 (gold-standard prior PR audits + observer
escalations corpus) supplies cross-PR context through the mode runner's
matrix retrieval. Same signal, ~50× fewer cloud calls per audit.
Per-audit cost:
Before: 168 gpt-oss:120b shard summaries + 3 final inference calls
After: 3 deepseek-v3.1:671b mode-runner calls (full retrieval included)
Wall-clock on PR #11 (1.36MB diff):
Before: ~25 minutes
After: 88 seconds (3/3 consensus succeeded)
Files:
auditor/checks/inference.ts
- Default MODEL kimi-k2:1t → deepseek-v3.1:671b. kimi-k2 is hitting
sustained Ollama Cloud 500 ISE (verified via repeated trivial
probes; multi-hour outage). deepseek is the proven drop-in from
Phase 5 distillation acceptance testing.
- Dropped treeSplitDiff invocation. Diff truncates to MAX_DIFF_CHARS
and goes straight to /v1/mode/execute task_class=pr_audit; mode
runner pulls cross-PR context from lakehouse_answers_v1 via
matrix retrieval. SHARD_MODEL retained for legacy callCloud
compatibility (default qwen3-coder:480b if it ever runs).
- extractAndPersistFacts now reads from truncated diff (no
scratchpad post-tree-split-removal).
auditor/checks/static.ts
- serde-derived struct exemption (commit 107a682 shipped this; this
commit is the rest of the auditor rebuild it landed alongside)
- multi-line template literal awareness in isInsideQuotedString —
tracks backtick state across lines so todo!() inside docstrings
doesn't trip BLOCK_PATTERNS.
crates/gateway/src/v1/mode.rs
- pr_audit native runner mode added to VALID_MODES + is_native_mode
+ flags_for_mode + framing_text. PrAudit framing produces strict
JSON {claim_verdicts, unflagged_gaps} for the auditor to parse.
config/modes.toml
- pr_audit task class with default_model=deepseek-v3.1:671b and
matrix_corpus=lakehouse_answers_v1. Documents kimi-k2 outage
with link to the swap rationale.
Real-data audit on PR #11 head 1b433a9 (which is the PR with all the
distillation work + auditor rebuild itself):
- Pipeline ran to completion (88s for inference; full audit ~3 min)
- 3/3 consensus runs succeeded on deepseek-v3.1:671b
- 156 findings: 12 block, 23 warn, 121 info
- Block findings are legitimate signal: 12 reviewer claims like
"Invariants enforced (proven by tests + real run):" that the
truncated diff can't directly verify. The auditor is correctly
flagging claim-vs-diff divergence — exactly its job.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fields on structs that derive Serialize or Deserialize ARE read — by
the macro, on every JSON round-trip — but the static check only
looked for explicit `.field` references in the diff. Result: every
new response/request struct shipped through `/v1/*` was flagged as
"placeholder state without a consumer."
PR #11 head 0844206 surfaced 8 such false positives across mode.rs,
respond.rs, truth.rs, and profiles/memory.rs — same shape as the
existing string-literal exemption for BLOCK_PATTERNS, just at a
different syntactic layer.
Two helpers added:
- extractNewFieldsWithLine: keeps each field's diff-line index so the
caller can locate the parent struct.
- parentStructHasSerdeDerive: walks back ≤80 lines for a `pub struct`
boundary, then ≤8 lines above it for `#[derive(...)]` lines
containing Serialize or Deserialize. Stops on closing-brace-at-col-0
to avoid escaping the enclosing scope.
Verified on PR #11's actual diff: unread-field warnings dropped from
8 → 0. Synthetic cases confirm the check still fires on plain
(non-serde) structs with no in-diff reader, so the
genuine-placeholder catch is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
auditor/checks/static.ts — grep-style scan of PR diffs, no AST,
no LLM. High-signal patterns only.
Severity grading:
- BLOCK — unimplemented!(), todo!(), panic!("not implemented"),
throw new Error("not implemented")
- WARN — TODO/FIXME/XXX/HACK in added lines;
new pub struct fields with <2 mentions in the diff
(added but nobody reads it — placeholder state)
- INFO — hardcoded "placeholder"/"dummy"/"foobar"/"changeme"/"xxx"
strings in added lines
Live-proven — the existential test J asked for:
vs PR #1 (scaffold): 0 findings (all scaffold fields cross-
reference within the diff)
vs commit 2a4b81b (Phase 5 WARN: every DocRef field (tool,
45 first slice — I version_seen, snippet_hash, source_url,
half-admitted placeholder): seen_at) added with 0 read-sites in
the diff
That's the auditor flagging my own "Phase 45 first slice" commit as
state-without-consumer, which is exactly what I half-admitted it
was. If PR #1 had required auditor-pass (branch protection), the
DocRef commit would have been blocked pre-merge. The auditor works
because it agreed with the honest read.
Next: dynamic hybrid test fixture (task #4) — the never-run multi-
layer pipeline test.