# Cross-Lineage Auditor Bake-Off — 2026-04-27 Same diff (`HEAD~5..HEAD~2`, 32KB, 3 commits = the kimi-integration work) audited by three models from three vendor lineages. All three through the lakehouse gateway, all three with the same `kimi_architect` prompt template + grounding verification. ## Results | | Kimi K2.6 (Moonshot) | Haiku 4.5 (Anthropic) | Opus 4.7 (Anthropic) | |---|---|---|---| | Provider | ollama_cloud | opencode/Zen | opencode/Zen | | Latency | 53.7s | **20.5s** | 53.6s | | Findings | 10 | 9 | 10 | | Grounded | 10/10 | 9/9 | 10/10 | | Severity (block/warn/info) | 0 / 9 / 1 | 0 / 5 / 4 | **3 / 5 / 2** | | Cost | flat-sub (Ollama Pro) | ~$0.02 | ~$0.10–0.15 | | Style | Architectural / migration | Boundary / resilience | Escalation / cross-file | ## Severity escalation pattern Only Opus produced `block`-level findings. Haiku and Kimi described the same kind of issues as `warn`. This isn't randomness — it's training. Anthropic's Opus is calibrated to flag merge-stoppers more confidently than the lighter-weight or different-lineage models. ## What ONLY Opus 4.7 caught - `parseFindings` rationale regex truncates on inline `**bold**` inside rationales — neither Haiku nor Kimi noticed - Cache-by-head-SHA survives `LH_AUDITOR_KIMI_MODEL` env flip (silently returns old findings under wrong model name) - Gateway/auditor timeout mismatch: kimi.rs 600s vs auditor curl 900s ## What ALL three caught - `(ev as any).contractor` schema bypass (3/3) - Empty-env `Number("")` returns 0 trap (3/3) - `readFileSync` in async function (3/3) - mode.rs Rust test compile error (3/3) Three-lineage consensus = high-confidence load-bearing real bug. ## What only Kimi K2.6 caught - Schema version bump v1→v2 without explicit migration path - ISO timestamp precision in run_id derivation - Multimodal content array passed verbatim to Kimi (would 400) Kimi favors architectural / API-contract concerns. Useful when the diff is a refactor rather than a feature. ## What only Haiku 4.5 caught - `appendMetrics` mkdir target uses `join(path, "..")` not `dirname` - `KIMI_MODEL` cast in `parseFindings` not validated against type - Truncation of error messages in callKimi at 300 chars loses context Haiku favors boundary cases — what happens when assumptions break. ## Cost-vs-quality verdict | Diff size | Recommended model | Why | |---|---|---| | < 100k chars (normal PRs) | Haiku 4.5 | 80% of the same surface, 5x cheaper, 2.6x faster | | > 100k chars (refactors, multi-file) | Opus 4.7 | Cross-file ramifications + escalation that lighter models miss | Auto-promotion implemented in `auditor/checks/kimi_architect.ts:74` via `selectModel(diffLen)`. Threshold env-overridable (`LH_AUDITOR_KIMI_OPUS_THRESHOLD_CHARS`, default 100000). ## Methodology notes - Same prompt template, same grounding rules, same input bundle - Each call cached at `data/_auditor/kimi_verdicts/-.json` - Per-call metrics appended to `data/_kb/kimi_audits.jsonl` - Wall-clock measured from request POST to response parse - Cost computed as `prompt_tokens * input_rate + completion_tokens * output_rate` - `usage.prompt_tokens` underreports through opencode proxy path (verified ~7k input tokens vs reported 5); cost figures use observed prompt size rather than reported.