auditor: alternate Kimi K2.6 ↔ Haiku 4.5, drop Opus from auto-promotion

Operator can't sustain Opus's ~$0.30/audit on the daemon. New strategy: - Even-numbered audits per PR use kimi-k2.6 via ollama_cloud (effectively free under the Ollama Pro flat subscription) - Odd-numbered audits use claude-haiku-4-5 via opencode/Zen (~$0.04/audit) - Frontier models (Opus, GPT-5.5-pro, Gemini 3.1-pro) are NOT in auto-promotion. Operator hands distilled findings to a frontier model manually when a load-bearing decision needs it. Mirrors the lakehouse playbook-memory pattern: cheap models do the volume, the validated subset compounds, only the compounded bundle gets handed to a frontier model. Same logic at the auditor layer. Audit-index derivation: count of existing kimi_verdicts files for the PR. So if the dir has 4 verdicts for PR #11 already, the 5th audit is index 4 (even) → Kimi, the 6th is index 5 (odd) → Haiku. Across an active PR's lifetime the audits naturally interleave the two lineages. Cost projection at observed cadence (5-10 pushes/day): - Old (Haiku default + Opus auto on big diffs): $1-3/day - New (Kimi/Haiku alternating, no Opus): $0.10-0.40/day - $31.68 budget lasts: ~3 months instead of ~10 days Override knobs: LH_AUDITOR_KIMI_MODEL=<X> pins to model X (no alternation) LH_AUDITOR_KIMI_PROVIDER=<P> provider for default model LH_AUDITOR_KIMI_ALT_MODEL=<X> sets the odd-index alternate LH_AUDITOR_KIMI_ALT_PROVIDER=<P> provider for alternate The OPUS_THRESHOLD env knobs from the prior auto-promotion commit are now no-ops (unset, no longer referenced). Verification: bun build auditor/checks/kimi_architect.ts compiles systemctl restart lakehouse-auditor active systemctl show env Haiku pin removed, Kimi default + cap=3 set Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
auditor: per-PR audit cap (default 3) — daemon halts further audits until reset
2026-04-27 07:26:31 -05:00 · 2026-04-27 07:24:23 -05:00
2 changed files with 82 additions and 18 deletions
--- a/auditor/checks/kimi_architect.ts
+++ b/auditor/checks/kimi_architect.ts
@ -50,22 +50,38 @@ const MAX_PRIOR_FINDINGS = 50;
 // gateway as a fallback for when Ollama Cloud is upstream-broken.
 const KIMI_PROVIDER = process.env.LH_AUDITOR_KIMI_PROVIDER ?? "ollama_cloud";
 const KIMI_MODEL = process.env.LH_AUDITOR_KIMI_MODEL ?? "kimi-k2.6";
-// Big-diff promotion: when the diff exceeds OPUS_THRESHOLD_CHARS, swap
-// to OPUS_MODEL for that audit. 2026-04-27 3-way bake-off (Kimi vs
-// Haiku vs Opus on a 32K diff) showed Opus is the only model that
-// catches cross-file ramifications + escalates `block` severity on
-// real architectural risks. ~5x the spend per audit, only worth it
-// when the diff is big enough to have those risks.
+// Cross-lineage alternation. 2026-04-27 J's call: Opus is too
+// expensive to auto-fire (~$0.30/audit). Kimi K2.6 via Go-sub is
+// effectively free; Haiku 4.5 via Zen is ~$0.04. Alternate between
+// them so we get cross-lineage signal (Moonshot vs Anthropic) on
+// every PR's audit history without burning the budget.
 //
-// Defaults: Haiku for normal diffs (fast, cheap, ~$0.02), Opus for
-// > 100k chars. Disable promotion: set OPUS_THRESHOLD_CHARS very high.
-const OPUS_MODEL = process.env.LH_AUDITOR_KIMI_OPUS_MODEL ?? "claude-opus-4-7";
-const OPUS_PROVIDER = process.env.LH_AUDITOR_KIMI_OPUS_PROVIDER ?? "opencode";
-const OPUS_THRESHOLD_CHARS = Number(process.env.LH_AUDITOR_KIMI_OPUS_THRESHOLD_CHARS) || 100_000;
+// Default: Kimi K2.6 on even audits, Haiku 4.5 on odd. Each PR's
+// audits flip between vendors as new SHAs come in.
+//
+// Frontier models (Opus 4.7, GPT-5.5, Gemini 3.1) are NOT in the
+// auto path. Operator hands distilled findings to a frontier model
+// manually when high-leverage decisions need it. Removing Opus from
+// auto-promotion saves ~$1-3/day on the daemon at our cadence.
+//
+// Override the alternation entirely with LH_AUDITOR_KIMI_MODEL
+// (forces one model regardless of audit count); set
+// LH_AUDITOR_KIMI_ALT_MODEL to the alternate.
+const ALT_MODEL = process.env.LH_AUDITOR_KIMI_ALT_MODEL ?? "claude-haiku-4-5";
+const ALT_PROVIDER = process.env.LH_AUDITOR_KIMI_ALT_PROVIDER ?? "opencode";
+const FORCE_DEFAULT = process.env.LH_AUDITOR_KIMI_MODEL !== undefined && process.env.LH_AUDITOR_KIMI_MODEL !== "";

-function selectModel(diffLen: number): { provider: string; model: string; promoted: boolean } {
-  if (diffLen > OPUS_THRESHOLD_CHARS) {
-    return { provider: OPUS_PROVIDER, model: OPUS_MODEL, promoted: true };
+function selectModel(diffLen: number, auditIndex: number = 0): { provider: string; model: string; promoted: boolean } {
+  // Operator override — env-pinned model wins.
+  if (FORCE_DEFAULT) {
+    return { provider: KIMI_PROVIDER, model: KIMI_MODEL, promoted: false };
+  }
+  // Alternate Kimi (default, even index) ↔ Haiku (alt, odd index).
+  // diffLen kept in the signature for future "big diff → Haiku
+  // anyway" logic; not used yet so we don't auto-burn on big PRs.
+  void diffLen;
+  if (auditIndex % 2 === 1) {
+    return { provider: ALT_PROVIDER, model: ALT_MODEL, promoted: true };
  }
  return { provider: KIMI_PROVIDER, model: KIMI_MODEL, promoted: false };
 }
@ -141,7 +157,20 @@ export async function runKimiArchitectCheck(
      : [{ check: "kimi_architect" as CheckKind, severity: "info", summary: "kimi_architect cached — 0 findings", evidence: [`cache: ${cachePath}`] }];
  }

-  const selected = selectModel(diff.length);
+  // Alternate model based on how many audits this PR has had — gives
+  // cross-lineage signal (Kimi/Moonshot ↔ Haiku/Anthropic) on every
+  // PR's audit history. Count is derived from existing kimi_verdicts
+  // files for this PR; cheap O(N_PRs) directory read.
+  let auditIndex = 0;
+  try {
+    const dir = "/home/profit/lakehouse/data/_auditor/kimi_verdicts";
+    if (existsSync(dir)) {
+      const all = require("node:fs").readdirSync(dir) as string[];
+      auditIndex = all.filter((f) => f.startsWith(`${ctx.pr_number}-`)).length;
+    }
+  } catch { /* default 0 — Kimi */ }
+
+  const selected = selectModel(diff.length, auditIndex);
  let response: { content: string; usage: any; finish_reason: string; latency_ms: number };
  try {
    response = await callKimi(buildPrompt(diff, priorFindings, ctx), selected.provider, selected.model);
--- a/auditor/index.ts
+++ b/auditor/index.ts
@ -24,14 +24,30 @@ const POLL_INTERVAL_MS = 90_000; // 90s — enough budget for audit runs to comp
 const PAUSE_FILE = "/home/profit/lakehouse/auditor.paused";
 const STATE_FILE = "/home/profit/lakehouse/data/_auditor/state.json";

+// Per-PR audit cap. Prevents the daemon from running away on a PR
+// when each push surfaces new findings — operator wants to review
+// in batch, not have the daemon burn budget while they're away.
+// Default 3 audits per PR. Override via LH_AUDITOR_MAX_AUDITS_PER_PR.
+// Set to 0 to disable the cap.
+//
+// Reset (after manual review): edit data/_auditor/state.json and
+// set audit_count_per_pr.<N> = 0 (or delete the key). Daemon picks
+// up the change on the next cycle without restart.
+const MAX_AUDITS_PER_PR = Number(process.env.LH_AUDITOR_MAX_AUDITS_PER_PR) || 3;
+
 interface State {
  // Map: PR number → last-audited head SHA. Lets us dedupe audits
  // across restarts (poller can crash/restart without re-auditing
  // all open PRs from scratch).
  last_audited: Record<string, string>;
+  // Map: PR number → number of audits run on that PR since last reset.
+  // Daemon halts auditing a PR once this hits MAX_AUDITS_PER_PR.
+  // Operator clears the entry to resume.
+  audit_count_per_pr: Record<string, number>;
  started_at: string;
  cycles_total: number;
  cycles_skipped_paused: number;
+  cycles_skipped_capped: number;
  audits_run: number;
  last_cycle_at?: string;
 }
@ -47,17 +63,21 @@ async function loadState(): Promise<State> {
    return {
      last_audited: s.last_audited ?? {},
      started_at: s.started_at ?? new Date().toISOString(),
+      audit_count_per_pr: s.audit_count_per_pr ?? {},
      cycles_total: s.cycles_total ?? 0,
      cycles_skipped_paused: s.cycles_skipped_paused ?? 0,
+      cycles_skipped_capped: s.cycles_skipped_capped ?? 0,
      audits_run: s.audits_run ?? 0,
      last_cycle_at: s.last_cycle_at,
    };
  } catch {
    return {
      last_audited: {},
+      audit_count_per_pr: {},
      started_at: new Date().toISOString(),
      cycles_total: 0,
      cycles_skipped_paused: 0,
+      cycles_skipped_capped: 0,
      audits_run: 0,
    };
  }
@ -89,12 +109,23 @@ async function runCycle(state: State): Promise<State> {
  console.log(`[auditor] cycle ${state.cycles_total}: ${prs.length} open PR(s)`);

  for (const pr of prs) {
-    const last = state.last_audited[String(pr.number)];
+    const prKey = String(pr.number);
+    const last = state.last_audited[prKey];
    if (last === pr.head_sha) {
      console.log(`[auditor]   skip PR #${pr.number} (SHA ${pr.head_sha.slice(0, 8)} already audited)`);
      continue;
    }
-    console.log(`[auditor]   audit PR #${pr.number} (${pr.head_sha.slice(0, 8)}) — ${pr.title.slice(0, 60)}`);
+    // Per-PR audit cap — once a PR has been audited MAX_AUDITS_PER_PR
+    // times, halt further audits until the operator manually clears
+    // audit_count_per_pr[<N>] in state.json. Prevents runaway burn
+    // when each fix surfaces new findings.
+    const auditedSoFar = state.audit_count_per_pr[prKey] ?? 0;
+    if (MAX_AUDITS_PER_PR > 0 && auditedSoFar >= MAX_AUDITS_PER_PR) {
+      console.log(`[auditor]   skip PR #${pr.number} (capped at ${auditedSoFar}/${MAX_AUDITS_PER_PR} audits — clear state.json audit_count_per_pr.${prKey} to resume)`);
+      state.cycles_skipped_capped += 1;
+      continue;
+    }
+    console.log(`[auditor]   audit PR #${pr.number} (${pr.head_sha.slice(0, 8)}) — ${pr.title.slice(0, 60)} [${auditedSoFar + 1}/${MAX_AUDITS_PER_PR}]`);
    try {
      // Skip dynamic by default: it mutates live playbook state and
      // re-runs on every PR update would pollute quickly. Operator
@ -106,8 +137,12 @@ async function runCycle(state: State): Promise<State> {
        skip_inference: process.env.LH_AUDITOR_SKIP_INFERENCE === "1",
      });
      console.log(`[auditor]     verdict=${verdict.overall} findings=${verdict.metrics.findings_total} (block=${verdict.metrics.findings_block} warn=${verdict.metrics.findings_warn})`);
-      state.last_audited[String(pr.number)] = pr.head_sha;
+      state.last_audited[prKey] = pr.head_sha;
+      state.audit_count_per_pr[prKey] = auditedSoFar + 1;
      state.audits_run += 1;
+      if (state.audit_count_per_pr[prKey] >= MAX_AUDITS_PER_PR) {
+        console.log(`[auditor]     PR #${pr.number} reached cap (${MAX_AUDITS_PER_PR} audits) — daemon will skip further audits until reset`);
+      }
    } catch (e) {
      console.error(`[auditor]     audit failed: ${(e as Error).message}`);
    }