Some checks failed
lakehouse/auditor 1 blocking issue: cloud: claim not backed — "journal event verified live (total_events_created 0→1 after probe)."
## Infrastructure (scrum loop hardening)
crates/gateway/src/v1/openrouter.rs — new OpenRouter provider
Direct HTTPS to openrouter.ai/api/v1/chat/completions with OpenAI-compatible shape.
Key resolution: OPENROUTER_API_KEY env → /home/profit/.env → /root/llm_team_config.json
(shares LLM Team UI's quota). Added after iter 5 hit repeated Ollama Cloud 502s on
kimi-k2:1t — different provider backbone as rescue rung. Unit tests pin the URL
stripping and OpenAI wire shape.
crates/gateway/src/v1/mod.rs + main.rs
Added `"openrouter" | "openrouter_free"` arm to /v1/chat dispatch.
V1State.openrouter_key loaded at startup via openrouter::resolve_openrouter_key()
mirroring the Ollama Cloud pattern. Startup log:
"v1: OpenRouter key loaded — /v1/chat provider=openrouter enabled"
tests/real-world/scrum_master_pipeline.ts
* 9-rung ladder — kimi-k2:1t → qwen3-coder:480b → deepseek-v3.1:671b →
mistral-large-3:675b → gpt-oss:120b → qwen3.5:397b → openrouter/gpt-oss-120b:free
→ openrouter/gemma-3-27b-it:free → local qwen3.5:latest.
Added qwen3-coder:480b as rung 2 after live probes confirmed it rescues
kimi-k2:1t 502s cleanly (0.9s latency, substantive reviews).
Dropped devstral-2 (displaced by qwen3-coder); dropped kimi-k2.6 (not available);
dropped minimax-m2.7 (returned 0 chars / 400 thinking tokens).
Local fallback promoted qwen3.5:latest per J's direction 2026-04-24.
* MAX_ATTEMPTS bumped 6 → 9 to accommodate the rescue tier.
* Tree-split scratchpad fixed — was concatenating shard markers directly
into the reviewer input, causing kimi-k2:1t to write titles like
"Forensic Audit Report – file.rs (shard 3)". Now uses internal §N§
markers during accumulation and runs a proper reduce step that
collapses per-shard digests into ONE coherent file-level synthesis
with markers stripped. Matches the Phase 21 aibridge::tree_split
map→reduce design. Fallback to stripped scratchpad if reducer returns thin.
tests/real-world/scrum_applier.ts — NEW (737 lines)
The auto-apply pipeline. Reads scrum_reviews.jsonl, filters rows where
gradient_tier ∈ {auto, dry_run} AND confidence_avg ≥ MIN_CONF (default 90),
asks the reviewer model for concrete old_string/new_string patch JSON,
applies via text replacement, runs cargo check after each file, commits
if green and reverts if red. Deny-list: /etc/, config/, ops/, auditor/,
docs/, data/, mcp-server/, ui/, sidecar/, scripts/. Hard caps: per-patch
confidence ≥ MIN_CONF, old_string must be exactly unique, max 20 lines per
patch. Never runs on main without explicit LH_APPLIER_BRANCH override.
Audit trail in data/_kb/auto_apply.jsonl.
Empirical behavior (dry-run over iter 4 reviews):
5 eligible files → 1 green commit-ready, 2 build-red reverts, 2 all-rejected
The build-green gate caught 2 bad patches before they'd have merged.
mcp-server/observer.ts — LLM Team code_review escalation
When a sig_hash accumulates ≥3 failures (ESCALATION_THRESHOLD), fire-and-forget
POST /api/run?mode=code_review at localhost:5000 with the failure cluster context.
Parses facts/entities/relationships/file_hints from the response. Writes to a
new data/_kb/observer_escalations.jsonl surface. Answers J's vision of the
observer triggering richer LLM Team calls when failures pile up.
Non-blocking: runs parallel to existing qwen2.5 analyzer, never replaces it.
Tracks escalated sig_hashes in a session-local Set to avoid re-hammering
LLM Team when a cluster persists across observer cycles.
crates/aibridge/src/context.rs
First auto-applied patch produced by scrum_applier.ts (dry-run path —
applier writes files in dry-run mode but doesn't commit; bug noted for
iter 6 fix). Adds #[deprecated] annotation to the inline estimate_tokens
helper pointing callers to the centralized shared::model_matrix::ModelMatrix
entry point (P21-002 — duplicate token-estimator surfaces). Cargo check
passes with the annotation (verified by applier's own build gate).
## Visual Control Plane (UI)
ui/server.ts — Bun.serve on :3950 with /data/* fan-out:
/data/services, /data/reviews, /data/metrics, /data/trust, /data/overrides,
/data/findings, /data/outcomes, /data/audit_facts, /data/file/:path,
/data/refactor_signals, /data/search?q=, /data/signal_classes,
/data/logs/:svc (journalctl tail per systemd unit), /data/scrum_log.
Bug fix: tryFetch always attempts JSON.parse before falling back to text
— observer's Bun.serve returns JSON without application/json content-type,
which was displaying stats as a raw string ("0 ops" on map) before.
ui/index.html + ui.css — dark neo-brutalist shell. 6 views:
MAP (D3 force-graph + overlays) / TRACE (per-file iter history) /
TRAJECTORY (signal-class cards + refactor-signals table + reverse-index
search box) / METRICS (every card has SOURCE + GOOD lines explaining
where the number comes from and what target trajectory means) /
KB (card grid with tooltips on every field) / CONSOLE (per-service
journalctl tabs).
ui/ui.js — polling client, D3 wiring, signal-class panel, refactor-signals
table, reverse-index search, per-service console tabs. Bug fix:
renderNodeContext had Object.entries() iterating string characters when
/health returned a plain string — now guards with typeof check so
"lakehouse ok" renders as one row instead of "0 l / 1 a / 2 k / ...".
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
340 lines
13 KiB
TypeScript
340 lines
13 KiB
TypeScript
// scrum_applier.ts — the auto-apply pipeline.
|
||
//
|
||
// Turns the scrum master's signal into real commits. Reads
|
||
// data/_kb/scrum_reviews.jsonl, filters to rows where the scrum's
|
||
// own confidence is high enough to trust auto-apply (gradient_tier
|
||
// auto OR confidence_avg ≥ 90), asks a patch-emitting model to
|
||
// produce concrete old_string/new_string pairs, applies them via
|
||
// text replacement, runs `cargo check` after each, commits on green
|
||
// and reverts on red.
|
||
//
|
||
// Runs on its own branch (never on main). Every action is recorded
|
||
// in data/_kb/auto_apply.jsonl so the auditor and future iterations
|
||
// can see what landed and what reverted.
|
||
//
|
||
// Usage:
|
||
// bun run tests/real-world/scrum_applier.ts # dry-run, print only
|
||
// LH_APPLIER_COMMIT=1 bun run tests/real-world/scrum_applier.ts # actually apply
|
||
//
|
||
// Env:
|
||
// LH_APPLIER_BRANCH — branch name (default: "scrum/auto-apply-${Date.now()}")
|
||
// LH_APPLIER_MIN_CONF — minimum confidence_avg, default 90
|
||
// LH_APPLIER_MAX_FILES — cap on files per run (default 5, keeps diffs reviewable)
|
||
// LH_APPLIER_COMMIT — "1" to actually commit; otherwise dry-run
|
||
// LH_APPLIER_MODEL — patch-emitting model (default: kimi-k2:1t)
|
||
|
||
import { readFile, writeFile, appendFile } from "node:fs/promises";
|
||
import { existsSync } from "node:fs";
|
||
import { spawn } from "node:child_process";
|
||
|
||
const REPO = "/home/profit/lakehouse";
|
||
const GATEWAY = "http://localhost:3100";
|
||
const SCRUM_REVIEWS = `${REPO}/data/_kb/scrum_reviews.jsonl`;
|
||
const AUDIT_LOG = `${REPO}/data/_kb/auto_apply.jsonl`;
|
||
|
||
const MIN_CONF = Number(process.env.LH_APPLIER_MIN_CONF ?? 90);
|
||
const MAX_FILES = Number(process.env.LH_APPLIER_MAX_FILES ?? 5);
|
||
const COMMIT = process.env.LH_APPLIER_COMMIT === "1";
|
||
const MODEL = process.env.LH_APPLIER_MODEL ?? "kimi-k2:1t";
|
||
const BRANCH = process.env.LH_APPLIER_BRANCH ?? `scrum/auto-apply-${Date.now().toString(36)}`;
|
||
|
||
// Deny-list — anything whose path starts with one of these is skipped
|
||
// regardless of how confident the scrum is. Config / systemd / docs /
|
||
// auditor itself are off limits for auto-apply; they need a human.
|
||
const DENY_PREFIXES = [
|
||
"config/",
|
||
"ops/",
|
||
"auditor/",
|
||
"docs/",
|
||
"data/",
|
||
"/etc/",
|
||
"mcp-server/",
|
||
"ui/",
|
||
"sidecar/",
|
||
"scripts/",
|
||
];
|
||
|
||
function log(msg: string) { console.log(`[applier] ${msg}`); }
|
||
|
||
async function sh(cmd: string[], cwd = REPO): Promise<{ stdout: string; stderr: string; code: number }> {
|
||
return new Promise((resolve) => {
|
||
const p = spawn(cmd[0], cmd.slice(1), { cwd, stdio: ["ignore", "pipe", "pipe"] });
|
||
let out = ""; let err = "";
|
||
p.stdout.on("data", (d) => { out += d.toString(); });
|
||
p.stderr.on("data", (d) => { err += d.toString(); });
|
||
p.on("close", (code) => resolve({ stdout: out, stderr: err, code: code ?? 1 }));
|
||
});
|
||
}
|
||
|
||
async function auditLog(row: Record<string, any>) {
|
||
const line = JSON.stringify({ ...row, ts: new Date().toISOString() }) + "\n";
|
||
await appendFile(AUDIT_LOG, line);
|
||
}
|
||
|
||
async function chat(opts: {
|
||
provider: "ollama_cloud" | "openrouter" | "ollama";
|
||
model: string;
|
||
prompt: string;
|
||
max_tokens?: number;
|
||
}): Promise<{ content: string; error?: string }> {
|
||
try {
|
||
const r = await fetch(`${GATEWAY}/v1/chat`, {
|
||
method: "POST",
|
||
headers: { "content-type": "application/json" },
|
||
body: JSON.stringify({
|
||
provider: opts.provider,
|
||
model: opts.model,
|
||
messages: [{ role: "user", content: opts.prompt }],
|
||
max_tokens: opts.max_tokens ?? 1500,
|
||
temperature: 0.1,
|
||
}),
|
||
signal: AbortSignal.timeout(180000),
|
||
});
|
||
if (!r.ok) return { content: "", error: `${r.status}: ${(await r.text()).slice(0, 300)}` };
|
||
const j: any = await r.json();
|
||
return { content: j.choices?.[0]?.message?.content ?? "" };
|
||
} catch (e) {
|
||
return { content: "", error: String(e) };
|
||
}
|
||
}
|
||
|
||
interface ScrumReview {
|
||
file: string;
|
||
reviewed_at: string;
|
||
accepted_model: string;
|
||
suggestions_preview: string;
|
||
confidences_per_finding?: number[];
|
||
confidence_avg?: number | null;
|
||
confidence_min?: number | null;
|
||
gradient_tier?: string;
|
||
gradient_tier_avg?: string;
|
||
verdict?: string;
|
||
critical_failures_count?: number;
|
||
schema_version?: number;
|
||
}
|
||
|
||
async function loadLatestReviews(): Promise<Map<string, ScrumReview>> {
|
||
// Map of file → latest review for that file. Ordered by reviewed_at.
|
||
if (!existsSync(SCRUM_REVIEWS)) return new Map();
|
||
const text = await readFile(SCRUM_REVIEWS, "utf8");
|
||
const rows: ScrumReview[] = text.split("\n").filter(Boolean).map(l => {
|
||
try { return JSON.parse(l); } catch { return null; }
|
||
}).filter((r): r is ScrumReview => r !== null);
|
||
// Keep the LATEST review per file.
|
||
const latest = new Map<string, ScrumReview>();
|
||
for (const r of rows) {
|
||
if (!r.file) continue;
|
||
const prev = latest.get(r.file);
|
||
if (!prev || (r.reviewed_at > prev.reviewed_at)) latest.set(r.file, r);
|
||
}
|
||
return latest;
|
||
}
|
||
|
||
function passesConfidenceGate(r: ScrumReview): boolean {
|
||
const avg = r.confidence_avg ?? 0;
|
||
const min = r.confidence_min ?? 0;
|
||
// Must be auto or dry_run tier AND confidence_min ≥ MIN_CONF.
|
||
// min is the conservative tier-lower-bound (one weak finding drags
|
||
// the whole file to "simulation" or "block" tier).
|
||
if (r.gradient_tier === "block" || r.gradient_tier === "simulation") return false;
|
||
return avg >= MIN_CONF && min >= 70;
|
||
}
|
||
|
||
function passesDenyList(file: string): boolean {
|
||
return !DENY_PREFIXES.some((p) => file.startsWith(p) || file === p.replace(/\/$/, ""));
|
||
}
|
||
|
||
interface Patch {
|
||
file: string;
|
||
old_string: string;
|
||
new_string: string;
|
||
rationale: string;
|
||
confidence: number;
|
||
}
|
||
|
||
async function requestPatches(file: string, source: string, review: string): Promise<Patch[]> {
|
||
const prompt = `You previously produced this review of ${file}:
|
||
|
||
─── REVIEW ───
|
||
${review}
|
||
─── END REVIEW ───
|
||
|
||
The review is high-confidence and the file is eligible for auto-apply. Produce CONCRETE PATCHES as JSON so they can be applied via string replacement.
|
||
|
||
RULES:
|
||
1. Output ONE JSON object with a "patches" array. NO prose, no markdown fences.
|
||
2. Each patch is {"old_string": "...", "new_string": "...", "rationale": "short", "confidence": 0-100}.
|
||
3. "old_string" MUST appear EXACTLY ONCE in the file (verbatim, including whitespace). If no unique anchor exists, SKIP that suggestion.
|
||
4. Mechanical changes only: wire a function call, add a field, remove #[allow(dead_code)], add a missing use import, rename one call-site. NO architectural rewrites. NO new modules.
|
||
5. Each "new_string" MUST compile in isolation with the same surrounding code. Don't introduce new dependencies.
|
||
6. If you cannot produce at least one high-confidence mechanical patch, output {"patches": []}.
|
||
7. Max 3 patches per file.
|
||
|
||
─── SOURCE (${source.length} bytes) ───
|
||
${source.slice(0, 14000)}
|
||
─── END SOURCE ───
|
||
|
||
Emit ONLY the JSON object.`;
|
||
|
||
const r = await chat({ provider: "ollama_cloud", model: MODEL, prompt, max_tokens: 2500 });
|
||
if (r.error || !r.content) return [];
|
||
|
||
// Strip markdown fences if model wrapped the JSON.
|
||
let raw = r.content.trim();
|
||
const fenceStart = raw.match(/^```(?:json)?\s*/);
|
||
if (fenceStart) raw = raw.slice(fenceStart[0].length);
|
||
if (raw.endsWith("```")) raw = raw.slice(0, -3).trim();
|
||
// Find first { and last } to extract JSON block if there's prose.
|
||
const first = raw.indexOf("{");
|
||
const last = raw.lastIndexOf("}");
|
||
if (first >= 0 && last > first) raw = raw.slice(first, last + 1);
|
||
|
||
try {
|
||
const obj = JSON.parse(raw);
|
||
const patches: Patch[] = (obj.patches ?? []).filter((p: any) =>
|
||
typeof p?.old_string === "string" &&
|
||
typeof p?.new_string === "string" &&
|
||
p.old_string !== p.new_string &&
|
||
p.old_string.length > 0 &&
|
||
typeof p?.confidence === "number"
|
||
).map((p: any) => ({
|
||
file,
|
||
old_string: p.old_string,
|
||
new_string: p.new_string,
|
||
rationale: String(p.rationale ?? ""),
|
||
confidence: p.confidence,
|
||
}));
|
||
return patches;
|
||
} catch (e) {
|
||
log(` ${file}: patch JSON parse failed — ${String(e).slice(0, 100)}`);
|
||
return [];
|
||
}
|
||
}
|
||
|
||
async function applyPatches(file: string, patches: Patch[]): Promise<{ applied: number; rejected: Array<{patch: Patch; reason: string}> }> {
|
||
const full = `${REPO}/${file}`;
|
||
let source = await readFile(full, "utf8");
|
||
const rejected: Array<{patch: Patch; reason: string}> = [];
|
||
let applied = 0;
|
||
for (const p of patches) {
|
||
// Confidence gate at the individual-patch level.
|
||
if (p.confidence < MIN_CONF) { rejected.push({patch: p, reason: `confidence ${p.confidence} < ${MIN_CONF}`}); continue; }
|
||
// Uniqueness gate.
|
||
const occurrences = source.split(p.old_string).length - 1;
|
||
if (occurrences === 0) { rejected.push({patch: p, reason: "old_string not found"}); continue; }
|
||
if (occurrences > 1) { rejected.push({patch: p, reason: `old_string appears ${occurrences}× (not unique)`}); continue; }
|
||
// Size gate — no patch touches > 20 lines (diff discipline).
|
||
const oldLines = p.old_string.split("\n").length;
|
||
const newLines = p.new_string.split("\n").length;
|
||
if (Math.max(oldLines, newLines) > 20) { rejected.push({patch: p, reason: `patch too large (${Math.max(oldLines,newLines)} lines)`}); continue; }
|
||
source = source.replace(p.old_string, p.new_string);
|
||
applied++;
|
||
}
|
||
if (applied > 0) await writeFile(full, source);
|
||
return { applied, rejected };
|
||
}
|
||
|
||
async function cargoCheck(): Promise<boolean> {
|
||
const r = await sh(["cargo", "check", "--workspace"]);
|
||
return r.code === 0;
|
||
}
|
||
|
||
async function gitCommit(file: string, patches: Patch[]): Promise<boolean> {
|
||
if (!COMMIT) { log(` (dry-run) would commit ${file}`); return true; }
|
||
const addR = await sh(["git", "add", file]);
|
||
if (addR.code !== 0) { log(` git add failed: ${addR.stderr.slice(0, 200)}`); return false; }
|
||
const msg = `auto-apply: ${patches.length} high-confidence fix${patches.length === 1 ? "" : "es"} in ${file}\n\n${patches.map(p => `- ${p.rationale} (conf ${p.confidence}%)`).join("\n")}\n\n🤖 scrum_applier.ts`;
|
||
const commitR = await sh(["git", "commit", "-m", msg]);
|
||
if (commitR.code !== 0) { log(` git commit failed: ${commitR.stderr.slice(0, 200)}`); return false; }
|
||
log(` ✓ committed ${file}`);
|
||
return true;
|
||
}
|
||
|
||
async function revertFile(file: string): Promise<void> {
|
||
await sh(["git", "checkout", "--", file]);
|
||
}
|
||
|
||
async function main() {
|
||
log(`starting · min_conf=${MIN_CONF} max_files=${MAX_FILES} model=${MODEL} commit=${COMMIT}`);
|
||
|
||
if (COMMIT) {
|
||
const headR = await sh(["git", "rev-parse", "--abbrev-ref", "HEAD"]);
|
||
const currentBranch = headR.stdout.trim();
|
||
if (currentBranch === "main") {
|
||
log(`refusing to run on main — create a branch first or set LH_APPLIER_BRANCH`);
|
||
const coR = await sh(["git", "checkout", "-b", BRANCH]);
|
||
if (coR.code !== 0) { log(`could not create branch ${BRANCH}: ${coR.stderr.slice(0, 200)}`); process.exit(1); }
|
||
log(`working branch: ${BRANCH}`);
|
||
} else {
|
||
log(`working branch: ${currentBranch}`);
|
||
}
|
||
}
|
||
|
||
const reviews = await loadLatestReviews();
|
||
log(`loaded ${reviews.size} latest reviews`);
|
||
|
||
const eligible = [...reviews.values()].filter(r =>
|
||
passesConfidenceGate(r) && passesDenyList(r.file)
|
||
).sort((a, b) => (b.confidence_avg ?? 0) - (a.confidence_avg ?? 0));
|
||
|
||
log(`${eligible.length} pass confidence gate + deny-list`);
|
||
log(`taking top ${Math.min(MAX_FILES, eligible.length)} by confidence`);
|
||
|
||
let committedFiles = 0;
|
||
let revertedFiles = 0;
|
||
|
||
for (const r of eligible.slice(0, MAX_FILES)) {
|
||
log(`${r.file} (conf_avg=${r.confidence_avg} tier=${r.gradient_tier})`);
|
||
const full = `${REPO}/${r.file}`;
|
||
if (!existsSync(full)) { log(` skip — file not found on disk`); continue; }
|
||
|
||
const source = await readFile(full, "utf8");
|
||
const patches = await requestPatches(r.file, source, r.suggestions_preview ?? "");
|
||
|
||
if (patches.length === 0) {
|
||
log(` no patches produced`);
|
||
await auditLog({ action: "no_patches", file: r.file, reviewer_model: r.accepted_model });
|
||
continue;
|
||
}
|
||
|
||
log(` ${patches.length} candidate patches`);
|
||
const { applied, rejected } = await applyPatches(r.file, patches);
|
||
log(` applied ${applied}, rejected ${rejected.length}`);
|
||
for (const rj of rejected) log(` ✗ ${rj.reason}`);
|
||
|
||
if (applied === 0) {
|
||
await auditLog({ action: "all_rejected", file: r.file, rejected: rejected.map(x => x.reason) });
|
||
continue;
|
||
}
|
||
|
||
log(` running cargo check...`);
|
||
const green = await cargoCheck();
|
||
if (!green) {
|
||
log(` ✗ build red — reverting ${r.file}`);
|
||
await revertFile(r.file);
|
||
revertedFiles++;
|
||
await auditLog({ action: "build_red_reverted", file: r.file, patches_applied: applied });
|
||
continue;
|
||
}
|
||
|
||
log(` ✓ build green`);
|
||
const ok = await gitCommit(r.file, patches.slice(0, applied));
|
||
if (ok) {
|
||
committedFiles++;
|
||
await auditLog({
|
||
action: COMMIT ? "committed" : "dry_run_committed",
|
||
file: r.file,
|
||
patches_applied: applied,
|
||
patches_rejected: rejected.length,
|
||
confidence_avg: r.confidence_avg,
|
||
gradient_tier: r.gradient_tier,
|
||
reviewer_model: r.accepted_model,
|
||
});
|
||
}
|
||
}
|
||
|
||
log(`DONE · committed=${committedFiles} reverted=${revertedFiles}`);
|
||
}
|
||
|
||
await main();
|