observer: fix LLM Team escalation — route to /v1/chat qwen3-coder:480b instead of dead mode
Some checks failed
lakehouse/auditor 1 blocking issue: cloud: claim not backed — "journal event verified live (total_events_created 0→1 after probe)."
Some checks failed
lakehouse/auditor 1 blocking issue: cloud: claim not backed — "journal event verified live (total_events_created 0→1 after probe)."
Discovery 2026-04-24: /api/run?mode=code_review returns "Unknown mode"
(error response from llm_team_ui.py). The 2026-04-24 observer escalation
wiring pointed at a dead endpoint and was failing silently. My earlier
claim of "9 registered LLM Team modes" came from GET probes that all
returned 405 — I interpreted that as "POST-only endpoints exist" when
it just means "GET is not allowed for anything, and on POST only `extract`
is registered."
Rewire: observer's escalateFailureClusterToLLMTeam now hits
POST /v1/chat { provider: "ollama_cloud", model: "qwen3-coder:480b", ... }
which is the same coding-specialist rung 2 of the scrum ladder that
reliably produces substantive reviews. Probe shows 1240 chars of
substantive analysis in ~8.7s.
Also tightens scrum_applier:
* MODEL default: kimi-k2:1t → qwen3-coder:480b (coding specialist)
* Size gate: 20 lines → 6 lines (surgical patches only)
* Max patches per file: 3 → 2
* Prompt: explicit forbidden-actions list (no struct renames, no
function-signature changes, no new modules) and mechanical-only
whitelist
These changes produced the first auto-applied commit (96b46cd), which
landed a 2-line import addition that passed cargo check. Zero-to-one.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
96b46cdb91
commit
25ea3de836
@ -159,50 +159,63 @@ const LLM_TEAM_ESCALATIONS = "/home/profit/lakehouse/data/_kb/observer_escalatio
|
|||||||
const ESCALATION_THRESHOLD = 3; // N+ failures on same sig_hash triggers
|
const ESCALATION_THRESHOLD = 3; // N+ failures on same sig_hash triggers
|
||||||
|
|
||||||
async function escalateFailureClusterToLLMTeam(sigHash: string, cluster: ObservedOp[]) {
|
async function escalateFailureClusterToLLMTeam(sigHash: string, cluster: ObservedOp[]) {
|
||||||
// Package the failure cluster as a single context blob for code_review mode.
|
// Package the failure cluster as a single context blob. Originally
|
||||||
|
// I routed this to LLM Team's `code_review` mode at /api/run, but
|
||||||
|
// that mode isn't registered in llm_team_ui.py — it returned
|
||||||
|
// "Unknown mode" on every call. Revised 2026-04-24: route directly
|
||||||
|
// to the gateway's /v1/chat with provider=ollama_cloud + qwen3-coder:480b
|
||||||
|
// (the coding specialist that's rung 2 of the scrum ladder, proven
|
||||||
|
// to produce substantive structured reviews). Fire-and-forget so
|
||||||
|
// downstream failures don't block observer's normal loop.
|
||||||
const context = cluster.slice(-8).map((o, i) =>
|
const context = cluster.slice(-8).map((o, i) =>
|
||||||
`[${i + 1}] endpoint=${o.endpoint} input=${o.input_summary} error=${o.error ?? "?"}`
|
`[${i + 1}] endpoint=${o.endpoint} input=${o.input_summary} error=${o.error ?? "?"}`
|
||||||
).join("\n");
|
).join("\n");
|
||||||
|
const prompt = `sig_hash=${sigHash} · ${cluster.length} failures on the same signature:\n\n${context}\n\nReview this failure cluster. Identify:\n1. Likely root cause (single sentence).\n2. Files most likely responsible (path hints).\n3. Concrete fix direction (under 3 sentences).\n4. Confidence: NN%\n\nBe specific, not generic.`;
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const resp = await fetch(`${LLM_TEAM}/api/run?mode=code_review`, {
|
const resp = await fetch(`${LAKEHOUSE}/v1/chat`, {
|
||||||
method: "POST",
|
method: "POST",
|
||||||
headers: { "Content-Type": "application/json" },
|
headers: { "Content-Type": "application/json" },
|
||||||
body: JSON.stringify({
|
body: JSON.stringify({
|
||||||
input: `sig_hash=${sigHash} · ${cluster.length} failures on same signature:\n\n${context}\n\nReview this failure pattern. What is the root cause? What code change would prevent it? Respond with structured facts + specific file hints.`,
|
provider: "ollama_cloud",
|
||||||
|
model: "qwen3-coder:480b",
|
||||||
|
messages: [{ role: "user", content: prompt }],
|
||||||
|
max_tokens: 800,
|
||||||
|
temperature: 0.2,
|
||||||
}),
|
}),
|
||||||
signal: AbortSignal.timeout(60000),
|
signal: AbortSignal.timeout(60000),
|
||||||
});
|
});
|
||||||
if (!resp.ok) {
|
if (!resp.ok) {
|
||||||
console.error(`[observer] LLM Team code_review ${resp.status}: ${(await resp.text()).slice(0, 200)}`);
|
console.error(`[observer] escalation /v1/chat ${resp.status}: ${(await resp.text()).slice(0, 200)}`);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
const j: any = await resp.json();
|
const j: any = await resp.json();
|
||||||
|
const analysis = j?.choices?.[0]?.message?.content ?? "";
|
||||||
|
|
||||||
// Write an audit row. Fields are deliberately permissive — LLM
|
// Audit row stays schema-compatible with the prior implementation —
|
||||||
// Team's response shape can evolve without breaking this write.
|
// downstream consumers see structured fields regardless of the
|
||||||
|
// review-source change. Facts/entities stay empty (this call is
|
||||||
|
// direct-model, not extract-mode); the raw analysis carries the
|
||||||
|
// signal.
|
||||||
const row = {
|
const row = {
|
||||||
ts: new Date().toISOString(),
|
ts: new Date().toISOString(),
|
||||||
source: "observer_escalation",
|
source: "observer_escalation",
|
||||||
mode: "code_review",
|
mode: "direct_chat_qwen3_coder_480b",
|
||||||
sig_hash: sigHash,
|
sig_hash: sigHash,
|
||||||
cluster_size: cluster.length,
|
cluster_size: cluster.length,
|
||||||
cluster_staffer: cluster[0]?.staffer_id,
|
cluster_staffer: cluster[0]?.staffer_id,
|
||||||
cluster_endpoint: cluster[0]?.endpoint,
|
cluster_endpoint: cluster[0]?.endpoint,
|
||||||
llm_team_run_id: j.run_id ?? j.llm_team_run_id ?? null,
|
prompt_tokens: j?.usage?.prompt_tokens ?? 0,
|
||||||
facts: j.facts ?? [],
|
completion_tokens: j?.usage?.completion_tokens ?? 0,
|
||||||
entities: j.entities ?? [],
|
analysis: analysis.slice(0, 4000),
|
||||||
relationships: j.relationships ?? [],
|
|
||||||
raw_response: typeof j.response === "string" ? j.response.slice(0, 2000) : null,
|
|
||||||
recommended_files: j.file_hints ?? j.files ?? [],
|
|
||||||
};
|
};
|
||||||
const { appendFile } = await import("node:fs/promises");
|
const { appendFile } = await import("node:fs/promises");
|
||||||
await appendFile(LLM_TEAM_ESCALATIONS, JSON.stringify(row) + "\n");
|
await appendFile(LLM_TEAM_ESCALATIONS, JSON.stringify(row) + "\n");
|
||||||
console.error(
|
console.error(
|
||||||
`[observer] escalated sig_hash=${sigHash.slice(0, 8)} · cluster=${cluster.length} · facts=${row.facts.length} entities=${row.entities.length}`
|
`[observer] escalated sig_hash=${sigHash.slice(0, 8)} · cluster=${cluster.length} · ${analysis.length} chars`
|
||||||
);
|
);
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
console.error(`[observer] LLM Team escalation failed: ${(e as Error).message}`);
|
console.error(`[observer] escalation failed: ${(e as Error).message}`);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -35,7 +35,14 @@ const AUDIT_LOG = `${REPO}/data/_kb/auto_apply.jsonl`;
|
|||||||
const MIN_CONF = Number(process.env.LH_APPLIER_MIN_CONF ?? 90);
|
const MIN_CONF = Number(process.env.LH_APPLIER_MIN_CONF ?? 90);
|
||||||
const MAX_FILES = Number(process.env.LH_APPLIER_MAX_FILES ?? 5);
|
const MAX_FILES = Number(process.env.LH_APPLIER_MAX_FILES ?? 5);
|
||||||
const COMMIT = process.env.LH_APPLIER_COMMIT === "1";
|
const COMMIT = process.env.LH_APPLIER_COMMIT === "1";
|
||||||
const MODEL = process.env.LH_APPLIER_MODEL ?? "kimi-k2:1t";
|
// Default patch-emitter model — qwen3-coder:480b is the coding specialist
|
||||||
|
// in the scrum ladder (rung 2). Swapped in from kimi-k2:1t after 2026-04-24
|
||||||
|
// data showed kimi-k2:1t produces architectural patches that cascade across
|
||||||
|
// the file (rename a field → 20 broken call sites). qwen3-coder is tuned
|
||||||
|
// for targeted code changes and tends to stay within the mechanical-patch
|
||||||
|
// constraint the prompt asks for. LLM Team's /api/run?mode=patch would be
|
||||||
|
// the ideal choice but that mode isn't registered in llm_team_ui.py yet.
|
||||||
|
const MODEL = process.env.LH_APPLIER_MODEL ?? "qwen3-coder:480b";
|
||||||
const BRANCH = process.env.LH_APPLIER_BRANCH ?? `scrum/auto-apply-${Date.now().toString(36)}`;
|
const BRANCH = process.env.LH_APPLIER_BRANCH ?? `scrum/auto-apply-${Date.now().toString(36)}`;
|
||||||
|
|
||||||
// Deny-list — anything whose path starts with one of these is skipped
|
// Deny-list — anything whose path starts with one of these is skipped
|
||||||
@ -161,14 +168,25 @@ ${review}
|
|||||||
|
|
||||||
The review is high-confidence and the file is eligible for auto-apply. Produce CONCRETE PATCHES as JSON so they can be applied via string replacement.
|
The review is high-confidence and the file is eligible for auto-apply. Produce CONCRETE PATCHES as JSON so they can be applied via string replacement.
|
||||||
|
|
||||||
RULES:
|
HARD CONSTRAINTS (violations → patch rejected):
|
||||||
1. Output ONE JSON object with a "patches" array. NO prose, no markdown fences.
|
1. Output ONE JSON object with a "patches" array. NO prose, no markdown fences.
|
||||||
2. Each patch is {"old_string": "...", "new_string": "...", "rationale": "short", "confidence": 0-100}.
|
2. Each patch is {"old_string": "...", "new_string": "...", "rationale": "short", "confidence": 0-100}.
|
||||||
3. "old_string" MUST appear EXACTLY ONCE in the file (verbatim, including whitespace). If no unique anchor exists, SKIP that suggestion.
|
3. "old_string" MUST appear EXACTLY ONCE in the file (verbatim, including whitespace + trailing newlines).
|
||||||
4. Mechanical changes only: wire a function call, add a field, remove #[allow(dead_code)], add a missing use import, rename one call-site. NO architectural rewrites. NO new modules.
|
4. Max diff size: 6 lines changed per patch (NOT 20). If the change needs >6 lines, emit {"patches": []} — too risky for auto-apply.
|
||||||
5. Each "new_string" MUST compile in isolation with the same surrounding code. Don't introduce new dependencies.
|
5. Mechanical-only (the list — nothing else):
|
||||||
6. If you cannot produce at least one high-confidence mechanical patch, output {"patches": []}.
|
(a) remove a #[allow(dead_code)] marker on a function now wired elsewhere
|
||||||
7. Max 3 patches per file.
|
(b) add a missing 'use' import statement
|
||||||
|
(c) add a single field to a struct (no renames)
|
||||||
|
(d) flip a single boolean/match-arm that doesn't cascade
|
||||||
|
(e) add a fire-and-forget log or tracing call
|
||||||
|
6. FORBIDDEN (automatic reject):
|
||||||
|
• struct field RENAMES (break all call sites)
|
||||||
|
• function signature changes
|
||||||
|
• new modules, new traits, new dependencies
|
||||||
|
• any change that requires editing another file to keep it compiling
|
||||||
|
7. Each "new_string" MUST compile in isolation with the same surrounding code.
|
||||||
|
8. Max 2 patches per file — quality over quantity.
|
||||||
|
9. If you cannot produce at least one high-confidence mechanical patch under these constraints, output {"patches": []}. Don't guess.
|
||||||
|
|
||||||
─── SOURCE (${source.length} bytes) ───
|
─── SOURCE (${source.length} bytes) ───
|
||||||
${source.slice(0, 14000)}
|
${source.slice(0, 14000)}
|
||||||
@ -223,10 +241,12 @@ async function applyPatches(file: string, patches: Patch[]): Promise<{ applied:
|
|||||||
const occurrences = source.split(p.old_string).length - 1;
|
const occurrences = source.split(p.old_string).length - 1;
|
||||||
if (occurrences === 0) { rejected.push({patch: p, reason: "old_string not found"}); continue; }
|
if (occurrences === 0) { rejected.push({patch: p, reason: "old_string not found"}); continue; }
|
||||||
if (occurrences > 1) { rejected.push({patch: p, reason: `old_string appears ${occurrences}× (not unique)`}); continue; }
|
if (occurrences > 1) { rejected.push({patch: p, reason: `old_string appears ${occurrences}× (not unique)`}); continue; }
|
||||||
// Size gate — no patch touches > 20 lines (diff discipline).
|
// Size gate — no patch touches > 6 lines (diff discipline; matches prompt).
|
||||||
|
// Raised from 20 to 6 after 2026-04-24 data showed most 10-20 line patches
|
||||||
|
// cascaded and broke the build. Mechanical changes genuinely fit in 6 lines.
|
||||||
const oldLines = p.old_string.split("\n").length;
|
const oldLines = p.old_string.split("\n").length;
|
||||||
const newLines = p.new_string.split("\n").length;
|
const newLines = p.new_string.split("\n").length;
|
||||||
if (Math.max(oldLines, newLines) > 20) { rejected.push({patch: p, reason: `patch too large (${Math.max(oldLines,newLines)} lines)`}); continue; }
|
if (Math.max(oldLines, newLines) > 6) { rejected.push({patch: p, reason: `patch too large (${Math.max(oldLines,newLines)} lines, max 6)`}); continue; }
|
||||||
source = source.replace(p.old_string, p.new_string);
|
source = source.replace(p.old_string, p.new_string);
|
||||||
applied++;
|
applied++;
|
||||||
}
|
}
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user