lakehouse

Author	SHA1	Message	Date
profit	77650c4ba3	auditor: inference curation layer + llm_team fact extraction → KB Closes the cycle J asked for: curated cloud output lands structured knowledge in the KB so future audits have architectural context, not just a log of per-finding signatures. Three pieces: 1. Inference curation (tree-split) — when diff > 30KB, shard at 4.5KB, summarize each shard via cloud (temp=0, think=false on small shards; think=true on main call). Merge into scratchpad. The cloud verification then runs against the scratchpad, not truncated raw. Eliminates the 40KB MAX_DIFF_CHARS truncation path for large PRs (PR #8 is 102KB — was losing 62KB). Anti-false-positive guard in the prompt: cloud is told scratchpad absence is NOT diff absence, so it doesn't flag curated-out symbols as missing. unflagged_gaps section is dropped entirely when curated (scratchpad can't ground them). 2. fact_extractor — TS client for llm_team_ui's extract-facts mode at localhost:5000/api/run. Sends curated scratchpad through qwen2.5 extractor + gemma2 verifier, parses SSE stream, returns structured {facts, entities, relationships, verification, llm_team_run_id}. Best-effort: if llm_team is down, extraction fails silently and the audit still completes. AWAITED so CLI tools (audit_one.ts) don't exit before extraction lands — the systemd poller has 90s headroom so the extra ~15s doesn't matter. 3. audit_facts.jsonl + checkAuditFacts() — one row per curated audit with the extraction result. kb_query tails the jsonl, explodes entity rows, aggregates by entity name with distinct-PR counting, surfaces entities recurring in 2+ PRs as info findings. Filters out short names (<3 chars, extractor truncation artifacts) and generic types (string/number/etc.) so signal isn't drowned. Verified end-to-end on PR #8: 102KB diff → 23 shards → 1KB scratchpad → qwen2.5 extracted 4 facts + 6 entities + 6 relationships (real code-level knowledge: AggregateOptions<T> type, aggregate<T> async function with real signature, typed relationships). llm_team_run_id cross-references to llm_team's own team_runs table. Also: audit.ts passes (pr_number, head_sha) as InferenceContext so extracted facts are scope-tagged for the KB index.	2026-04-22 23:09:14 -05:00
profit	9d12a814e3	auditor: kb_index aggregator + nine-consecutive empirical test Some checks failed lakehouse/auditor 1 blocking issue: cloud: claim not backed — "the proven escalation ladder with learning context, collects" Phase 1 — definition-layer over append-only JSONL scratchpads. auditor/kb_index.ts is the single shared aggregator: aggregate<T>(jsonlPath, { keyFn, scopeFn, checkFn, tailLimit }) → Map<signature, {count, distinct_scopes, confidence, first_seen, last_seen, representative_summary, ...}> ratingSeverity(agg) — confidence × count severity policy shared across all KB readers. Kills the "same unfixed PR inflates its own recurrence score" failure mode by design: confidence = distinct_scopes/count, so same-scope noise stays below the 0.3 escalation threshold no matter how many times it repeats. checkAuditLessons now routes through aggregate + ratingSeverity. Net effect: the recurrence detector's bespoke Map/Set bookkeeping is gone; same behavior, shared discipline, reusable by scrum/observer. Also: symbolsExistInRepo now skips files >500KB so the audit can't get stuck slurping a fixture. Phase 2 — nine-consecutive audit runner. tests/real-world/nine_consecutive_audits.ts pushes 9 empty commits, waits for each verdict, captures the audit_lessons aggregate state after each run, reports: - sig_count trajectory (should stabilize, not grow linearly) - max_count trajectory (same-signature repeat rate) - max_confidence trajectory (must stay LOW on same-PR noise) - verdict_stable across runs (must NOT oscillate) This is the empirical proof that the KB compounds favorably: noise doesn't escalate itself, and signal stays distinguishable. Unit-tested both failure modes: same-PR × 9 repeats = conf=0.11 (info); cross-PR × 5 distinct = conf=1.00 (block). The rating function correctly discriminates.	2026-04-22 21:49:46 -05:00
profit	0306dd88c1	auditor: close the verdict→playbook loop + fix rubric-string false positive Some checks failed lakehouse/auditor 2 blocking issues: unimplemented!() macro call in tests/real-world/hard_task_escalation.ts Two changes that fell out of running the auto-loop for real on PR #8: 1. The systemd auditor blocked PR #8 on 'unimplemented!()' / 'todo!()' in tests/real-world/hard_task_escalation.ts — but those strings are the rubric itself, not macro calls. Added isInsideQuotedString() detection in static.ts: BLOCK_PATTERNS now skip matches that fall inside double-quoted / single-quoted / backtick string literals on the added line. WARN/INFO patterns still run — a TODO comment in a string is still a valid signal. 2. Verdicts were being persisted to disk but never fed back as learning signal. Added appendAuditLessons() — every block/warn finding writes a JSONL row to data/_kb/audit_lessons.jsonl with a path-agnostic signature (strips file paths, line numbers, commit hashes) so the SAME class of finding on DIFFERENT files dedups to one signature. kb_query now tails audit_lessons.jsonl and emits recurrence findings: 2 distinct PRs hit a signature = info, 3-4 = warn, 5+ = block. Severity ramps on distinct-PR count, not total rows, so a single unfixed PR being re-audited doesn't inflate its own recurrence score. Fires on post-verdict fire-and-forget (can't break the audit if disk write fails). The learning loop is now closed: each audit contributes to the KB that guides the next audit. Tested: unit tests for normalizedSignature confirmed path-agnostic dedup; static.ts regression tests confirmed rubric strings no longer trip BLOCK while real unquoted unimplemented!() still does.	2026-04-22 21:31:35 -05:00
profit	dc01ba0a3b	auditor: kb_query surfaces scrum-master reviews for files in PR diff Some checks failed lakehouse/auditor 2 blocking issues: unimplemented!() macro call in tests/real-world/hard_task_escalation.ts Wires the cohesion-plan Phase C link: the scrum-master pipeline writes per-file reviews to data/_kb/scrum_reviews.jsonl on accept; the auditor now reads that same file and emits one kb_query finding per scrum review whose `file` matches a path in the PR's diff. Severity heuristic: attempt 1-3 → info, attempt 4+ → warn. Reaching the cloud specialist (attempt 4+) means the ladder had to escalate, which is meaningful signal reviewers should see. Tree-split fired is also surfaced in the finding summary. audit.ts now passes pr.files.map(f => f.path) into runKbCheck (the old signature dropped it on the floor). Also adds auditor/audit_one.ts — a dry-run CLI for auditing a single PR without posting to Gitea, useful for verifying check behavior without spamming review comments. Verified: after writing scrum_reviews for auditor/audit.ts and mcp-server/observer.ts (both in PR #7), audit_one 7 surfaced both as info findings with preview + accepted_model + tree_split flag. A scrum review for playbook_memory.rs (NOT in PR #7) was correctly filtered out.	2026-04-22 21:18:21 -05:00
profit	039ed32411	Auditor: KB query check + verdict orchestrator + Gitea poster All checks were successful lakehouse/auditor all checks passed (4 findings, all info) auditor/checks/kb_query.ts (task #7) — reads data/_kb/outcomes.jsonl, error_corrections.jsonl, data/_observer/ops.jsonl, data/_bot/cycles/. Cheap/offline: no model calls, tail-reads only. Fail-rate >30% in recent scenario outcomes → warn; otherwise info. Live-proven: 1 finding emitted against current KB state (69 scenario runs, 27.7% fail rate — below warn threshold). auditor/audit.ts (task #8) — orchestrator. Runs static + dynamic + inference + kb_query in parallel, calls assembleVerdict, persists to data/_auditor/verdicts/, posts to Gitea (commit status + issue comment). AuditOptions supports skip_dynamic/skip_inference/dry_run for iteration. auditor/gitea.ts — added postIssueComment (author can comment on own PR, unlike postReview which self-review-blocks). static.ts — skip BLOCK_PATTERNS scan on auditor/checks/ and auditor/fixtures/* because those files legitimately contain the patterns as regex/string-literal data. WARN/INFO patterns (TODO comments, hardcoded placeholders) still run. Live-proven: dry-run audit of PR #1 after fix went from 13 block findings to 0 from static; 11 warn from inference still fire on real overreach claims. Dry-run audit against PR #1, skip_dynamic=true: verdict: block (BEFORE the static fix) verdict: request_changes (AFTER — inference correctly flagged "tasks 1-9 complete" as not backed; 0 false-positive blocks from static self-match) 42.5s total across checks (mostly cloud inference: 36s) 26 claims, 39KB diff Tasks 5 + 6 + 7 + 8 complete. Remaining: #9 (poller) + #10 (end-to-end proof) + #12 (upsert UPDATE merge fix).	2026-04-22 03:59:38 -05:00

5 Commits