3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
56dbfb7d03 |
fact_extractor: project context + fixed verifier-verdict parser
Some checks failed
lakehouse/auditor 8 warnings — see review
Two bundled changes. Both came out of J's observation that the
verifier was defaulting to UNVERIFIABLE on domain-specific facts
because it had no idea what Lakehouse was, which project's code it
was reading, or what framework the types belonged to.
1. Project context preamble. Added docs/AUDITOR_CONTEXT.md — a <400-
word concise description of the project (crates, services,
architecture phases, the auditor's role itself). fact_extractor
reads it once, caches it, prepends it to the extract prompt as a
"PROJECT CONTEXT (for grounding; do NOT extract from this)"
section. Both extractor and verifier now see this context, so
statements like "aggregate<T> returns Map<string, AggregateRow>"
get grounded as "this is a TypeScript function in the Lakehouse
auditor subsystem" and the verifier can reason about plausibility
instead of guessing.
2. Verifier-verdict parser fix. Gemma2's output format varies between
"**Verdict:** CORRECT" and just "* **CORRECT**" inline (observed
variance across runs). The old regex required "Verdict:" as a
label and missed the second format — causing all verdicts to
stay UNCHECKED. Replaced with a two-pass approach: find each
fact section start ("**N.**" or "N."), slice to the next section,
scan the slice for the first CORRECT|INCORRECT|UNVERIFIABLE
token. Handles both formats plus unfenced fallback.
Verified: 4-fact test extraction went from 0/4 verdicts scored
(pre-fix) to 2/4 CORRECT + 2/4 UNVERIFIABLE (post-fix). The 2
UNVERIFIABLE cases are domain-specific code behavior the verifier
legitimately can't confirm without reading source — correct stance,
not a parser miss.
No new consensus modes yet. J suggested adding codereview or
validator as a second pass; holding until we see whether context
injection alone gives sufficient signal lift.
|
||
|
|
181c35b829 |
scrum_master fact extraction + verifier gate + schema_version bump
Three bundled changes that round out the KB enrichment pipeline (PR #9 commits B/C/D compressed into one — they all touch the same persist surfaces so splitting them would just add noise): B. scrum_master reviews now route accepted review bodies through fact_extractor (same llm_team extract pipeline as inference) and append to data/_kb/audit_facts.jsonl tagged source:"scrum_review". One KB, two producers — downstream consumers can filter by source when they care about provenance. Skips reviews <120 chars (one-liners / LGTM-type comments with no extractable knowledge). C. Verifier-gated fact persistence. fact_extractor now parses the verifier's free-form prose into per-fact verdicts (CORRECT / INCORRECT / UNVERIFIABLE / UNCHECKED). Facts marked INCORRECT are dropped on write; CORRECT + UNVERIFIABLE + UNCHECKED are kept (dropping UNVERIFIABLE would lose ~90% of real signal — the verifier's prior-knowledge base doesn't know Lakehouse internals, so domain-specific facts read as UNVERIFIABLE by default). verifier_verdicts array is persisted alongside facts so downstream queries can surface high-confidence facts (CORRECT) separately from provisional ones (UNVERIFIABLE). schema_version:2 added to both scrum_reviews.jsonl and audit_facts.jsonl writes. Old (v1) rows remain readable; new rows get the field so the forward-compat reader in kb_query can differentiate. D. scrum_master_reviewed:true flag added to scrum_reviews.jsonl rows on accept. Future kb_query surfacing can filter by this (e.g., "show me PRs where a scrum review exists vs only inference" as governance signal). Also carried into audit_facts.jsonl when the scrum_review source path writes there. |
||
|
|
77650c4ba3 |
auditor: inference curation layer + llm_team fact extraction → KB
Closes the cycle J asked for: curated cloud output lands structured knowledge in the KB so future audits have architectural context, not just a log of per-finding signatures. Three pieces: 1. Inference curation (tree-split) — when diff > 30KB, shard at 4.5KB, summarize each shard via cloud (temp=0, think=false on small shards; think=true on main call). Merge into scratchpad. The cloud verification then runs against the scratchpad, not truncated raw. Eliminates the 40KB MAX_DIFF_CHARS truncation path for large PRs (PR #8 is 102KB — was losing 62KB). Anti-false-positive guard in the prompt: cloud is told scratchpad absence is NOT diff absence, so it doesn't flag curated-out symbols as missing. unflagged_gaps section is dropped entirely when curated (scratchpad can't ground them). 2. fact_extractor — TS client for llm_team_ui's extract-facts mode at localhost:5000/api/run. Sends curated scratchpad through qwen2.5 extractor + gemma2 verifier, parses SSE stream, returns structured {facts, entities, relationships, verification, llm_team_run_id}. Best-effort: if llm_team is down, extraction fails silently and the audit still completes. AWAITED so CLI tools (audit_one.ts) don't exit before extraction lands — the systemd poller has 90s headroom so the extra ~15s doesn't matter. 3. audit_facts.jsonl + checkAuditFacts() — one row per curated audit with the extraction result. kb_query tails the jsonl, explodes entity rows, aggregates by entity name with distinct-PR counting, surfaces entities recurring in 2+ PRs as info findings. Filters out short names (<3 chars, extractor truncation artifacts) and generic types (string/number/etc.) so signal isn't drowned. Verified end-to-end on PR #8: 102KB diff → 23 shards → 1KB scratchpad → qwen2.5 extracted 4 facts + 6 entities + 6 relationships (real code-level knowledge: AggregateOptions<T> type, aggregate<T> async function with real signature, typed relationships). llm_team_run_id cross-references to llm_team's own team_runs table. Also: audit.ts passes (pr_number, head_sha) as InferenceContext so extracted facts are scope-tagged for the KB index. |