3 Commits

Author SHA1 Message Date
profit
56dbfb7d03 fact_extractor: project context + fixed verifier-verdict parser
Some checks failed
lakehouse/auditor 8 warnings — see review
Two bundled changes. Both came out of J's observation that the
verifier was defaulting to UNVERIFIABLE on domain-specific facts
because it had no idea what Lakehouse was, which project's code it
was reading, or what framework the types belonged to.

1. Project context preamble. Added docs/AUDITOR_CONTEXT.md — a <400-
   word concise description of the project (crates, services,
   architecture phases, the auditor's role itself). fact_extractor
   reads it once, caches it, prepends it to the extract prompt as a
   "PROJECT CONTEXT (for grounding; do NOT extract from this)"
   section. Both extractor and verifier now see this context, so
   statements like "aggregate<T> returns Map<string, AggregateRow>"
   get grounded as "this is a TypeScript function in the Lakehouse
   auditor subsystem" and the verifier can reason about plausibility
   instead of guessing.

2. Verifier-verdict parser fix. Gemma2's output format varies between
   "**Verdict:** CORRECT" and just "* **CORRECT**" inline (observed
   variance across runs). The old regex required "Verdict:" as a
   label and missed the second format — causing all verdicts to
   stay UNCHECKED. Replaced with a two-pass approach: find each
   fact section start ("**N.**" or "N."), slice to the next section,
   scan the slice for the first CORRECT|INCORRECT|UNVERIFIABLE
   token. Handles both formats plus unfenced fallback.

Verified: 4-fact test extraction went from 0/4 verdicts scored
(pre-fix) to 2/4 CORRECT + 2/4 UNVERIFIABLE (post-fix). The 2
UNVERIFIABLE cases are domain-specific code behavior the verifier
legitimately can't confirm without reading source — correct stance,
not a parser miss.

No new consensus modes yet. J suggested adding codereview or
validator as a second pass; holding until we see whether context
injection alone gives sufficient signal lift.
2026-04-23 00:26:01 -05:00
profit
181c35b829 scrum_master fact extraction + verifier gate + schema_version bump
Three bundled changes that round out the KB enrichment pipeline
(PR #9 commits B/C/D compressed into one — they all touch the same
persist surfaces so splitting them would just add noise):

B. scrum_master reviews now route accepted review bodies through
   fact_extractor (same llm_team extract pipeline as inference) and
   append to data/_kb/audit_facts.jsonl tagged source:"scrum_review".
   One KB, two producers — downstream consumers can filter by source
   when they care about provenance. Skips reviews <120 chars
   (one-liners / LGTM-type comments with no extractable knowledge).

C. Verifier-gated fact persistence. fact_extractor now parses the
   verifier's free-form prose into per-fact verdicts (CORRECT /
   INCORRECT / UNVERIFIABLE / UNCHECKED). Facts marked INCORRECT are
   dropped on write; CORRECT + UNVERIFIABLE + UNCHECKED are kept
   (dropping UNVERIFIABLE would lose ~90% of real signal — the
   verifier's prior-knowledge base doesn't know Lakehouse internals,
   so domain-specific facts read as UNVERIFIABLE by default).

   verifier_verdicts array is persisted alongside facts so downstream
   queries can surface high-confidence facts (CORRECT) separately
   from provisional ones (UNVERIFIABLE).

   schema_version:2 added to both scrum_reviews.jsonl and
   audit_facts.jsonl writes. Old (v1) rows remain readable; new rows
   get the field so the forward-compat reader in kb_query can
   differentiate.

D. scrum_master_reviewed:true flag added to scrum_reviews.jsonl
   rows on accept. Future kb_query surfacing can filter by this
   (e.g., "show me PRs where a scrum review exists vs only inference"
   as governance signal). Also carried into audit_facts.jsonl when
   the scrum_review source path writes there.
2026-04-22 23:40:21 -05:00
profit
77650c4ba3 auditor: inference curation layer + llm_team fact extraction → KB
Closes the cycle J asked for: curated cloud output lands structured
knowledge in the KB so future audits have architectural context, not
just a log of per-finding signatures.

Three pieces:

1. Inference curation (tree-split) — when diff > 30KB, shard at 4.5KB,
   summarize each shard via cloud (temp=0, think=false on small
   shards; think=true on main call). Merge into scratchpad. The cloud
   verification then runs against the scratchpad, not truncated raw.
   Eliminates the 40KB MAX_DIFF_CHARS truncation path for large PRs
   (PR #8 is 102KB — was losing 62KB). Anti-false-positive guard in
   the prompt: cloud is told scratchpad absence is NOT diff absence,
   so it doesn't flag curated-out symbols as missing. unflagged_gaps
   section is dropped entirely when curated (scratchpad can't ground
   them).

2. fact_extractor — TS client for llm_team_ui's extract-facts mode at
   localhost:5000/api/run. Sends curated scratchpad through qwen2.5
   extractor + gemma2 verifier, parses SSE stream, returns structured
   {facts, entities, relationships, verification, llm_team_run_id}.
   Best-effort: if llm_team is down, extraction fails silently and
   the audit still completes. AWAITED so CLI tools (audit_one.ts)
   don't exit before extraction lands — the systemd poller has 90s
   headroom so the extra ~15s doesn't matter.

3. audit_facts.jsonl + checkAuditFacts() — one row per curated audit
   with the extraction result. kb_query tails the jsonl, explodes
   entity rows, aggregates by entity name with distinct-PR counting,
   surfaces entities recurring in 2+ PRs as info findings. Filters
   out short names (<3 chars, extractor truncation artifacts) and
   generic types (string/number/etc.) so signal isn't drowned.

Verified end-to-end on PR #8: 102KB diff → 23 shards → 1KB scratchpad
→ qwen2.5 extracted 4 facts + 6 entities + 6 relationships (real
code-level knowledge: AggregateOptions<T> type, aggregate<T> async
function with real signature, typed relationships). llm_team_run_id
cross-references to llm_team's own team_runs table.

Also: audit.ts passes (pr_number, head_sha) as InferenceContext so
extracted facts are scope-tagged for the KB index.
2026-04-22 23:09:14 -05:00