Three bundled changes that round out the KB enrichment pipeline
(PR #9 commits B/C/D compressed into one — they all touch the same
persist surfaces so splitting them would just add noise):
B. scrum_master reviews now route accepted review bodies through
fact_extractor (same llm_team extract pipeline as inference) and
append to data/_kb/audit_facts.jsonl tagged source:"scrum_review".
One KB, two producers — downstream consumers can filter by source
when they care about provenance. Skips reviews <120 chars
(one-liners / LGTM-type comments with no extractable knowledge).
C. Verifier-gated fact persistence. fact_extractor now parses the
verifier's free-form prose into per-fact verdicts (CORRECT /
INCORRECT / UNVERIFIABLE / UNCHECKED). Facts marked INCORRECT are
dropped on write; CORRECT + UNVERIFIABLE + UNCHECKED are kept
(dropping UNVERIFIABLE would lose ~90% of real signal — the
verifier's prior-knowledge base doesn't know Lakehouse internals,
so domain-specific facts read as UNVERIFIABLE by default).
verifier_verdicts array is persisted alongside facts so downstream
queries can surface high-confidence facts (CORRECT) separately
from provisional ones (UNVERIFIABLE).
schema_version:2 added to both scrum_reviews.jsonl and
audit_facts.jsonl writes. Old (v1) rows remain readable; new rows
get the field so the forward-compat reader in kb_query can
differentiate.
D. scrum_master_reviewed:true flag added to scrum_reviews.jsonl
rows on accept. Future kb_query surfacing can filter by this
(e.g., "show me PRs where a scrum review exists vs only inference"
as governance signal). Also carried into audit_facts.jsonl when
the scrum_review source path writes there.
Extends the scrum-master pipeline to handle input overflow on large
source files (>6KB). Previously, the review prompt truncated the file
to first-chunk, which caused false-positive "field is missing"
findings whenever the actual field was past the cutoff.
Now each file >FILE_TREE_SPLIT_THRESHOLD (6000) is sharded at
FILE_SHARD_SIZE (3500), each shard summarized via gpt-oss:120b cloud,
and the distillations merged into a scratchpad. The review then runs
against the scratchpad with an explicit truncation-awareness clause
in the prompt: "DO NOT claim any field, function, or feature is
'missing' based on its absence from this distillation."
Also writes each accepted review as a JSONL row to
data/_kb/scrum_reviews.jsonl (file, reviewed_at, accepted_model,
accepted_on_attempt, attempts_made, tree_split_fired, preview).
This is the source the auditor's kb_query reads to surface
per-file scrum reviews on PRs that touch those files (cohesion
plan Phase C).
Verified: scrum review of 92KB playbook_memory.rs → 27 shards via
cloud → distilled scratchpad → qwen3.5 local 7B accepted on attempt 1
(5931 chars). Tree-split fires, jsonl row appended, output file
contains structured suggestions.
The orchestrator J described: pulls git repo source + PRD +
suggested-changes doc, chunks them, hands each code piece through
the proven escalation ladder with learning context, collects
per-file suggestions in a consolidated handoff report.
Composes ONLY already-shipped primitives — no new core code:
- chunker with 800-char / 120-overlap windows
- sidecar /embed for real nomic-embed-text embeddings
- in-memory cosine retrieval for top-5 PRD + top-5 proposal
chunks per target file
- escalation ladder (qwen3.5 → qwen3 → gpt-oss:20b → gpt-oss:120b
→ devstral-2:123b → mistral-large-3:675b)
- per-attempt learning-context injection (prior failures as
"do not repeat" block)
- acceptance rubric (length ≥ 200 chars + structured form)
Live-run (tests/real-world/runs/scrum_moatqkee/):
targets: 3 files
- crates/vectord/src/playbook_memory.rs (920 lines)
- crates/vectord/src/doc_drift.rs (163 lines)
- auditor/audit.ts (170 lines)
resolved: 3/3 on attempt 1 by qwen3.5:latest local 7B
total duration: 111.7s
output: scrum_report.md + per-file JSON
Sample from scrum_report.md (playbook_memory.rs review):
- Alignment score: 9/10 vs PRD Phase 19
- 4 concrete change suggestions naming specific lines + PLAN/PRD
chunk offsets
- 3 gap analyses with PRD-reference citations
Honest findings from this run:
1. Local 7B handled review-style tasks first-try. The escalation
ladder infrastructure is live but didn't fire — review is an
easier task shape than strict code-generation (see hard_task
test which needed devstral-2 specialist).
2. 6KB file-truncation caused one false positive: model claimed
playbook_memory.rs lacks a `doc_refs` field, but that field
exists past the 6KB cutoff. Trade-off between context-size
and review-depth needs tuning per file.
3. Chunk-offset citations are real: model output includes
`[PRD @27880]` and `[PLAN @16320]` which map to the actual
byte offsets of retrieved context chunks. Auditor pattern could
adopt this for traceable claims.
This is the scrum-master-handoff shape J asked for:
repo + PRD + proposal → chunk → retrieve → escalate → consolidate
→ human-reviewable markdown report
Not shipping: per-PR diff analysis, open-PR integration, Gitea
posting of suggestions. Those compose the same primitives
differently — this proves the core pattern.
Env override: LH_SCRUM_FILES=path1,path2,... to target a different
file set. Default 3 files keeps runtime ~2min.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>