Cross-lineage scrum (Opus 4.7 / Kimi K2.6 / Qwen3-coder via chatd's
/v1/chat) on the harness's first 4 commits surfaced 5 real bugs;
this commit lands the 4 inside the LLM/validator stack. B5 (scanner
skip-list semantics) ships separately as it changes scan behavior
on every target repo.
B1 (Kimi BLOCK + Opus WARN convergent) — internal/validators:
evidencePresent had two flaws: (1) cursor advanced on match in the
trim-line fallback, breaking same-line repeated matches AND skipping
not-yet-considered lines so out-of-order evidence spuriously failed;
(2) strings.Contains on a single `}` trim-matched any closing brace
in the file, defeating the "evidence quotes real text" contract.
Fix: trivial-evidence guard FIRST (reject anything <4 non-whitespace
chars) + per-line search no longer advances a cursor. New regression
test TestEvidencePresent_RejectsTrivialMatches covers `}`, `{`, `)`,
empty, and out-of-order multi-line evidence (which now passes —
order isn't part of the contract).
B2 (Kimi WARN + Opus WARN convergent) — internal/pipeline:
WriteJSON error for rejected-findings.json was swallowed with
`if err == nil`, so a write failure left the validation phase
reporting status="ok" while the audit trail vanished. Mirror the
validated-findings branch: surface the error in
validatePhase.Errors + bump status to degraded + ExitCode=66.
B3 (Kimi BLOCK + Opus BLOCK convergent) — internal/llm/ollama.go:
HealthCheck.basic_prompt_ok was set to true on ANY non-empty
response, so a model emitting `<think>...` traces or apologies
passed silently. Now requires the response to contain "OK"
(uppercase, substring). Substring rather than equality lets minor
whitespace/punctuation variations through (some models add a
trailing period). Errors now record what the model actually said
when it fails the check.
B4 (Opus BLOCK only — same class as today's chatd Anthropic-temp
fix) — internal/llm/ollama.go: chatBody had `if opts.Temperature != 0`
which silently dropped Temperature=0 from the request, so HealthCheck
+ Reviewer (both pass Temperature=0 expecting determinism) actually
ran at Ollama's ~0.8 default. Always forward Temperature now. The
two callers always set explicit values, so "0 means 0" is correct;
if a future caller wants Ollama's default they'll switch
CompleteOptions.Temperature to *float64 like chatd did this morning.
Verified end-to-end: insecure-repo + --enable-llm still produces 25
confirmed findings (16 static + 9 LLM), 0 rejected. Validator unit
tests: 11 pass (added TestEvidencePresent_RejectsTrivialMatches).
Same-day-as-shipping scrum, same-day-as-shipping fixes. The
convergent-≥2 gate caught 3 of these; the 4th was Opus-only but
verified by reading the code (same idiom as today's chatd bug).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
280 lines
9.1 KiB
Go
280 lines
9.1 KiB
Go
// Package validators cross-checks LLM-generated findings against
|
|
// real repository evidence. PROMPT.md / REVIEW_PIPELINE.md Phase 3:
|
|
// "AI may suggest. Code validates." Findings that pass validation
|
|
// move from status=suspected → status=confirmed; failures land in a
|
|
// separate rejected-findings.json with a per-rejection reason.
|
|
//
|
|
// V0 implements 3 hard checks per the PROMPT.md "Reject A Finding If"
|
|
// list:
|
|
// - file does not exist
|
|
// - cited evidence does not exist verbatim in the file
|
|
// - line hint is impossible (file has fewer lines than claimed)
|
|
//
|
|
// 3 softer checks from the same list are NOT v0 — documented as
|
|
// "open" so the audit trail is honest:
|
|
// - claim is unsupported (semantic, requires another LLM pass)
|
|
// - suggested fix targets unrelated code (semantic)
|
|
// - model invents tests/commands/files (covered by file-exists for
|
|
// files; tests/commands need a Phase D+1 fact-check)
|
|
package validators
|
|
|
|
import (
|
|
"fmt"
|
|
"os"
|
|
"path/filepath"
|
|
"regexp"
|
|
"strconv"
|
|
"strings"
|
|
|
|
"local-review-harness/internal/analyzers"
|
|
)
|
|
|
|
// Reason captures why a finding was rejected. Stable strings so
|
|
// reports + receipts can group/sort by reason.
|
|
type Reason string
|
|
|
|
const (
|
|
ReasonFileNotFound Reason = "file_not_found"
|
|
ReasonNoEvidence Reason = "evidence_not_in_file"
|
|
ReasonLineHintTooHigh Reason = "line_hint_exceeds_file_length"
|
|
ReasonEmptyEvidence Reason = "empty_evidence_field"
|
|
)
|
|
|
|
// Result is the validator's output for one finding.
|
|
type Result struct {
|
|
Finding analyzers.Finding `json:"finding"`
|
|
Validated bool `json:"validated"`
|
|
RejectionReason Reason `json:"rejection_reason,omitempty"`
|
|
RejectionDetail string `json:"rejection_detail,omitempty"`
|
|
}
|
|
|
|
// Outputs split the input list into validated + rejected. Only LLM
|
|
// findings (Source == SourceLLM) get validated — static findings
|
|
// already have grep-able evidence by construction.
|
|
type Outputs struct {
|
|
Validated []analyzers.Finding `json:"-"` // promoted to confirmed
|
|
Rejected []Result `json:"rejected"`
|
|
Pass []Result `json:"pass"`
|
|
}
|
|
|
|
// Validate runs the 3 hard checks for every LLM finding. Static and
|
|
// validator-source findings pass through unchanged (they have their
|
|
// own evidence pipeline). Returns the validated set + the rejected
|
|
// set with per-rejection reason for the audit trail.
|
|
//
|
|
// repoPath is the absolute path the LLM was asked to review; finding
|
|
// File paths are joined under it.
|
|
func Validate(repoPath string, findings []analyzers.Finding) Outputs {
|
|
out := Outputs{}
|
|
contentCache := map[string]string{} // abs path → content (read once)
|
|
|
|
for _, f := range findings {
|
|
if f.Source != analyzers.SourceLLM {
|
|
// Non-LLM findings carry their own evidence path; pass through
|
|
// unchanged. The pipeline still ships them as-is.
|
|
f.Status = analyzers.StatusConfirmed
|
|
out.Validated = append(out.Validated, f)
|
|
out.Pass = append(out.Pass, Result{Finding: f, Validated: true})
|
|
continue
|
|
}
|
|
|
|
res := check(repoPath, f, contentCache)
|
|
if res.Validated {
|
|
res.Finding.Status = analyzers.StatusConfirmed
|
|
out.Validated = append(out.Validated, res.Finding)
|
|
out.Pass = append(out.Pass, res)
|
|
} else {
|
|
res.Finding.Status = analyzers.StatusRejected
|
|
out.Rejected = append(out.Rejected, res)
|
|
}
|
|
}
|
|
return out
|
|
}
|
|
|
|
// check is the per-finding validation logic. Stops at the first
|
|
// failure — operators only need to see one rejection reason.
|
|
func check(repoPath string, f analyzers.Finding, cache map[string]string) Result {
|
|
res := Result{Finding: f}
|
|
|
|
// Empty evidence is unusable — the model didn't quote anything.
|
|
if strings.TrimSpace(f.Evidence) == "" {
|
|
res.RejectionReason = ReasonEmptyEvidence
|
|
res.RejectionDetail = "finding has no evidence quote — can't be validated"
|
|
return res
|
|
}
|
|
|
|
// Resolve absolute path. The validator runs after the scanner has
|
|
// already classified the repo; we trust f.File is repo-relative.
|
|
// Both repoPath AND the joined target are converted to absolute
|
|
// before the path-traversal check — bug fixed 2026-04-30: prior
|
|
// version compared relative-abs to absolute-repoAbs and HasPrefix
|
|
// always failed, rejecting every real finding as file_not_found.
|
|
joined := f.File
|
|
if !filepath.IsAbs(joined) {
|
|
joined = filepath.Join(repoPath, f.File)
|
|
}
|
|
abs, err := filepath.Abs(joined)
|
|
if err != nil {
|
|
res.RejectionReason = ReasonFileNotFound
|
|
res.RejectionDetail = "abs(" + joined + "): " + err.Error()
|
|
return res
|
|
}
|
|
abs = filepath.Clean(abs)
|
|
|
|
// Refuse to traverse outside the repo (path-traversal protection
|
|
// — the LLM might have hallucinated a "../../../etc/passwd" file).
|
|
repoAbs, err := filepath.Abs(repoPath)
|
|
if err != nil {
|
|
res.RejectionReason = ReasonFileNotFound
|
|
res.RejectionDetail = "abs(" + repoPath + "): " + err.Error()
|
|
return res
|
|
}
|
|
repoAbs = filepath.Clean(repoAbs)
|
|
if !strings.HasPrefix(abs, repoAbs+string(filepath.Separator)) && abs != repoAbs {
|
|
res.RejectionReason = ReasonFileNotFound
|
|
res.RejectionDetail = fmt.Sprintf("path %q escapes repo root %q (resolved: abs=%q repo_abs=%q)", f.File, repoPath, abs, repoAbs)
|
|
return res
|
|
}
|
|
|
|
// Read once + cache.
|
|
content, ok := cache[abs]
|
|
if !ok {
|
|
b, err := os.ReadFile(abs)
|
|
if err != nil {
|
|
res.RejectionReason = ReasonFileNotFound
|
|
res.RejectionDetail = err.Error()
|
|
return res
|
|
}
|
|
content = string(b)
|
|
cache[abs] = content
|
|
}
|
|
|
|
// Evidence presence check — the verbatim quote MUST appear in the
|
|
// file. Tolerate leading/trailing whitespace differences (models
|
|
// often re-indent quotes); compare on trim. Multi-line evidence
|
|
// is matched as-is (newlines preserved).
|
|
if !evidencePresent(content, f.Evidence) {
|
|
res.RejectionReason = ReasonNoEvidence
|
|
res.RejectionDetail = fmt.Sprintf("evidence %q not found in %s", abbrev(f.Evidence, 80), f.File)
|
|
return res
|
|
}
|
|
|
|
// Line hint plausibility — parse "42" or "10-20" or "line 42";
|
|
// reject if file has fewer lines than the highest cited number.
|
|
if hint := strings.TrimSpace(f.LineHint); hint != "" {
|
|
hi, ok := highestLine(hint)
|
|
if ok {
|
|
fileLines := strings.Count(content, "\n") + 1
|
|
if hi > fileLines {
|
|
res.RejectionReason = ReasonLineHintTooHigh
|
|
res.RejectionDetail = fmt.Sprintf("line %d cited but file has only %d lines", hi, fileLines)
|
|
return res
|
|
}
|
|
}
|
|
}
|
|
|
|
res.Validated = true
|
|
return res
|
|
}
|
|
|
|
// evidencePresent returns true if the evidence appears verbatim in
|
|
// the file, OR every evidence line trim-matches some line in the
|
|
// file (models often re-indent quotes when quoting code).
|
|
//
|
|
// Scrum fix B1 (Kimi BLOCK + Opus WARN, 2026-04-30):
|
|
// - reject trivially-matchable evidence FIRST (empty, lone braces,
|
|
// single-char/punct quotes); strings.Contains on tiny strings
|
|
// hits half the file and lets a non-quoting LLM pass
|
|
// - per-line trim-match no longer advances a cursor on hit; earlier
|
|
// version drove cursor forward unconditionally, both preventing
|
|
// same-line repeated matches and skipping unseen lines, so
|
|
// out-of-order evidence spuriously failed
|
|
//
|
|
// Order is no longer enforced — every evidence line just needs to
|
|
// appear somewhere in the file. The contract is "evidence quotes
|
|
// real text from the file," not "evidence quotes contiguous text in
|
|
// the same order."
|
|
func evidencePresent(content, evidence string) bool {
|
|
trimmed := strings.TrimSpace(evidence)
|
|
if trimmed == "" {
|
|
return false
|
|
}
|
|
// Trivial-match guard: if the *entire* evidence is shorter than 4
|
|
// non-whitespace chars, reject regardless of how it's being matched.
|
|
// Lone `}` / `{` / `)` / `(` would substring-hit any matching brace
|
|
// in the file. Min-length picked at 4 because real verbatim quotes
|
|
// (variable names, function calls) are essentially always longer.
|
|
if nonWSLen(trimmed) < 4 {
|
|
return false
|
|
}
|
|
if strings.Contains(content, evidence) {
|
|
return true
|
|
}
|
|
evLines := strings.Split(trimmed, "\n")
|
|
contentLines := strings.Split(content, "\n")
|
|
for _, ev := range evLines {
|
|
want := strings.TrimSpace(ev)
|
|
if want == "" {
|
|
continue
|
|
}
|
|
// Per-line trivial guard: even within a multi-line evidence
|
|
// block, a line of `}` shouldn't satisfy the "this evidence
|
|
// line appears in the file" check.
|
|
if nonWSLen(want) < 4 {
|
|
return false
|
|
}
|
|
found := false
|
|
for _, cl := range contentLines {
|
|
if strings.Contains(strings.TrimSpace(cl), want) {
|
|
found = true
|
|
break
|
|
}
|
|
}
|
|
if !found {
|
|
return false
|
|
}
|
|
}
|
|
return true
|
|
}
|
|
|
|
func nonWSLen(s string) int {
|
|
n := 0
|
|
for _, r := range s {
|
|
if r != ' ' && r != '\t' && r != '\n' && r != '\r' {
|
|
n++
|
|
}
|
|
}
|
|
return n
|
|
}
|
|
|
|
// highestLine extracts the largest line number cited in the hint.
|
|
// Accepts "42", "10-20" (returns 20), "line 42", "L42", "42:5".
|
|
// Returns (n, true) on parse; (0, false) if no number found.
|
|
var lineHintNumRe = regexp.MustCompile(`\d+`)
|
|
|
|
func highestLine(hint string) (int, bool) {
|
|
matches := lineHintNumRe.FindAllString(hint, -1)
|
|
if len(matches) == 0 {
|
|
return 0, false
|
|
}
|
|
hi := 0
|
|
for _, m := range matches {
|
|
n, err := strconv.Atoi(m)
|
|
if err != nil {
|
|
continue
|
|
}
|
|
if n > hi {
|
|
hi = n
|
|
}
|
|
}
|
|
return hi, hi > 0
|
|
}
|
|
|
|
func abbrev(s string, n int) string {
|
|
s = strings.TrimSpace(s)
|
|
if len(s) <= n {
|
|
return s
|
|
}
|
|
return s[:n] + "…"
|
|
}
|