Claude (review-harness setup) 4dc53c5798 Phase D — validator cross-checks LLM findings + 2 close-out fixes
Implements PROMPT.md / docs/REVIEW_PIPELINE.md Phase 3:
"AI may suggest. Code validates."

internal/validators/validate.go — 3 hard checks per the
"Reject A Finding If" list:
- file does not exist (with path-traversal guard against the LLM
  hallucinating ../../../etc/passwd)
- cited evidence does not appear in the file (verbatim or
  trim-line-by-line — models often re-indent quotes when quoting code)
- line hint exceeds file line count

3 soft checks documented as open (claim semantics, suggested-fix
relevance, invented tests/commands — all need another LLM pass).

internal/validators/validate_test.go — 9 tests including:
- TestValidate_RejectsNonexistentFile (gate D1)
- TestValidate_RejectsEvidenceNotInFile
- TestValidate_RejectsLineHintBeyondFile
- TestValidate_AcceptsRealFinding
- TestValidate_AcceptsEvidenceWithDifferentLeadingWhitespace
- TestValidate_RejectsEmptyEvidence
- TestValidate_PassesThroughStaticFindings
- TestValidate_RejectsPathEscapingRepo (path-traversal protection)
- TestValidate_AcceptsRelativeRepoPath (the regression — see below)

Pipeline phase 3 wired between LLM review (Phase C) and report gen
(Phase 4). validated-findings.json contains the confirmed set;
rejected-findings.json contains rejects with per-finding reason +
detail. Receipt phase entry honest about output files + status.

=== Bug J caught ===

First Phase D run rejected EVERY real LLM finding as file_not_found
because the path-traversal check compared a relative joined path
(`tests/fixtures/insecure-repo/src/handler.go`) against an absolute
repoAbs (`/home/profit/share/.../insecure-repo`), so HasPrefix
always returned false. Both sides now resolved via filepath.Abs
before comparison. Regression test
TestValidate_AcceptsRelativeRepoPath locks this in — runs the
validator against a relative repo path AND a relative chdir, the
exact shape that hit the bug.

J's framing was honest: "I don't know what the problem is, but you
know what we're trying to accomplish." The fix-it-yourself signal
let me trace through the rejection details + see the smoking gun
in the detail string ("escapes repo root"). Without that prompt the
9 false rejections might have looked like real LLM bugs.

=== 2 close-out fixes ===

1. .gitignore: changed `/reports/latest/` → `**/reports/latest/`
   (and same for `run-*`). Phase C committed 22 generated files
   from `tests/fixtures/*/reports/latest/` because the original
   pattern was anchored at the harness root only. Existing tracked
   files removed via git rm --cached; new pattern keeps fixture
   reports out of version control going forward.

2. pipeline.cleanOutputDir: pipeline now deletes the bounded list
   of known per-run files at the start of each run. Before this,
   a prior run's rejected-findings.json could linger when the
   current run had no rejections — confused J during the bug hunt
   above. cleanOutputDir is bounded (deletes only files we emit)
   so operator-owned adjacent files stay.

Verified end-to-end: insecure-repo + --enable-llm → 25 confirmed
findings (16 static + 9 LLM), 0 rejected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 01:24:02 -05:00

248 lines
7.9 KiB
Go

// Package validators cross-checks LLM-generated findings against
// real repository evidence. PROMPT.md / REVIEW_PIPELINE.md Phase 3:
// "AI may suggest. Code validates." Findings that pass validation
// move from status=suspected → status=confirmed; failures land in a
// separate rejected-findings.json with a per-rejection reason.
//
// V0 implements 3 hard checks per the PROMPT.md "Reject A Finding If"
// list:
// - file does not exist
// - cited evidence does not exist verbatim in the file
// - line hint is impossible (file has fewer lines than claimed)
//
// 3 softer checks from the same list are NOT v0 — documented as
// "open" so the audit trail is honest:
// - claim is unsupported (semantic, requires another LLM pass)
// - suggested fix targets unrelated code (semantic)
// - model invents tests/commands/files (covered by file-exists for
// files; tests/commands need a Phase D+1 fact-check)
package validators
import (
"fmt"
"os"
"path/filepath"
"regexp"
"strconv"
"strings"
"local-review-harness/internal/analyzers"
)
// Reason captures why a finding was rejected. Stable strings so
// reports + receipts can group/sort by reason.
type Reason string
const (
ReasonFileNotFound Reason = "file_not_found"
ReasonNoEvidence Reason = "evidence_not_in_file"
ReasonLineHintTooHigh Reason = "line_hint_exceeds_file_length"
ReasonEmptyEvidence Reason = "empty_evidence_field"
)
// Result is the validator's output for one finding.
type Result struct {
Finding analyzers.Finding `json:"finding"`
Validated bool `json:"validated"`
RejectionReason Reason `json:"rejection_reason,omitempty"`
RejectionDetail string `json:"rejection_detail,omitempty"`
}
// Outputs split the input list into validated + rejected. Only LLM
// findings (Source == SourceLLM) get validated — static findings
// already have grep-able evidence by construction.
type Outputs struct {
Validated []analyzers.Finding `json:"-"` // promoted to confirmed
Rejected []Result `json:"rejected"`
Pass []Result `json:"pass"`
}
// Validate runs the 3 hard checks for every LLM finding. Static and
// validator-source findings pass through unchanged (they have their
// own evidence pipeline). Returns the validated set + the rejected
// set with per-rejection reason for the audit trail.
//
// repoPath is the absolute path the LLM was asked to review; finding
// File paths are joined under it.
func Validate(repoPath string, findings []analyzers.Finding) Outputs {
out := Outputs{}
contentCache := map[string]string{} // abs path → content (read once)
for _, f := range findings {
if f.Source != analyzers.SourceLLM {
// Non-LLM findings carry their own evidence path; pass through
// unchanged. The pipeline still ships them as-is.
f.Status = analyzers.StatusConfirmed
out.Validated = append(out.Validated, f)
out.Pass = append(out.Pass, Result{Finding: f, Validated: true})
continue
}
res := check(repoPath, f, contentCache)
if res.Validated {
res.Finding.Status = analyzers.StatusConfirmed
out.Validated = append(out.Validated, res.Finding)
out.Pass = append(out.Pass, res)
} else {
res.Finding.Status = analyzers.StatusRejected
out.Rejected = append(out.Rejected, res)
}
}
return out
}
// check is the per-finding validation logic. Stops at the first
// failure — operators only need to see one rejection reason.
func check(repoPath string, f analyzers.Finding, cache map[string]string) Result {
res := Result{Finding: f}
// Empty evidence is unusable — the model didn't quote anything.
if strings.TrimSpace(f.Evidence) == "" {
res.RejectionReason = ReasonEmptyEvidence
res.RejectionDetail = "finding has no evidence quote — can't be validated"
return res
}
// Resolve absolute path. The validator runs after the scanner has
// already classified the repo; we trust f.File is repo-relative.
// Both repoPath AND the joined target are converted to absolute
// before the path-traversal check — bug fixed 2026-04-30: prior
// version compared relative-abs to absolute-repoAbs and HasPrefix
// always failed, rejecting every real finding as file_not_found.
joined := f.File
if !filepath.IsAbs(joined) {
joined = filepath.Join(repoPath, f.File)
}
abs, err := filepath.Abs(joined)
if err != nil {
res.RejectionReason = ReasonFileNotFound
res.RejectionDetail = "abs(" + joined + "): " + err.Error()
return res
}
abs = filepath.Clean(abs)
// Refuse to traverse outside the repo (path-traversal protection
// — the LLM might have hallucinated a "../../../etc/passwd" file).
repoAbs, err := filepath.Abs(repoPath)
if err != nil {
res.RejectionReason = ReasonFileNotFound
res.RejectionDetail = "abs(" + repoPath + "): " + err.Error()
return res
}
repoAbs = filepath.Clean(repoAbs)
if !strings.HasPrefix(abs, repoAbs+string(filepath.Separator)) && abs != repoAbs {
res.RejectionReason = ReasonFileNotFound
res.RejectionDetail = fmt.Sprintf("path %q escapes repo root %q (resolved: abs=%q repo_abs=%q)", f.File, repoPath, abs, repoAbs)
return res
}
// Read once + cache.
content, ok := cache[abs]
if !ok {
b, err := os.ReadFile(abs)
if err != nil {
res.RejectionReason = ReasonFileNotFound
res.RejectionDetail = err.Error()
return res
}
content = string(b)
cache[abs] = content
}
// Evidence presence check — the verbatim quote MUST appear in the
// file. Tolerate leading/trailing whitespace differences (models
// often re-indent quotes); compare on trim. Multi-line evidence
// is matched as-is (newlines preserved).
if !evidencePresent(content, f.Evidence) {
res.RejectionReason = ReasonNoEvidence
res.RejectionDetail = fmt.Sprintf("evidence %q not found in %s", abbrev(f.Evidence, 80), f.File)
return res
}
// Line hint plausibility — parse "42" or "10-20" or "line 42";
// reject if file has fewer lines than the highest cited number.
if hint := strings.TrimSpace(f.LineHint); hint != "" {
hi, ok := highestLine(hint)
if ok {
fileLines := strings.Count(content, "\n") + 1
if hi > fileLines {
res.RejectionReason = ReasonLineHintTooHigh
res.RejectionDetail = fmt.Sprintf("line %d cited but file has only %d lines", hi, fileLines)
return res
}
}
}
res.Validated = true
return res
}
// evidencePresent returns true if the evidence appears verbatim in
// the file. Multi-line evidence is checked as-is; single-line evidence
// is also compared trimmed (models often add/drop leading whitespace
// when quoting code).
func evidencePresent(content, evidence string) bool {
if strings.Contains(content, evidence) {
return true
}
// Trim each line of the evidence + match line-by-line. Conservative:
// every evidence line must appear (in order) in the file's trimmed
// lines for the evidence to count as found.
evLines := strings.Split(strings.TrimSpace(evidence), "\n")
if len(evLines) == 0 {
return false
}
contentLines := strings.Split(content, "\n")
cursor := 0
for _, ev := range evLines {
want := strings.TrimSpace(ev)
if want == "" {
continue
}
found := false
for cursor < len(contentLines) {
if strings.Contains(strings.TrimSpace(contentLines[cursor]), want) {
found = true
cursor++
break
}
cursor++
}
if !found {
return false
}
}
return true
}
// highestLine extracts the largest line number cited in the hint.
// Accepts "42", "10-20" (returns 20), "line 42", "L42", "42:5".
// Returns (n, true) on parse; (0, false) if no number found.
var lineHintNumRe = regexp.MustCompile(`\d+`)
func highestLine(hint string) (int, bool) {
matches := lineHintNumRe.FindAllString(hint, -1)
if len(matches) == 0 {
return 0, false
}
hi := 0
for _, m := range matches {
n, err := strconv.Atoi(m)
if err != nil {
continue
}
if n > hi {
hi = n
}
}
return hi, hi > 0
}
func abbrev(s string, n int) string {
s = strings.TrimSpace(s)
if len(s) <= n {
return s
}
return s[:n] + "…"
}