Claude (review-harness setup) ab550a7c5a Apply B5 from 2026-04-30 scrum — scanner skip-list scoped to harness self
Opus-only BLOCK from the cross-lineage scrum: pre-fix SkipDirs
basename-matched bin/build/dist/target/reports for ANY repo,
silently excluding legitimate source dirs on real targets. The
lakehouse Rust repo has reports/ holding markdown; some Java/
Python/Go projects use bin/ as a source dir; target/ is project-
specific. Skipping them globally produced silent false-negative
scans the operator would never know about.

Fix: trim SkipDirs to dirs that are universally not source code —
.git, .hg, .svn (VCS metadata); node_modules, vendor (dep caches);
__pycache__, .venv, venv (Python envs); .idea, .vscode (editor state).
Removed: bin, build, dist, target, reports.

For the harness's own self-skip (it shouldn't scan its own bin/
or reports/), added path-scoped skip via selfSkipsFor — detects
"this is the harness repo" by the presence of BOTH
cmd/review-harness/ AND internal/analyzers/ subdirs (combination
unique to this codebase), then skips the absolute paths bin/ and
reports/ for that scan only.

Two regression tests:
- TestWalk_DoesNotSkipBinReportsInTargetRepo plants files under
  bin/, reports/, build/, dist/, target/ in a synthetic target
  repo; asserts all 5 appear in scan, while .git/ + node_modules/
  + vendor/ are still skipped.
- TestWalk_SelfSkipsBinReportsInHarnessRepo plants the harness's
  marker dirs (cmd/review-harness/, internal/analyzers/) plus
  bin/ + reports/ + ordinary src/; asserts self-skip fires on
  bin/+reports/ but real src/ scans normally.

Compiled artifacts inside bin/ are filtered by the analyzers'
isTextLike extension check (.exe / .dll / .so), so target repos
with bin/ holding compiled output don't waste cycles decoding it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 01:34:45 -05:00

128 lines
3.4 KiB
Go

package scanner
import (
"os"
"path/filepath"
"testing"
)
// TestWalk_DoesNotSkipBinReportsInTargetRepo locks in scrum fix B5.
// Pre-fix the SkipDirs map basename-matched bin/reports/build/dist
// for ANY repo, silently excluding legitimate source directories on
// real targets (lakehouse Rust has reports/ with markdown). Now
// only universal noise (.git, node_modules, vendor, .venv, .idea,
// etc.) is basename-skipped; harness self-skip is path-scoped.
func TestWalk_DoesNotSkipBinReportsInTargetRepo(t *testing.T) {
repo := t.TempDir()
// Plant files under names that USED to be silently skipped.
for _, p := range []string{
"bin/script.go",
"reports/findings.md",
"build/notes.txt",
"dist/release.md",
"target/output.go",
// Universal-noise dirs SHOULD still skip.
".git/HEAD",
"node_modules/foo/index.js",
"vendor/dep/lib.go",
} {
full := filepath.Join(repo, p)
if err := os.MkdirAll(filepath.Dir(full), 0o755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(full, []byte("test\n"), 0o644); err != nil {
t.Fatal(err)
}
}
res, err := Walk(repo, false)
if err != nil {
t.Fatalf("Walk: %v", err)
}
seen := map[string]bool{}
for _, f := range res.Files {
seen[f.Path] = true
}
// These MUST appear (B5 fix — used to be silently skipped).
mustHave := []string{
"bin/script.go",
"reports/findings.md",
"build/notes.txt",
"dist/release.md",
"target/output.go",
}
for _, p := range mustHave {
if !seen[p] {
t.Errorf("expected %q in scan; got %v", p, mapKeys(seen))
}
}
// These MUST be skipped (universal noise).
mustNotHave := []string{
".git/HEAD",
"node_modules/foo/index.js",
"vendor/dep/lib.go",
}
for _, p := range mustNotHave {
if seen[p] {
t.Errorf("unexpected %q in scan (universal-noise dir)", p)
}
}
}
// TestWalk_SelfSkipsBinReportsInHarnessRepo locks in the path-scoped
// self-skip — when the scanner detects it's being run against the
// harness's own tree (cmd/review-harness AND internal/analyzers
// present), it skips bin/ + reports/ to avoid recursing into its
// own build output and run artifacts.
func TestWalk_SelfSkipsBinReportsInHarnessRepo(t *testing.T) {
repo := t.TempDir()
// Plant the marker dirs that signal "this is the harness repo"
for _, p := range []string{
"cmd/review-harness/main.go",
"internal/analyzers/checks.go",
"bin/review-harness", // build output — should skip
"reports/latest/x.json", // runtime — should skip
"src/real_code.go", // ordinary source — should appear
} {
full := filepath.Join(repo, p)
_ = os.MkdirAll(filepath.Dir(full), 0o755)
_ = os.WriteFile(full, []byte("x\n"), 0o644)
}
res, err := Walk(repo, false)
if err != nil {
t.Fatalf("Walk: %v", err)
}
seen := map[string]bool{}
for _, f := range res.Files {
seen[f.Path] = true
}
for _, want := range []string{
"cmd/review-harness/main.go",
"internal/analyzers/checks.go",
"src/real_code.go",
} {
if !seen[want] {
t.Errorf("expected %q in self-scan; got %v", want, mapKeys(seen))
}
}
// Self-skip should fire on bin/ and reports/.
if seen["bin/review-harness"] {
t.Errorf("bin/ should be self-skipped on harness repo")
}
if seen["reports/latest/x.json"] {
t.Errorf("reports/ should be self-skipped on harness repo")
}
}
func mapKeys(m map[string]bool) []string {
out := make([]string, 0, len(m))
for k := range m {
out = append(out, k)
}
return out
}