Single biggest unblock for using the harness on real targets. The
lakehouse Rust repo has a 67GB data/ directory holding parquet,
JSONL pathway memory, headshots, and other runtime data — all
gitignored. Pre-fix the scanner walked it all (and stalled). Post-
fix the full Rust scan completes in 15s.
internal/scanner/gitignore.go — minimal Matcher that handles the
patterns real .gitignore files use ~99% of the time:
- basename match anywhere (`pattern`)
- dir-only match (`pattern/`)
- root-anchored (`/pattern`)
- path-anchored (`pattern/sub` — interior slash)
- extension globs (`*.ext`)
- path + extension (`path/*.ext`)
- comments + blank lines ignored
Negations (!pattern) intentionally NOT supported v0; matcher records
HasNegations() so callers can surface a warning if encountered.
internal/scanner/gitignore_test.go — 14 cases against a synthetic
.gitignore covering all 6 pattern shapes, plus missing-file and
negation-recording tests.
walk.go integration: gitignore loaded once at scan start; checked
in the dir-skip branch (SkipDir cascades) and the file-emit branch.
Skip layers in order: universal-noise basenames → .gitignore →
path-scoped self-skip → dotfile filter.
Verified end-to-end:
- lakehouse Rust full repo: 15s scan, 1031 findings, 0 critical
(no committed secrets in source — independently confirms what
scrum2 + the Rust auditor said)
- 529 hardcoded-path findings IS the Sprint 4 gap the audit kept
naming; the harness just put a number on it
This was Opus's WARN B5 from the cross-lineage scrum, plus the
"harness stalls on real repos" gap exposed when running it against
the actual Lakehouse repos. Both addressed in one wave.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>