Closes the harness's feature set per PROMPT.md modes 2 (Diff Review)
and Phase 5 (Memory). Rules subcommand still pending (it needs
operator-authored .review-rules.md content first; documented as
Phase E follow-up).
internal/memory/ — append-only writer:
- AppendKnownRisks: one JSONL line per confirmed finding per run.
O_APPEND only; never O_TRUNC. Empty findings list is a no-op
(doesn't even create the file — keeps clean runs from polluting
.memory/).
- AppendRunHistory: one JSONL line per run. Run summary stats +
receipts hash for cross-link.
- WriteProjectProfile: the ONLY non-versioned memory file; snapshot
semantics, overwrites are explicit + documented.
- 4 unit tests including TestAppendKnownRisks_NeverTruncates which
is the audit's "no silent overwrite" gate — write twice, assert
both writes' content survives.
Pipeline phase 5 wires it. Confirmed findings only — suspected
findings might still be wrong, keeping .memory/ authoritative.
Disabled if review-profile.memory.enabled = false.
internal/git/git.go — ChangedFiles helper:
- Probes unstaged + staged + branch diff against main/master.
- Dedup'd, stable order. Empty result on clean tree.
- Graceful failure: returns error if git binary missing or target
isn't a git repo.
cli/repo.go — Diff subcommand:
- `review-harness diff <path>` runs the same pipeline as scrum but
scoped to changed files only. Pipeline.Inputs gains DiffOnlyFiles
filter applied post-Walk.
- Empty diff (clean tree, no commits ahead of base) → exit 0 with
message; doesn't generate empty reports.
- LLM toggleable via --enable-llm same as scrum.
scanner/walk.go: added .memory to SkipDirs (universal — harness's
own audit trail, scanning it surfaces planted-secret evidence as
new findings — same class as B5 self-skip).
.gitignore tightened: /.memory/ → **/.memory/ to keep test-fixture
.memory dirs from leaking into version control (same fix as
reports/latest pattern).
Verified end-to-end:
- 4 memory unit tests PASS
- Append-only proven: insecure-repo run 1 → 16 known-risks lines;
run 2 → 44 lines (16 + 28 from new run); run-history grew 1 → 2.
- Diff subcommand against this repo (5 uncommitted Phase E files
staged) → exit 0, all reports produced, scoped to those 5 files
only (0 findings on the diff-scoped scan vs 129 on full repo —
changed files don't contain analyzer-flaggable patterns).
Phase A through E shipped today. Rules subcommand + tests for
internal/{config,scanner,git,llm,reporters,pipeline} remain.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single biggest unblock for using the harness on real targets. The
lakehouse Rust repo has a 67GB data/ directory holding parquet,
JSONL pathway memory, headshots, and other runtime data — all
gitignored. Pre-fix the scanner walked it all (and stalled). Post-
fix the full Rust scan completes in 15s.
internal/scanner/gitignore.go — minimal Matcher that handles the
patterns real .gitignore files use ~99% of the time:
- basename match anywhere (`pattern`)
- dir-only match (`pattern/`)
- root-anchored (`/pattern`)
- path-anchored (`pattern/sub` — interior slash)
- extension globs (`*.ext`)
- path + extension (`path/*.ext`)
- comments + blank lines ignored
Negations (!pattern) intentionally NOT supported v0; matcher records
HasNegations() so callers can surface a warning if encountered.
internal/scanner/gitignore_test.go — 14 cases against a synthetic
.gitignore covering all 6 pattern shapes, plus missing-file and
negation-recording tests.
walk.go integration: gitignore loaded once at scan start; checked
in the dir-skip branch (SkipDir cascades) and the file-emit branch.
Skip layers in order: universal-noise basenames → .gitignore →
path-scoped self-skip → dotfile filter.
Verified end-to-end:
- lakehouse Rust full repo: 15s scan, 1031 findings, 0 critical
(no committed secrets in source — independently confirms what
scrum2 + the Rust auditor said)
- 529 hardcoded-path findings IS the Sprint 4 gap the audit kept
naming; the harness just put a number on it
This was Opus's WARN B5 from the cross-lineage scrum, plus the
"harness stalls on real repos" gap exposed when running it against
the actual Lakehouse repos. Both addressed in one wave.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Opus-only BLOCK from the cross-lineage scrum: pre-fix SkipDirs
basename-matched bin/build/dist/target/reports for ANY repo,
silently excluding legitimate source dirs on real targets. The
lakehouse Rust repo has reports/ holding markdown; some Java/
Python/Go projects use bin/ as a source dir; target/ is project-
specific. Skipping them globally produced silent false-negative
scans the operator would never know about.
Fix: trim SkipDirs to dirs that are universally not source code —
.git, .hg, .svn (VCS metadata); node_modules, vendor (dep caches);
__pycache__, .venv, venv (Python envs); .idea, .vscode (editor state).
Removed: bin, build, dist, target, reports.
For the harness's own self-skip (it shouldn't scan its own bin/
or reports/), added path-scoped skip via selfSkipsFor — detects
"this is the harness repo" by the presence of BOTH
cmd/review-harness/ AND internal/analyzers/ subdirs (combination
unique to this codebase), then skips the absolute paths bin/ and
reports/ for that scan only.
Two regression tests:
- TestWalk_DoesNotSkipBinReportsInTargetRepo plants files under
bin/, reports/, build/, dist/, target/ in a synthetic target
repo; asserts all 5 appear in scan, while .git/ + node_modules/
+ vendor/ are still skipped.
- TestWalk_SelfSkipsBinReportsInHarnessRepo plants the harness's
marker dirs (cmd/review-harness/, internal/analyzers/) plus
bin/ + reports/ + ordinary src/; asserts self-skip fires on
bin/+reports/ but real src/ scans normally.
Compiled artifacts inside bin/ are filtered by the analyzers'
isTextLike extension check (.exe / .dll / .so), so target repos
with bin/ holding compiled output don't waste cycles decoding it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-lineage scrum (Opus 4.7 / Kimi K2.6 / Qwen3-coder via chatd's
/v1/chat) on the harness's first 4 commits surfaced 5 real bugs;
this commit lands the 4 inside the LLM/validator stack. B5 (scanner
skip-list semantics) ships separately as it changes scan behavior
on every target repo.
B1 (Kimi BLOCK + Opus WARN convergent) — internal/validators:
evidencePresent had two flaws: (1) cursor advanced on match in the
trim-line fallback, breaking same-line repeated matches AND skipping
not-yet-considered lines so out-of-order evidence spuriously failed;
(2) strings.Contains on a single `}` trim-matched any closing brace
in the file, defeating the "evidence quotes real text" contract.
Fix: trivial-evidence guard FIRST (reject anything <4 non-whitespace
chars) + per-line search no longer advances a cursor. New regression
test TestEvidencePresent_RejectsTrivialMatches covers `}`, `{`, `)`,
empty, and out-of-order multi-line evidence (which now passes —
order isn't part of the contract).
B2 (Kimi WARN + Opus WARN convergent) — internal/pipeline:
WriteJSON error for rejected-findings.json was swallowed with
`if err == nil`, so a write failure left the validation phase
reporting status="ok" while the audit trail vanished. Mirror the
validated-findings branch: surface the error in
validatePhase.Errors + bump status to degraded + ExitCode=66.
B3 (Kimi BLOCK + Opus BLOCK convergent) — internal/llm/ollama.go:
HealthCheck.basic_prompt_ok was set to true on ANY non-empty
response, so a model emitting `<think>...` traces or apologies
passed silently. Now requires the response to contain "OK"
(uppercase, substring). Substring rather than equality lets minor
whitespace/punctuation variations through (some models add a
trailing period). Errors now record what the model actually said
when it fails the check.
B4 (Opus BLOCK only — same class as today's chatd Anthropic-temp
fix) — internal/llm/ollama.go: chatBody had `if opts.Temperature != 0`
which silently dropped Temperature=0 from the request, so HealthCheck
+ Reviewer (both pass Temperature=0 expecting determinism) actually
ran at Ollama's ~0.8 default. Always forward Temperature now. The
two callers always set explicit values, so "0 means 0" is correct;
if a future caller wants Ollama's default they'll switch
CompleteOptions.Temperature to *float64 like chatd did this morning.
Verified end-to-end: insecure-repo + --enable-llm still produces 25
confirmed findings (16 static + 9 LLM), 0 rejected. Validator unit
tests: 11 pass (added TestEvidencePresent_RejectsTrivialMatches).
Same-day-as-shipping scrum, same-day-as-shipping fixes. The
convergent-≥2 gate caught 3 of these; the 4th was Opus-only but
verified by reading the code (same idiom as today's chatd bug).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements PROMPT.md / docs/REVIEW_PIPELINE.md Phase 3:
"AI may suggest. Code validates."
internal/validators/validate.go — 3 hard checks per the
"Reject A Finding If" list:
- file does not exist (with path-traversal guard against the LLM
hallucinating ../../../etc/passwd)
- cited evidence does not appear in the file (verbatim or
trim-line-by-line — models often re-indent quotes when quoting code)
- line hint exceeds file line count
3 soft checks documented as open (claim semantics, suggested-fix
relevance, invented tests/commands — all need another LLM pass).
internal/validators/validate_test.go — 9 tests including:
- TestValidate_RejectsNonexistentFile (gate D1)
- TestValidate_RejectsEvidenceNotInFile
- TestValidate_RejectsLineHintBeyondFile
- TestValidate_AcceptsRealFinding
- TestValidate_AcceptsEvidenceWithDifferentLeadingWhitespace
- TestValidate_RejectsEmptyEvidence
- TestValidate_PassesThroughStaticFindings
- TestValidate_RejectsPathEscapingRepo (path-traversal protection)
- TestValidate_AcceptsRelativeRepoPath (the regression — see below)
Pipeline phase 3 wired between LLM review (Phase C) and report gen
(Phase 4). validated-findings.json contains the confirmed set;
rejected-findings.json contains rejects with per-finding reason +
detail. Receipt phase entry honest about output files + status.
=== Bug J caught ===
First Phase D run rejected EVERY real LLM finding as file_not_found
because the path-traversal check compared a relative joined path
(`tests/fixtures/insecure-repo/src/handler.go`) against an absolute
repoAbs (`/home/profit/share/.../insecure-repo`), so HasPrefix
always returned false. Both sides now resolved via filepath.Abs
before comparison. Regression test
TestValidate_AcceptsRelativeRepoPath locks this in — runs the
validator against a relative repo path AND a relative chdir, the
exact shape that hit the bug.
J's framing was honest: "I don't know what the problem is, but you
know what we're trying to accomplish." The fix-it-yourself signal
let me trace through the rejection details + see the smoking gun
in the detail string ("escapes repo root"). Without that prompt the
9 false rejections might have looked like real LLM bugs.
=== 2 close-out fixes ===
1. .gitignore: changed `/reports/latest/` → `**/reports/latest/`
(and same for `run-*`). Phase C committed 22 generated files
from `tests/fixtures/*/reports/latest/` because the original
pattern was anchored at the harness root only. Existing tracked
files removed via git rm --cached; new pattern keeps fixture
reports out of version control going forward.
2. pipeline.cleanOutputDir: pipeline now deletes the bounded list
of known per-run files at the start of each run. Before this,
a prior run's rejected-findings.json could linger when the
current run had no rejections — confused J during the bug hunt
above. cleanOutputDir is bounded (deletes only files we emit)
so operator-owned adjacent files stay.
Verified end-to-end: insecure-repo + --enable-llm → 25 confirmed
findings (16 static + 9 LLM), 0 rejected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements PROMPT.md / docs/REVIEW_PIPELINE.md Phase 2:
- internal/llm/ollama.go — real Ollama provider:
- HealthCheck probes /api/tags + a 1-token completion + a JSON-mode
probe ({"ok": true} round-trip), populating the model-doctor.json
schema documented in docs/LOCAL_MODEL_SETUP.md
- Complete + CompleteJSON via /api/chat with stream=false
- think=false set for ALL completions (qwen3.5:latest is reasoning-
capable but the inner-loop hot path wants direct answers, not
reasoning traces consuming the token budget — same finding as
the Lakehouse-Go chatd 2026-04-30 wave)
- internal/llm/review.go — Reviewer wrapper:
- 2-attempt flow: prompt → parse → repair-prompt → parse
- Strict JSON shape enforced; markdown fences stripped before parse
- Severity normalized to enum; out-of-range confidence clamped
- Per-file chunking (file-level for v0; function-level Phase D+)
- Bounded by review-profile max_file_bytes + max_llm_chunk_chars
- pipeline.go — Phase 2 wired between static scan + report gen:
- --enable-llm flag opts in (off by default — static-only is
cheaper and faster)
- Raw output ALWAYS saved to llm-findings.raw.json (forensics)
- Normalized findings → llm-findings.normalized.json
- LLM findings merged into the report findings list (sourced
"llm" so consumers can filter)
- Receipts honestly mark phase status: "ok" | "degraded" | "skipped"
- cli model doctor — real probes replace the Phase A stub.
Verified:
- model doctor: status="ok" with qwen3.5:latest + qwen3:latest both
loaded, basic_prompt_ok=true, json_mode_ok=true
- insecure-repo with --enable-llm: 9 LLM findings; qwen3.5 correctly
flagged SQLi, RCE, hardcoded credentials as critical with verbatim
evidence; 27s wall for 3 chunks
- clean-repo with --enable-llm: 0 LLM findings, 4 parsed chunks, 2.8s
- self-review with --enable-llm: 77 LLM findings + 83 static; 3 of
~30 chunks needed retry (PROMPT.md, REPORT_SCHEMA.md,
SCRUM_TEST_TEMPLATE.md — all eventually parsed); 5min wall
go vet + go test -short clean. Fixture stray.go now `package fixture`
so go-tooling doesn't choke on the orphan.
Phase D (validator cross-check) + Phase E (memory + diff/rules
subcommands) remain.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>