Claude (review-harness setup) e346b54e0f Phase C — local-Ollama LLM review wired end-to-end
Implements PROMPT.md / docs/REVIEW_PIPELINE.md Phase 2:
- internal/llm/ollama.go — real Ollama provider:
  - HealthCheck probes /api/tags + a 1-token completion + a JSON-mode
    probe ({"ok": true} round-trip), populating the model-doctor.json
    schema documented in docs/LOCAL_MODEL_SETUP.md
  - Complete + CompleteJSON via /api/chat with stream=false
  - think=false set for ALL completions (qwen3.5:latest is reasoning-
    capable but the inner-loop hot path wants direct answers, not
    reasoning traces consuming the token budget — same finding as
    the Lakehouse-Go chatd 2026-04-30 wave)
- internal/llm/review.go — Reviewer wrapper:
  - 2-attempt flow: prompt → parse → repair-prompt → parse
  - Strict JSON shape enforced; markdown fences stripped before parse
  - Severity normalized to enum; out-of-range confidence clamped
  - Per-file chunking (file-level for v0; function-level Phase D+)
  - Bounded by review-profile max_file_bytes + max_llm_chunk_chars
- pipeline.go — Phase 2 wired between static scan + report gen:
  - --enable-llm flag opts in (off by default — static-only is
    cheaper and faster)
  - Raw output ALWAYS saved to llm-findings.raw.json (forensics)
  - Normalized findings → llm-findings.normalized.json
  - LLM findings merged into the report findings list (sourced
    "llm" so consumers can filter)
  - Receipts honestly mark phase status: "ok" | "degraded" | "skipped"
- cli model doctor — real probes replace the Phase A stub.

Verified:
- model doctor: status="ok" with qwen3.5:latest + qwen3:latest both
  loaded, basic_prompt_ok=true, json_mode_ok=true
- insecure-repo with --enable-llm: 9 LLM findings; qwen3.5 correctly
  flagged SQLi, RCE, hardcoded credentials as critical with verbatim
  evidence; 27s wall for 3 chunks
- clean-repo with --enable-llm: 0 LLM findings, 4 parsed chunks, 2.8s
- self-review with --enable-llm: 77 LLM findings + 83 static; 3 of
  ~30 chunks needed retry (PROMPT.md, REPORT_SCHEMA.md,
  SCRUM_TEST_TEMPLATE.md — all eventually parsed); 5min wall

go vet + go test -short clean. Fixture stray.go now `package fixture`
so go-tooling doesn't choke on the orphan.

Phase D (validator cross-check) + Phase E (memory + diff/rules
subcommands) remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 01:13:39 -05:00

local-review-harness

Local-first code review harness. Walks a repository, runs evidence-bearing static checks, generates Scrum-style reports. No cloud dependencies. LLM review is local-Ollama-only (Phase C, not yet shipped).

Per FIRST_COMMAND_FOR_CLAUDE_CODE.md + PROMPT.md — "AI may suggest. Code validates. Reports must show evidence." Findings without grep-able evidence get rejected; the validator phase rejects model claims that cite missing files.

Status

Phase A + Phase B (MVP) shipped. What works today:

  • review-harness repo <path> — Phase 0 intake + Phase 1 static scan
  • review-harness scrum <path> — same pipeline + full Scrum report bundle
  • review-harness model doctor — stub (real Ollama probe in Phase C)
  • 12 static analyzers covering hardcoded paths, shell exec, raw SQL, wildcard CORS, secret patterns, large files, TODO/FIXME, missing tests, committed .env, unsafe file I/O, exposed mutation endpoints, hardcoded private-network IPs

Phases CE pending: real LLM review, validation cross-check, append-only memory, diff/rules subcommands.

Build

Single static binary, no cgo:

go build -o review-harness ./cmd/review-harness

Requires Go 1.22+.

Run

# Full repo review (Phase 0 + Phase 1 + Phase 4)
./review-harness repo /path/to/target/repo

# Same + Scrum bundle (scrum-test.md, risk-register.md, sprint-backlog.md, acceptance-gates.md)
./review-harness scrum /path/to/target/repo

# Model doctor stub
./review-harness model doctor

Reports land in <target>/reports/latest/ by default; override with --output-dir.

Optional config files:

./review-harness scrum /path --review-profile configs/review-profile.example.yaml \
                              --model-profile  configs/model-profile.example.yaml

Self-review

The harness reviews itself as a sanity gate (PROMPT.md "Final Deliverable"):

./review-harness scrum .
cat reports/latest/scrum-test.md

The fixture-planted secrets in tests/fixtures/insecure-repo/ are intentional — they prove the secret-pattern analyzer fires. Operators reviewing the self-report should expect those critical-severity hits and dismiss them as fixture content.

Test fixtures

Three synthetic repos under tests/fixtures/:

Fixture Purpose Expected outcome
clean-repo/ sterile reference 0 confirmed findings
insecure-repo/ every static check fires ≥8 distinct check IDs
degraded-repo/ no git, no manifests repo_intake phase marked degraded

Run them all to validate after a regex change:

for f in clean-repo insecure-repo degraded-repo; do
  ./review-harness scrum "tests/fixtures/$f" > /dev/null
  echo "$f: $(jq '.summary.total' tests/fixtures/$f/reports/latest/static-findings.json) findings"
done

Exit codes

  • 0 — clean run, no degraded phases
  • 64 — usage error
  • 65 — runtime error (config parse fail, target path missing, etc.)
  • 66 — degraded mode (one or more phases skipped or stubbed; reports still produced)

66 is the expected exit code in MVP because the LLM phase is hardcoded degraded until Phase C lands.

Description
Local-first code review harness — evidence-driven, no cloud deps. Sibling to Lakehouse-Go.
Readme 178 KiB
Languages
Go 100%