Implements PROMPT.md / docs/REVIEW_PIPELINE.md Phase 2:
- internal/llm/ollama.go — real Ollama provider:
- HealthCheck probes /api/tags + a 1-token completion + a JSON-mode
probe ({"ok": true} round-trip), populating the model-doctor.json
schema documented in docs/LOCAL_MODEL_SETUP.md
- Complete + CompleteJSON via /api/chat with stream=false
- think=false set for ALL completions (qwen3.5:latest is reasoning-
capable but the inner-loop hot path wants direct answers, not
reasoning traces consuming the token budget — same finding as
the Lakehouse-Go chatd 2026-04-30 wave)
- internal/llm/review.go — Reviewer wrapper:
- 2-attempt flow: prompt → parse → repair-prompt → parse
- Strict JSON shape enforced; markdown fences stripped before parse
- Severity normalized to enum; out-of-range confidence clamped
- Per-file chunking (file-level for v0; function-level Phase D+)
- Bounded by review-profile max_file_bytes + max_llm_chunk_chars
- pipeline.go — Phase 2 wired between static scan + report gen:
- --enable-llm flag opts in (off by default — static-only is
cheaper and faster)
- Raw output ALWAYS saved to llm-findings.raw.json (forensics)
- Normalized findings → llm-findings.normalized.json
- LLM findings merged into the report findings list (sourced
"llm" so consumers can filter)
- Receipts honestly mark phase status: "ok" | "degraded" | "skipped"
- cli model doctor — real probes replace the Phase A stub.
Verified:
- model doctor: status="ok" with qwen3.5:latest + qwen3:latest both
loaded, basic_prompt_ok=true, json_mode_ok=true
- insecure-repo with --enable-llm: 9 LLM findings; qwen3.5 correctly
flagged SQLi, RCE, hardcoded credentials as critical with verbatim
evidence; 27s wall for 3 chunks
- clean-repo with --enable-llm: 0 LLM findings, 4 parsed chunks, 2.8s
- self-review with --enable-llm: 77 LLM findings + 83 static; 3 of
~30 chunks needed retry (PROMPT.md, REPORT_SCHEMA.md,
SCRUM_TEST_TEMPLATE.md — all eventually parsed); 5min wall
go vet + go test -short clean. Fixture stray.go now `package fixture`
so go-tooling doesn't choke on the orphan.
Phase D (validator cross-check) + Phase E (memory + diff/rules
subcommands) remain.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
92 lines
2.7 KiB
Go
92 lines
2.7 KiB
Go
// Local review harness — entry point.
|
|
//
|
|
// Subcommands per PROMPT.md:
|
|
// repo /path — full-repo review (Phase B MVP)
|
|
// diff /path — diff/PR-style review (Phase E)
|
|
// scrum /path — scrum-test report bundle (Phase B MVP)
|
|
// rules /path — rules audit (Phase E)
|
|
// model doctor — Ollama probe + JSON shape (Phase A stub, Phase C real)
|
|
//
|
|
// PROMPT.md hard rules apply: no cloud deps, no auto-commit, no
|
|
// destructive changes to the target repo, no fake success.
|
|
package main
|
|
|
|
import (
|
|
"flag"
|
|
"fmt"
|
|
"os"
|
|
|
|
"local-review-harness/internal/cli"
|
|
)
|
|
|
|
func main() {
|
|
if len(os.Args) < 2 {
|
|
usage()
|
|
os.Exit(2)
|
|
}
|
|
|
|
// Per-subcommand flag sets. Common flags (--review-profile,
|
|
// --model-profile, --output-dir) live on each FlagSet rather than
|
|
// a global pre-parser; the CLI library would be overkill for 5
|
|
// subcommands.
|
|
sub := os.Args[1]
|
|
args := os.Args[2:]
|
|
|
|
switch sub {
|
|
case "repo":
|
|
os.Exit(cli.Repo(args))
|
|
case "diff":
|
|
fmt.Fprintln(os.Stderr, "diff: not implemented in MVP (Phase E)")
|
|
os.Exit(64)
|
|
case "scrum":
|
|
os.Exit(cli.Scrum(args))
|
|
case "rules":
|
|
fmt.Fprintln(os.Stderr, "rules: not implemented in MVP (Phase E)")
|
|
os.Exit(64)
|
|
case "model":
|
|
// Two-token verb: "model doctor"
|
|
if len(args) < 1 {
|
|
fmt.Fprintln(os.Stderr, "model: missing verb (try: model doctor)")
|
|
os.Exit(2)
|
|
}
|
|
switch args[0] {
|
|
case "doctor":
|
|
os.Exit(cli.ModelDoctor(args[1:]))
|
|
default:
|
|
fmt.Fprintf(os.Stderr, "model: unknown verb %q\n", args[0])
|
|
os.Exit(2)
|
|
}
|
|
case "-h", "--help", "help":
|
|
usage()
|
|
os.Exit(0)
|
|
case "version":
|
|
fmt.Println("review-harness 0.1.0 (Phase A skeleton)")
|
|
os.Exit(0)
|
|
default:
|
|
fmt.Fprintf(os.Stderr, "unknown subcommand: %q\n", sub)
|
|
usage()
|
|
os.Exit(2)
|
|
}
|
|
_ = flag.CommandLine // keep import stable across phases
|
|
}
|
|
|
|
func usage() {
|
|
fmt.Fprint(os.Stderr, `review-harness — local-first code review
|
|
|
|
Usage:
|
|
review-harness repo <path> full-repo review (MVP)
|
|
review-harness scrum <path> scrum-test report bundle (MVP)
|
|
review-harness model doctor probe Ollama / configured models
|
|
review-harness diff <path> diff review (Phase E, not yet)
|
|
review-harness rules <path> rules audit (Phase E, not yet)
|
|
review-harness version print version
|
|
review-harness help this message
|
|
|
|
Common flags (per subcommand):
|
|
--review-profile <path> YAML; defaults applied if omitted
|
|
--model-profile <path> YAML; defaults applied if omitted
|
|
--output-dir <path> override review-profile output dir
|
|
--enable-llm also run local-Ollama LLM review (Phase C)
|
|
`)
|
|
}
|