Implements the MVP cutline from the planning artifact: - Phase A: skeleton + CLI dispatch + provider interface + stub model doctor - Phase B: scanner + git probe + 12 static analyzers + reporters + pipeline - Phase B fixtures: clean-repo, insecure-repo, degraded-repo 12 static analyzers per PROMPT.md "Suggested Static Checks For MVP": hardcoded_paths, shell_execution, raw_sql_interpolation, broad_cors, secret_patterns, large_files, todo_comments, missing_tests, env_file_committed, unsafe_file_io, exposed_mutation_endpoint, hardcoded_local_ip. Acceptance gates passing: - B1 (intake produces accurate counts) ✓ - B2 (insecure fixture fires ≥8 distinct check_ids — actually 11/12) ✓ - B3 (clean fixture produces 0 confirmed findings — no false positives) ✓ - B4 (scrum mode produces all 6 required markdown + JSON reports) ✓ - B5 (receipts.json marks degraded phases honestly) ✓ - F (self-review on this repo runs without crashing) ✓ — exit 66 (degraded because Phase C LLM review is hardcoded skipped) Phases C (LLM review), D (validation cross-check), E (memory + diff + rules subcommands) deferred per the cutline. The MVP delivers the evidence-first path; LLM is purely additive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.6 KiB
Executable File
Claude Code Prompt: Build Local AI Code Review Harness
Mission
Create a local-first autonomous code review harness inspired by PR-Agent, Gito, OpenReview, Kodus, and Sourcery, but built around our own tools, local models, and validation-first workflow.
This is not a SaaS PR bot.
This is a local DevOps review system that can inspect a repository, summarize risk, identify architectural drift, detect unsafe code patterns, produce Scrum-style backlog reports, and optionally route review tasks through local LLMs using Ollama or another local model endpoint.
Core Principle
AI may suggest.
Code validates.
Reports must show evidence.
Nothing is trusted because a model said it.
Target Use Case
Given a repository path, the system should run a review pipeline that produces:
- architecture overview
- code health report
- security and trust-boundary report
- test coverage gap report
- refactor recommendations
- Scrum sprint backlog
- acceptance gates
- machine-readable JSON receipts
Inspired Features To Extract
From PR-Agent
Implement:
- PR and diff-style review mode
- summary of changed files
- risk-ranked findings
- suggested review comments
- checklist output
- confidence score per finding
Do not copy implementation. Recreate the concept locally.
From Gito
Implement:
- local model compatibility
- full-repo review mode
- model-provider abstraction
- ability to run without GitHub or SaaS
- config-driven review profiles
From OpenReview
Implement:
- webhook-ready design later
- clean separation between:
- repo scanner
- diff analyzer
- LLM reviewer
- report generator
- validation layer
For now, local CLI first.
From Kodus
Implement:
- plain-language project rules
- repo-specific review policy file
- ability to enforce local conventions
- persistent team memory rules
Example file:
.review-rules.md
From Sourcery
Implement:
- low-level refactor suggestions
- duplicated logic detection
- complexity hotspots
- dead code suspicion
- long-file warnings
- unsafe error handling warnings
Architecture
Create a modular system with this shape:
local-review-harness/
configs/
review-profile.example.yaml
model-profile.example.yaml
docs/
REVIEW_PIPELINE.md
LOCAL_MODEL_SETUP.md
REPORT_SCHEMA.md
src/
cli/
scanner/
git/
analyzers/
llm/
validators/
reporters/
memory/
reports/
latest/
tests/
fixtures/
Required Modes
1. Full Repo Review
Command:
review-harness repo /path/to/repo
Should inspect:
- file tree
- language mix
- build files
- test files
- scripts
- docs
- dependency manifests
- large files
- suspicious hardcoded paths
- TODO, FIXME, and security comments
2. Diff Review
Command:
review-harness diff /path/to/repo
Should inspect:
- unstaged changes
- staged changes
- branch diff against main or master
- changed functions where possible
- risk introduced by change
3. Scrum Test
Command:
review-harness scrum /path/to/repo
Should produce:
reports/latest/scrum-test.md
reports/latest/risk-register.md
reports/latest/claim-coverage-table.md
reports/latest/sprint-backlog.md
reports/latest/acceptance-gates.md
reports/latest/receipts.json
4. Rules Audit
Command:
review-harness rules /path/to/repo
Reads:
.review-rules.md
.review-profile.yaml
Then checks whether the repository violates local project rules.
5. Local Model Probe
Command:
review-harness model doctor
Should test:
- Ollama availability
- configured model exists
- context limit estimate
- small prompt response
- JSON-mode reliability if available
- timeout behavior
- fallback model behavior
Local Model Requirements
Support a model endpoint abstraction.
Initial provider:
provider: ollama
base_url: http://localhost:11434
model: qwen2.5-coder
fallback_model: llama3.1
timeout_seconds: 120
temperature: 0.1
Do not hardcode Ollama everywhere. Use a provider interface so OpenAI-compatible local endpoints can be added later.
Review Pipeline
Pipeline should run in phases.
Phase 0: Repo Intake
Collect:
- repo path
- git status
- current branch
- latest commit
- language breakdown
- file count
- largest files
- dependency manifests
- test manifests
Output:
repo_intake.json
Phase 1: Static Scan
Detect:
- hardcoded absolute paths
- raw SQL interpolation
- shell command execution
- unsafe environment handling
- broad CORS
- exposed mutation endpoints
- suspicious secret patterns
- unchecked file reads and writes
- missing error handling
- excessive file size
- missing tests near critical code
Output:
static_findings.json
Phase 2: LLM Review
Send bounded chunks to the local model.
The model must return strict JSON:
{
"findings": [
{
"title": "",
"severity": "low|medium|high|critical",
"file": "",
"line_hint": "",
"evidence": "",
"reason": "",
"suggested_fix": "",
"confidence": 0.0
}
]
}
If model output is invalid JSON, retry once with a repair prompt.
If the output is still invalid, save raw output and mark the model phase degraded.
Phase 3: Validation
Every LLM finding must be validated against actual files.
Reject findings that:
- point to missing files
- cite text that does not exist
- make unsupported claims
- recommend unrelated rewrites
- lack evidence
Output:
validated_findings.json
Phase 4: Report Generation
Generate Markdown reports:
- executive summary
- risk register
- sprint backlog
- acceptance gates
- test gaps
- architecture drift
- suggested next commands
Phase 5: Memory
Create local memory files:
.memory/review-rules.md
.memory/known-risks.json
.memory/fixed-patterns.json
.memory/project-profile.json
Memory should be append-only by default.
Never silently overwrite prior memory. Version it.
Validation Rules
Hard rules:
- No hallucinated files.
- No invented tests.
- No fake command success.
- No "appears to work" language without evidence.
- Every finding must include:
- file path
- evidence snippet
- risk
- suggested next action
- Reports must distinguish:
- confirmed issue
- suspected issue
- missing evidence
- blocked by unavailable dependency
First Implementation Target
Do not build everything at once.
Implement MVP:
Phase 0 repo intake
Phase 1 static scan
Phase 4 report generation
Basic Ollama model doctor
Then add LLM review after the static evidence pipeline is stable.
MVP Acceptance Criteria
The MVP passes when:
review-harness repo .
review-harness scrum .
review-harness model doctor
produce usable output without crashing.
Required files:
reports/latest/repo-intake.json
reports/latest/static-findings.json
reports/latest/scrum-test.md
reports/latest/risk-register.md
reports/latest/sprint-backlog.md
reports/latest/receipts.json
Suggested Static Checks For MVP
Implement these first:
- hardcoded
/home/ - hardcoded
/root/ - hardcoded local IP addresses
exec(spawn(Command::new- raw SQL patterns:
format!("SELECT- string interpolation near SQL keywords
- template literals containing
SELECT - template literals containing
INSERT - template literals containing
UPDATE - template literals containing
DELETE
Access-Control-Allow-Origin: *- committed
.envfiles - private key patterns
- files over 800 lines
- TODO, FIXME, and HACK count
- missing test directory
- package or build files without corresponding test command
Output Style
Reports should be blunt and operational.
No motivational filler.
Use sections:
Verdict
Evidence
Confirmed Risks
Suspected Risks
Blocked Checks
Sprint Backlog
Acceptance Gates
Next Commands
Final Deliverable
After implementation, produce:
docs/REVIEW_PIPELINE.md
docs/LOCAL_MODEL_SETUP.md
docs/REPORT_SCHEMA.md
reports/latest/*
Then run the harness against this repository itself and include the self-review report.
Do Not
- Do not require GitHub.
- Do not require cloud LLMs.
- Do not pretend local model output is authoritative.
- Do not rewrite the target repository.
- Do not make destructive changes.
- Do not auto-commit.
- Do not hide degraded model failures.
Strategic Goal
This should become the local review node for a larger autonomous development system.
Eventually it should plug into:
- OpenClaw
- MCP tools
- local lakehouse memory
- playbook sealing
- CI verification
- observer review loop
But first: make the local review harness reliable, inspectable, and evidence-driven.