Implements the MVP cutline from the planning artifact: - Phase A: skeleton + CLI dispatch + provider interface + stub model doctor - Phase B: scanner + git probe + 12 static analyzers + reporters + pipeline - Phase B fixtures: clean-repo, insecure-repo, degraded-repo 12 static analyzers per PROMPT.md "Suggested Static Checks For MVP": hardcoded_paths, shell_execution, raw_sql_interpolation, broad_cors, secret_patterns, large_files, todo_comments, missing_tests, env_file_committed, unsafe_file_io, exposed_mutation_endpoint, hardcoded_local_ip. Acceptance gates passing: - B1 (intake produces accurate counts) ✓ - B2 (insecure fixture fires ≥8 distinct check_ids — actually 11/12) ✓ - B3 (clean fixture produces 0 confirmed findings — no false positives) ✓ - B4 (scrum mode produces all 6 required markdown + JSON reports) ✓ - B5 (receipts.json marks degraded phases honestly) ✓ - F (self-review on this repo runs without crashing) ✓ — exit 66 (degraded because Phase C LLM review is hardcoded skipped) Phases C (LLM review), D (validation cross-check), E (memory + diff + rules subcommands) deferred per the cutline. The MVP delivers the evidence-first path; LLM is purely additive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
493 lines
8.6 KiB
Markdown
Executable File
493 lines
8.6 KiB
Markdown
Executable File
# Claude Code Prompt: Build Local AI Code Review Harness
|
|
|
|
## Mission
|
|
|
|
Create a local-first autonomous code review harness inspired by PR-Agent, Gito, OpenReview, Kodus, and Sourcery, but built around our own tools, local models, and validation-first workflow.
|
|
|
|
This is not a SaaS PR bot.
|
|
|
|
This is a local DevOps review system that can inspect a repository, summarize risk, identify architectural drift, detect unsafe code patterns, produce Scrum-style backlog reports, and optionally route review tasks through local LLMs using Ollama or another local model endpoint.
|
|
|
|
## Core Principle
|
|
|
|
AI may suggest.
|
|
|
|
Code validates.
|
|
|
|
Reports must show evidence.
|
|
|
|
Nothing is trusted because a model said it.
|
|
|
|
## Target Use Case
|
|
|
|
Given a repository path, the system should run a review pipeline that produces:
|
|
|
|
- architecture overview
|
|
- code health report
|
|
- security and trust-boundary report
|
|
- test coverage gap report
|
|
- refactor recommendations
|
|
- Scrum sprint backlog
|
|
- acceptance gates
|
|
- machine-readable JSON receipts
|
|
|
|
## Inspired Features To Extract
|
|
|
|
### From PR-Agent
|
|
|
|
Implement:
|
|
|
|
- PR and diff-style review mode
|
|
- summary of changed files
|
|
- risk-ranked findings
|
|
- suggested review comments
|
|
- checklist output
|
|
- confidence score per finding
|
|
|
|
Do not copy implementation. Recreate the concept locally.
|
|
|
|
### From Gito
|
|
|
|
Implement:
|
|
|
|
- local model compatibility
|
|
- full-repo review mode
|
|
- model-provider abstraction
|
|
- ability to run without GitHub or SaaS
|
|
- config-driven review profiles
|
|
|
|
### From OpenReview
|
|
|
|
Implement:
|
|
|
|
- webhook-ready design later
|
|
- clean separation between:
|
|
- repo scanner
|
|
- diff analyzer
|
|
- LLM reviewer
|
|
- report generator
|
|
- validation layer
|
|
|
|
For now, local CLI first.
|
|
|
|
### From Kodus
|
|
|
|
Implement:
|
|
|
|
- plain-language project rules
|
|
- repo-specific review policy file
|
|
- ability to enforce local conventions
|
|
- persistent team memory rules
|
|
|
|
Example file:
|
|
|
|
```text
|
|
.review-rules.md
|
|
```
|
|
|
|
### From Sourcery
|
|
|
|
Implement:
|
|
|
|
- low-level refactor suggestions
|
|
- duplicated logic detection
|
|
- complexity hotspots
|
|
- dead code suspicion
|
|
- long-file warnings
|
|
- unsafe error handling warnings
|
|
|
|
## Architecture
|
|
|
|
Create a modular system with this shape:
|
|
|
|
```text
|
|
local-review-harness/
|
|
configs/
|
|
review-profile.example.yaml
|
|
model-profile.example.yaml
|
|
docs/
|
|
REVIEW_PIPELINE.md
|
|
LOCAL_MODEL_SETUP.md
|
|
REPORT_SCHEMA.md
|
|
src/
|
|
cli/
|
|
scanner/
|
|
git/
|
|
analyzers/
|
|
llm/
|
|
validators/
|
|
reporters/
|
|
memory/
|
|
reports/
|
|
latest/
|
|
tests/
|
|
fixtures/
|
|
```
|
|
|
|
## Required Modes
|
|
|
|
### 1. Full Repo Review
|
|
|
|
Command:
|
|
|
|
```bash
|
|
review-harness repo /path/to/repo
|
|
```
|
|
|
|
Should inspect:
|
|
|
|
- file tree
|
|
- language mix
|
|
- build files
|
|
- test files
|
|
- scripts
|
|
- docs
|
|
- dependency manifests
|
|
- large files
|
|
- suspicious hardcoded paths
|
|
- TODO, FIXME, and security comments
|
|
|
|
### 2. Diff Review
|
|
|
|
Command:
|
|
|
|
```bash
|
|
review-harness diff /path/to/repo
|
|
```
|
|
|
|
Should inspect:
|
|
|
|
- unstaged changes
|
|
- staged changes
|
|
- branch diff against main or master
|
|
- changed functions where possible
|
|
- risk introduced by change
|
|
|
|
### 3. Scrum Test
|
|
|
|
Command:
|
|
|
|
```bash
|
|
review-harness scrum /path/to/repo
|
|
```
|
|
|
|
Should produce:
|
|
|
|
```text
|
|
reports/latest/scrum-test.md
|
|
reports/latest/risk-register.md
|
|
reports/latest/claim-coverage-table.md
|
|
reports/latest/sprint-backlog.md
|
|
reports/latest/acceptance-gates.md
|
|
reports/latest/receipts.json
|
|
```
|
|
|
|
### 4. Rules Audit
|
|
|
|
Command:
|
|
|
|
```bash
|
|
review-harness rules /path/to/repo
|
|
```
|
|
|
|
Reads:
|
|
|
|
```text
|
|
.review-rules.md
|
|
.review-profile.yaml
|
|
```
|
|
|
|
Then checks whether the repository violates local project rules.
|
|
|
|
### 5. Local Model Probe
|
|
|
|
Command:
|
|
|
|
```bash
|
|
review-harness model doctor
|
|
```
|
|
|
|
Should test:
|
|
|
|
- Ollama availability
|
|
- configured model exists
|
|
- context limit estimate
|
|
- small prompt response
|
|
- JSON-mode reliability if available
|
|
- timeout behavior
|
|
- fallback model behavior
|
|
|
|
## Local Model Requirements
|
|
|
|
Support a model endpoint abstraction.
|
|
|
|
Initial provider:
|
|
|
|
```yaml
|
|
provider: ollama
|
|
base_url: http://localhost:11434
|
|
model: qwen2.5-coder
|
|
fallback_model: llama3.1
|
|
timeout_seconds: 120
|
|
temperature: 0.1
|
|
```
|
|
|
|
Do not hardcode Ollama everywhere. Use a provider interface so OpenAI-compatible local endpoints can be added later.
|
|
|
|
## Review Pipeline
|
|
|
|
Pipeline should run in phases.
|
|
|
|
### Phase 0: Repo Intake
|
|
|
|
Collect:
|
|
|
|
- repo path
|
|
- git status
|
|
- current branch
|
|
- latest commit
|
|
- language breakdown
|
|
- file count
|
|
- largest files
|
|
- dependency manifests
|
|
- test manifests
|
|
|
|
Output:
|
|
|
|
```text
|
|
repo_intake.json
|
|
```
|
|
|
|
### Phase 1: Static Scan
|
|
|
|
Detect:
|
|
|
|
- hardcoded absolute paths
|
|
- raw SQL interpolation
|
|
- shell command execution
|
|
- unsafe environment handling
|
|
- broad CORS
|
|
- exposed mutation endpoints
|
|
- suspicious secret patterns
|
|
- unchecked file reads and writes
|
|
- missing error handling
|
|
- excessive file size
|
|
- missing tests near critical code
|
|
|
|
Output:
|
|
|
|
```text
|
|
static_findings.json
|
|
```
|
|
|
|
### Phase 2: LLM Review
|
|
|
|
Send bounded chunks to the local model.
|
|
|
|
The model must return strict JSON:
|
|
|
|
```json
|
|
{
|
|
"findings": [
|
|
{
|
|
"title": "",
|
|
"severity": "low|medium|high|critical",
|
|
"file": "",
|
|
"line_hint": "",
|
|
"evidence": "",
|
|
"reason": "",
|
|
"suggested_fix": "",
|
|
"confidence": 0.0
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
If model output is invalid JSON, retry once with a repair prompt.
|
|
|
|
If the output is still invalid, save raw output and mark the model phase degraded.
|
|
|
|
### Phase 3: Validation
|
|
|
|
Every LLM finding must be validated against actual files.
|
|
|
|
Reject findings that:
|
|
|
|
- point to missing files
|
|
- cite text that does not exist
|
|
- make unsupported claims
|
|
- recommend unrelated rewrites
|
|
- lack evidence
|
|
|
|
Output:
|
|
|
|
```text
|
|
validated_findings.json
|
|
```
|
|
|
|
### Phase 4: Report Generation
|
|
|
|
Generate Markdown reports:
|
|
|
|
- executive summary
|
|
- risk register
|
|
- sprint backlog
|
|
- acceptance gates
|
|
- test gaps
|
|
- architecture drift
|
|
- suggested next commands
|
|
|
|
### Phase 5: Memory
|
|
|
|
Create local memory files:
|
|
|
|
```text
|
|
.memory/review-rules.md
|
|
.memory/known-risks.json
|
|
.memory/fixed-patterns.json
|
|
.memory/project-profile.json
|
|
```
|
|
|
|
Memory should be append-only by default.
|
|
|
|
Never silently overwrite prior memory. Version it.
|
|
|
|
## Validation Rules
|
|
|
|
Hard rules:
|
|
|
|
1. No hallucinated files.
|
|
2. No invented tests.
|
|
3. No fake command success.
|
|
4. No "appears to work" language without evidence.
|
|
5. Every finding must include:
|
|
- file path
|
|
- evidence snippet
|
|
- risk
|
|
- suggested next action
|
|
6. Reports must distinguish:
|
|
- confirmed issue
|
|
- suspected issue
|
|
- missing evidence
|
|
- blocked by unavailable dependency
|
|
|
|
## First Implementation Target
|
|
|
|
Do not build everything at once.
|
|
|
|
Implement MVP:
|
|
|
|
```text
|
|
Phase 0 repo intake
|
|
Phase 1 static scan
|
|
Phase 4 report generation
|
|
Basic Ollama model doctor
|
|
```
|
|
|
|
Then add LLM review after the static evidence pipeline is stable.
|
|
|
|
## MVP Acceptance Criteria
|
|
|
|
The MVP passes when:
|
|
|
|
```bash
|
|
review-harness repo .
|
|
review-harness scrum .
|
|
review-harness model doctor
|
|
```
|
|
|
|
produce usable output without crashing.
|
|
|
|
Required files:
|
|
|
|
```text
|
|
reports/latest/repo-intake.json
|
|
reports/latest/static-findings.json
|
|
reports/latest/scrum-test.md
|
|
reports/latest/risk-register.md
|
|
reports/latest/sprint-backlog.md
|
|
reports/latest/receipts.json
|
|
```
|
|
|
|
## Suggested Static Checks For MVP
|
|
|
|
Implement these first:
|
|
|
|
- hardcoded `/home/`
|
|
- hardcoded `/root/`
|
|
- hardcoded local IP addresses
|
|
- `exec(`
|
|
- `spawn(`
|
|
- `Command::new`
|
|
- raw SQL patterns:
|
|
- `format!("SELECT`
|
|
- string interpolation near SQL keywords
|
|
- template literals containing `SELECT`
|
|
- template literals containing `INSERT`
|
|
- template literals containing `UPDATE`
|
|
- template literals containing `DELETE`
|
|
- `Access-Control-Allow-Origin: *`
|
|
- committed `.env` files
|
|
- private key patterns
|
|
- files over 800 lines
|
|
- TODO, FIXME, and HACK count
|
|
- missing test directory
|
|
- package or build files without corresponding test command
|
|
|
|
## Output Style
|
|
|
|
Reports should be blunt and operational.
|
|
|
|
No motivational filler.
|
|
|
|
Use sections:
|
|
|
|
```text
|
|
Verdict
|
|
Evidence
|
|
Confirmed Risks
|
|
Suspected Risks
|
|
Blocked Checks
|
|
Sprint Backlog
|
|
Acceptance Gates
|
|
Next Commands
|
|
```
|
|
|
|
## Final Deliverable
|
|
|
|
After implementation, produce:
|
|
|
|
```text
|
|
docs/REVIEW_PIPELINE.md
|
|
docs/LOCAL_MODEL_SETUP.md
|
|
docs/REPORT_SCHEMA.md
|
|
reports/latest/*
|
|
```
|
|
|
|
Then run the harness against this repository itself and include the self-review report.
|
|
|
|
## Do Not
|
|
|
|
- Do not require GitHub.
|
|
- Do not require cloud LLMs.
|
|
- Do not pretend local model output is authoritative.
|
|
- Do not rewrite the target repository.
|
|
- Do not make destructive changes.
|
|
- Do not auto-commit.
|
|
- Do not hide degraded model failures.
|
|
|
|
## Strategic Goal
|
|
|
|
This should become the local review node for a larger autonomous development system.
|
|
|
|
Eventually it should plug into:
|
|
|
|
- OpenClaw
|
|
- MCP tools
|
|
- local lakehouse memory
|
|
- playbook sealing
|
|
- CI verification
|
|
- observer review loop
|
|
|
|
But first: make the local review harness reliable, inspectable, and evidence-driven.
|