Claude (review-harness setup) f3ee4722a8 Phase A + B (MVP) — local review harness
Implements the MVP cutline from the planning artifact:
- Phase A: skeleton + CLI dispatch + provider interface + stub model doctor
- Phase B: scanner + git probe + 12 static analyzers + reporters + pipeline
- Phase B fixtures: clean-repo, insecure-repo, degraded-repo

12 static analyzers per PROMPT.md "Suggested Static Checks For MVP":
hardcoded_paths, shell_execution, raw_sql_interpolation, broad_cors,
secret_patterns, large_files, todo_comments, missing_tests,
env_file_committed, unsafe_file_io, exposed_mutation_endpoint,
hardcoded_local_ip.

Acceptance gates passing:
- B1 (intake produces accurate counts) ✓
- B2 (insecure fixture fires ≥8 distinct check_ids — actually 11/12) ✓
- B3 (clean fixture produces 0 confirmed findings — no false positives) ✓
- B4 (scrum mode produces all 6 required markdown + JSON reports) ✓
- B5 (receipts.json marks degraded phases honestly) ✓
- F  (self-review on this repo runs without crashing) ✓ — exit 66 (degraded
  because Phase C LLM review is hardcoded skipped)

Phases C (LLM review), D (validation cross-check), E (memory + diff +
rules subcommands) deferred per the cutline. The MVP delivers the
evidence-first path; LLM is purely additive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:56:02 -05:00

493 lines
8.6 KiB
Markdown
Executable File

# Claude Code Prompt: Build Local AI Code Review Harness
## Mission
Create a local-first autonomous code review harness inspired by PR-Agent, Gito, OpenReview, Kodus, and Sourcery, but built around our own tools, local models, and validation-first workflow.
This is not a SaaS PR bot.
This is a local DevOps review system that can inspect a repository, summarize risk, identify architectural drift, detect unsafe code patterns, produce Scrum-style backlog reports, and optionally route review tasks through local LLMs using Ollama or another local model endpoint.
## Core Principle
AI may suggest.
Code validates.
Reports must show evidence.
Nothing is trusted because a model said it.
## Target Use Case
Given a repository path, the system should run a review pipeline that produces:
- architecture overview
- code health report
- security and trust-boundary report
- test coverage gap report
- refactor recommendations
- Scrum sprint backlog
- acceptance gates
- machine-readable JSON receipts
## Inspired Features To Extract
### From PR-Agent
Implement:
- PR and diff-style review mode
- summary of changed files
- risk-ranked findings
- suggested review comments
- checklist output
- confidence score per finding
Do not copy implementation. Recreate the concept locally.
### From Gito
Implement:
- local model compatibility
- full-repo review mode
- model-provider abstraction
- ability to run without GitHub or SaaS
- config-driven review profiles
### From OpenReview
Implement:
- webhook-ready design later
- clean separation between:
- repo scanner
- diff analyzer
- LLM reviewer
- report generator
- validation layer
For now, local CLI first.
### From Kodus
Implement:
- plain-language project rules
- repo-specific review policy file
- ability to enforce local conventions
- persistent team memory rules
Example file:
```text
.review-rules.md
```
### From Sourcery
Implement:
- low-level refactor suggestions
- duplicated logic detection
- complexity hotspots
- dead code suspicion
- long-file warnings
- unsafe error handling warnings
## Architecture
Create a modular system with this shape:
```text
local-review-harness/
configs/
review-profile.example.yaml
model-profile.example.yaml
docs/
REVIEW_PIPELINE.md
LOCAL_MODEL_SETUP.md
REPORT_SCHEMA.md
src/
cli/
scanner/
git/
analyzers/
llm/
validators/
reporters/
memory/
reports/
latest/
tests/
fixtures/
```
## Required Modes
### 1. Full Repo Review
Command:
```bash
review-harness repo /path/to/repo
```
Should inspect:
- file tree
- language mix
- build files
- test files
- scripts
- docs
- dependency manifests
- large files
- suspicious hardcoded paths
- TODO, FIXME, and security comments
### 2. Diff Review
Command:
```bash
review-harness diff /path/to/repo
```
Should inspect:
- unstaged changes
- staged changes
- branch diff against main or master
- changed functions where possible
- risk introduced by change
### 3. Scrum Test
Command:
```bash
review-harness scrum /path/to/repo
```
Should produce:
```text
reports/latest/scrum-test.md
reports/latest/risk-register.md
reports/latest/claim-coverage-table.md
reports/latest/sprint-backlog.md
reports/latest/acceptance-gates.md
reports/latest/receipts.json
```
### 4. Rules Audit
Command:
```bash
review-harness rules /path/to/repo
```
Reads:
```text
.review-rules.md
.review-profile.yaml
```
Then checks whether the repository violates local project rules.
### 5. Local Model Probe
Command:
```bash
review-harness model doctor
```
Should test:
- Ollama availability
- configured model exists
- context limit estimate
- small prompt response
- JSON-mode reliability if available
- timeout behavior
- fallback model behavior
## Local Model Requirements
Support a model endpoint abstraction.
Initial provider:
```yaml
provider: ollama
base_url: http://localhost:11434
model: qwen2.5-coder
fallback_model: llama3.1
timeout_seconds: 120
temperature: 0.1
```
Do not hardcode Ollama everywhere. Use a provider interface so OpenAI-compatible local endpoints can be added later.
## Review Pipeline
Pipeline should run in phases.
### Phase 0: Repo Intake
Collect:
- repo path
- git status
- current branch
- latest commit
- language breakdown
- file count
- largest files
- dependency manifests
- test manifests
Output:
```text
repo_intake.json
```
### Phase 1: Static Scan
Detect:
- hardcoded absolute paths
- raw SQL interpolation
- shell command execution
- unsafe environment handling
- broad CORS
- exposed mutation endpoints
- suspicious secret patterns
- unchecked file reads and writes
- missing error handling
- excessive file size
- missing tests near critical code
Output:
```text
static_findings.json
```
### Phase 2: LLM Review
Send bounded chunks to the local model.
The model must return strict JSON:
```json
{
"findings": [
{
"title": "",
"severity": "low|medium|high|critical",
"file": "",
"line_hint": "",
"evidence": "",
"reason": "",
"suggested_fix": "",
"confidence": 0.0
}
]
}
```
If model output is invalid JSON, retry once with a repair prompt.
If the output is still invalid, save raw output and mark the model phase degraded.
### Phase 3: Validation
Every LLM finding must be validated against actual files.
Reject findings that:
- point to missing files
- cite text that does not exist
- make unsupported claims
- recommend unrelated rewrites
- lack evidence
Output:
```text
validated_findings.json
```
### Phase 4: Report Generation
Generate Markdown reports:
- executive summary
- risk register
- sprint backlog
- acceptance gates
- test gaps
- architecture drift
- suggested next commands
### Phase 5: Memory
Create local memory files:
```text
.memory/review-rules.md
.memory/known-risks.json
.memory/fixed-patterns.json
.memory/project-profile.json
```
Memory should be append-only by default.
Never silently overwrite prior memory. Version it.
## Validation Rules
Hard rules:
1. No hallucinated files.
2. No invented tests.
3. No fake command success.
4. No "appears to work" language without evidence.
5. Every finding must include:
- file path
- evidence snippet
- risk
- suggested next action
6. Reports must distinguish:
- confirmed issue
- suspected issue
- missing evidence
- blocked by unavailable dependency
## First Implementation Target
Do not build everything at once.
Implement MVP:
```text
Phase 0 repo intake
Phase 1 static scan
Phase 4 report generation
Basic Ollama model doctor
```
Then add LLM review after the static evidence pipeline is stable.
## MVP Acceptance Criteria
The MVP passes when:
```bash
review-harness repo .
review-harness scrum .
review-harness model doctor
```
produce usable output without crashing.
Required files:
```text
reports/latest/repo-intake.json
reports/latest/static-findings.json
reports/latest/scrum-test.md
reports/latest/risk-register.md
reports/latest/sprint-backlog.md
reports/latest/receipts.json
```
## Suggested Static Checks For MVP
Implement these first:
- hardcoded `/home/`
- hardcoded `/root/`
- hardcoded local IP addresses
- `exec(`
- `spawn(`
- `Command::new`
- raw SQL patterns:
- `format!("SELECT`
- string interpolation near SQL keywords
- template literals containing `SELECT`
- template literals containing `INSERT`
- template literals containing `UPDATE`
- template literals containing `DELETE`
- `Access-Control-Allow-Origin: *`
- committed `.env` files
- private key patterns
- files over 800 lines
- TODO, FIXME, and HACK count
- missing test directory
- package or build files without corresponding test command
## Output Style
Reports should be blunt and operational.
No motivational filler.
Use sections:
```text
Verdict
Evidence
Confirmed Risks
Suspected Risks
Blocked Checks
Sprint Backlog
Acceptance Gates
Next Commands
```
## Final Deliverable
After implementation, produce:
```text
docs/REVIEW_PIPELINE.md
docs/LOCAL_MODEL_SETUP.md
docs/REPORT_SCHEMA.md
reports/latest/*
```
Then run the harness against this repository itself and include the self-review report.
## Do Not
- Do not require GitHub.
- Do not require cloud LLMs.
- Do not pretend local model output is authoritative.
- Do not rewrite the target repository.
- Do not make destructive changes.
- Do not auto-commit.
- Do not hide degraded model failures.
## Strategic Goal
This should become the local review node for a larger autonomous development system.
Eventually it should plug into:
- OpenClaw
- MCP tools
- local lakehouse memory
- playbook sealing
- CI verification
- observer review loop
But first: make the local review harness reliable, inspectable, and evidence-driven.