Claude (review-harness setup) f3ee4722a8 Phase A + B (MVP) — local review harness

Implements the MVP cutline from the planning artifact:
- Phase A: skeleton + CLI dispatch + provider interface + stub model doctor
- Phase B: scanner + git probe + 12 static analyzers + reporters + pipeline
- Phase B fixtures: clean-repo, insecure-repo, degraded-repo

12 static analyzers per PROMPT.md "Suggested Static Checks For MVP":
hardcoded_paths, shell_execution, raw_sql_interpolation, broad_cors,
secret_patterns, large_files, todo_comments, missing_tests,
env_file_committed, unsafe_file_io, exposed_mutation_endpoint,
hardcoded_local_ip.

Acceptance gates passing:
- B1 (intake produces accurate counts) ✓
- B2 (insecure fixture fires ≥8 distinct check_ids — actually 11/12) ✓
- B3 (clean fixture produces 0 confirmed findings — no false positives) ✓
- B4 (scrum mode produces all 6 required markdown + JSON reports) ✓
- B5 (receipts.json marks degraded phases honestly) ✓
- F  (self-review on this repo runs without crashing) ✓ — exit 66 (degraded
  because Phase C LLM review is hardcoded skipped)

Phases C (LLM review), D (validation cross-check), E (memory + diff +
rules subcommands) deferred per the cutline. The MVP delivers the
evidence-first path; LLM is purely additive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 00:56:02 -05:00

8.6 KiB

Executable File

Raw Blame History

Claude Code Prompt: Build Local AI Code Review Harness

Mission

Create a local-first autonomous code review harness inspired by PR-Agent, Gito, OpenReview, Kodus, and Sourcery, but built around our own tools, local models, and validation-first workflow.

This is not a SaaS PR bot.

This is a local DevOps review system that can inspect a repository, summarize risk, identify architectural drift, detect unsafe code patterns, produce Scrum-style backlog reports, and optionally route review tasks through local LLMs using Ollama or another local model endpoint.

Core Principle

AI may suggest.

Code validates.

Reports must show evidence.

Nothing is trusted because a model said it.

Target Use Case

Given a repository path, the system should run a review pipeline that produces:

architecture overview
code health report
security and trust-boundary report
test coverage gap report
refactor recommendations
Scrum sprint backlog
acceptance gates
machine-readable JSON receipts

Inspired Features To Extract

From PR-Agent

Implement:

PR and diff-style review mode
summary of changed files
risk-ranked findings
suggested review comments
checklist output
confidence score per finding

Do not copy implementation. Recreate the concept locally.

From Gito

Implement:

local model compatibility
full-repo review mode
model-provider abstraction
ability to run without GitHub or SaaS
config-driven review profiles

From OpenReview

Implement:

webhook-ready design later
clean separation between:
- repo scanner
- diff analyzer
- LLM reviewer
- report generator
- validation layer

For now, local CLI first.

From Kodus

Implement:

plain-language project rules
repo-specific review policy file
ability to enforce local conventions
persistent team memory rules

Example file:

.review-rules.md

From Sourcery

Implement:

low-level refactor suggestions
duplicated logic detection
complexity hotspots
dead code suspicion
long-file warnings
unsafe error handling warnings

Architecture

Create a modular system with this shape:

local-review-harness/
  configs/
    review-profile.example.yaml
    model-profile.example.yaml
  docs/
    REVIEW_PIPELINE.md
    LOCAL_MODEL_SETUP.md
    REPORT_SCHEMA.md
  src/
    cli/
    scanner/
    git/
    analyzers/
    llm/
    validators/
    reporters/
    memory/
  reports/
    latest/
  tests/
    fixtures/

Required Modes

1. Full Repo Review

Command:

review-harness repo /path/to/repo

Should inspect:

file tree
language mix
build files
test files
scripts
docs
dependency manifests
large files
suspicious hardcoded paths
TODO, FIXME, and security comments

2. Diff Review

Command:

review-harness diff /path/to/repo

Should inspect:

unstaged changes
staged changes
branch diff against main or master
changed functions where possible
risk introduced by change

3. Scrum Test

Command:

review-harness scrum /path/to/repo

Should produce:

reports/latest/scrum-test.md
reports/latest/risk-register.md
reports/latest/claim-coverage-table.md
reports/latest/sprint-backlog.md
reports/latest/acceptance-gates.md
reports/latest/receipts.json

4. Rules Audit

Command:

review-harness rules /path/to/repo

Reads:

.review-rules.md
.review-profile.yaml

Then checks whether the repository violates local project rules.

5. Local Model Probe

Command:

review-harness model doctor

Should test:

Ollama availability
configured model exists
context limit estimate
small prompt response
JSON-mode reliability if available
timeout behavior
fallback model behavior

Local Model Requirements

Support a model endpoint abstraction.

Initial provider:

provider: ollama
base_url: http://localhost:11434
model: qwen2.5-coder
fallback_model: llama3.1
timeout_seconds: 120
temperature: 0.1

Do not hardcode Ollama everywhere. Use a provider interface so OpenAI-compatible local endpoints can be added later.

Review Pipeline

Pipeline should run in phases.

Phase 0: Repo Intake

Collect:

repo path
git status
current branch
latest commit
language breakdown
file count
largest files
dependency manifests
test manifests

Output:

repo_intake.json

Phase 1: Static Scan

Detect:

hardcoded absolute paths
raw SQL interpolation
shell command execution
unsafe environment handling
broad CORS
exposed mutation endpoints
suspicious secret patterns
unchecked file reads and writes
missing error handling
excessive file size
missing tests near critical code

Output:

static_findings.json

Phase 2: LLM Review

Send bounded chunks to the local model.

The model must return strict JSON:

{
  "findings": [
    {
      "title": "",
      "severity": "low|medium|high|critical",
      "file": "",
      "line_hint": "",
      "evidence": "",
      "reason": "",
      "suggested_fix": "",
      "confidence": 0.0
    }
  ]
}

If model output is invalid JSON, retry once with a repair prompt.

If the output is still invalid, save raw output and mark the model phase degraded.

Phase 3: Validation

Every LLM finding must be validated against actual files.

Reject findings that:

point to missing files
cite text that does not exist
make unsupported claims
recommend unrelated rewrites
lack evidence

Output:

validated_findings.json

Phase 4: Report Generation

Generate Markdown reports:

executive summary
risk register
sprint backlog
acceptance gates
test gaps
architecture drift
suggested next commands

Phase 5: Memory

Create local memory files:

.memory/review-rules.md
.memory/known-risks.json
.memory/fixed-patterns.json
.memory/project-profile.json

Memory should be append-only by default.

Never silently overwrite prior memory. Version it.

Validation Rules

Hard rules:

No hallucinated files.
No invented tests.
No fake command success.
No "appears to work" language without evidence.
Every finding must include:
- file path
- evidence snippet
- risk
- suggested next action
Reports must distinguish:
- confirmed issue
- suspected issue
- missing evidence
- blocked by unavailable dependency

First Implementation Target

Do not build everything at once.

Implement MVP:

Phase 0 repo intake
Phase 1 static scan
Phase 4 report generation
Basic Ollama model doctor

Then add LLM review after the static evidence pipeline is stable.

MVP Acceptance Criteria

The MVP passes when:

review-harness repo .
review-harness scrum .
review-harness model doctor

produce usable output without crashing.

Required files:

reports/latest/repo-intake.json
reports/latest/static-findings.json
reports/latest/scrum-test.md
reports/latest/risk-register.md
reports/latest/sprint-backlog.md
reports/latest/receipts.json

Suggested Static Checks For MVP

Implement these first:

hardcoded /home/
hardcoded /root/
hardcoded local IP addresses
exec(
spawn(
Command::new
raw SQL patterns:
- format!("SELECT
- string interpolation near SQL keywords
- template literals containing SELECT
- template literals containing INSERT
- template literals containing UPDATE
- template literals containing DELETE
Access-Control-Allow-Origin: *
committed .env files
private key patterns
files over 800 lines
TODO, FIXME, and HACK count
missing test directory
package or build files without corresponding test command

Output Style

Reports should be blunt and operational.

No motivational filler.

Use sections:

Verdict
Evidence
Confirmed Risks
Suspected Risks
Blocked Checks
Sprint Backlog
Acceptance Gates
Next Commands

Final Deliverable

After implementation, produce:

docs/REVIEW_PIPELINE.md
docs/LOCAL_MODEL_SETUP.md
docs/REPORT_SCHEMA.md
reports/latest/*

Then run the harness against this repository itself and include the self-review report.

Do Not

Do not require GitHub.
Do not require cloud LLMs.
Do not pretend local model output is authoritative.
Do not rewrite the target repository.
Do not make destructive changes.
Do not auto-commit.
Do not hide degraded model failures.

Strategic Goal

This should become the local review node for a larger autonomous development system.

Eventually it should plug into:

OpenClaw
MCP tools
local lakehouse memory
playbook sealing
CI verification
observer review loop

But first: make the local review harness reliable, inspectable, and evidence-driven.

8.6 KiB Executable File Raw Blame History

Claude Code Prompt: Build Local AI Code Review Harness

Mission

Core Principle

Target Use Case

Inspired Features To Extract

From PR-Agent

From Gito

From OpenReview

From Kodus

From Sourcery

Architecture

Required Modes

1. Full Repo Review

2. Diff Review

3. Scrum Test

4. Rules Audit

5. Local Model Probe

Local Model Requirements

Review Pipeline

Phase 0: Repo Intake

Phase 1: Static Scan

Phase 2: LLM Review

Phase 3: Validation

Phase 4: Report Generation

Phase 5: Memory

Validation Rules

First Implementation Target

MVP Acceptance Criteria

Suggested Static Checks For MVP

Output Style

Final Deliverable

Do Not

Strategic Goal

8.6 KiB

Executable File

Raw Blame History