Claude (review-harness setup) f3ee4722a8 Phase A + B (MVP) — local review harness
Implements the MVP cutline from the planning artifact:
- Phase A: skeleton + CLI dispatch + provider interface + stub model doctor
- Phase B: scanner + git probe + 12 static analyzers + reporters + pipeline
- Phase B fixtures: clean-repo, insecure-repo, degraded-repo

12 static analyzers per PROMPT.md "Suggested Static Checks For MVP":
hardcoded_paths, shell_execution, raw_sql_interpolation, broad_cors,
secret_patterns, large_files, todo_comments, missing_tests,
env_file_committed, unsafe_file_io, exposed_mutation_endpoint,
hardcoded_local_ip.

Acceptance gates passing:
- B1 (intake produces accurate counts) ✓
- B2 (insecure fixture fires ≥8 distinct check_ids — actually 11/12) ✓
- B3 (clean fixture produces 0 confirmed findings — no false positives) ✓
- B4 (scrum mode produces all 6 required markdown + JSON reports) ✓
- B5 (receipts.json marks degraded phases honestly) ✓
- F  (self-review on this repo runs without crashing) ✓ — exit 66 (degraded
  because Phase C LLM review is hardcoded skipped)

Phases C (LLM review), D (validation cross-check), E (memory + diff +
rules subcommands) deferred per the cutline. The MVP delivers the
evidence-first path; LLM is purely additive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:56:02 -05:00

8.6 KiB
Executable File

Claude Code Prompt: Build Local AI Code Review Harness

Mission

Create a local-first autonomous code review harness inspired by PR-Agent, Gito, OpenReview, Kodus, and Sourcery, but built around our own tools, local models, and validation-first workflow.

This is not a SaaS PR bot.

This is a local DevOps review system that can inspect a repository, summarize risk, identify architectural drift, detect unsafe code patterns, produce Scrum-style backlog reports, and optionally route review tasks through local LLMs using Ollama or another local model endpoint.

Core Principle

AI may suggest.

Code validates.

Reports must show evidence.

Nothing is trusted because a model said it.

Target Use Case

Given a repository path, the system should run a review pipeline that produces:

  • architecture overview
  • code health report
  • security and trust-boundary report
  • test coverage gap report
  • refactor recommendations
  • Scrum sprint backlog
  • acceptance gates
  • machine-readable JSON receipts

Inspired Features To Extract

From PR-Agent

Implement:

  • PR and diff-style review mode
  • summary of changed files
  • risk-ranked findings
  • suggested review comments
  • checklist output
  • confidence score per finding

Do not copy implementation. Recreate the concept locally.

From Gito

Implement:

  • local model compatibility
  • full-repo review mode
  • model-provider abstraction
  • ability to run without GitHub or SaaS
  • config-driven review profiles

From OpenReview

Implement:

  • webhook-ready design later
  • clean separation between:
    • repo scanner
    • diff analyzer
    • LLM reviewer
    • report generator
    • validation layer

For now, local CLI first.

From Kodus

Implement:

  • plain-language project rules
  • repo-specific review policy file
  • ability to enforce local conventions
  • persistent team memory rules

Example file:

.review-rules.md

From Sourcery

Implement:

  • low-level refactor suggestions
  • duplicated logic detection
  • complexity hotspots
  • dead code suspicion
  • long-file warnings
  • unsafe error handling warnings

Architecture

Create a modular system with this shape:

local-review-harness/
  configs/
    review-profile.example.yaml
    model-profile.example.yaml
  docs/
    REVIEW_PIPELINE.md
    LOCAL_MODEL_SETUP.md
    REPORT_SCHEMA.md
  src/
    cli/
    scanner/
    git/
    analyzers/
    llm/
    validators/
    reporters/
    memory/
  reports/
    latest/
  tests/
    fixtures/

Required Modes

1. Full Repo Review

Command:

review-harness repo /path/to/repo

Should inspect:

  • file tree
  • language mix
  • build files
  • test files
  • scripts
  • docs
  • dependency manifests
  • large files
  • suspicious hardcoded paths
  • TODO, FIXME, and security comments

2. Diff Review

Command:

review-harness diff /path/to/repo

Should inspect:

  • unstaged changes
  • staged changes
  • branch diff against main or master
  • changed functions where possible
  • risk introduced by change

3. Scrum Test

Command:

review-harness scrum /path/to/repo

Should produce:

reports/latest/scrum-test.md
reports/latest/risk-register.md
reports/latest/claim-coverage-table.md
reports/latest/sprint-backlog.md
reports/latest/acceptance-gates.md
reports/latest/receipts.json

4. Rules Audit

Command:

review-harness rules /path/to/repo

Reads:

.review-rules.md
.review-profile.yaml

Then checks whether the repository violates local project rules.

5. Local Model Probe

Command:

review-harness model doctor

Should test:

  • Ollama availability
  • configured model exists
  • context limit estimate
  • small prompt response
  • JSON-mode reliability if available
  • timeout behavior
  • fallback model behavior

Local Model Requirements

Support a model endpoint abstraction.

Initial provider:

provider: ollama
base_url: http://localhost:11434
model: qwen2.5-coder
fallback_model: llama3.1
timeout_seconds: 120
temperature: 0.1

Do not hardcode Ollama everywhere. Use a provider interface so OpenAI-compatible local endpoints can be added later.

Review Pipeline

Pipeline should run in phases.

Phase 0: Repo Intake

Collect:

  • repo path
  • git status
  • current branch
  • latest commit
  • language breakdown
  • file count
  • largest files
  • dependency manifests
  • test manifests

Output:

repo_intake.json

Phase 1: Static Scan

Detect:

  • hardcoded absolute paths
  • raw SQL interpolation
  • shell command execution
  • unsafe environment handling
  • broad CORS
  • exposed mutation endpoints
  • suspicious secret patterns
  • unchecked file reads and writes
  • missing error handling
  • excessive file size
  • missing tests near critical code

Output:

static_findings.json

Phase 2: LLM Review

Send bounded chunks to the local model.

The model must return strict JSON:

{
  "findings": [
    {
      "title": "",
      "severity": "low|medium|high|critical",
      "file": "",
      "line_hint": "",
      "evidence": "",
      "reason": "",
      "suggested_fix": "",
      "confidence": 0.0
    }
  ]
}

If model output is invalid JSON, retry once with a repair prompt.

If the output is still invalid, save raw output and mark the model phase degraded.

Phase 3: Validation

Every LLM finding must be validated against actual files.

Reject findings that:

  • point to missing files
  • cite text that does not exist
  • make unsupported claims
  • recommend unrelated rewrites
  • lack evidence

Output:

validated_findings.json

Phase 4: Report Generation

Generate Markdown reports:

  • executive summary
  • risk register
  • sprint backlog
  • acceptance gates
  • test gaps
  • architecture drift
  • suggested next commands

Phase 5: Memory

Create local memory files:

.memory/review-rules.md
.memory/known-risks.json
.memory/fixed-patterns.json
.memory/project-profile.json

Memory should be append-only by default.

Never silently overwrite prior memory. Version it.

Validation Rules

Hard rules:

  1. No hallucinated files.
  2. No invented tests.
  3. No fake command success.
  4. No "appears to work" language without evidence.
  5. Every finding must include:
    • file path
    • evidence snippet
    • risk
    • suggested next action
  6. Reports must distinguish:
    • confirmed issue
    • suspected issue
    • missing evidence
    • blocked by unavailable dependency

First Implementation Target

Do not build everything at once.

Implement MVP:

Phase 0 repo intake
Phase 1 static scan
Phase 4 report generation
Basic Ollama model doctor

Then add LLM review after the static evidence pipeline is stable.

MVP Acceptance Criteria

The MVP passes when:

review-harness repo .
review-harness scrum .
review-harness model doctor

produce usable output without crashing.

Required files:

reports/latest/repo-intake.json
reports/latest/static-findings.json
reports/latest/scrum-test.md
reports/latest/risk-register.md
reports/latest/sprint-backlog.md
reports/latest/receipts.json

Suggested Static Checks For MVP

Implement these first:

  • hardcoded /home/
  • hardcoded /root/
  • hardcoded local IP addresses
  • exec(
  • spawn(
  • Command::new
  • raw SQL patterns:
    • format!("SELECT
    • string interpolation near SQL keywords
    • template literals containing SELECT
    • template literals containing INSERT
    • template literals containing UPDATE
    • template literals containing DELETE
  • Access-Control-Allow-Origin: *
  • committed .env files
  • private key patterns
  • files over 800 lines
  • TODO, FIXME, and HACK count
  • missing test directory
  • package or build files without corresponding test command

Output Style

Reports should be blunt and operational.

No motivational filler.

Use sections:

Verdict
Evidence
Confirmed Risks
Suspected Risks
Blocked Checks
Sprint Backlog
Acceptance Gates
Next Commands

Final Deliverable

After implementation, produce:

docs/REVIEW_PIPELINE.md
docs/LOCAL_MODEL_SETUP.md
docs/REPORT_SCHEMA.md
reports/latest/*

Then run the harness against this repository itself and include the self-review report.

Do Not

  • Do not require GitHub.
  • Do not require cloud LLMs.
  • Do not pretend local model output is authoritative.
  • Do not rewrite the target repository.
  • Do not make destructive changes.
  • Do not auto-commit.
  • Do not hide degraded model failures.

Strategic Goal

This should become the local review node for a larger autonomous development system.

Eventually it should plug into:

  • OpenClaw
  • MCP tools
  • local lakehouse memory
  • playbook sealing
  • CI verification
  • observer review loop

But first: make the local review harness reliable, inspectable, and evidence-driven.