# Claude Code Prompt: Build Local AI Code Review Harness ## Mission Create a local-first autonomous code review harness inspired by PR-Agent, Gito, OpenReview, Kodus, and Sourcery, but built around our own tools, local models, and validation-first workflow. This is not a SaaS PR bot. This is a local DevOps review system that can inspect a repository, summarize risk, identify architectural drift, detect unsafe code patterns, produce Scrum-style backlog reports, and optionally route review tasks through local LLMs using Ollama or another local model endpoint. ## Core Principle AI may suggest. Code validates. Reports must show evidence. Nothing is trusted because a model said it. ## Target Use Case Given a repository path, the system should run a review pipeline that produces: - architecture overview - code health report - security and trust-boundary report - test coverage gap report - refactor recommendations - Scrum sprint backlog - acceptance gates - machine-readable JSON receipts ## Inspired Features To Extract ### From PR-Agent Implement: - PR and diff-style review mode - summary of changed files - risk-ranked findings - suggested review comments - checklist output - confidence score per finding Do not copy implementation. Recreate the concept locally. ### From Gito Implement: - local model compatibility - full-repo review mode - model-provider abstraction - ability to run without GitHub or SaaS - config-driven review profiles ### From OpenReview Implement: - webhook-ready design later - clean separation between: - repo scanner - diff analyzer - LLM reviewer - report generator - validation layer For now, local CLI first. ### From Kodus Implement: - plain-language project rules - repo-specific review policy file - ability to enforce local conventions - persistent team memory rules Example file: ```text .review-rules.md ``` ### From Sourcery Implement: - low-level refactor suggestions - duplicated logic detection - complexity hotspots - dead code suspicion - long-file warnings - unsafe error handling warnings ## Architecture Create a modular system with this shape: ```text local-review-harness/ configs/ review-profile.example.yaml model-profile.example.yaml docs/ REVIEW_PIPELINE.md LOCAL_MODEL_SETUP.md REPORT_SCHEMA.md src/ cli/ scanner/ git/ analyzers/ llm/ validators/ reporters/ memory/ reports/ latest/ tests/ fixtures/ ``` ## Required Modes ### 1. Full Repo Review Command: ```bash review-harness repo /path/to/repo ``` Should inspect: - file tree - language mix - build files - test files - scripts - docs - dependency manifests - large files - suspicious hardcoded paths - TODO, FIXME, and security comments ### 2. Diff Review Command: ```bash review-harness diff /path/to/repo ``` Should inspect: - unstaged changes - staged changes - branch diff against main or master - changed functions where possible - risk introduced by change ### 3. Scrum Test Command: ```bash review-harness scrum /path/to/repo ``` Should produce: ```text reports/latest/scrum-test.md reports/latest/risk-register.md reports/latest/claim-coverage-table.md reports/latest/sprint-backlog.md reports/latest/acceptance-gates.md reports/latest/receipts.json ``` ### 4. Rules Audit Command: ```bash review-harness rules /path/to/repo ``` Reads: ```text .review-rules.md .review-profile.yaml ``` Then checks whether the repository violates local project rules. ### 5. Local Model Probe Command: ```bash review-harness model doctor ``` Should test: - Ollama availability - configured model exists - context limit estimate - small prompt response - JSON-mode reliability if available - timeout behavior - fallback model behavior ## Local Model Requirements Support a model endpoint abstraction. Initial provider: ```yaml provider: ollama base_url: http://localhost:11434 model: qwen2.5-coder fallback_model: llama3.1 timeout_seconds: 120 temperature: 0.1 ``` Do not hardcode Ollama everywhere. Use a provider interface so OpenAI-compatible local endpoints can be added later. ## Review Pipeline Pipeline should run in phases. ### Phase 0: Repo Intake Collect: - repo path - git status - current branch - latest commit - language breakdown - file count - largest files - dependency manifests - test manifests Output: ```text repo_intake.json ``` ### Phase 1: Static Scan Detect: - hardcoded absolute paths - raw SQL interpolation - shell command execution - unsafe environment handling - broad CORS - exposed mutation endpoints - suspicious secret patterns - unchecked file reads and writes - missing error handling - excessive file size - missing tests near critical code Output: ```text static_findings.json ``` ### Phase 2: LLM Review Send bounded chunks to the local model. The model must return strict JSON: ```json { "findings": [ { "title": "", "severity": "low|medium|high|critical", "file": "", "line_hint": "", "evidence": "", "reason": "", "suggested_fix": "", "confidence": 0.0 } ] } ``` If model output is invalid JSON, retry once with a repair prompt. If the output is still invalid, save raw output and mark the model phase degraded. ### Phase 3: Validation Every LLM finding must be validated against actual files. Reject findings that: - point to missing files - cite text that does not exist - make unsupported claims - recommend unrelated rewrites - lack evidence Output: ```text validated_findings.json ``` ### Phase 4: Report Generation Generate Markdown reports: - executive summary - risk register - sprint backlog - acceptance gates - test gaps - architecture drift - suggested next commands ### Phase 5: Memory Create local memory files: ```text .memory/review-rules.md .memory/known-risks.json .memory/fixed-patterns.json .memory/project-profile.json ``` Memory should be append-only by default. Never silently overwrite prior memory. Version it. ## Validation Rules Hard rules: 1. No hallucinated files. 2. No invented tests. 3. No fake command success. 4. No "appears to work" language without evidence. 5. Every finding must include: - file path - evidence snippet - risk - suggested next action 6. Reports must distinguish: - confirmed issue - suspected issue - missing evidence - blocked by unavailable dependency ## First Implementation Target Do not build everything at once. Implement MVP: ```text Phase 0 repo intake Phase 1 static scan Phase 4 report generation Basic Ollama model doctor ``` Then add LLM review after the static evidence pipeline is stable. ## MVP Acceptance Criteria The MVP passes when: ```bash review-harness repo . review-harness scrum . review-harness model doctor ``` produce usable output without crashing. Required files: ```text reports/latest/repo-intake.json reports/latest/static-findings.json reports/latest/scrum-test.md reports/latest/risk-register.md reports/latest/sprint-backlog.md reports/latest/receipts.json ``` ## Suggested Static Checks For MVP Implement these first: - hardcoded `/home/` - hardcoded `/root/` - hardcoded local IP addresses - `exec(` - `spawn(` - `Command::new` - raw SQL patterns: - `format!("SELECT` - string interpolation near SQL keywords - template literals containing `SELECT` - template literals containing `INSERT` - template literals containing `UPDATE` - template literals containing `DELETE` - `Access-Control-Allow-Origin: *` - committed `.env` files - private key patterns - files over 800 lines - TODO, FIXME, and HACK count - missing test directory - package or build files without corresponding test command ## Output Style Reports should be blunt and operational. No motivational filler. Use sections: ```text Verdict Evidence Confirmed Risks Suspected Risks Blocked Checks Sprint Backlog Acceptance Gates Next Commands ``` ## Final Deliverable After implementation, produce: ```text docs/REVIEW_PIPELINE.md docs/LOCAL_MODEL_SETUP.md docs/REPORT_SCHEMA.md reports/latest/* ``` Then run the harness against this repository itself and include the self-review report. ## Do Not - Do not require GitHub. - Do not require cloud LLMs. - Do not pretend local model output is authoritative. - Do not rewrite the target repository. - Do not make destructive changes. - Do not auto-commit. - Do not hide degraded model failures. ## Strategic Goal This should become the local review node for a larger autonomous development system. Eventually it should plug into: - OpenClaw - MCP tools - local lakehouse memory - playbook sealing - CI verification - observer review loop But first: make the local review harness reliable, inspectable, and evidence-driven.