# Review Pipeline Specification ## Purpose This document defines the local review harness pipeline. The pipeline exists to inspect a repository, collect evidence, identify risks, validate model claims, and generate operational reports without relying on cloud services. ## Pipeline Overview ```text Repo Intake -> Static Scan -> Optional LLM Review -> Validation -> Report Generation -> Memory Update ``` ## Phase 0: Repo Intake ### Goal Build a factual profile of the target repository. ### Inputs - repository path - git metadata - filesystem metadata - dependency manifests - build files - test files ### Required Output ```text reports/latest/repo-intake.json ``` ### Required Fields ```json { "repo_path": "", "current_branch": "", "latest_commit": "", "git_status": "", "file_count": 0, "language_breakdown": {}, "largest_files": [], "dependency_manifests": [], "test_manifests": [], "generated_at": "" } ``` ## Phase 1: Static Scan ### Goal Find evidence-backed problems without using an LLM. ### Detection Targets - hardcoded absolute paths - unsafe shell execution - raw SQL interpolation - exposed mutation endpoints - broad CORS - unchecked file reads and writes - suspicious secret patterns - large files - TODO, FIXME, HACK comments - missing tests near critical modules ### Required Output ```text reports/latest/static-findings.json ``` ## Phase 2: LLM Review ### Goal Use a local model to perform higher-level reasoning over bounded evidence chunks. ### Rules - Do not send the entire repository blindly. - Chunk inputs by file, function, or diff boundary. - Require strict JSON output. - Retry invalid JSON once. - Save degraded output if parsing fails. - Never trust model claims without validation. ### Required Output ```text reports/latest/llm-findings.raw.json reports/latest/llm-findings.normalized.json ``` ## Phase 3: Validation ### Goal Validate every LLM-generated finding against real repository evidence. ### Reject A Finding If - the file does not exist - the cited evidence does not exist - the line hint is impossible - the claim is unsupported - the suggested fix targets unrelated code - the model invents tests, commands, or files ### Required Output ```text reports/latest/validated-findings.json ``` ## Phase 4: Report Generation ### Goal Produce human-readable and machine-readable reports. ### Required Markdown Reports ```text reports/latest/scrum-test.md reports/latest/risk-register.md reports/latest/claim-coverage-table.md reports/latest/sprint-backlog.md reports/latest/acceptance-gates.md ``` ### Required JSON Receipt ```text reports/latest/receipts.json ``` ## Phase 5: Memory ### Goal Persist durable review knowledge for future runs. ### Required Memory Files ```text .memory/review-rules.md .memory/known-risks.json .memory/fixed-patterns.json .memory/project-profile.json ``` ### Memory Rules - append-only by default - version every update - never silently overwrite - record source run ID - record evidence file - record confidence level ## Degraded Mode A phase is degraded when it cannot fully run but the pipeline can continue. Examples: - Ollama unavailable - model returns invalid JSON - repository has no git metadata - dependency manager unavailable - large dataset missing Degraded mode must be explicit in reports. No silent success.