local-review-harness/docs/REVIEW_PIPELINE.md

# Review Pipeline Specification

## Purpose

This document defines the local review harness pipeline.

The pipeline exists to inspect a repository, collect evidence, identify risks, validate model claims, and generate operational reports without relying on cloud services.

## Pipeline Overview

```text
Repo Intake
  -> Static Scan
  -> Optional LLM Review
  -> Validation
  -> Report Generation
  -> Memory Update
```

## Phase 0: Repo Intake

### Goal

Build a factual profile of the target repository.

### Inputs

- repository path
- git metadata
- filesystem metadata
- dependency manifests
- build files
- test files

### Required Output

```text
reports/latest/repo-intake.json
```

### Required Fields

```json
{
  "repo_path": "",
  "current_branch": "",
  "latest_commit": "",
  "git_status": "",
  "file_count": 0,
  "language_breakdown": {},
  "largest_files": [],
  "dependency_manifests": [],
  "test_manifests": [],
  "generated_at": ""
}
```

## Phase 1: Static Scan

### Goal

Find evidence-backed problems without using an LLM.

### Detection Targets

- hardcoded absolute paths
- unsafe shell execution
- raw SQL interpolation
- exposed mutation endpoints
- broad CORS
- unchecked file reads and writes
- suspicious secret patterns
- large files
- TODO, FIXME, HACK comments
- missing tests near critical modules

### Required Output

```text
reports/latest/static-findings.json
```

## Phase 2: LLM Review

### Goal

Use a local model to perform higher-level reasoning over bounded evidence chunks.

### Rules

- Do not send the entire repository blindly.
- Chunk inputs by file, function, or diff boundary.
- Require strict JSON output.
- Retry invalid JSON once.
- Save degraded output if parsing fails.
- Never trust model claims without validation.

### Required Output

```text
reports/latest/llm-findings.raw.json
reports/latest/llm-findings.normalized.json
```

## Phase 3: Validation

### Goal

Validate every LLM-generated finding against real repository evidence.

### Reject A Finding If

- the file does not exist
- the cited evidence does not exist
- the line hint is impossible
- the claim is unsupported
- the suggested fix targets unrelated code
- the model invents tests, commands, or files

### Required Output

```text
reports/latest/validated-findings.json
```

## Phase 4: Report Generation

### Goal

Produce human-readable and machine-readable reports.

### Required Markdown Reports

```text
reports/latest/scrum-test.md
reports/latest/risk-register.md
reports/latest/claim-coverage-table.md
reports/latest/sprint-backlog.md
reports/latest/acceptance-gates.md
```

### Required JSON Receipt

```text
reports/latest/receipts.json
```

## Phase 5: Memory

### Goal

Persist durable review knowledge for future runs.

### Required Memory Files

```text
.memory/review-rules.md
.memory/known-risks.json
.memory/fixed-patterns.json
.memory/project-profile.json
```

### Memory Rules

- append-only by default
- version every update
- never silently overwrite
- record source run ID
- record evidence file
- record confidence level

## Degraded Mode

A phase is degraded when it cannot fully run but the pipeline can continue.

Examples:

- Ollama unavailable
- model returns invalid JSON
- repository has no git metadata
- dependency manager unavailable
- large dataset missing

Degraded mode must be explicit in reports.

No silent success.