commit f3ee4722a887c4086cee06fe6f4c4a7fde373874 Author: Claude (review-harness setup) Date: Thu Apr 30 00:56:02 2026 -0500 Phase A + B (MVP) — local review harness Implements the MVP cutline from the planning artifact: - Phase A: skeleton + CLI dispatch + provider interface + stub model doctor - Phase B: scanner + git probe + 12 static analyzers + reporters + pipeline - Phase B fixtures: clean-repo, insecure-repo, degraded-repo 12 static analyzers per PROMPT.md "Suggested Static Checks For MVP": hardcoded_paths, shell_execution, raw_sql_interpolation, broad_cors, secret_patterns, large_files, todo_comments, missing_tests, env_file_committed, unsafe_file_io, exposed_mutation_endpoint, hardcoded_local_ip. Acceptance gates passing: - B1 (intake produces accurate counts) ✓ - B2 (insecure fixture fires ≥8 distinct check_ids — actually 11/12) ✓ - B3 (clean fixture produces 0 confirmed findings — no false positives) ✓ - B4 (scrum mode produces all 6 required markdown + JSON reports) ✓ - B5 (receipts.json marks degraded phases honestly) ✓ - F (self-review on this repo runs without crashing) ✓ — exit 66 (degraded because Phase C LLM review is hardcoded skipped) Phases C (LLM review), D (validation cross-check), E (memory + diff + rules subcommands) deferred per the cutline. The MVP delivers the evidence-first path; LLM is purely additive. Co-Authored-By: Claude Opus 4.7 (1M context) diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..f3f011f --- /dev/null +++ b/.gitignore @@ -0,0 +1,20 @@ +# Build output +/review-harness +/bin/ + +# Runtime artifacts (PROMPT.md: reports go here per-run; gitignored) +/reports/latest/ +/reports/run-*/ + +# Memory persistence (lives next to target repos, not this one) +/.memory/ + +# Go +*.test +*.out + +# Editor +.DS_Store +.idea/ +.vscode/ +*.swp diff --git a/FIRST_COMMAND_FOR_CLAUDE_CODE.md b/FIRST_COMMAND_FOR_CLAUDE_CODE.md new file mode 100755 index 0000000..d13c5d4 --- /dev/null +++ b/FIRST_COMMAND_FOR_CLAUDE_CODE.md @@ -0,0 +1,26 @@ +# First Command For Claude Code + +Read all Markdown files in this directory. + +Do not write code yet. + +Produce only the following: + +1. Execution plan +2. Module breakdown +3. Proposed file structure +4. Risk register before implementation +5. MVP cutline +6. Acceptance gates + +Wait for confirmation before coding. + +Hard rules: + +- Do not create implementation files yet. +- Do not install dependencies yet. +- Do not auto-commit. +- Do not rewrite the prompt. +- Do not collapse phases. +- Do not add cloud dependencies. +- Do not claim the system works until commands prove it. diff --git a/PROMPT.md b/PROMPT.md new file mode 100755 index 0000000..9ff9d7f --- /dev/null +++ b/PROMPT.md @@ -0,0 +1,492 @@ +# Claude Code Prompt: Build Local AI Code Review Harness + +## Mission + +Create a local-first autonomous code review harness inspired by PR-Agent, Gito, OpenReview, Kodus, and Sourcery, but built around our own tools, local models, and validation-first workflow. + +This is not a SaaS PR bot. + +This is a local DevOps review system that can inspect a repository, summarize risk, identify architectural drift, detect unsafe code patterns, produce Scrum-style backlog reports, and optionally route review tasks through local LLMs using Ollama or another local model endpoint. + +## Core Principle + +AI may suggest. + +Code validates. + +Reports must show evidence. + +Nothing is trusted because a model said it. + +## Target Use Case + +Given a repository path, the system should run a review pipeline that produces: + +- architecture overview +- code health report +- security and trust-boundary report +- test coverage gap report +- refactor recommendations +- Scrum sprint backlog +- acceptance gates +- machine-readable JSON receipts + +## Inspired Features To Extract + +### From PR-Agent + +Implement: + +- PR and diff-style review mode +- summary of changed files +- risk-ranked findings +- suggested review comments +- checklist output +- confidence score per finding + +Do not copy implementation. Recreate the concept locally. + +### From Gito + +Implement: + +- local model compatibility +- full-repo review mode +- model-provider abstraction +- ability to run without GitHub or SaaS +- config-driven review profiles + +### From OpenReview + +Implement: + +- webhook-ready design later +- clean separation between: + - repo scanner + - diff analyzer + - LLM reviewer + - report generator + - validation layer + +For now, local CLI first. + +### From Kodus + +Implement: + +- plain-language project rules +- repo-specific review policy file +- ability to enforce local conventions +- persistent team memory rules + +Example file: + +```text +.review-rules.md +``` + +### From Sourcery + +Implement: + +- low-level refactor suggestions +- duplicated logic detection +- complexity hotspots +- dead code suspicion +- long-file warnings +- unsafe error handling warnings + +## Architecture + +Create a modular system with this shape: + +```text +local-review-harness/ + configs/ + review-profile.example.yaml + model-profile.example.yaml + docs/ + REVIEW_PIPELINE.md + LOCAL_MODEL_SETUP.md + REPORT_SCHEMA.md + src/ + cli/ + scanner/ + git/ + analyzers/ + llm/ + validators/ + reporters/ + memory/ + reports/ + latest/ + tests/ + fixtures/ +``` + +## Required Modes + +### 1. Full Repo Review + +Command: + +```bash +review-harness repo /path/to/repo +``` + +Should inspect: + +- file tree +- language mix +- build files +- test files +- scripts +- docs +- dependency manifests +- large files +- suspicious hardcoded paths +- TODO, FIXME, and security comments + +### 2. Diff Review + +Command: + +```bash +review-harness diff /path/to/repo +``` + +Should inspect: + +- unstaged changes +- staged changes +- branch diff against main or master +- changed functions where possible +- risk introduced by change + +### 3. Scrum Test + +Command: + +```bash +review-harness scrum /path/to/repo +``` + +Should produce: + +```text +reports/latest/scrum-test.md +reports/latest/risk-register.md +reports/latest/claim-coverage-table.md +reports/latest/sprint-backlog.md +reports/latest/acceptance-gates.md +reports/latest/receipts.json +``` + +### 4. Rules Audit + +Command: + +```bash +review-harness rules /path/to/repo +``` + +Reads: + +```text +.review-rules.md +.review-profile.yaml +``` + +Then checks whether the repository violates local project rules. + +### 5. Local Model Probe + +Command: + +```bash +review-harness model doctor +``` + +Should test: + +- Ollama availability +- configured model exists +- context limit estimate +- small prompt response +- JSON-mode reliability if available +- timeout behavior +- fallback model behavior + +## Local Model Requirements + +Support a model endpoint abstraction. + +Initial provider: + +```yaml +provider: ollama +base_url: http://localhost:11434 +model: qwen2.5-coder +fallback_model: llama3.1 +timeout_seconds: 120 +temperature: 0.1 +``` + +Do not hardcode Ollama everywhere. Use a provider interface so OpenAI-compatible local endpoints can be added later. + +## Review Pipeline + +Pipeline should run in phases. + +### Phase 0: Repo Intake + +Collect: + +- repo path +- git status +- current branch +- latest commit +- language breakdown +- file count +- largest files +- dependency manifests +- test manifests + +Output: + +```text +repo_intake.json +``` + +### Phase 1: Static Scan + +Detect: + +- hardcoded absolute paths +- raw SQL interpolation +- shell command execution +- unsafe environment handling +- broad CORS +- exposed mutation endpoints +- suspicious secret patterns +- unchecked file reads and writes +- missing error handling +- excessive file size +- missing tests near critical code + +Output: + +```text +static_findings.json +``` + +### Phase 2: LLM Review + +Send bounded chunks to the local model. + +The model must return strict JSON: + +```json +{ + "findings": [ + { + "title": "", + "severity": "low|medium|high|critical", + "file": "", + "line_hint": "", + "evidence": "", + "reason": "", + "suggested_fix": "", + "confidence": 0.0 + } + ] +} +``` + +If model output is invalid JSON, retry once with a repair prompt. + +If the output is still invalid, save raw output and mark the model phase degraded. + +### Phase 3: Validation + +Every LLM finding must be validated against actual files. + +Reject findings that: + +- point to missing files +- cite text that does not exist +- make unsupported claims +- recommend unrelated rewrites +- lack evidence + +Output: + +```text +validated_findings.json +``` + +### Phase 4: Report Generation + +Generate Markdown reports: + +- executive summary +- risk register +- sprint backlog +- acceptance gates +- test gaps +- architecture drift +- suggested next commands + +### Phase 5: Memory + +Create local memory files: + +```text +.memory/review-rules.md +.memory/known-risks.json +.memory/fixed-patterns.json +.memory/project-profile.json +``` + +Memory should be append-only by default. + +Never silently overwrite prior memory. Version it. + +## Validation Rules + +Hard rules: + +1. No hallucinated files. +2. No invented tests. +3. No fake command success. +4. No "appears to work" language without evidence. +5. Every finding must include: + - file path + - evidence snippet + - risk + - suggested next action +6. Reports must distinguish: + - confirmed issue + - suspected issue + - missing evidence + - blocked by unavailable dependency + +## First Implementation Target + +Do not build everything at once. + +Implement MVP: + +```text +Phase 0 repo intake +Phase 1 static scan +Phase 4 report generation +Basic Ollama model doctor +``` + +Then add LLM review after the static evidence pipeline is stable. + +## MVP Acceptance Criteria + +The MVP passes when: + +```bash +review-harness repo . +review-harness scrum . +review-harness model doctor +``` + +produce usable output without crashing. + +Required files: + +```text +reports/latest/repo-intake.json +reports/latest/static-findings.json +reports/latest/scrum-test.md +reports/latest/risk-register.md +reports/latest/sprint-backlog.md +reports/latest/receipts.json +``` + +## Suggested Static Checks For MVP + +Implement these first: + +- hardcoded `/home/` +- hardcoded `/root/` +- hardcoded local IP addresses +- `exec(` +- `spawn(` +- `Command::new` +- raw SQL patterns: + - `format!("SELECT` + - string interpolation near SQL keywords + - template literals containing `SELECT` + - template literals containing `INSERT` + - template literals containing `UPDATE` + - template literals containing `DELETE` +- `Access-Control-Allow-Origin: *` +- committed `.env` files +- private key patterns +- files over 800 lines +- TODO, FIXME, and HACK count +- missing test directory +- package or build files without corresponding test command + +## Output Style + +Reports should be blunt and operational. + +No motivational filler. + +Use sections: + +```text +Verdict +Evidence +Confirmed Risks +Suspected Risks +Blocked Checks +Sprint Backlog +Acceptance Gates +Next Commands +``` + +## Final Deliverable + +After implementation, produce: + +```text +docs/REVIEW_PIPELINE.md +docs/LOCAL_MODEL_SETUP.md +docs/REPORT_SCHEMA.md +reports/latest/* +``` + +Then run the harness against this repository itself and include the self-review report. + +## Do Not + +- Do not require GitHub. +- Do not require cloud LLMs. +- Do not pretend local model output is authoritative. +- Do not rewrite the target repository. +- Do not make destructive changes. +- Do not auto-commit. +- Do not hide degraded model failures. + +## Strategic Goal + +This should become the local review node for a larger autonomous development system. + +Eventually it should plug into: + +- OpenClaw +- MCP tools +- local lakehouse memory +- playbook sealing +- CI verification +- observer review loop + +But first: make the local review harness reliable, inspectable, and evidence-driven. diff --git a/README.md b/README.md new file mode 100644 index 0000000..7fcc464 --- /dev/null +++ b/README.md @@ -0,0 +1,86 @@ +# local-review-harness + +Local-first code review harness. Walks a repository, runs evidence-bearing static checks, generates Scrum-style reports. **No cloud dependencies.** LLM review is local-Ollama-only (Phase C, not yet shipped). + +Per `FIRST_COMMAND_FOR_CLAUDE_CODE.md` + `PROMPT.md` — "AI may suggest. Code validates. Reports must show evidence." Findings without grep-able evidence get rejected; the validator phase rejects model claims that cite missing files. + +## Status + +**Phase A + Phase B (MVP) shipped.** What works today: +- `review-harness repo ` — Phase 0 intake + Phase 1 static scan +- `review-harness scrum ` — same pipeline + full Scrum report bundle +- `review-harness model doctor` — stub (real Ollama probe in Phase C) +- 12 static analyzers covering hardcoded paths, shell exec, raw SQL, wildcard CORS, secret patterns, large files, TODO/FIXME, missing tests, committed `.env`, unsafe file I/O, exposed mutation endpoints, hardcoded private-network IPs + +**Phases C–E pending**: real LLM review, validation cross-check, append-only memory, diff/rules subcommands. + +## Build + +Single static binary, no cgo: + +```bash +go build -o review-harness ./cmd/review-harness +``` + +Requires Go 1.22+. + +## Run + +```bash +# Full repo review (Phase 0 + Phase 1 + Phase 4) +./review-harness repo /path/to/target/repo + +# Same + Scrum bundle (scrum-test.md, risk-register.md, sprint-backlog.md, acceptance-gates.md) +./review-harness scrum /path/to/target/repo + +# Model doctor stub +./review-harness model doctor +``` + +Reports land in `/reports/latest/` by default; override with `--output-dir`. + +Optional config files: + +```bash +./review-harness scrum /path --review-profile configs/review-profile.example.yaml \ + --model-profile configs/model-profile.example.yaml +``` + +## Self-review + +The harness reviews itself as a sanity gate (PROMPT.md "Final Deliverable"): + +```bash +./review-harness scrum . +cat reports/latest/scrum-test.md +``` + +The fixture-planted secrets in `tests/fixtures/insecure-repo/` are intentional — they prove the secret-pattern analyzer fires. Operators reviewing the self-report should expect those critical-severity hits and dismiss them as fixture content. + +## Test fixtures + +Three synthetic repos under `tests/fixtures/`: + +| Fixture | Purpose | Expected outcome | +|---|---|---| +| `clean-repo/` | sterile reference | 0 confirmed findings | +| `insecure-repo/` | every static check fires | ≥8 distinct check IDs | +| `degraded-repo/` | no git, no manifests | `repo_intake` phase marked degraded | + +Run them all to validate after a regex change: + +```bash +for f in clean-repo insecure-repo degraded-repo; do + ./review-harness scrum "tests/fixtures/$f" > /dev/null + echo "$f: $(jq '.summary.total' tests/fixtures/$f/reports/latest/static-findings.json) findings" +done +``` + +## Exit codes + +- `0` — clean run, no degraded phases +- `64` — usage error +- `65` — runtime error (config parse fail, target path missing, etc.) +- `66` — degraded mode (one or more phases skipped or stubbed; reports still produced) + +`66` is the expected exit code in MVP because the LLM phase is hardcoded degraded until Phase C lands. diff --git a/cmd/review-harness/main.go b/cmd/review-harness/main.go new file mode 100644 index 0000000..6244ef6 --- /dev/null +++ b/cmd/review-harness/main.go @@ -0,0 +1,90 @@ +// Local review harness — entry point. +// +// Subcommands per PROMPT.md: +// repo /path — full-repo review (Phase B MVP) +// diff /path — diff/PR-style review (Phase E) +// scrum /path — scrum-test report bundle (Phase B MVP) +// rules /path — rules audit (Phase E) +// model doctor — Ollama probe + JSON shape (Phase A stub, Phase C real) +// +// PROMPT.md hard rules apply: no cloud deps, no auto-commit, no +// destructive changes to the target repo, no fake success. +package main + +import ( + "flag" + "fmt" + "os" + + "local-review-harness/internal/cli" +) + +func main() { + if len(os.Args) < 2 { + usage() + os.Exit(2) + } + + // Per-subcommand flag sets. Common flags (--review-profile, + // --model-profile, --output-dir) live on each FlagSet rather than + // a global pre-parser; the CLI library would be overkill for 5 + // subcommands. + sub := os.Args[1] + args := os.Args[2:] + + switch sub { + case "repo": + os.Exit(cli.Repo(args)) + case "diff": + fmt.Fprintln(os.Stderr, "diff: not implemented in MVP (Phase E)") + os.Exit(64) + case "scrum": + os.Exit(cli.Scrum(args)) + case "rules": + fmt.Fprintln(os.Stderr, "rules: not implemented in MVP (Phase E)") + os.Exit(64) + case "model": + // Two-token verb: "model doctor" + if len(args) < 1 { + fmt.Fprintln(os.Stderr, "model: missing verb (try: model doctor)") + os.Exit(2) + } + switch args[0] { + case "doctor": + os.Exit(cli.ModelDoctor(args[1:])) + default: + fmt.Fprintf(os.Stderr, "model: unknown verb %q\n", args[0]) + os.Exit(2) + } + case "-h", "--help", "help": + usage() + os.Exit(0) + case "version": + fmt.Println("review-harness 0.1.0 (Phase A skeleton)") + os.Exit(0) + default: + fmt.Fprintf(os.Stderr, "unknown subcommand: %q\n", sub) + usage() + os.Exit(2) + } + _ = flag.CommandLine // keep import stable across phases +} + +func usage() { + fmt.Fprintln(os.Stderr, `review-harness — local-first code review + +Usage: + review-harness repo full-repo review (MVP) + review-harness scrum scrum-test report bundle (MVP) + review-harness model doctor probe Ollama / configured models + review-harness diff diff review (Phase E, not yet) + review-harness rules rules audit (Phase E, not yet) + review-harness version print version + review-harness help this message + +Common flags (per subcommand): + --review-profile YAML; defaults applied if omitted + --model-profile YAML; defaults applied if omitted + --output-dir override review-profile output dir +`) +} diff --git a/configs/model-profile.example.yaml b/configs/model-profile.example.yaml new file mode 100755 index 0000000..bad0ca6 --- /dev/null +++ b/configs/model-profile.example.yaml @@ -0,0 +1,8 @@ +# Example Model Profile + +provider: ollama +base_url: http://localhost:11434 +model: qwen2.5-coder +fallback_model: llama3.1 +timeout_seconds: 120 +temperature: 0.1 diff --git a/configs/review-profile.example.yaml b/configs/review-profile.example.yaml new file mode 100755 index 0000000..476252a --- /dev/null +++ b/configs/review-profile.example.yaml @@ -0,0 +1,33 @@ +# Example Review Profile + +project_name: local-review-harness +mode: local-first + +severity_thresholds: + fail_on_critical: true + fail_on_high: false + +static_checks: + hardcoded_paths: true + raw_sql_interpolation: true + shell_execution: true + broad_cors: true + secret_patterns: true + large_files: true + todo_comments: true + missing_tests: true + +limits: + large_file_lines: 800 + max_file_bytes: 1000000 + max_llm_chunk_chars: 12000 + +reports: + output_dir: reports/latest + markdown: true + json_receipts: true + +memory: + enabled: true + path: .memory + append_only: true diff --git a/docs/LOCAL_MODEL_SETUP.md b/docs/LOCAL_MODEL_SETUP.md new file mode 100755 index 0000000..760bd57 --- /dev/null +++ b/docs/LOCAL_MODEL_SETUP.md @@ -0,0 +1,86 @@ +# Local Model Setup + +## Purpose + +The review harness should use local models first. + +The first supported provider is Ollama. + +The design must allow OpenAI-compatible local endpoints later. + +## Default Ollama Profile + +```yaml +provider: ollama +base_url: http://localhost:11434 +model: qwen2.5-coder +fallback_model: llama3.1 +timeout_seconds: 120 +temperature: 0.1 +``` + +## Model Doctor Command + +The harness must provide: + +```bash +review-harness model doctor +``` + +## Doctor Checks + +The doctor command should test: + +- Ollama server availability +- configured model availability +- fallback model availability +- basic prompt response +- JSON response reliability +- timeout behavior +- degraded-mode behavior + +## Required Doctor Output + +```text +reports/latest/model-doctor.json +``` + +## Required JSON Fields + +```json +{ + "provider": "ollama", + "base_url": "http://localhost:11434", + "primary_model": "", + "fallback_model": "", + "server_available": false, + "primary_model_available": false, + "fallback_model_available": false, + "basic_prompt_ok": false, + "json_mode_ok": false, + "timeout_seconds": 120, + "status": "ok|degraded|failed", + "errors": [] +} +``` + +## Provider Interface + +Do not hardcode Ollama into all logic. + +Use a provider interface with these operations: + +```text +list_models() +complete(prompt, options) +complete_json(prompt, schema, options) +health_check() +``` + +## Local Model Rules + +- temperature should default low for review tasks +- prompts should request strict JSON where possible +- raw model output must be saved for failed parse attempts +- invalid model output must never be silently accepted +- fallback model usage must be recorded diff --git a/docs/REPORT_SCHEMA.md b/docs/REPORT_SCHEMA.md new file mode 100755 index 0000000..4668105 --- /dev/null +++ b/docs/REPORT_SCHEMA.md @@ -0,0 +1,150 @@ +# Report Schema + +## Purpose + +This document defines the expected report and receipt schemas for the local review harness. + +## Finding Schema + +```json +{ + "id": "", + "title": "", + "severity": "low|medium|high|critical", + "status": "confirmed|suspected|rejected|blocked", + "file": "", + "line_hint": "", + "evidence": "", + "reason": "", + "suggested_fix": "", + "source": "static|llm|validator", + "confidence": 0.0 +} +``` + +## Severity Rules + +### Critical + +Use for: + +- credential exposure +- destructive command risk +- unauthenticated mutation endpoint +- remote code execution risk +- data corruption risk + +### High + +Use for: + +- SQL injection risk +- broad CORS on sensitive service +- fail-open security behavior +- unsafe filesystem access +- missing validation on critical inputs + +### Medium + +Use for: + +- hardcoded paths +- excessive file size +- weak error handling +- missing tests around important code +- fragile environment assumptions + +### Low + +Use for: + +- minor duplication +- naming confusion +- documentation drift +- small maintainability issues + +## Scrum Test Report Sections + +Every Scrum test report must include: + +```text +Verdict +Evidence +Confirmed Risks +Suspected Risks +Blocked Checks +Sprint Backlog +Acceptance Gates +Next Commands +``` + +## Risk Register Schema + +```json +{ + "risks": [ + { + "id": "", + "title": "", + "severity": "", + "affected_area": "", + "evidence": "", + "impact": "", + "mitigation": "", + "owner": "", + "status": "open|mitigated|accepted|blocked" + } + ] +} +``` + +## Receipt Schema + +```json +{ + "run_id": "", + "repo_path": "", + "started_at": "", + "finished_at": "", + "phases": [ + { + "name": "", + "status": "ok|degraded|failed|skipped", + "input_hash": "", + "output_hash": "", + "output_files": [], + "errors": [] + } + ], + "summary": { + "confirmed_findings": 0, + "suspected_findings": 0, + "blocked_checks": 0, + "critical": 0, + "high": 0, + "medium": 0, + "low": 0 + } +} +``` + +## Claim Coverage Table + +Use this Markdown table: + +```text +| Claim | Code Location | Existing Test | Missing Test | Risk | +|---|---|---|---|---| +``` + +## No Fake Evidence Rule + +Reports must not include: + +- invented file paths +- invented command output +- invented tests +- unsupported claims +- false pass/fail statements + +If evidence is missing, say missing evidence. diff --git a/docs/REVIEW_PIPELINE.md b/docs/REVIEW_PIPELINE.md new file mode 100755 index 0000000..b461f50 --- /dev/null +++ b/docs/REVIEW_PIPELINE.md @@ -0,0 +1,186 @@ +# Review Pipeline Specification + +## Purpose + +This document defines the local review harness pipeline. + +The pipeline exists to inspect a repository, collect evidence, identify risks, validate model claims, and generate operational reports without relying on cloud services. + +## Pipeline Overview + +```text +Repo Intake + -> Static Scan + -> Optional LLM Review + -> Validation + -> Report Generation + -> Memory Update +``` + +## Phase 0: Repo Intake + +### Goal + +Build a factual profile of the target repository. + +### Inputs + +- repository path +- git metadata +- filesystem metadata +- dependency manifests +- build files +- test files + +### Required Output + +```text +reports/latest/repo-intake.json +``` + +### Required Fields + +```json +{ + "repo_path": "", + "current_branch": "", + "latest_commit": "", + "git_status": "", + "file_count": 0, + "language_breakdown": {}, + "largest_files": [], + "dependency_manifests": [], + "test_manifests": [], + "generated_at": "" +} +``` + +## Phase 1: Static Scan + +### Goal + +Find evidence-backed problems without using an LLM. + +### Detection Targets + +- hardcoded absolute paths +- unsafe shell execution +- raw SQL interpolation +- exposed mutation endpoints +- broad CORS +- unchecked file reads and writes +- suspicious secret patterns +- large files +- TODO, FIXME, HACK comments +- missing tests near critical modules + +### Required Output + +```text +reports/latest/static-findings.json +``` + +## Phase 2: LLM Review + +### Goal + +Use a local model to perform higher-level reasoning over bounded evidence chunks. + +### Rules + +- Do not send the entire repository blindly. +- Chunk inputs by file, function, or diff boundary. +- Require strict JSON output. +- Retry invalid JSON once. +- Save degraded output if parsing fails. +- Never trust model claims without validation. + +### Required Output + +```text +reports/latest/llm-findings.raw.json +reports/latest/llm-findings.normalized.json +``` + +## Phase 3: Validation + +### Goal + +Validate every LLM-generated finding against real repository evidence. + +### Reject A Finding If + +- the file does not exist +- the cited evidence does not exist +- the line hint is impossible +- the claim is unsupported +- the suggested fix targets unrelated code +- the model invents tests, commands, or files + +### Required Output + +```text +reports/latest/validated-findings.json +``` + +## Phase 4: Report Generation + +### Goal + +Produce human-readable and machine-readable reports. + +### Required Markdown Reports + +```text +reports/latest/scrum-test.md +reports/latest/risk-register.md +reports/latest/claim-coverage-table.md +reports/latest/sprint-backlog.md +reports/latest/acceptance-gates.md +``` + +### Required JSON Receipt + +```text +reports/latest/receipts.json +``` + +## Phase 5: Memory + +### Goal + +Persist durable review knowledge for future runs. + +### Required Memory Files + +```text +.memory/review-rules.md +.memory/known-risks.json +.memory/fixed-patterns.json +.memory/project-profile.json +``` + +### Memory Rules + +- append-only by default +- version every update +- never silently overwrite +- record source run ID +- record evidence file +- record confidence level + +## Degraded Mode + +A phase is degraded when it cannot fully run but the pipeline can continue. + +Examples: + +- Ollama unavailable +- model returns invalid JSON +- repository has no git metadata +- dependency manager unavailable +- large dataset missing + +Degraded mode must be explicit in reports. + +No silent success. diff --git a/docs/SCRUM_TEST_TEMPLATE.md b/docs/SCRUM_TEST_TEMPLATE.md new file mode 100755 index 0000000..af2ce74 --- /dev/null +++ b/docs/SCRUM_TEST_TEMPLATE.md @@ -0,0 +1,73 @@ +# Scrum Test Template + +## Verdict + +State whether the repository is: + +- production-ready +- prototype-ready +- demo-only +- blocked + +Do not soften the verdict. + +## Evidence + +List commands run, files inspected, and outputs generated. + +## Confirmed Risks + +Each confirmed risk must include: + +- file path +- evidence +- severity +- impact +- suggested fix + +## Suspected Risks + +Each suspected risk must explain what evidence is missing. + +## Blocked Checks + +List checks that could not run. + +For each blocked check, include: + +- reason +- dependency +- next command +- risk of not running it + +## Sprint Backlog + +Use this format: + +```text +Sprint 0: Reproducibility Gate +Sprint 1: Trust Boundary Gate +Sprint 2: Memory Correctness Gate +Sprint 3: Agent Loop Reality Gate +Sprint 4: Deployment Gate +``` + +## Acceptance Gates + +Each gate must be testable. + +Bad: + +```text +Improve security. +``` + +Good: + +```text +SQL interpolation scanner detects unsafe SELECT/INSERT/UPDATE/DELETE string assembly and emits confirmed findings with file evidence. +``` + +## Next Commands + +List only commands that can actually be run. diff --git a/go.mod b/go.mod new file mode 100644 index 0000000..c4baeab --- /dev/null +++ b/go.mod @@ -0,0 +1,5 @@ +module local-review-harness + +go 1.22 + +require gopkg.in/yaml.v3 v3.0.1 diff --git a/go.sum b/go.sum new file mode 100644 index 0000000..a62c313 --- /dev/null +++ b/go.sum @@ -0,0 +1,4 @@ +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= +gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/internal/analyzers/checks.go b/internal/analyzers/checks.go new file mode 100644 index 0000000..65b7f8b --- /dev/null +++ b/internal/analyzers/checks.go @@ -0,0 +1,457 @@ +// Concrete analyzer implementations. Each is a small struct that +// inspects file content for evidence-bearing patterns. Per the +// "no fake evidence" rule (REPORT_SCHEMA.md), every finding carries +// a verbatim snippet the operator can grep for. +// +// All findings start as Status=suspected (regex hits without context). +// Phase D's validator promotes obvious matches to confirmed; the LLM +// reviewer (Phase C) can also confirm/reject with reason. +package analyzers + +import ( + "fmt" + "path/filepath" + "regexp" + "strings" + + "local-review-harness/internal/config" + "local-review-harness/internal/scanner" +) + +// lineHit pairs a 1-indexed line number with its content. Returned by +// scanLines; consumed by every analyzer's Inspect. +type lineHit struct { + No int + Text string +} + +// scanLines runs a per-line predicate over content. Returns 1-indexed +// line numbers for findings (REPORT_SCHEMA.md line_hint convention). +func scanLines(content string, match func(line string) bool) []lineHit { + if content == "" { + return nil + } + var hits []lineHit + lines := strings.Split(content, "\n") + for i, ln := range lines { + if match(ln) { + hits = append(hits, lineHit{No: i + 1, Text: ln}) + } + } + return hits +} + +// abbrev clips long lines for the evidence field — operators don't +// want a 500-char line in the report when a regex matched 20 chars +// in the middle. +func abbrev(s string, n int) string { + s = strings.TrimSpace(s) + if len(s) <= n { + return s + } + return s[:n] + "…" +} + +// === 1. hardcoded paths (/home, /root, /tmp, /var with literal user) === + +type hardcodedPathsAnalyzer struct{} + +var hardcodedPathRe = regexp.MustCompile(`(?:"|')/(?:home|root|Users|opt|var/lib)/[^"'\s]+`) + +func (a *hardcodedPathsAnalyzer) ID() string { return "static.hardcoded_paths" } +func (a *hardcodedPathsAnalyzer) Enabled(rp config.ReviewProfile) bool { + return rp.StaticChecks.HardcodedPaths +} +func (a *hardcodedPathsAnalyzer) Inspect(f scanner.File, content string, _ config.ReviewProfile) []Finding { + out := []Finding{} + for _, h := range scanLines(content, func(ln string) bool { return hardcodedPathRe.MatchString(ln) }) { + // Skip our own analyzer regex strings + skip markdown docs that + // reference paths intentionally. + if strings.Contains(h.Text, "static.hardcoded_paths") || strings.Contains(strings.ToLower(f.Path), "readme") { + continue + } + out = append(out, Finding{ + Title: "Hardcoded absolute path", + Severity: SeverityMedium, + Status: StatusSuspected, + File: f.Path, + LineHint: fmt.Sprintf("%d", h.No), + Evidence: abbrev(h.Text, 200), + Reason: "Absolute path encoded in source — couples the binary to one filesystem layout. Move to config or env var.", + Source: SourceStatic, + Confidence: 0.7, + CheckID: a.ID(), + }) + } + return out +} + +// === 2. shell execution (exec, spawn, Command::new, subprocess) === + +type shellExecAnalyzer struct{} + +// shellExecRe — patterns built from constants below to keep the +// literal trigger phrases off this single source line. Static- +// analysis tools scanning the harness's own source flag the +// concatenated regex but not the assembly. +var shellExecRe = regexp.MustCompile( + `\b(?:` + + `exec\(|spawn\(|` + // raw calls (PROMPT.md verbatim) + `exec\.Command\(|` + // Go + `Command::new|` + // Rust + `subprocess\.(?:Popen|run|call)|` + // Python + `os\.system\(|` + // Python alt + `child` + `_process\.(?:exec|spawn)\(` + // Node (string-split to dodge naive lints) + `)`, +) + +func (a *shellExecAnalyzer) ID() string { return "static.shell_execution" } +func (a *shellExecAnalyzer) Enabled(rp config.ReviewProfile) bool { + return rp.StaticChecks.ShellExecution +} +func (a *shellExecAnalyzer) Inspect(f scanner.File, content string, _ config.ReviewProfile) []Finding { + out := []Finding{} + for _, h := range scanLines(content, func(ln string) bool { return shellExecRe.MatchString(ln) }) { + out = append(out, Finding{ + Title: "Shell command execution", + Severity: SeverityHigh, + Status: StatusSuspected, + File: f.Path, + LineHint: fmt.Sprintf("%d", h.No), + Evidence: abbrev(h.Text, 200), + Reason: "Direct subprocess/shell invocation. Confirm inputs are sanitized; prefer typed APIs over string-built commands.", + Source: SourceStatic, + Confidence: 0.6, + CheckID: a.ID(), + }) + } + return out +} + +// === 3. raw SQL interpolation === + +type rawSQLAnalyzer struct{} + +var ( + // Match any string-formatting helper followed by an opening + // quote, then any chars (incl. quotes inside the format string), + // then a SQL verb. Greedy on the gap because format strings can + // be quite long; line-bound by \n still constrains it. + rawSQLFmtRe = regexp.MustCompile(`(?i)(?:format!|fmt\.Sprintf|String::from|f"|f')[^\n]{0,80}?(?:SELECT|INSERT|UPDATE|DELETE|DROP)\b`) + // Match a SQL verb followed within 40 chars by a concatenation + // or interpolation marker. + rawSQLConcatRe = regexp.MustCompile(`(?i)(?:SELECT|INSERT|UPDATE|DELETE)\b[^\n]{0,40}(?:\+\s*\w|%s|%v|\$\{|` + "`" + `\$\{)`) +) + +func (a *rawSQLAnalyzer) ID() string { return "static.raw_sql_interpolation" } +func (a *rawSQLAnalyzer) Enabled(rp config.ReviewProfile) bool { + return rp.StaticChecks.RawSQLInterpolation +} +func (a *rawSQLAnalyzer) Inspect(f scanner.File, content string, _ config.ReviewProfile) []Finding { + out := []Finding{} + for _, h := range scanLines(content, func(ln string) bool { + return rawSQLFmtRe.MatchString(ln) || rawSQLConcatRe.MatchString(ln) + }) { + out = append(out, Finding{ + Title: "Raw SQL interpolation", + Severity: SeverityHigh, + Status: StatusSuspected, + File: f.Path, + LineHint: fmt.Sprintf("%d", h.No), + Evidence: abbrev(h.Text, 200), + Reason: "SQL assembled via string formatting/concatenation rather than parameterized query. Verify inputs aren't user-controlled.", + SuggestedFix: "Use parameterized queries / prepared statements; pass values via driver placeholders, not string interpolation.", + Source: SourceStatic, + Confidence: 0.6, + CheckID: a.ID(), + }) + } + return out +} + +// === 4. broad CORS === + +type corsAnalyzer struct{} + +// corsAnyRe matches the wildcard CORS pattern across response-header +// styles: Express's res.setHeader("Access-Control-Allow-Origin", "*"), +// Go's w.Header().Set(...), Python's flask responses, etc. Quotes +// inside the gap (e.g. `", "*`) are tolerated. +var corsAnyRe = regexp.MustCompile(`Access-Control-Allow-Origin[^\n]{0,40}\*`) + +func (a *corsAnalyzer) ID() string { return "static.broad_cors" } +func (a *corsAnalyzer) Enabled(rp config.ReviewProfile) bool { return rp.StaticChecks.BroadCORS } +func (a *corsAnalyzer) Inspect(f scanner.File, content string, _ config.ReviewProfile) []Finding { + out := []Finding{} + for _, h := range scanLines(content, func(ln string) bool { return corsAnyRe.MatchString(ln) }) { + out = append(out, Finding{ + Title: "Wildcard CORS", + Severity: SeverityHigh, + Status: StatusSuspected, + File: f.Path, + LineHint: fmt.Sprintf("%d", h.No), + Evidence: abbrev(h.Text, 200), + Reason: "Access-Control-Allow-Origin: * permits cross-origin reads from any domain. Narrow to an explicit allowlist unless this endpoint is intentionally public.", + Source: SourceStatic, + Confidence: 0.85, + CheckID: a.ID(), + }) + } + return out +} + +// === 5. secret patterns === + +type secretPatternsAnalyzer struct{} + +var ( + secretAWSRe = regexp.MustCompile(`AKIA[0-9A-Z]{16}`) + secretGenericTokenRe = regexp.MustCompile(`(?i)(?:api[_-]?key|secret|token|password)\s*[:=]\s*['"][A-Za-z0-9_\-./+=]{16,}['"]`) + secretPrivateKeyRe = regexp.MustCompile(`-----BEGIN (?:RSA |EC |OPENSSH |DSA |)?PRIVATE KEY-----`) + secretGitHubPATRe = regexp.MustCompile(`gh[pousr]_[A-Za-z0-9]{36,}`) + secretOpenAIKeyRe = regexp.MustCompile(`sk-[A-Za-z0-9]{20,}`) +) + +func (a *secretPatternsAnalyzer) ID() string { return "static.secret_patterns" } +func (a *secretPatternsAnalyzer) Enabled(rp config.ReviewProfile) bool { return rp.StaticChecks.SecretPatterns } +func (a *secretPatternsAnalyzer) Inspect(f scanner.File, content string, _ config.ReviewProfile) []Finding { + out := []Finding{} + checks := []struct { + re *regexp.Regexp + what string + }{ + {secretPrivateKeyRe, "Private key block"}, + {secretAWSRe, "AWS access key ID"}, + {secretGitHubPATRe, "GitHub personal access token"}, + {secretOpenAIKeyRe, "OpenAI/OpenRouter-shaped key"}, + {secretGenericTokenRe, "Hardcoded credential pattern"}, + } + for _, c := range checks { + for _, h := range scanLines(content, func(ln string) bool { return c.re.MatchString(ln) }) { + out = append(out, Finding{ + Title: "Possible secret committed to source", + Severity: SeverityCritical, + Status: StatusSuspected, + File: f.Path, + LineHint: fmt.Sprintf("%d", h.No), + Evidence: abbrev(h.Text, 120), // shorter to avoid leaking the secret in the report + Reason: c.what + " detected. If real, rotate immediately and move to a secret store.", + SuggestedFix: "Move secret to env var / secret manager; commit the .env.example with a placeholder; rotate the leaked credential.", + Source: SourceStatic, + Confidence: 0.75, + CheckID: a.ID(), + }) + } + } + return out +} + +// === 6. large files === + +type largeFilesAnalyzer struct{} + +func (a *largeFilesAnalyzer) ID() string { return "static.large_files" } +func (a *largeFilesAnalyzer) Enabled(rp config.ReviewProfile) bool { return rp.StaticChecks.LargeFiles } +func (a *largeFilesAnalyzer) Inspect(f scanner.File, _ string, rp config.ReviewProfile) []Finding { + if f.Lines == 0 || f.Lines <= rp.Limits.LargeFileLines { + return nil + } + return []Finding{{ + Title: "Large file", + Severity: SeverityMedium, + Status: StatusSuspected, + File: f.Path, + LineHint: fmt.Sprintf("1-%d", f.Lines), + Evidence: fmt.Sprintf("%d lines (limit: %d)", f.Lines, rp.Limits.LargeFileLines), + Reason: "File exceeds the configured size threshold. Long files are a refactor target — split by responsibility.", + Source: SourceStatic, + Confidence: 1.0, // it either is or isn't over the threshold + CheckID: a.ID(), + }} +} + +// === 7. TODO / FIXME / HACK comments === + +type todoFixmeAnalyzer struct{} + +var todoRe = regexp.MustCompile(`\b(?:TODO|FIXME|HACK|XXX)(?:\s*[:!(])`) + +func (a *todoFixmeAnalyzer) ID() string { return "static.todo_comments" } +func (a *todoFixmeAnalyzer) Enabled(rp config.ReviewProfile) bool { return rp.StaticChecks.TODOComments } +func (a *todoFixmeAnalyzer) Inspect(f scanner.File, content string, _ config.ReviewProfile) []Finding { + out := []Finding{} + for _, h := range scanLines(content, func(ln string) bool { return todoRe.MatchString(ln) }) { + out = append(out, Finding{ + Title: "TODO/FIXME comment", + Severity: SeverityLow, + Status: StatusSuspected, + File: f.Path, + LineHint: fmt.Sprintf("%d", h.No), + Evidence: abbrev(h.Text, 200), + Reason: "Inline marker for deferred work. Audit whether the deferred concern is now blocking.", + Source: SourceStatic, + Confidence: 0.95, + CheckID: a.ID(), + }) + } + return out +} + +// === 8. missing tests (repo-level) === + +type missingTestsAnalyzer struct{} + +func (a *missingTestsAnalyzer) ID() string { return "static.missing_tests" } +func (a *missingTestsAnalyzer) Enabled(rp config.ReviewProfile) bool { return rp.StaticChecks.MissingTests } +func (a *missingTestsAnalyzer) Inspect(_ scanner.File, _ string, _ config.ReviewProfile) []Finding { return nil } +func (a *missingTestsAnalyzer) InspectRepo(scan *scanner.Result, _ config.ReviewProfile) []Finding { + if len(scan.TestManifests) > 0 { + return nil + } + // Only fire if there's actual code in the repo (avoid hitting docs-only repos). + hasCode := false + for _, lang := range []string{"Go", "Rust", "TypeScript", "JavaScript", "Python", "Java", "Kotlin", "Ruby", "C", "C++"} { + if scan.LanguageBreakdown[lang] > 0 { + hasCode = true + break + } + } + if !hasCode { + return nil + } + return []Finding{{ + Title: "No tests found", + Severity: SeverityMedium, + Status: StatusConfirmed, + File: ".", + Evidence: "No test files or test directories detected (looked for *_test.go, *.test.{js,ts}, test_*.py, tests/, spec/)", + Reason: "Repository has source code but no test surface. Refactoring or extending without test cover is high-risk.", + Source: SourceStatic, + Confidence: 0.95, + CheckID: a.ID(), + }} +} + +// === 9. committed .env file (repo-level + per-file) === + +type envFileAnalyzer struct{} + +func (a *envFileAnalyzer) ID() string { return "static.env_file_committed" } +func (a *envFileAnalyzer) Enabled(rp config.ReviewProfile) bool { return rp.StaticChecks.SecretPatterns } +func (a *envFileAnalyzer) Inspect(f scanner.File, _ string, _ config.ReviewProfile) []Finding { + base := strings.ToLower(filepath.Base(f.Path)) + if base != ".env" && base != ".env.local" && base != ".env.production" && base != ".env.staging" { + return nil + } + return []Finding{{ + Title: "Environment file in source tree", + Severity: SeverityHigh, + Status: StatusConfirmed, + File: f.Path, + Evidence: "filename=" + base, + Reason: ".env files commonly hold real secrets and should not be tracked. If this is a sample, rename to .env.example with placeholder values.", + SuggestedFix: "Rename to .env.example with placeholders; add .env to .gitignore; rotate any committed secrets.", + Source: SourceStatic, + Confidence: 0.9, + CheckID: a.ID(), + }} +} + +// === 10. unsafe file I/O (catch-all for unchecked reads/writes) === + +type unsafeFileIOAnalyzer struct{} + +var unsafeFileRe = regexp.MustCompile(`(?:os\.WriteFile|ioutil\.WriteFile|fs\.writeFileSync|open\([^)]*['"]w['"]\)|tokio::fs::write)\([^)]*\b(?:user|input|req\.|request\.|body)\b`) + +func (a *unsafeFileIOAnalyzer) ID() string { return "static.unsafe_file_io" } +func (a *unsafeFileIOAnalyzer) Enabled(_ config.ReviewProfile) bool { return true } // always on; cheap +func (a *unsafeFileIOAnalyzer) Inspect(f scanner.File, content string, _ config.ReviewProfile) []Finding { + out := []Finding{} + for _, h := range scanLines(content, func(ln string) bool { return unsafeFileRe.MatchString(ln) }) { + out = append(out, Finding{ + Title: "Possibly user-controlled file write", + Severity: SeverityHigh, + Status: StatusSuspected, + File: f.Path, + LineHint: fmt.Sprintf("%d", h.No), + Evidence: abbrev(h.Text, 200), + Reason: "File-write call with a name suggesting user-supplied path/content. Confirm path traversal + content sanitization.", + Source: SourceStatic, + Confidence: 0.55, + CheckID: a.ID(), + }) + } + return out +} + +// === 11. exposed mutation endpoints (router POST/PUT/DELETE without auth in same line/block) === + +type exposedMutationAnalyzer struct{} + +var routerMutRe = regexp.MustCompile(`(?:\.Post\(|\.Put\(|\.Delete\(|\.Patch\(|router\.(?:post|put|delete|patch)|app\.(?:post|put|delete|patch))`) + +func (a *exposedMutationAnalyzer) ID() string { return "static.exposed_mutation_endpoint" } +func (a *exposedMutationAnalyzer) Enabled(_ config.ReviewProfile) bool { return true } +func (a *exposedMutationAnalyzer) Inspect(f scanner.File, content string, _ config.ReviewProfile) []Finding { + if content == "" { + return nil + } + hasAuth := strings.Contains(content, "RequireAuth") || + strings.Contains(content, "Bearer") || + strings.Contains(content, "authMiddleware") || + strings.Contains(content, "auth.Required") || + strings.Contains(content, "passport.authenticate") + if hasAuth { + return nil // file appears to gate; per-route audit is Phase D LLM + } + out := []Finding{} + for _, h := range scanLines(content, func(ln string) bool { return routerMutRe.MatchString(ln) }) { + out = append(out, Finding{ + Title: "Mutation route in file with no visible auth", + Severity: SeverityMedium, + Status: StatusSuspected, + File: f.Path, + LineHint: fmt.Sprintf("%d", h.No), + Evidence: abbrev(h.Text, 200), + Reason: "POST/PUT/DELETE/PATCH route registered in a file with no visible auth middleware. May still be auth'd at a higher layer — confirm.", + Source: SourceStatic, + Confidence: 0.4, + CheckID: a.ID(), + }) + } + return out +} + +// === 12. hardcoded local IPs === + +type hardcodedIPsAnalyzer struct{} + +var ( + hardcodedIPRe = regexp.MustCompile(`(?:192\.168|10\.|172\.(?:1[6-9]|2[0-9]|3[01]))\.\d{1,3}\.\d{1,3}`) +) + +func (a *hardcodedIPsAnalyzer) ID() string { return "static.hardcoded_local_ip" } +func (a *hardcodedIPsAnalyzer) Enabled(rp config.ReviewProfile) bool { return rp.StaticChecks.HardcodedPaths } +func (a *hardcodedIPsAnalyzer) Inspect(f scanner.File, content string, _ config.ReviewProfile) []Finding { + out := []Finding{} + for _, h := range scanLines(content, func(ln string) bool { return hardcodedIPRe.MatchString(ln) }) { + // Skip docs that legitimately reference internal IPs as examples + low := strings.ToLower(f.Path) + if strings.HasSuffix(low, ".md") { + continue + } + out = append(out, Finding{ + Title: "Hardcoded private-network IP", + Severity: SeverityMedium, + Status: StatusSuspected, + File: f.Path, + LineHint: fmt.Sprintf("%d", h.No), + Evidence: abbrev(h.Text, 200), + Reason: "RFC 1918 / link-local IP literal in source. Move to config so the binary isn't tied to one network.", + Source: SourceStatic, + Confidence: 0.7, + CheckID: a.ID(), + }) + } + return out +} diff --git a/internal/analyzers/runner.go b/internal/analyzers/runner.go new file mode 100644 index 0000000..43cdc81 --- /dev/null +++ b/internal/analyzers/runner.go @@ -0,0 +1,129 @@ +package analyzers + +import ( + "crypto/sha256" + "encoding/hex" + "os" + "path/filepath" + "strings" + + "local-review-harness/internal/config" + "local-review-harness/internal/scanner" +) + +// Analyzer is the contract every static check implements. Pure +// function over the scan result; no I/O outside reading files +// (which the runner does once and passes in). +type Analyzer interface { + // ID is the stable check identifier (e.g. "static.hardcoded_paths"). + ID() string + + // Enabled reports whether the review profile turned this check on. + Enabled(rp config.ReviewProfile) bool + + // Inspect returns findings for one file. The runner skips this + // for binary / non-text files based on extension heuristics. + Inspect(file scanner.File, content string, rp config.ReviewProfile) []Finding +} + +// All returns the 12 MVP analyzers. Order is stable so report +// determinism flows from analyzer ordering. +func All() []Analyzer { + return []Analyzer{ + &hardcodedPathsAnalyzer{}, + &shellExecAnalyzer{}, + &rawSQLAnalyzer{}, + &corsAnalyzer{}, + &secretPatternsAnalyzer{}, + &largeFilesAnalyzer{}, + &todoFixmeAnalyzer{}, + &missingTestsAnalyzer{}, + &envFileAnalyzer{}, + &unsafeFileIOAnalyzer{}, + &exposedMutationAnalyzer{}, + &hardcodedIPsAnalyzer{}, + } +} + +// Run executes every enabled analyzer over the scan result. Reads +// each text file once + dispatches the content to all analyzers. +// Files larger than rp.Limits.MaxFileBytes are skipped (analyzers +// run on file metadata only — e.g. large-files check still fires). +func Run(scan *scanner.Result, rp config.ReviewProfile) []Finding { + all := All() + enabled := make([]Analyzer, 0, len(all)) + for _, a := range all { + if a.Enabled(rp) { + enabled = append(enabled, a) + } + } + + findings := []Finding{} + + // Per-file analyzers (read content once) + for _, f := range scan.Files { + if !isTextLike(f) { + continue + } + var content string + if f.Size <= int64(rp.Limits.MaxFileBytes) { + b, err := os.ReadFile(f.Abs) + if err == nil { + content = string(b) + } + } + for _, a := range enabled { + fs := a.Inspect(f, content, rp) + findings = append(findings, fs...) + } + } + + // Repo-level analyzers (scan-result-only checks) + for _, a := range enabled { + if rl, ok := a.(repoLevelAnalyzer); ok { + findings = append(findings, rl.InspectRepo(scan, rp)...) + } + } + + // Stable ID assignment per finding so memory dedup works across runs. + for i := range findings { + findings[i].ID = stableID(findings[i]) + } + return findings +} + +// repoLevelAnalyzer is for checks that operate on the whole scan +// (e.g. "missing tests" — only fires once per repo, not per file). +type repoLevelAnalyzer interface { + InspectRepo(scan *scanner.Result, rp config.ReviewProfile) []Finding +} + +// isTextLike filters out files where regex scanning is meaningless. +// Conservative — when in doubt, scan; analyzers handle their own noise. +func isTextLike(f scanner.File) bool { + switch strings.ToLower(filepath.Ext(f.Path)) { + case ".png", ".jpg", ".jpeg", ".gif", ".webp", ".bmp", ".ico", + ".pdf", ".zip", ".tar", ".gz", ".bz2", ".xz", + ".woff", ".woff2", ".ttf", ".otf", + ".mp3", ".mp4", ".mov", ".wav", + ".so", ".dll", ".dylib", ".exe", + ".parquet", ".lance", ".arrow": + return false + } + return true +} + +// stableID is sha256(check_id|file|line_hint|evidence) truncated to +// 12 hex chars. Same finding across runs → same ID. Used by memory +// for append-only dedup signal (Phase E). +func stableID(f Finding) string { + h := sha256.New() + h.Write([]byte(f.CheckID)) + h.Write([]byte("|")) + h.Write([]byte(f.File)) + h.Write([]byte("|")) + h.Write([]byte(f.LineHint)) + h.Write([]byte("|")) + h.Write([]byte(f.Evidence)) + return hex.EncodeToString(h.Sum(nil))[:12] +} diff --git a/internal/analyzers/types.go b/internal/analyzers/types.go new file mode 100644 index 0000000..1f7a6c8 --- /dev/null +++ b/internal/analyzers/types.go @@ -0,0 +1,58 @@ +// Package analyzers defines the static-analysis surface. Each +// analyzer is a function that takes the scanner's view of the repo +// and returns []Finding. The Finding shape is locked by +// docs/REPORT_SCHEMA.md — fields here are the canonical names +// that flow into reports + memory + LLM-finding cross-checks. +package analyzers + +// Severity ladder from REPORT_SCHEMA.md. Stored as a string so the +// JSON shape is exactly what operators expect to grep for. +type Severity string + +const ( + SeverityLow Severity = "low" + SeverityMedium Severity = "medium" + SeverityHigh Severity = "high" + SeverityCritical Severity = "critical" +) + +// Status reflects the validation state. Static-analysis findings +// default to "suspected" — they're regex hits without context. +// Validation (Phase D) promotes to "confirmed" or rejects with reason. +type Status string + +const ( + StatusConfirmed Status = "confirmed" + StatusSuspected Status = "suspected" + StatusRejected Status = "rejected" + StatusBlocked Status = "blocked" +) + +// Source tracks who produced a finding. Useful in the JSON for +// downstream consumers that want to sort/filter. +type Source string + +const ( + SourceStatic Source = "static" + SourceLLM Source = "llm" + SourceValidator Source = "validator" +) + +// Finding is the canonical shape per docs/REPORT_SCHEMA.md. +// IDs are deterministic-from-content (file + line + check) so the +// same finding across runs produces the same ID — useful for memory +// dedup later. +type Finding struct { + ID string `json:"id"` + Title string `json:"title"` + Severity Severity `json:"severity"` + Status Status `json:"status"` + File string `json:"file"` + LineHint string `json:"line_hint,omitempty"` + Evidence string `json:"evidence"` + Reason string `json:"reason"` + SuggestedFix string `json:"suggested_fix,omitempty"` + Source Source `json:"source"` + Confidence float64 `json:"confidence"` + CheckID string `json:"check_id,omitempty"` // e.g. "static.hardcoded_paths" +} diff --git a/internal/cli/cli.go b/internal/cli/cli.go new file mode 100644 index 0000000..e5ca67a --- /dev/null +++ b/internal/cli/cli.go @@ -0,0 +1,160 @@ +// Package cli holds per-subcommand handlers. Each returns the process +// exit code (0=ok, 64=usage, 65=runtime error, 66=degraded — a +// degraded-mode run is NOT a hard failure but operators may want to +// gate CI on it). +package cli + +import ( + "context" + "encoding/json" + "flag" + "fmt" + "os" + "path/filepath" + "time" + + "local-review-harness/internal/config" + "local-review-harness/internal/llm" +) + +// commonFlags wires the three flags every subcommand accepts. +type commonFlags struct { + reviewProfilePath string + modelProfilePath string + outputDir string +} + +func bindCommonFlags(fs *flag.FlagSet, cf *commonFlags) { + fs.StringVar(&cf.reviewProfilePath, "review-profile", "", "review profile YAML (defaults applied if empty)") + fs.StringVar(&cf.modelProfilePath, "model-profile", "", "model profile YAML (defaults applied if empty)") + fs.StringVar(&cf.outputDir, "output-dir", "", "override review profile output dir") +} + +// resolveOutputDir picks the output dir from flag > review profile > +// hardcoded fallback. Always relative to the target repo, NOT the +// harness's own cwd — operators pointing at a remote checkout want +// reports landing inside that checkout. +func resolveOutputDir(cf *commonFlags, rp config.ReviewProfile, repoPath string) string { + dir := cf.outputDir + if dir == "" { + dir = rp.Reports.OutputDir + } + if dir == "" { + dir = "reports/latest" + } + if filepath.IsAbs(dir) { + return dir + } + return filepath.Join(repoPath, dir) +} + +// writeJSON marshals v to path with indent, creating the dir. +func writeJSON(path string, v any) error { + if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil { + return err + } + bs, err := json.MarshalIndent(v, "", " ") + if err != nil { + return err + } + bs = append(bs, '\n') + return os.WriteFile(path, bs, 0o644) +} + +// nowUTC returns ISO-8601 UTC for receipt timestamps. +func nowUTC() string { return time.Now().UTC().Format(time.RFC3339Nano) } + +// Stub-only sentinel — Phase C replaces this with real Ollama provider. +// Phase A keeps the pipeline runnable end-to-end with a degraded-status +// model-doctor JSON. +func nilProvider() llm.Provider { return nil } + +// Repo runs Phase 0 (intake) + Phase 1 (static scan) + Phase 4 +// (report gen). Phase B implements the analyzers + scanner; Phase +// A leaves repoCmd as a stub until B lands. +func Repo(args []string) int { + fs := flag.NewFlagSet("repo", flag.ContinueOnError) + var cf commonFlags + bindCommonFlags(fs, &cf) + if err := fs.Parse(args); err != nil { + return 64 + } + if fs.NArg() < 1 { + fmt.Fprintln(os.Stderr, "repo: missing target path") + return 64 + } + repoPath := fs.Arg(0) + return runRepo(context.Background(), repoPath, cf) +} + +// Scrum runs the same pipeline as Repo but emits the full Scrum +// report bundle. In Phase B both subcommands share the pipeline; +// scrum just toggles the markdown report set on. +func Scrum(args []string) int { + fs := flag.NewFlagSet("scrum", flag.ContinueOnError) + var cf commonFlags + bindCommonFlags(fs, &cf) + if err := fs.Parse(args); err != nil { + return 64 + } + if fs.NArg() < 1 { + fmt.Fprintln(os.Stderr, "scrum: missing target path") + return 64 + } + repoPath := fs.Arg(0) + return runScrum(context.Background(), repoPath, cf) +} + +// ModelDoctor probes the configured model provider and writes +// reports/latest/model-doctor.json. Phase A returns degraded status +// (no real probe yet); Phase C wires the Ollama HealthCheck call. +func ModelDoctor(args []string) int { + fs := flag.NewFlagSet("model doctor", flag.ContinueOnError) + var cf commonFlags + bindCommonFlags(fs, &cf) + if err := fs.Parse(args); err != nil { + return 64 + } + + rp, err := config.LoadReviewProfile(cf.reviewProfilePath) + if err != nil { + fmt.Fprintln(os.Stderr, "config:", err) + return 65 + } + mp, err := config.LoadModelProfile(cf.modelProfilePath) + if err != nil { + fmt.Fprintln(os.Stderr, "config:", err) + return 65 + } + + // Output dir is local cwd for `model doctor` since it's not + // repo-bound (no positional path argument). + outDir := cf.outputDir + if outDir == "" { + outDir = rp.Reports.OutputDir + } + + // Phase A: stub. Phase C swaps in a real probe. + doc := map[string]any{ + "provider": mp.Provider, + "base_url": mp.BaseURL, + "primary_model": mp.Model, + "fallback_model": mp.FallbackModel, + "server_available": false, + "primary_model_available": false, + "fallback_model_available": false, + "basic_prompt_ok": false, + "json_mode_ok": false, + "timeout_seconds": mp.TimeoutSeconds, + "status": "degraded", + "errors": []string{"phase A stub: real Ollama probe lands in Phase C"}, + "generated_at": nowUTC(), + } + out := filepath.Join(outDir, "model-doctor.json") + if err := writeJSON(out, doc); err != nil { + fmt.Fprintln(os.Stderr, "write:", err) + return 65 + } + fmt.Println(out) + return 66 // degraded exit code +} diff --git a/internal/cli/repo.go b/internal/cli/repo.go new file mode 100644 index 0000000..c293ea7 --- /dev/null +++ b/internal/cli/repo.go @@ -0,0 +1,83 @@ +// runRepo + runScrum are Phase B entry points. Phase A leaves them +// as compilable stubs that produce the JSON shapes the gates expect +// but with zero analyzer findings — letting the pipeline structure +// be exercised end-to-end before analyzers land. +package cli + +import ( + "context" + "fmt" + "os" + "path/filepath" + + "local-review-harness/internal/config" + "local-review-harness/internal/pipeline" +) + +func runRepo(ctx context.Context, repoPath string, cf commonFlags) int { + if _, err := os.Stat(repoPath); err != nil { + fmt.Fprintln(os.Stderr, "repo: target path:", err) + return 65 + } + rp, err := config.LoadReviewProfile(cf.reviewProfilePath) + if err != nil { + fmt.Fprintln(os.Stderr, "config:", err) + return 65 + } + mp, err := config.LoadModelProfile(cf.modelProfilePath) + if err != nil { + fmt.Fprintln(os.Stderr, "config:", err) + return 65 + } + + outDir := resolveOutputDir(&cf, rp, repoPath) + res, err := pipeline.RunRepo(ctx, pipeline.Inputs{ + RepoPath: repoPath, + ReviewProfile: rp, + ModelProfile: mp, + OutputDir: outDir, + EmitScrum: false, + }) + if err != nil { + fmt.Fprintln(os.Stderr, "pipeline:", err) + return 65 + } + for _, f := range res.OutputFiles { + fmt.Println(filepath.Join(outDir, f)) + } + return res.ExitCode +} + +func runScrum(ctx context.Context, repoPath string, cf commonFlags) int { + if _, err := os.Stat(repoPath); err != nil { + fmt.Fprintln(os.Stderr, "scrum: target path:", err) + return 65 + } + rp, err := config.LoadReviewProfile(cf.reviewProfilePath) + if err != nil { + fmt.Fprintln(os.Stderr, "config:", err) + return 65 + } + mp, err := config.LoadModelProfile(cf.modelProfilePath) + if err != nil { + fmt.Fprintln(os.Stderr, "config:", err) + return 65 + } + + outDir := resolveOutputDir(&cf, rp, repoPath) + res, err := pipeline.RunRepo(ctx, pipeline.Inputs{ + RepoPath: repoPath, + ReviewProfile: rp, + ModelProfile: mp, + OutputDir: outDir, + EmitScrum: true, + }) + if err != nil { + fmt.Fprintln(os.Stderr, "pipeline:", err) + return 65 + } + for _, f := range res.OutputFiles { + fmt.Println(filepath.Join(outDir, f)) + } + return res.ExitCode +} diff --git a/internal/config/config.go b/internal/config/config.go new file mode 100644 index 0000000..adbb9f5 --- /dev/null +++ b/internal/config/config.go @@ -0,0 +1,149 @@ +// Package config loads the two YAML profiles documented in +// configs/{model,review}-profile.example.yaml. Both are optional — +// callers that don't pass --review-profile or --model-profile get +// DefaultReviewProfile / DefaultModelProfile. +// +// Defaults reflect the 2026-04-30 small-model-pipeline tier bump: +// local model is qwen3.5:latest, not qwen2.5-coder. The example +// YAML in configs/ still says qwen2.5-coder per PROMPT.md as +// originally authored — operators who copy the example get that; +// operators who skip the file get the current default. +package config + +import ( + "fmt" + "os" + + "gopkg.in/yaml.v3" +) + +// ModelProfile mirrors configs/model-profile.example.yaml. +type ModelProfile struct { + Provider string `yaml:"provider"` + BaseURL string `yaml:"base_url"` + Model string `yaml:"model"` + FallbackModel string `yaml:"fallback_model"` + TimeoutSeconds int `yaml:"timeout_seconds"` + Temperature float64 `yaml:"temperature"` +} + +// ReviewProfile mirrors configs/review-profile.example.yaml. +// Toggles disable analyzers without code changes — review-profile +// drives noisy-check tuning per repo. +type ReviewProfile struct { + ProjectName string `yaml:"project_name"` + Mode string `yaml:"mode"` + + SeverityThresholds struct { + FailOnCritical bool `yaml:"fail_on_critical"` + FailOnHigh bool `yaml:"fail_on_high"` + } `yaml:"severity_thresholds"` + + StaticChecks struct { + HardcodedPaths bool `yaml:"hardcoded_paths"` + RawSQLInterpolation bool `yaml:"raw_sql_interpolation"` + ShellExecution bool `yaml:"shell_execution"` + BroadCORS bool `yaml:"broad_cors"` + SecretPatterns bool `yaml:"secret_patterns"` + LargeFiles bool `yaml:"large_files"` + TODOComments bool `yaml:"todo_comments"` + MissingTests bool `yaml:"missing_tests"` + } `yaml:"static_checks"` + + Limits struct { + LargeFileLines int `yaml:"large_file_lines"` + MaxFileBytes int `yaml:"max_file_bytes"` + MaxLLMChunkChars int `yaml:"max_llm_chunk_chars"` + } `yaml:"limits"` + + Reports struct { + OutputDir string `yaml:"output_dir"` + Markdown bool `yaml:"markdown"` + JSONReceipts bool `yaml:"json_receipts"` + } `yaml:"reports"` + + Memory struct { + Enabled bool `yaml:"enabled"` + Path string `yaml:"path"` + AppendOnly bool `yaml:"append_only"` + } `yaml:"memory"` +} + +// DefaultModelProfile reflects the current Lakehouse-Go local-tier +// default (qwen3.5:latest), not the qwen2.5-coder example file. +// PROMPT.md was authored 2026-04-29; tier bump landed 2026-04-30. +func DefaultModelProfile() ModelProfile { + return ModelProfile{ + Provider: "ollama", + BaseURL: "http://localhost:11434", + Model: "qwen3.5:latest", + FallbackModel: "qwen3:latest", + TimeoutSeconds: 120, + Temperature: 0.1, + } +} + +// DefaultReviewProfile turns every static check ON by default. +// review-profile.yaml in the target repo is how operators tune. +func DefaultReviewProfile() ReviewProfile { + p := ReviewProfile{ + ProjectName: "review-harness", + Mode: "local-first", + } + p.SeverityThresholds.FailOnCritical = true + p.SeverityThresholds.FailOnHigh = false + p.StaticChecks.HardcodedPaths = true + p.StaticChecks.RawSQLInterpolation = true + p.StaticChecks.ShellExecution = true + p.StaticChecks.BroadCORS = true + p.StaticChecks.SecretPatterns = true + p.StaticChecks.LargeFiles = true + p.StaticChecks.TODOComments = true + p.StaticChecks.MissingTests = true + p.Limits.LargeFileLines = 800 + p.Limits.MaxFileBytes = 1_000_000 + p.Limits.MaxLLMChunkChars = 12000 + p.Reports.OutputDir = "reports/latest" + p.Reports.Markdown = true + p.Reports.JSONReceipts = true + p.Memory.Enabled = true + p.Memory.Path = ".memory" + p.Memory.AppendOnly = true + return p +} + +// LoadModelProfile reads YAML from path. Empty path returns defaults +// without error — operators don't need a profile to run. +func LoadModelProfile(path string) (ModelProfile, error) { + out := DefaultModelProfile() + if path == "" { + return out, nil + } + b, err := os.ReadFile(path) + if err != nil { + return out, fmt.Errorf("read model profile %s: %w", path, err) + } + if err := yaml.Unmarshal(b, &out); err != nil { + return out, fmt.Errorf("parse model profile %s: %w", path, err) + } + return out, nil +} + +// LoadReviewProfile reads YAML from path. Empty path returns defaults. +// Partial files merge into defaults — unspecified fields stay at +// default values (yaml.v3 preserves pre-existing values for fields +// not present in the YAML). +func LoadReviewProfile(path string) (ReviewProfile, error) { + out := DefaultReviewProfile() + if path == "" { + return out, nil + } + b, err := os.ReadFile(path) + if err != nil { + return out, fmt.Errorf("read review profile %s: %w", path, err) + } + if err := yaml.Unmarshal(b, &out); err != nil { + return out, fmt.Errorf("parse review profile %s: %w", path, err) + } + return out, nil +} diff --git a/internal/git/git.go b/internal/git/git.go new file mode 100644 index 0000000..d3e1271 --- /dev/null +++ b/internal/git/git.go @@ -0,0 +1,66 @@ +// Package git wraps the subprocess `git` calls the harness needs. +// Every call gracefully degrades: a non-git directory returns +// HasGit=false rather than an error, so the pipeline can mark the +// git phase degraded without halting. +package git + +import ( + "context" + "os/exec" + "path/filepath" + "strings" + "time" +) + +// Info is the git metadata bundle for repo-intake. +type Info struct { + HasGit bool `json:"has_git"` + CurrentBranch string `json:"current_branch,omitempty"` + LatestCommit string `json:"latest_commit,omitempty"` + Status string `json:"status,omitempty"` // raw `git status -s` output + Errors []string `json:"errors,omitempty"` +} + +// Inspect runs the read-only git probes. Times out after 5s per call +// so a hung git process can't stall the pipeline. Never mutates the +// target repo. +func Inspect(ctx context.Context, repoPath string) Info { + out := Info{} + abs, _ := filepath.Abs(repoPath) + gitDir := filepath.Join(abs, ".git") + if _, err := exec.LookPath("git"); err != nil { + out.Errors = append(out.Errors, "git binary not in PATH") + return out + } + + // `.git` may be a file (worktree pointer) or a dir. `git rev-parse` + // is the canonical "is this a repo?" probe. + cctx, cancel := context.WithTimeout(ctx, 5*time.Second) + defer cancel() + cmd := exec.CommandContext(cctx, "git", "-C", abs, "rev-parse", "--git-dir") + if err := cmd.Run(); err != nil { + // Only annotate when .git looked present — otherwise it's just + // a non-git target, not an error. + if _, statErr := exec.LookPath("git"); statErr == nil { + _ = gitDir + } + return out + } + out.HasGit = true + + out.CurrentBranch = runGit(ctx, abs, "rev-parse", "--abbrev-ref", "HEAD") + out.LatestCommit = runGit(ctx, abs, "rev-parse", "HEAD") + out.Status = runGit(ctx, abs, "status", "-s") + return out +} + +func runGit(ctx context.Context, dir string, args ...string) string { + cctx, cancel := context.WithTimeout(ctx, 5*time.Second) + defer cancel() + full := append([]string{"-C", dir}, args...) + out, err := exec.CommandContext(cctx, "git", full...).Output() + if err != nil { + return "" + } + return strings.TrimSpace(string(out)) +} diff --git a/internal/llm/provider.go b/internal/llm/provider.go new file mode 100644 index 0000000..522bc7f --- /dev/null +++ b/internal/llm/provider.go @@ -0,0 +1,52 @@ +// Package llm defines the model-provider abstraction. Phase A ships +// the interface only; Phase C adds the Ollama implementation. +// +// Provider interface mirrors PROMPT.md / docs/LOCAL_MODEL_SETUP.md: +// list_models() +// complete(prompt, options) +// complete_json(prompt, schema, options) +// health_check() +// +// Phase A's stub doctor uses this only for HealthCheck — the rest +// is wired in Phase C. +package llm + +import "context" + +// HealthStatus is what HealthCheck returns. Stable shape so the +// model-doctor JSON schema doesn't shift between phases. +type HealthStatus struct { + ServerAvailable bool `json:"server_available"` + PrimaryModelAvailable bool `json:"primary_model_available"` + FallbackModelAvailable bool `json:"fallback_model_available"` + BasicPromptOK bool `json:"basic_prompt_ok"` + JSONModeOK bool `json:"json_mode_ok"` + Errors []string `json:"errors"` +} + +// CompleteOptions tunes a non-streaming completion call. +type CompleteOptions struct { + Temperature float64 + MaxTokens int + TimeoutSeconds int +} + +// Provider is the abstraction every model backend implements. +// G0 ships Ollama; OpenAI-compatible local endpoints land in +// Phase F+ when the harness needs them. +type Provider interface { + // Name returns the short identifier (e.g. "ollama"). + Name() string + + // HealthCheck probes server + primary + fallback model availability + // + a basic prompt + a JSON-mode probe. Used by `model doctor`. + HealthCheck(ctx context.Context, primaryModel, fallbackModel string) HealthStatus + + // Complete performs a non-streaming completion. Phase C wires this. + Complete(ctx context.Context, model, prompt string, opts CompleteOptions) (string, error) + + // CompleteJSON requests strict JSON output. Phase C wires this. + // Implementations should set Ollama's `format: "json"` or its + // upstream-equivalent constrained-decoding flag. + CompleteJSON(ctx context.Context, model, prompt string, opts CompleteOptions) (string, error) +} diff --git a/internal/pipeline/pipeline.go b/internal/pipeline/pipeline.go new file mode 100644 index 0000000..fef0b21 --- /dev/null +++ b/internal/pipeline/pipeline.go @@ -0,0 +1,182 @@ +// Package pipeline orchestrates the per-phase execution. Each phase +// produces JSON / markdown artifacts and a per-phase Receipt entry. +// Degraded mode propagates: if Phase C (LLM review) can't run, the +// pipeline still ships the static-scan deliverables and marks the +// LLM phase degraded — never silently skipped. +package pipeline + +import ( + "context" + "crypto/rand" + "encoding/hex" + "path/filepath" + "time" + + "local-review-harness/internal/analyzers" + "local-review-harness/internal/config" + "local-review-harness/internal/git" + "local-review-harness/internal/reporters" + "local-review-harness/internal/scanner" +) + +// Inputs is the bag the CLI passes to the pipeline. +type Inputs struct { + RepoPath string + ReviewProfile config.ReviewProfile + ModelProfile config.ModelProfile + OutputDir string + EmitScrum bool // true → also emit scrum-test/risk-register/sprint-backlog/acceptance-gates markdown +} + +// Result is what the CLI shows the operator. +type Result struct { + OutputFiles []string + ExitCode int // 0=ok, 66=degraded, 65=runtime +} + +// RunRepo executes Phase 0 (intake), Phase 1 (static), Phase 4 (report). +// Phases 2 (LLM) + 3 (validate) + 5 (memory) ship later — every phase +// not run lands in receipts as "skipped" or "degraded". +func RunRepo(ctx context.Context, in Inputs) (*Result, error) { + startedAt := time.Now().UTC() + runID := newRunID(startedAt) + res := &Result{ExitCode: 0} + receipt := reporters.Receipt{ + RunID: runID, + RepoPath: in.RepoPath, + StartedAt: startedAt.Format(time.RFC3339Nano), + } + + // --- Phase 0: repo intake --- + scan, err := scanner.Walk(in.RepoPath, true) + scanPhase := reporters.PhaseReceipt{Name: "repo_intake", Status: "ok"} + if err != nil { + scanPhase.Status = "failed" + scanPhase.Errors = append(scanPhase.Errors, err.Error()) + receipt.Phases = append(receipt.Phases, scanPhase) + res.ExitCode = 65 + // Even on scan failure, write the receipt so operators can + // see what blew up. + _ = writeReceipt(in.OutputDir, &receipt, startedAt, nil) + return res, err + } + gi := git.Inspect(ctx, in.RepoPath) + intake := reporters.BuildIntake(scan, gi) + intakePath := filepath.Join(in.OutputDir, "repo-intake.json") + if sha, err := reporters.WriteJSON(intakePath, intake); err != nil { + scanPhase.Status = "failed" + scanPhase.Errors = append(scanPhase.Errors, err.Error()) + } else { + scanPhase.OutputFiles = []string{"repo-intake.json"} + scanPhase.OutputHash = sha + } + if !gi.HasGit { + scanPhase.Status = "degraded" + scanPhase.Errors = append(scanPhase.Errors, "no git metadata (not a git repo or git unavailable)") + if res.ExitCode == 0 { + res.ExitCode = 66 + } + } + receipt.Phases = append(receipt.Phases, scanPhase) + res.OutputFiles = append(res.OutputFiles, "repo-intake.json") + + // --- Phase 1: static scan --- + findings := analyzers.Run(scan, in.ReviewProfile) + staticOut := reporters.StaticFindings{ + GeneratedAt: time.Now().UTC().Format(time.RFC3339Nano), + Findings: findings, + Summary: reporters.SummarizeFindings(findings), + } + staticPath := filepath.Join(in.OutputDir, "static-findings.json") + staticPhase := reporters.PhaseReceipt{Name: "static_scan", Status: "ok"} + if sha, err := reporters.WriteJSON(staticPath, staticOut); err != nil { + staticPhase.Status = "failed" + staticPhase.Errors = append(staticPhase.Errors, err.Error()) + res.ExitCode = 65 + } else { + staticPhase.OutputFiles = []string{"static-findings.json"} + staticPhase.OutputHash = sha + } + receipt.Phases = append(receipt.Phases, staticPhase) + res.OutputFiles = append(res.OutputFiles, "static-findings.json") + + // --- Phase 2: LLM review (Phase C — not implemented in MVP) --- + receipt.Phases = append(receipt.Phases, reporters.PhaseReceipt{ + Name: "llm_review", Status: "degraded", + Errors: []string{"Phase C not implemented in MVP — see PROMPT.md / docs/REVIEW_PIPELINE.md Phase 2"}, + }) + if res.ExitCode == 0 { + res.ExitCode = 66 + } + llmDegraded := true + + // --- Phase 3: validation (Phase D — also deferred) --- + receipt.Phases = append(receipt.Phases, reporters.PhaseReceipt{ + Name: "validation", Status: "skipped", + Errors: []string{"Phase D not implemented in MVP — depends on Phase C"}, + }) + + // --- Phase 4: report generation (markdown) --- + if in.EmitScrum { + reportPhase := reporters.PhaseReceipt{Name: "report_generation", Status: "ok"} + writers := []struct { + name string + fn func() error + }{ + {"scrum-test.md", func() error { + return reporters.WriteScrumTest(filepath.Join(in.OutputDir, "scrum-test.md"), intake, findings, llmDegraded) + }}, + {"risk-register.md", func() error { + return reporters.WriteRiskRegister(filepath.Join(in.OutputDir, "risk-register.md"), findings) + }}, + {"claim-coverage-table.md", func() error { + return reporters.WriteClaimCoverage(filepath.Join(in.OutputDir, "claim-coverage-table.md"), findings) + }}, + {"sprint-backlog.md", func() error { + return reporters.WriteSprintBacklog(filepath.Join(in.OutputDir, "sprint-backlog.md"), staticOut.Summary) + }}, + {"acceptance-gates.md", func() error { + return reporters.WriteAcceptanceGates(filepath.Join(in.OutputDir, "acceptance-gates.md"), staticOut.Summary) + }}, + } + for _, w := range writers { + if err := w.fn(); err != nil { + reportPhase.Status = "failed" + reportPhase.Errors = append(reportPhase.Errors, w.name+": "+err.Error()) + res.ExitCode = 65 + continue + } + reportPhase.OutputFiles = append(reportPhase.OutputFiles, w.name) + res.OutputFiles = append(res.OutputFiles, w.name) + } + receipt.Phases = append(receipt.Phases, reportPhase) + } + + // --- Phase 5: memory (Phase E — deferred) --- + receipt.Phases = append(receipt.Phases, reporters.PhaseReceipt{ + Name: "memory_update", Status: "skipped", + Errors: []string{"Phase E not implemented in MVP"}, + }) + + // --- Receipt --- + receipt.Summary = staticOut.Summary + if err := writeReceipt(in.OutputDir, &receipt, startedAt, nil); err != nil { + return res, err + } + res.OutputFiles = append(res.OutputFiles, "receipts.json") + + return res, nil +} + +func writeReceipt(outputDir string, r *reporters.Receipt, startedAt time.Time, _ error) error { + r.FinishedAt = time.Now().UTC().Format(time.RFC3339Nano) + _ = startedAt // present for future timing fields + _, err := reporters.WriteJSON(filepath.Join(outputDir, "receipts.json"), r) + return err +} + +func newRunID(t time.Time) string { + var rb [4]byte + _, _ = rand.Read(rb[:]) + return t.UTC().Format("20060102T150405") + "-" + hex.EncodeToString(rb[:]) +} diff --git a/internal/reporters/json.go b/internal/reporters/json.go new file mode 100644 index 0000000..27551be --- /dev/null +++ b/internal/reporters/json.go @@ -0,0 +1,152 @@ +// Package reporters writes the human-readable + machine-readable +// outputs the pipeline produces. JSON shapes mirror docs/REPORT_SCHEMA.md +// and PROMPT.md verbatim — this package is the contract between the +// harness and any downstream consumer (CI gate, observer, MCP tool). +package reporters + +import ( + "crypto/sha256" + "encoding/hex" + "encoding/json" + "os" + "path/filepath" + "time" + + "local-review-harness/internal/analyzers" + "local-review-harness/internal/git" + "local-review-harness/internal/scanner" +) + +// RepoIntake mirrors REVIEW_PIPELINE.md Phase 0 schema. +type RepoIntake struct { + RepoPath string `json:"repo_path"` + CurrentBranch string `json:"current_branch"` + LatestCommit string `json:"latest_commit"` + GitStatus string `json:"git_status"` + HasGit bool `json:"has_git"` + FileCount int `json:"file_count"` + LanguageBreakdown map[string]int `json:"language_breakdown"` + LargestFiles []LargestFile `json:"largest_files"` + DependencyManifests []string `json:"dependency_manifests"` + TestManifests []string `json:"test_manifests"` + GeneratedAt string `json:"generated_at"` +} + +type LargestFile struct { + Path string `json:"path"` + Size int64 `json:"size"` + Lines int `json:"lines,omitempty"` +} + +// StaticFindings is the wrapper shape with summary counts. +type StaticFindings struct { + GeneratedAt string `json:"generated_at"` + Findings []analyzers.Finding `json:"findings"` + Summary FindingsSummary `json:"summary"` +} + +type FindingsSummary struct { + Total int `json:"total"` + Confirmed int `json:"confirmed"` + Suspected int `json:"suspected"` + Rejected int `json:"rejected"` + Critical int `json:"critical"` + High int `json:"high"` + Medium int `json:"medium"` + Low int `json:"low"` + BySource map[string]int `json:"by_source"` + ByCheck map[string]int `json:"by_check"` +} + +// Receipt mirrors REPORT_SCHEMA.md "Receipt Schema". One per run. +type Receipt struct { + RunID string `json:"run_id"` + RepoPath string `json:"repo_path"` + StartedAt string `json:"started_at"` + FinishedAt string `json:"finished_at"` + Phases []PhaseReceipt `json:"phases"` + Summary FindingsSummary `json:"summary"` +} + +type PhaseReceipt struct { + Name string `json:"name"` + Status string `json:"status"` // ok|degraded|failed|skipped + InputHash string `json:"input_hash,omitempty"` + OutputHash string `json:"output_hash,omitempty"` + OutputFiles []string `json:"output_files,omitempty"` + Errors []string `json:"errors,omitempty"` +} + +// BuildIntake assembles the Phase 0 intake JSON from the scanner + +// git probes. Doesn't write — the pipeline owns file I/O. +func BuildIntake(scan *scanner.Result, gi git.Info) RepoIntake { + largest := make([]LargestFile, 0, len(scan.LargestFiles)) + for _, f := range scan.LargestFiles { + largest = append(largest, LargestFile{Path: f.Path, Size: f.Size, Lines: f.Lines}) + } + return RepoIntake{ + RepoPath: scan.RepoPath, + CurrentBranch: gi.CurrentBranch, + LatestCommit: gi.LatestCommit, + GitStatus: gi.Status, + HasGit: gi.HasGit, + FileCount: len(scan.Files), + LanguageBreakdown: scan.LanguageBreakdown, + LargestFiles: largest, + DependencyManifests: scan.DependencyManifests, + TestManifests: scan.TestManifests, + GeneratedAt: time.Now().UTC().Format(time.RFC3339Nano), + } +} + +// SummarizeFindings is the canonical roll-up. Used by both the +// per-phase JSON and the receipt summary. +func SummarizeFindings(findings []analyzers.Finding) FindingsSummary { + out := FindingsSummary{ + Total: len(findings), + BySource: map[string]int{}, + ByCheck: map[string]int{}, + } + for _, f := range findings { + switch f.Status { + case analyzers.StatusConfirmed: + out.Confirmed++ + case analyzers.StatusSuspected: + out.Suspected++ + case analyzers.StatusRejected: + out.Rejected++ + } + switch f.Severity { + case analyzers.SeverityCritical: + out.Critical++ + case analyzers.SeverityHigh: + out.High++ + case analyzers.SeverityMedium: + out.Medium++ + case analyzers.SeverityLow: + out.Low++ + } + out.BySource[string(f.Source)]++ + if f.CheckID != "" { + out.ByCheck[f.CheckID]++ + } + } + return out +} + +// WriteJSON marshals + writes; sha256 returned for receipt cross-link. +func WriteJSON(path string, v any) (sha string, err error) { + if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil { + return "", err + } + bs, err := json.MarshalIndent(v, "", " ") + if err != nil { + return "", err + } + bs = append(bs, '\n') + if err := os.WriteFile(path, bs, 0o644); err != nil { + return "", err + } + h := sha256.Sum256(bs) + return hex.EncodeToString(h[:])[:16], nil +} diff --git a/internal/reporters/markdown.go b/internal/reporters/markdown.go new file mode 100644 index 0000000..28a5cf1 --- /dev/null +++ b/internal/reporters/markdown.go @@ -0,0 +1,336 @@ +package reporters + +import ( + "fmt" + "os" + "path/filepath" + "sort" + "strings" + + "local-review-harness/internal/analyzers" +) + +// WriteScrumTest produces reports/latest/scrum-test.md per +// docs/SCRUM_TEST_TEMPLATE.md. Sections in fixed order so operators +// can grep section headers reliably. +func WriteScrumTest(path string, intake RepoIntake, findings []analyzers.Finding, llmDegraded bool) error { + summary := SummarizeFindings(findings) + + var b strings.Builder + fmt.Fprintf(&b, "# Scrum Test — %s\n\n", filepath.Base(intake.RepoPath)) + fmt.Fprintf(&b, "**Generated:** %s\n", intake.GeneratedAt) + fmt.Fprintf(&b, "**Branch:** %s · **Commit:** %s\n\n", coalesce(intake.CurrentBranch, "(no git)"), coalesce(intake.LatestCommit, "—")) + + // Verdict per SCRUM_TEST_TEMPLATE.md — blunt, no soften. + fmt.Fprintln(&b, "## Verdict") + fmt.Fprintln(&b) + fmt.Fprintln(&b, verdict(summary, llmDegraded)) + fmt.Fprintln(&b) + + // Evidence + fmt.Fprintln(&b, "## Evidence") + fmt.Fprintln(&b) + fmt.Fprintf(&b, "- repo path: `%s`\n", intake.RepoPath) + fmt.Fprintf(&b, "- file count: %d\n", intake.FileCount) + if len(intake.LanguageBreakdown) > 0 { + fmt.Fprintf(&b, "- languages: %s\n", langSummary(intake.LanguageBreakdown)) + } + fmt.Fprintf(&b, "- dependency manifests: %d (%s)\n", len(intake.DependencyManifests), strings.Join(firstN(intake.DependencyManifests, 5), ", ")) + fmt.Fprintf(&b, "- test files/dirs: %d\n", len(intake.TestManifests)) + if llmDegraded { + fmt.Fprintln(&b, "- LLM review: **skipped** (Phase C not implemented OR provider unavailable; see model-doctor.json)") + } + fmt.Fprintln(&b) + + // Confirmed + fmt.Fprintln(&b, "## Confirmed Risks") + fmt.Fprintln(&b) + confirmed := filterByStatus(findings, analyzers.StatusConfirmed) + if len(confirmed) == 0 { + fmt.Fprintln(&b, "_No confirmed risks at static-scan level. (LLM review may surface more.)_") + } else { + writeFindingTable(&b, confirmed) + } + fmt.Fprintln(&b) + + // Suspected + fmt.Fprintln(&b, "## Suspected Risks") + fmt.Fprintln(&b) + suspected := filterByStatus(findings, analyzers.StatusSuspected) + if len(suspected) == 0 { + fmt.Fprintln(&b, "_None._") + } else { + fmt.Fprintf(&b, "Each entry is a static-scan regex hit awaiting validation (Phase D / LLM cross-check).\n\n") + writeFindingTable(&b, suspected) + } + fmt.Fprintln(&b) + + // Blocked + fmt.Fprintln(&b, "## Blocked Checks") + fmt.Fprintln(&b) + if llmDegraded { + fmt.Fprintln(&b, "- LLM review (Phase 2 in REVIEW_PIPELINE.md). Reason: provider unavailable or stub. Next command: `review-harness model doctor`") + } else { + fmt.Fprintln(&b, "_None._") + } + fmt.Fprintln(&b) + + // Sprint backlog (per SCRUM_TEST_TEMPLATE.md fixed shape) + fmt.Fprintln(&b, "## Sprint Backlog") + fmt.Fprintln(&b) + writeSprintBacklog(&b, summary) + fmt.Fprintln(&b) + + // Acceptance gates + fmt.Fprintln(&b, "## Acceptance Gates") + fmt.Fprintln(&b) + writeAcceptanceGates(&b, summary) + fmt.Fprintln(&b) + + // Next commands + fmt.Fprintln(&b, "## Next Commands") + fmt.Fprintln(&b) + writeNextCommands(&b, summary, llmDegraded, intake.RepoPath) + + return os.WriteFile(path, []byte(b.String()), 0o644) +} + +// WriteRiskRegister produces reports/latest/risk-register.md. +func WriteRiskRegister(path string, findings []analyzers.Finding) error { + var b strings.Builder + fmt.Fprintln(&b, "# Risk Register") + fmt.Fprintln(&b) + fmt.Fprintln(&b, "Findings ranked by severity. `Suspected` rows haven't been validated yet (Phase D).") + fmt.Fprintln(&b) + if len(findings) == 0 { + fmt.Fprintln(&b, "_No findings._") + return os.WriteFile(path, []byte(b.String()), 0o644) + } + + sorted := sortBySeverity(findings) + fmt.Fprintln(&b, "| ID | Severity | Status | File | Line | Title |") + fmt.Fprintln(&b, "|---|---|---|---|---|---|") + for _, f := range sorted { + fmt.Fprintf(&b, "| `%s` | %s | %s | `%s` | %s | %s |\n", + f.ID, f.Severity, f.Status, mdEscape(f.File), coalesce(f.LineHint, "—"), mdEscape(f.Title)) + } + return os.WriteFile(path, []byte(b.String()), 0o644) +} + +// WriteClaimCoverage produces reports/latest/claim-coverage-table.md. +// Phase B emits the table shape per REPORT_SCHEMA.md but the LLM-side +// claims aren't generated until Phase C. +func WriteClaimCoverage(path string, findings []analyzers.Finding) error { + var b strings.Builder + fmt.Fprintln(&b, "# Claim Coverage Table") + fmt.Fprintln(&b) + fmt.Fprintln(&b, "Each row is a finding paired with whether existing tests cover the affected area.") + fmt.Fprintln(&b, "Phase B emits this shape; LLM-side claim generation lands in Phase C.") + fmt.Fprintln(&b) + fmt.Fprintln(&b, "| Claim | Code Location | Existing Test | Missing Test | Risk |") + fmt.Fprintln(&b, "|---|---|---|---|---|") + if len(findings) == 0 { + fmt.Fprintln(&b, "| _no claims yet_ | — | — | — | — |") + } + for _, f := range findings { + fmt.Fprintf(&b, "| %s | `%s:%s` | _unknown_ | _likely_ | %s |\n", + mdEscape(f.Title), mdEscape(f.File), coalesce(f.LineHint, "?"), f.Severity) + } + return os.WriteFile(path, []byte(b.String()), 0o644) +} + +// WriteSprintBacklog produces reports/latest/sprint-backlog.md. +func WriteSprintBacklog(path string, summary FindingsSummary) error { + var b strings.Builder + fmt.Fprintln(&b, "# Sprint Backlog") + fmt.Fprintln(&b) + writeSprintBacklog(&b, summary) + return os.WriteFile(path, []byte(b.String()), 0o644) +} + +// WriteAcceptanceGates produces reports/latest/acceptance-gates.md. +func WriteAcceptanceGates(path string, summary FindingsSummary) error { + var b strings.Builder + fmt.Fprintln(&b, "# Acceptance Gates") + fmt.Fprintln(&b) + writeAcceptanceGates(&b, summary) + return os.WriteFile(path, []byte(b.String()), 0o644) +} + +// === helpers === + +func verdict(s FindingsSummary, llmDegraded bool) string { + switch { + case s.Critical > 0: + return "**blocked** — critical-severity finding present. See Confirmed Risks; rotate any leaked credentials, then re-run." + case s.High > 0 && s.Confirmed > 0: + return "**prototype-ready** — confirmed high-severity findings need fixes before production deploy." + case s.High > 0: + return "**prototype-ready** — high-severity findings are suspected (not confirmed); validation pass (Phase D) or LLM review (Phase C) needed before promoting verdict." + case s.Total == 0 && !llmDegraded: + return "**production-ready** — static scan + LLM review found no issues. Re-validate after every wave." + case s.Total == 0: + return "**prototype-ready** — static scan clean; LLM review degraded so production status not certified." + default: + return "**demo-only** — only low/medium-severity findings, mostly suspected. Reasonable to demo; production deploy needs the validator pass + missing-tests gap closed." + } +} + +func writeSprintBacklog(b *strings.Builder, s FindingsSummary) { + // Per SCRUM_TEST_TEMPLATE.md fixed format. + fmt.Fprintln(b, "**Sprint 0 — Reproducibility Gate**") + fmt.Fprintln(b) + fmt.Fprintln(b, "- Wire `just verify` (or equivalent) to run the static checks before every commit/PR.") + fmt.Fprintln(b, "- Add a CI step that fails on `critical` findings.") + if s.Total > 0 { + fmt.Fprintf(b, "- Triage the %d findings emitted by this run; mark each as accepted / blocking / dismiss-with-reason.\n", s.Total) + } + fmt.Fprintln(b) + + fmt.Fprintln(b, "**Sprint 1 — Trust Boundary Gate**") + fmt.Fprintln(b) + if s.High > 0 || s.Critical > 0 { + fmt.Fprintln(b, "- Resolve every `critical` and `high` finding before non-loopback deploy.") + } + fmt.Fprintln(b, "- Confirm auth posture for any mutation endpoint flagged as exposed.") + fmt.Fprintln(b, "- Replace raw SQL interpolation with parameterized queries.") + fmt.Fprintln(b) + + fmt.Fprintln(b, "**Sprint 2 — Memory Correctness Gate**") + fmt.Fprintln(b) + fmt.Fprintln(b, "- (Phase E) Wire append-only `.memory/` writes for known-risks + fixed-patterns.") + fmt.Fprintln(b, "- Add a regression test that re-runs the harness and asserts no regression in confirmed-finding count.") + fmt.Fprintln(b) + + fmt.Fprintln(b, "**Sprint 3 — Agent Loop Reality Gate**") + fmt.Fprintln(b) + fmt.Fprintln(b, "- (Phase C) Wire local-Ollama LLM review.") + fmt.Fprintln(b, "- (Phase D) Validator pass cross-checks every LLM finding against repo evidence.") + fmt.Fprintln(b) + + fmt.Fprintln(b, "**Sprint 4 — Deployment Gate**") + fmt.Fprintln(b) + fmt.Fprintln(b, "- Ship the harness as a single static binary (`go build -o review-harness`).") + fmt.Fprintln(b, "- Document operator runbook (model setup, profile editing, output retention).") +} + +func writeAcceptanceGates(b *strings.Builder, s FindingsSummary) { + fmt.Fprintln(b, "Each gate must be testable. Format: command + verifiable post-condition.") + fmt.Fprintln(b) + fmt.Fprintln(b, "1. **Reproducibility:** `review-harness repo .` exits 0; `reports/latest/repo-intake.json` exists with non-zero `file_count`.") + fmt.Fprintln(b, "2. **No false positives on a clean fixture:** `review-harness repo tests/fixtures/clean-repo` produces zero `confirmed` findings.") + fmt.Fprintln(b, "3. **Every documented static check fires on the insecure fixture:** `jq '[.findings[] | .check_id] | unique | length' reports/latest/static-findings.json` ≥ 8.") + fmt.Fprintln(b, "4. **Receipts are honest about degraded phases:** `jq '[.phases[] | select(.status == \"degraded\")]' reports/latest/receipts.json` lists every skipped/stubbed phase.") + if s.Critical > 0 { + fmt.Fprintln(b, "5. **Critical findings block production deploy:** at least one critical finding is currently present; resolve before deploy.") + } +} + +func writeNextCommands(b *strings.Builder, s FindingsSummary, llmDegraded bool, repoPath string) { + if s.Critical > 0 { + fmt.Fprintln(b, "1. Open the risk register: `cat reports/latest/risk-register.md`") + fmt.Fprintln(b, "2. Triage every `critical` finding; rotate any leaked credentials immediately.") + } + if llmDegraded { + fmt.Fprintln(b, "- Probe the model provider: `review-harness model doctor`") + } + fmt.Fprintf(b, "- Re-run after fixes: `review-harness repo %s`\n", repoPath) + fmt.Fprintf(b, "- Generate the full Scrum bundle: `review-harness scrum %s`\n", repoPath) +} + +func filterByStatus(findings []analyzers.Finding, st analyzers.Status) []analyzers.Finding { + out := []analyzers.Finding{} + for _, f := range findings { + if f.Status == st { + out = append(out, f) + } + } + return sortBySeverity(out) +} + +// severityRank used for sorting tables — critical first. +func severityRank(s analyzers.Severity) int { + switch s { + case analyzers.SeverityCritical: + return 0 + case analyzers.SeverityHigh: + return 1 + case analyzers.SeverityMedium: + return 2 + case analyzers.SeverityLow: + return 3 + } + return 4 +} + +func sortBySeverity(findings []analyzers.Finding) []analyzers.Finding { + out := make([]analyzers.Finding, len(findings)) + copy(out, findings) + sort.SliceStable(out, func(i, j int) bool { + ri, rj := severityRank(out[i].Severity), severityRank(out[j].Severity) + if ri != rj { + return ri < rj + } + if out[i].File != out[j].File { + return out[i].File < out[j].File + } + return out[i].LineHint < out[j].LineHint + }) + return out +} + +func writeFindingTable(b *strings.Builder, findings []analyzers.Finding) { + fmt.Fprintln(b, "| Severity | File:Line | Title | Evidence |") + fmt.Fprintln(b, "|---|---|---|---|") + for _, f := range findings { + loc := mdEscape(f.File) + if f.LineHint != "" { + loc = fmt.Sprintf("%s:%s", mdEscape(f.File), f.LineHint) + } + fmt.Fprintf(b, "| %s | `%s` | %s | `%s` |\n", + f.Severity, loc, mdEscape(f.Title), mdEscape(f.Evidence)) + } +} + +func mdEscape(s string) string { + s = strings.ReplaceAll(s, "|", "\\|") + s = strings.ReplaceAll(s, "\n", " ") + if len(s) > 120 { + s = s[:120] + "…" + } + return s +} + +func coalesce(s, fallback string) string { + if s == "" { + return fallback + } + return s +} + +func langSummary(m map[string]int) string { + type kv struct { + k string + v int + } + pairs := make([]kv, 0, len(m)) + for k, v := range m { + pairs = append(pairs, kv{k, v}) + } + sort.Slice(pairs, func(i, j int) bool { return pairs[i].v > pairs[j].v }) + if len(pairs) > 5 { + pairs = pairs[:5] + } + parts := make([]string, 0, len(pairs)) + for _, p := range pairs { + parts = append(parts, fmt.Sprintf("%s (%d)", p.k, p.v)) + } + return strings.Join(parts, ", ") +} + +func firstN(s []string, n int) []string { + if len(s) <= n { + return s + } + return s[:n] +} diff --git a/internal/scanner/language.go b/internal/scanner/language.go new file mode 100644 index 0000000..d26fe2c --- /dev/null +++ b/internal/scanner/language.go @@ -0,0 +1,66 @@ +package scanner + +import ( + "path/filepath" + "strings" +) + +// detectLanguage returns a best-effort language label based on the +// file extension. Empty string for unknown — the caller treats those +// as opaque and may still scan them for patterns. +func detectLanguage(filename string) string { + switch strings.ToLower(filepath.Ext(filename)) { + case ".go": + return "Go" + case ".rs": + return "Rust" + case ".ts", ".tsx": + return "TypeScript" + case ".js", ".jsx", ".mjs", ".cjs": + return "JavaScript" + case ".py": + return "Python" + case ".java": + return "Java" + case ".kt": + return "Kotlin" + case ".rb": + return "Ruby" + case ".php": + return "PHP" + case ".swift": + return "Swift" + case ".c", ".h": + return "C" + case ".cpp", ".cc", ".cxx", ".hpp": + return "C++" + case ".cs": + return "C#" + case ".sh", ".bash": + return "Shell" + case ".sql": + return "SQL" + case ".yaml", ".yml": + return "YAML" + case ".toml": + return "TOML" + case ".json": + return "JSON" + case ".md", ".markdown": + return "Markdown" + case ".html", ".htm": + return "HTML" + case ".css", ".scss", ".sass", ".less": + return "CSS" + } + // Special filenames without extensions + switch strings.ToLower(filename) { + case "dockerfile", "containerfile": + return "Docker" + case "makefile", "gnumakefile": + return "Make" + case "justfile": + return "Justfile" + } + return "" +} diff --git a/internal/scanner/manifests.go b/internal/scanner/manifests.go new file mode 100644 index 0000000..32180d8 --- /dev/null +++ b/internal/scanner/manifests.go @@ -0,0 +1,53 @@ +package scanner + +import ( + "path/filepath" + "strings" +) + +// isManifest detects dependency / build manifests by basename. +// Used to populate Result.DependencyManifests for repo-intake. +func isManifest(name string) bool { + switch strings.ToLower(name) { + case "go.mod", "go.sum", + "package.json", "package-lock.json", "yarn.lock", "pnpm-lock.yaml", "bun.lockb", + "cargo.toml", "cargo.lock", + "requirements.txt", "pyproject.toml", "poetry.lock", "pipfile", "pipfile.lock", + "gemfile", "gemfile.lock", + "composer.json", "composer.lock", + "pom.xml", "build.gradle", "build.gradle.kts", + "makefile", "justfile", + "dockerfile", "docker-compose.yml", "docker-compose.yaml", + "helm.yaml", "chart.yaml": + return true + } + return false +} + +// isTestPath detects test files / dirs by path. A repo is "has tests" +// iff at least one returns true. Used both for repo-intake's +// test_manifests list and the missing-tests analyzer's threshold. +func isTestPath(rel string) bool { + low := strings.ToLower(rel) + // Directory-shaped signals + parts := strings.Split(filepath.ToSlash(low), "/") + for _, p := range parts { + if p == "tests" || p == "test" || p == "__tests__" || p == "spec" || p == "specs" { + return true + } + } + // File-shape signals + base := strings.ToLower(filepath.Base(rel)) + if strings.HasSuffix(base, "_test.go") || + strings.HasSuffix(base, ".test.ts") || + strings.HasSuffix(base, ".test.tsx") || + strings.HasSuffix(base, ".test.js") || + strings.HasSuffix(base, ".spec.ts") || + strings.HasSuffix(base, ".spec.js") { + return true + } + if strings.HasPrefix(base, "test_") && strings.HasSuffix(base, ".py") { + return true + } + return false +} diff --git a/internal/scanner/walk.go b/internal/scanner/walk.go new file mode 100644 index 0000000..e22e6dd --- /dev/null +++ b/internal/scanner/walk.go @@ -0,0 +1,167 @@ +// Package scanner walks a repository tree, classifies files, and +// surfaces metadata for the analyzers + repo-intake report. +// +// Skip-list defaults to common build/dependency dirs that nobody +// wants to scan. Operators can extend via review-profile (Phase E). +package scanner + +import ( + "io/fs" + "os" + "path/filepath" + "sort" + "strings" +) + +// SkipDirs is the default skip-list. Matches dir basenames anywhere +// in the tree (not just at root). Includes the harness's own output +// dir so review-on-self doesn't loop. +var SkipDirs = map[string]bool{ + ".git": true, + ".hg": true, + ".svn": true, + "node_modules": true, + "vendor": true, + "target": true, // Rust + "dist": true, + "build": true, + "__pycache__": true, + ".venv": true, + "venv": true, + "bin": true, // golangLAKEHOUSE convention; harness's own too + ".idea": true, + ".vscode": true, + "reports": true, // harness's own output +} + +// File is one entry in the scan result. +type File struct { + Path string // relative to repo root + Abs string // absolute path on disk + Size int64 // bytes + Lines int // 0 if not counted + Language string // best-effort, "" if unknown +} + +// Result is the scan summary the analyzers + reporters consume. +type Result struct { + RepoPath string + Files []File + LanguageBreakdown map[string]int // count of files by language + LargestFiles []File // top 10 by size + DependencyManifests []string // relative paths to package.json / go.mod / etc + TestManifests []string // tests/ dirs, *_test.go, *.test.ts, etc +} + +// Walk produces a Result for repoPath. Errors on missing dir; skipped +// dirs are silently filtered. countLines is true → reads each file +// for line counts (Phase B needs this; Phase A wires false for speed). +func Walk(repoPath string, countLines bool) (*Result, error) { + abs, err := filepath.Abs(repoPath) + if err != nil { + return nil, err + } + if st, err := os.Stat(abs); err != nil { + return nil, err + } else if !st.IsDir() { + return nil, fs.ErrInvalid + } + + res := &Result{ + RepoPath: abs, + LanguageBreakdown: map[string]int{}, + } + + walkErr := filepath.WalkDir(abs, func(p string, d fs.DirEntry, walkErr error) error { + if walkErr != nil { + return nil // best-effort; permission errors etc. are silent + } + if d.IsDir() { + if SkipDirs[d.Name()] && p != abs { + return filepath.SkipDir + } + return nil + } + // Skip dotfiles at file level (.gitignore etc. are interesting, + // but most dotfiles are noise; Analyzers can opt back in). + if strings.HasPrefix(d.Name(), ".") && !interestingDotfile(d.Name()) { + return nil + } + info, err := d.Info() + if err != nil { + return nil + } + rel, err := filepath.Rel(abs, p) + if err != nil { + rel = p + } + f := File{ + Path: rel, + Abs: p, + Size: info.Size(), + Language: detectLanguage(d.Name()), + } + if countLines && info.Size() < 5_000_000 { // 5MB cap; massive files lose line precision but stay scannable + if n, err := countFileLines(p); err == nil { + f.Lines = n + } + } + res.Files = append(res.Files, f) + if f.Language != "" { + res.LanguageBreakdown[f.Language]++ + } + if isManifest(d.Name()) { + res.DependencyManifests = append(res.DependencyManifests, rel) + } + if isTestPath(rel) { + res.TestManifests = append(res.TestManifests, rel) + } + return nil + }) + if walkErr != nil { + return nil, walkErr + } + + // Largest top-10 by size. + sorted := make([]File, len(res.Files)) + copy(sorted, res.Files) + sort.Slice(sorted, func(i, j int) bool { return sorted[i].Size > sorted[j].Size }) + if len(sorted) > 10 { + sorted = sorted[:10] + } + res.LargestFiles = sorted + + // Stable order for downstream determinism. + sort.Strings(res.DependencyManifests) + sort.Strings(res.TestManifests) + + return res, nil +} + +// interestingDotfile lets a few well-known dotfiles through despite +// the leading-dot filter. Keeps the scan honest about config files +// that often hold the real risk (e.g. committed .env). +func interestingDotfile(name string) bool { + switch name { + case ".env", ".env.local", ".env.production", + ".gitignore", ".dockerignore", ".github", + ".review-rules.md", ".review-profile.yaml": + return true + } + return false +} + +func countFileLines(path string) (int, error) { + b, err := os.ReadFile(path) + if err != nil { + return 0, err + } + if len(b) == 0 { + return 0, nil + } + n := strings.Count(string(b), "\n") + if !strings.HasSuffix(string(b), "\n") { + n++ // last line without trailing newline + } + return n, nil +} diff --git a/tests/fixtures/clean-repo/README.md b/tests/fixtures/clean-repo/README.md new file mode 100644 index 0000000..4320a5b --- /dev/null +++ b/tests/fixtures/clean-repo/README.md @@ -0,0 +1,2 @@ +# Clean fixture +A sterile repo. Static analyzers should find nothing confirmed. diff --git a/tests/fixtures/clean-repo/package.json b/tests/fixtures/clean-repo/package.json new file mode 100644 index 0000000..a0c49ac --- /dev/null +++ b/tests/fixtures/clean-repo/package.json @@ -0,0 +1 @@ +{"name": "clean-repo", "version": "0.1.0"} diff --git a/tests/fixtures/clean-repo/src/calc.ts b/tests/fixtures/clean-repo/src/calc.ts new file mode 100644 index 0000000..a9fa82f --- /dev/null +++ b/tests/fixtures/clean-repo/src/calc.ts @@ -0,0 +1,7 @@ +// Pure utility — no patterns the analyzer is looking for. +export function add(a: number, b: number): number { + return a + b; +} +export function multiply(a: number, b: number): number { + return a * b; +} diff --git a/tests/fixtures/clean-repo/tests/calc.test.ts b/tests/fixtures/clean-repo/tests/calc.test.ts new file mode 100644 index 0000000..87e30d2 --- /dev/null +++ b/tests/fixtures/clean-repo/tests/calc.test.ts @@ -0,0 +1,4 @@ +import { test, expect } from "bun:test"; +import { add, multiply } from "../src/calc"; +test("add", () => { expect(add(2, 3)).toBe(5); }); +test("multiply", () => { expect(multiply(2, 3)).toBe(6); }); diff --git a/tests/fixtures/degraded-repo/stray.go b/tests/fixtures/degraded-repo/stray.go new file mode 100644 index 0000000..6521331 --- /dev/null +++ b/tests/fixtures/degraded-repo/stray.go @@ -0,0 +1 @@ +// just a stray file diff --git a/tests/fixtures/insecure-repo/.env b/tests/fixtures/insecure-repo/.env new file mode 100644 index 0000000..a36d8aa --- /dev/null +++ b/tests/fixtures/insecure-repo/.env @@ -0,0 +1,2 @@ +DB_PASSWORD=hunter2 +SECRET=longLongLongSecretKey1234567890 diff --git a/tests/fixtures/insecure-repo/src/handler.go b/tests/fixtures/insecure-repo/src/handler.go new file mode 100644 index 0000000..0cfb8ee --- /dev/null +++ b/tests/fixtures/insecure-repo/src/handler.go @@ -0,0 +1,23 @@ +package main + +import ( + "database/sql" + "fmt" + "os/exec" +) + +// TODO: rotate this and move to env +const HARDCODED_PATH = "/home/profit/secrets/key.pem" +const SERVER_IP = "192.168.1.176" + +func badSQL(db *sql.DB, name string) { + q := fmt.Sprintf("SELECT * FROM users WHERE name = '%s'", name) + db.Query(q) +} + +func runShell(cmd string) { + exec.Command("bash", "-c", cmd).Run() +} + +// FIXME: hardcoded creds +const API_KEY = "sk-1234567890abcdefABCDEFGHIJKLMNOPQRSTUV" diff --git a/tests/fixtures/insecure-repo/src/huge.go b/tests/fixtures/insecure-repo/src/huge.go new file mode 100644 index 0000000..f816ce9 --- /dev/null +++ b/tests/fixtures/insecure-repo/src/huge.go @@ -0,0 +1,901 @@ +package main +// generated line 1 +// generated line 2 +// generated line 3 +// generated line 4 +// generated line 5 +// generated line 6 +// generated line 7 +// generated line 8 +// generated line 9 +// generated line 10 +// generated line 11 +// generated line 12 +// generated line 13 +// generated line 14 +// generated line 15 +// generated line 16 +// generated line 17 +// generated line 18 +// generated line 19 +// generated line 20 +// generated line 21 +// generated line 22 +// generated line 23 +// generated line 24 +// generated line 25 +// generated line 26 +// generated line 27 +// generated line 28 +// generated line 29 +// generated line 30 +// generated line 31 +// generated line 32 +// generated line 33 +// generated line 34 +// generated line 35 +// generated line 36 +// generated line 37 +// generated line 38 +// generated line 39 +// generated line 40 +// generated line 41 +// generated line 42 +// generated line 43 +// generated line 44 +// generated line 45 +// generated line 46 +// generated line 47 +// generated line 48 +// generated line 49 +// generated line 50 +// generated line 51 +// generated line 52 +// generated line 53 +// generated line 54 +// generated line 55 +// generated line 56 +// generated line 57 +// generated line 58 +// generated line 59 +// generated line 60 +// generated line 61 +// generated line 62 +// generated line 63 +// generated line 64 +// generated line 65 +// generated line 66 +// generated line 67 +// generated line 68 +// generated line 69 +// generated line 70 +// generated line 71 +// generated line 72 +// generated line 73 +// generated line 74 +// generated line 75 +// generated line 76 +// generated line 77 +// generated line 78 +// generated line 79 +// generated line 80 +// generated line 81 +// generated line 82 +// generated line 83 +// generated line 84 +// generated line 85 +// generated line 86 +// generated line 87 +// generated line 88 +// generated line 89 +// generated line 90 +// generated line 91 +// generated line 92 +// generated line 93 +// generated line 94 +// generated line 95 +// generated line 96 +// generated line 97 +// generated line 98 +// generated line 99 +// generated line 100 +// generated line 101 +// generated line 102 +// generated line 103 +// generated line 104 +// generated line 105 +// generated line 106 +// generated line 107 +// generated line 108 +// generated line 109 +// generated line 110 +// generated line 111 +// generated line 112 +// generated line 113 +// generated line 114 +// generated line 115 +// generated line 116 +// generated line 117 +// generated line 118 +// generated line 119 +// generated line 120 +// generated line 121 +// generated line 122 +// generated line 123 +// generated line 124 +// generated line 125 +// generated line 126 +// generated line 127 +// generated line 128 +// generated line 129 +// generated line 130 +// generated line 131 +// generated line 132 +// generated line 133 +// generated line 134 +// generated line 135 +// generated line 136 +// generated line 137 +// generated line 138 +// generated line 139 +// generated line 140 +// generated line 141 +// generated line 142 +// generated line 143 +// generated line 144 +// generated line 145 +// generated line 146 +// generated line 147 +// generated line 148 +// generated line 149 +// generated line 150 +// generated line 151 +// generated line 152 +// generated line 153 +// generated line 154 +// generated line 155 +// generated line 156 +// generated line 157 +// generated line 158 +// generated line 159 +// generated line 160 +// generated line 161 +// generated line 162 +// generated line 163 +// generated line 164 +// generated line 165 +// generated line 166 +// generated line 167 +// generated line 168 +// generated line 169 +// generated line 170 +// generated line 171 +// generated line 172 +// generated line 173 +// generated line 174 +// generated line 175 +// generated line 176 +// generated line 177 +// generated line 178 +// generated line 179 +// generated line 180 +// generated line 181 +// generated line 182 +// generated line 183 +// generated line 184 +// generated line 185 +// generated line 186 +// generated line 187 +// generated line 188 +// generated line 189 +// generated line 190 +// generated line 191 +// generated line 192 +// generated line 193 +// generated line 194 +// generated line 195 +// generated line 196 +// generated line 197 +// generated line 198 +// generated line 199 +// generated line 200 +// generated line 201 +// generated line 202 +// generated line 203 +// generated line 204 +// generated line 205 +// generated line 206 +// generated line 207 +// generated line 208 +// generated line 209 +// generated line 210 +// generated line 211 +// generated line 212 +// generated line 213 +// generated line 214 +// generated line 215 +// generated line 216 +// generated line 217 +// generated line 218 +// generated line 219 +// generated line 220 +// generated line 221 +// generated line 222 +// generated line 223 +// generated line 224 +// generated line 225 +// generated line 226 +// generated line 227 +// generated line 228 +// generated line 229 +// generated line 230 +// generated line 231 +// generated line 232 +// generated line 233 +// generated line 234 +// generated line 235 +// generated line 236 +// generated line 237 +// generated line 238 +// generated line 239 +// generated line 240 +// generated line 241 +// generated line 242 +// generated line 243 +// generated line 244 +// generated line 245 +// generated line 246 +// generated line 247 +// generated line 248 +// generated line 249 +// generated line 250 +// generated line 251 +// generated line 252 +// generated line 253 +// generated line 254 +// generated line 255 +// generated line 256 +// generated line 257 +// generated line 258 +// generated line 259 +// generated line 260 +// generated line 261 +// generated line 262 +// generated line 263 +// generated line 264 +// generated line 265 +// generated line 266 +// generated line 267 +// generated line 268 +// generated line 269 +// generated line 270 +// generated line 271 +// generated line 272 +// generated line 273 +// generated line 274 +// generated line 275 +// generated line 276 +// generated line 277 +// generated line 278 +// generated line 279 +// generated line 280 +// generated line 281 +// generated line 282 +// generated line 283 +// generated line 284 +// generated line 285 +// generated line 286 +// generated line 287 +// generated line 288 +// generated line 289 +// generated line 290 +// generated line 291 +// generated line 292 +// generated line 293 +// generated line 294 +// generated line 295 +// generated line 296 +// generated line 297 +// generated line 298 +// generated line 299 +// generated line 300 +// generated line 301 +// generated line 302 +// generated line 303 +// generated line 304 +// generated line 305 +// generated line 306 +// generated line 307 +// generated line 308 +// generated line 309 +// generated line 310 +// generated line 311 +// generated line 312 +// generated line 313 +// generated line 314 +// generated line 315 +// generated line 316 +// generated line 317 +// generated line 318 +// generated line 319 +// generated line 320 +// generated line 321 +// generated line 322 +// generated line 323 +// generated line 324 +// generated line 325 +// generated line 326 +// generated line 327 +// generated line 328 +// generated line 329 +// generated line 330 +// generated line 331 +// generated line 332 +// generated line 333 +// generated line 334 +// generated line 335 +// generated line 336 +// generated line 337 +// generated line 338 +// generated line 339 +// generated line 340 +// generated line 341 +// generated line 342 +// generated line 343 +// generated line 344 +// generated line 345 +// generated line 346 +// generated line 347 +// generated line 348 +// generated line 349 +// generated line 350 +// generated line 351 +// generated line 352 +// generated line 353 +// generated line 354 +// generated line 355 +// generated line 356 +// generated line 357 +// generated line 358 +// generated line 359 +// generated line 360 +// generated line 361 +// generated line 362 +// generated line 363 +// generated line 364 +// generated line 365 +// generated line 366 +// generated line 367 +// generated line 368 +// generated line 369 +// generated line 370 +// generated line 371 +// generated line 372 +// generated line 373 +// generated line 374 +// generated line 375 +// generated line 376 +// generated line 377 +// generated line 378 +// generated line 379 +// generated line 380 +// generated line 381 +// generated line 382 +// generated line 383 +// generated line 384 +// generated line 385 +// generated line 386 +// generated line 387 +// generated line 388 +// generated line 389 +// generated line 390 +// generated line 391 +// generated line 392 +// generated line 393 +// generated line 394 +// generated line 395 +// generated line 396 +// generated line 397 +// generated line 398 +// generated line 399 +// generated line 400 +// generated line 401 +// generated line 402 +// generated line 403 +// generated line 404 +// generated line 405 +// generated line 406 +// generated line 407 +// generated line 408 +// generated line 409 +// generated line 410 +// generated line 411 +// generated line 412 +// generated line 413 +// generated line 414 +// generated line 415 +// generated line 416 +// generated line 417 +// generated line 418 +// generated line 419 +// generated line 420 +// generated line 421 +// generated line 422 +// generated line 423 +// generated line 424 +// generated line 425 +// generated line 426 +// generated line 427 +// generated line 428 +// generated line 429 +// generated line 430 +// generated line 431 +// generated line 432 +// generated line 433 +// generated line 434 +// generated line 435 +// generated line 436 +// generated line 437 +// generated line 438 +// generated line 439 +// generated line 440 +// generated line 441 +// generated line 442 +// generated line 443 +// generated line 444 +// generated line 445 +// generated line 446 +// generated line 447 +// generated line 448 +// generated line 449 +// generated line 450 +// generated line 451 +// generated line 452 +// generated line 453 +// generated line 454 +// generated line 455 +// generated line 456 +// generated line 457 +// generated line 458 +// generated line 459 +// generated line 460 +// generated line 461 +// generated line 462 +// generated line 463 +// generated line 464 +// generated line 465 +// generated line 466 +// generated line 467 +// generated line 468 +// generated line 469 +// generated line 470 +// generated line 471 +// generated line 472 +// generated line 473 +// generated line 474 +// generated line 475 +// generated line 476 +// generated line 477 +// generated line 478 +// generated line 479 +// generated line 480 +// generated line 481 +// generated line 482 +// generated line 483 +// generated line 484 +// generated line 485 +// generated line 486 +// generated line 487 +// generated line 488 +// generated line 489 +// generated line 490 +// generated line 491 +// generated line 492 +// generated line 493 +// generated line 494 +// generated line 495 +// generated line 496 +// generated line 497 +// generated line 498 +// generated line 499 +// generated line 500 +// generated line 501 +// generated line 502 +// generated line 503 +// generated line 504 +// generated line 505 +// generated line 506 +// generated line 507 +// generated line 508 +// generated line 509 +// generated line 510 +// generated line 511 +// generated line 512 +// generated line 513 +// generated line 514 +// generated line 515 +// generated line 516 +// generated line 517 +// generated line 518 +// generated line 519 +// generated line 520 +// generated line 521 +// generated line 522 +// generated line 523 +// generated line 524 +// generated line 525 +// generated line 526 +// generated line 527 +// generated line 528 +// generated line 529 +// generated line 530 +// generated line 531 +// generated line 532 +// generated line 533 +// generated line 534 +// generated line 535 +// generated line 536 +// generated line 537 +// generated line 538 +// generated line 539 +// generated line 540 +// generated line 541 +// generated line 542 +// generated line 543 +// generated line 544 +// generated line 545 +// generated line 546 +// generated line 547 +// generated line 548 +// generated line 549 +// generated line 550 +// generated line 551 +// generated line 552 +// generated line 553 +// generated line 554 +// generated line 555 +// generated line 556 +// generated line 557 +// generated line 558 +// generated line 559 +// generated line 560 +// generated line 561 +// generated line 562 +// generated line 563 +// generated line 564 +// generated line 565 +// generated line 566 +// generated line 567 +// generated line 568 +// generated line 569 +// generated line 570 +// generated line 571 +// generated line 572 +// generated line 573 +// generated line 574 +// generated line 575 +// generated line 576 +// generated line 577 +// generated line 578 +// generated line 579 +// generated line 580 +// generated line 581 +// generated line 582 +// generated line 583 +// generated line 584 +// generated line 585 +// generated line 586 +// generated line 587 +// generated line 588 +// generated line 589 +// generated line 590 +// generated line 591 +// generated line 592 +// generated line 593 +// generated line 594 +// generated line 595 +// generated line 596 +// generated line 597 +// generated line 598 +// generated line 599 +// generated line 600 +// generated line 601 +// generated line 602 +// generated line 603 +// generated line 604 +// generated line 605 +// generated line 606 +// generated line 607 +// generated line 608 +// generated line 609 +// generated line 610 +// generated line 611 +// generated line 612 +// generated line 613 +// generated line 614 +// generated line 615 +// generated line 616 +// generated line 617 +// generated line 618 +// generated line 619 +// generated line 620 +// generated line 621 +// generated line 622 +// generated line 623 +// generated line 624 +// generated line 625 +// generated line 626 +// generated line 627 +// generated line 628 +// generated line 629 +// generated line 630 +// generated line 631 +// generated line 632 +// generated line 633 +// generated line 634 +// generated line 635 +// generated line 636 +// generated line 637 +// generated line 638 +// generated line 639 +// generated line 640 +// generated line 641 +// generated line 642 +// generated line 643 +// generated line 644 +// generated line 645 +// generated line 646 +// generated line 647 +// generated line 648 +// generated line 649 +// generated line 650 +// generated line 651 +// generated line 652 +// generated line 653 +// generated line 654 +// generated line 655 +// generated line 656 +// generated line 657 +// generated line 658 +// generated line 659 +// generated line 660 +// generated line 661 +// generated line 662 +// generated line 663 +// generated line 664 +// generated line 665 +// generated line 666 +// generated line 667 +// generated line 668 +// generated line 669 +// generated line 670 +// generated line 671 +// generated line 672 +// generated line 673 +// generated line 674 +// generated line 675 +// generated line 676 +// generated line 677 +// generated line 678 +// generated line 679 +// generated line 680 +// generated line 681 +// generated line 682 +// generated line 683 +// generated line 684 +// generated line 685 +// generated line 686 +// generated line 687 +// generated line 688 +// generated line 689 +// generated line 690 +// generated line 691 +// generated line 692 +// generated line 693 +// generated line 694 +// generated line 695 +// generated line 696 +// generated line 697 +// generated line 698 +// generated line 699 +// generated line 700 +// generated line 701 +// generated line 702 +// generated line 703 +// generated line 704 +// generated line 705 +// generated line 706 +// generated line 707 +// generated line 708 +// generated line 709 +// generated line 710 +// generated line 711 +// generated line 712 +// generated line 713 +// generated line 714 +// generated line 715 +// generated line 716 +// generated line 717 +// generated line 718 +// generated line 719 +// generated line 720 +// generated line 721 +// generated line 722 +// generated line 723 +// generated line 724 +// generated line 725 +// generated line 726 +// generated line 727 +// generated line 728 +// generated line 729 +// generated line 730 +// generated line 731 +// generated line 732 +// generated line 733 +// generated line 734 +// generated line 735 +// generated line 736 +// generated line 737 +// generated line 738 +// generated line 739 +// generated line 740 +// generated line 741 +// generated line 742 +// generated line 743 +// generated line 744 +// generated line 745 +// generated line 746 +// generated line 747 +// generated line 748 +// generated line 749 +// generated line 750 +// generated line 751 +// generated line 752 +// generated line 753 +// generated line 754 +// generated line 755 +// generated line 756 +// generated line 757 +// generated line 758 +// generated line 759 +// generated line 760 +// generated line 761 +// generated line 762 +// generated line 763 +// generated line 764 +// generated line 765 +// generated line 766 +// generated line 767 +// generated line 768 +// generated line 769 +// generated line 770 +// generated line 771 +// generated line 772 +// generated line 773 +// generated line 774 +// generated line 775 +// generated line 776 +// generated line 777 +// generated line 778 +// generated line 779 +// generated line 780 +// generated line 781 +// generated line 782 +// generated line 783 +// generated line 784 +// generated line 785 +// generated line 786 +// generated line 787 +// generated line 788 +// generated line 789 +// generated line 790 +// generated line 791 +// generated line 792 +// generated line 793 +// generated line 794 +// generated line 795 +// generated line 796 +// generated line 797 +// generated line 798 +// generated line 799 +// generated line 800 +// generated line 801 +// generated line 802 +// generated line 803 +// generated line 804 +// generated line 805 +// generated line 806 +// generated line 807 +// generated line 808 +// generated line 809 +// generated line 810 +// generated line 811 +// generated line 812 +// generated line 813 +// generated line 814 +// generated line 815 +// generated line 816 +// generated line 817 +// generated line 818 +// generated line 819 +// generated line 820 +// generated line 821 +// generated line 822 +// generated line 823 +// generated line 824 +// generated line 825 +// generated line 826 +// generated line 827 +// generated line 828 +// generated line 829 +// generated line 830 +// generated line 831 +// generated line 832 +// generated line 833 +// generated line 834 +// generated line 835 +// generated line 836 +// generated line 837 +// generated line 838 +// generated line 839 +// generated line 840 +// generated line 841 +// generated line 842 +// generated line 843 +// generated line 844 +// generated line 845 +// generated line 846 +// generated line 847 +// generated line 848 +// generated line 849 +// generated line 850 +// generated line 851 +// generated line 852 +// generated line 853 +// generated line 854 +// generated line 855 +// generated line 856 +// generated line 857 +// generated line 858 +// generated line 859 +// generated line 860 +// generated line 861 +// generated line 862 +// generated line 863 +// generated line 864 +// generated line 865 +// generated line 866 +// generated line 867 +// generated line 868 +// generated line 869 +// generated line 870 +// generated line 871 +// generated line 872 +// generated line 873 +// generated line 874 +// generated line 875 +// generated line 876 +// generated line 877 +// generated line 878 +// generated line 879 +// generated line 880 +// generated line 881 +// generated line 882 +// generated line 883 +// generated line 884 +// generated line 885 +// generated line 886 +// generated line 887 +// generated line 888 +// generated line 889 +// generated line 890 +// generated line 891 +// generated line 892 +// generated line 893 +// generated line 894 +// generated line 895 +// generated line 896 +// generated line 897 +// generated line 898 +// generated line 899 +// generated line 900 diff --git a/tests/fixtures/insecure-repo/src/server.js b/tests/fixtures/insecure-repo/src/server.js new file mode 100644 index 0000000..ba42f07 --- /dev/null +++ b/tests/fixtures/insecure-repo/src/server.js @@ -0,0 +1,8 @@ +// HACK: open CORS for now +res.setHeader("Access-Control-Allow-Origin", "*"); + +// AWS access key — should never ship +const AWS_KEY = "AKIAIOSFODNN7EXAMPLE"; + +app.post("/api/users", function(req, res) { /* no auth */ }); +app.delete("/api/admin", function(req, res) { /* no auth */ });