Claude (review-harness setup) a75e14716b Phase E — append-only memory + diff subcommand (PROMPT.md complete)
Closes the harness's feature set per PROMPT.md modes 2 (Diff Review)
and Phase 5 (Memory). Rules subcommand still pending (it needs
operator-authored .review-rules.md content first; documented as
Phase E follow-up).

internal/memory/ — append-only writer:
- AppendKnownRisks: one JSONL line per confirmed finding per run.
  O_APPEND only; never O_TRUNC. Empty findings list is a no-op
  (doesn't even create the file — keeps clean runs from polluting
  .memory/).
- AppendRunHistory: one JSONL line per run. Run summary stats +
  receipts hash for cross-link.
- WriteProjectProfile: the ONLY non-versioned memory file; snapshot
  semantics, overwrites are explicit + documented.
- 4 unit tests including TestAppendKnownRisks_NeverTruncates which
  is the audit's "no silent overwrite" gate — write twice, assert
  both writes' content survives.

Pipeline phase 5 wires it. Confirmed findings only — suspected
findings might still be wrong, keeping .memory/ authoritative.
Disabled if review-profile.memory.enabled = false.

internal/git/git.go — ChangedFiles helper:
- Probes unstaged + staged + branch diff against main/master.
- Dedup'd, stable order. Empty result on clean tree.
- Graceful failure: returns error if git binary missing or target
  isn't a git repo.

cli/repo.go — Diff subcommand:
- `review-harness diff <path>` runs the same pipeline as scrum but
  scoped to changed files only. Pipeline.Inputs gains DiffOnlyFiles
  filter applied post-Walk.
- Empty diff (clean tree, no commits ahead of base) → exit 0 with
  message; doesn't generate empty reports.
- LLM toggleable via --enable-llm same as scrum.

scanner/walk.go: added .memory to SkipDirs (universal — harness's
own audit trail, scanning it surfaces planted-secret evidence as
new findings — same class as B5 self-skip).

.gitignore tightened: /.memory/ → **/.memory/ to keep test-fixture
.memory dirs from leaking into version control (same fix as
reports/latest pattern).

Verified end-to-end:
- 4 memory unit tests PASS
- Append-only proven: insecure-repo run 1 → 16 known-risks lines;
  run 2 → 44 lines (16 + 28 from new run); run-history grew 1 → 2.
- Diff subcommand against this repo (5 uncommitted Phase E files
  staged) → exit 0, all reports produced, scoped to those 5 files
  only (0 findings on the diff-scoped scan vs 129 on full repo —
  changed files don't contain analyzer-flaggable patterns).

Phase A through E shipped today. Rules subcommand + tests for
internal/{config,scanner,git,llm,reporters,pipeline} remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 02:19:12 -05:00

145 lines
5.2 KiB
Go

// Package memory implements PROMPT.md Phase 5: append-only `.memory/`
// state that lets the harness build up knowledge across runs.
//
// The append-only constraint is non-optional. Operators can grep
// .memory/ to see how findings drifted run-to-run, prove no silent
// data loss, and reconstruct intermediate states. Every write goes
// through Append* helpers that open with O_APPEND only — no O_TRUNC,
// no os.Create. A regression test proves the constraint holds.
//
// Files written:
// .memory/known-risks.jsonl one line per confirmed finding per run;
// same finding ID across runs deduped
// in the reader, never silently dropped
// from the log
// .memory/run-history.jsonl one line per run; summary stats +
// receipts hash for cross-link
// .memory/project-profile.json overwritten — non-versioned snapshot
// of static repo facts (language mix,
// latest commit, etc.). Operator-readable.
package memory
import (
"encoding/json"
"fmt"
"os"
"path/filepath"
"time"
"local-review-harness/internal/analyzers"
)
// KnownRiskEntry is one append per confirmed finding per run.
type KnownRiskEntry struct {
RunID string `json:"run_id"`
WrittenAt string `json:"written_at"`
Finding analyzers.Finding `json:"finding"`
}
// RunHistoryEntry is one append per harness run.
type RunHistoryEntry struct {
RunID string `json:"run_id"`
RepoPath string `json:"repo_path"`
StartedAt string `json:"started_at"`
FinishedAt string `json:"finished_at"`
TotalFindings int `json:"total_findings"`
Confirmed int `json:"confirmed"`
Critical int `json:"critical"`
High int `json:"high"`
Medium int `json:"medium"`
Low int `json:"low"`
LLMEnabled bool `json:"llm_enabled"`
ExitCode int `json:"exit_code"`
ReceiptsHash string `json:"receipts_hash,omitempty"` // cross-link
}
// ProjectProfile is the only non-versioned memory file. Overwrites OK
// — it's a snapshot, not a log. The append-only contract applies to
// known-risks + run-history.
type ProjectProfile struct {
RepoPath string `json:"repo_path"`
LastSeenAt string `json:"last_seen_at"`
LastSeenCommit string `json:"last_seen_commit,omitempty"`
LanguageBreakdown map[string]int `json:"language_breakdown"`
FileCount int `json:"file_count"`
}
// Writer is the append-only memory writer. Holds a base path so
// every method writes under the same .memory/ root. Stateless; safe
// for concurrent use (each Append opens its own fd).
type Writer struct {
dir string
}
// NewWriter constructs a Writer rooted at <repoPath>/.memory/. The
// dir is created on demand. Operators who want a different location
// can override via review-profile.memory.path (Phase E follow-up).
func NewWriter(repoPath string) (*Writer, error) {
dir := filepath.Join(repoPath, ".memory")
if err := os.MkdirAll(dir, 0o755); err != nil {
return nil, err
}
return &Writer{dir: dir}, nil
}
// AppendKnownRisks appends one JSONL line per confirmed finding.
// Append-only: opens with O_APPEND|O_CREATE|O_WRONLY. NEVER opens
// with O_TRUNC. Truncation is the failure mode this package exists
// to prevent.
func (w *Writer) AppendKnownRisks(runID string, findings []analyzers.Finding) error {
if len(findings) == 0 {
return nil
}
f, err := os.OpenFile(filepath.Join(w.dir, "known-risks.jsonl"),
os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
if err != nil {
return fmt.Errorf("open known-risks: %w", err)
}
defer f.Close()
enc := json.NewEncoder(f)
now := time.Now().UTC().Format(time.RFC3339Nano)
for _, finding := range findings {
entry := KnownRiskEntry{
RunID: runID,
WrittenAt: now,
Finding: finding,
}
if err := enc.Encode(&entry); err != nil {
return fmt.Errorf("encode known-risk: %w", err)
}
}
return nil
}
// AppendRunHistory writes one JSONL line for the run. Same append-
// only constraint as known-risks.
func (w *Writer) AppendRunHistory(entry RunHistoryEntry) error {
f, err := os.OpenFile(filepath.Join(w.dir, "run-history.jsonl"),
os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
if err != nil {
return fmt.Errorf("open run-history: %w", err)
}
defer f.Close()
if err := json.NewEncoder(f).Encode(&entry); err != nil {
return fmt.Errorf("encode run-history: %w", err)
}
return nil
}
// WriteProjectProfile overwrites .memory/project-profile.json. This
// is the ONLY memory file that's not append-only — it's a snapshot
// of current state, not a log. Operators wanting historical profiles
// can grep run-history.jsonl which carries the receipts hash.
func (w *Writer) WriteProjectProfile(p ProjectProfile) error {
bs, err := json.MarshalIndent(p, "", " ")
if err != nil {
return err
}
bs = append(bs, '\n')
return os.WriteFile(filepath.Join(w.dir, "project-profile.json"), bs, 0o644)
}
// MemoryDir returns the absolute .memory/ path for the writer.
// Useful in receipts so operators can find the JSONL files.
func (w *Writer) MemoryDir() string { return w.dir }