diff --git a/docs/PRD.md b/docs/PRD.md index 6e16dcb..bbfe0c4 100644 --- a/docs/PRD.md +++ b/docs/PRD.md @@ -9,6 +9,40 @@ estimates, library choices, and acceptance gates. --- +## Product vision — what we're actually building + +**The Go refactor isn't the goal. The goal is a small-model-driven autonomous pipeline that gets better with each run, with frontier models in audit/oversight and humans triaged in only for the genuinely abstract cases.** + +The Rust Lakehouse already has most of the pieces: +- **Pathway memory** (`internal/pathway` in Go, 88 Rust traces preserved) — what we tried, what worked +- **Matrix indexer** (SPEC §3.4) — multi-corpus retrieve+merge that gives the small model the right knowledge slice for *this* task +- **Observer** — watches runs, refines configs, escalates +- **Distillation v1.0.0** (`e7636f2`) — turns successful runs into denser playbooks +- **Auditor cross-lineage fabric** — Kimi/Haiku/Opus oversight on small-model outputs + +What the Go refactor is FOR: a second-language pass surfaces architectural weaknesses that Rust hid. The pipeline has to pull together cleanly *as a pipeline* — not as 15 crates that happen to interact. + +### The five-loop substrate + +1. **Knowledge pathway loop** — pathway memory + matrix indexer give the small model context for the task. Pathway answers "what worked last time?"; matrix answers "what's relevant now?" +2. **Execution loop** — small model runs on focused context. Frontier API calls are reserved for audit/escalation, not the inner loop. Cost + rate limits stay sane. +3. **Observer loop** — watches each run, refines the configs (matrix corpus picks, downgrade gate, prompt mold) that got the model to a good pathway. Outputs new config, not new prompt. +4. **Rating + distillation loop** — successful outcomes get scored and folded back into the playbook substrate. The playbook gets denser; the next run starts smarter. +5. **Drift loop** — quantify when the distilled playbook stops matching reality (codebase changed, contracts shifted, profiles updated). Drift is a *measured* signal, not "hope nothing broke." + +### The gate + +**The playbook + matrix indexer must produce the results we're looking for.** That's the single load-bearing acceptance criterion. Throughput, scaling, code elegance — all secondary. If a deep-field reality test on the 500K corpus surfaces wrong answers, the loop isn't working and we fix that before adding anything else. + +### Triage / human-in-loop + +Most cases are abstract enough that small-model + pathway + matrix can complete them. Some can't — they need a human. The system's job is to **identify which is which** and only escalate the second class. Frontier models partially solve this internally with their thinking loops; we're externalizing it so: +- Small models are swappable (vendor independence) +- Drift is measurable (quantitative signal, not vibes) +- Each loop iteration is auditable (the pathway memory IS the audit trail) + +This is what the auditor cross-lineage fabric proves out in Rust — Opus auto-promote on diffs >100k chars is the same pattern: triage by signal, not by guesswork. + ## Direction pivot — why this PRD exists The Rust-first Lakehouse (15 crates, ~24 unmerged commits past PR #11,