pathway_memory: consensus-designed sidecar + hot-swap learning loop

10-probe N=3 consensus (kimi-k2:1t / gpt-oss:120b / qwen3.5:latest / deepseek-v3.1:671b / qwen3-coder:480b / mistral-large-3:675b / qwen3.5:397b + 2 stability re-probes; 2 openrouter probes 429'd) locked the design across three rounds. Full JSON responses in data/_kb/consensus_reducer_design_{mocq3akn,mocq6pi1,mocqatik}.json. What it does Preserves FULL backtrack context per reviewed file (ladder attempts + latencies + reject reasons, KB chunks with provenance + cosine + rank, observer signals, context7 bridge hits, sub-pipeline calls, audit consensus) and indexes them by narrow fingerprint for hot-swap of proven review pathways. When scrum reviews a file: 1. narrow fingerprint = task_class + file_prefix + signal_class 2. query_hot_swap checks pathway memory for a match that passes probation (≥3 replays @ ≥80% success) + audit gate + similarity (≥0.90 cosine on normalized-metadata-token embedding) 3. if hot-swap eligible, recommended model tried first in the ladder 4. replay outcome reported back, updating the pathway's success_rate 5. pathways below 0.80 after ≥3 replays retire permanently (sticky) 6. full PathwayTrace always inserted at end of review — hot-swap grows with use, it doesn't bootstrap from nothing Gate design is load-bearing: - narrow fingerprint (6 of 8 consensus models converged on the same 3-field composition; lock) — enables generalization within crate - probation ≥3 replays — binomial tail at 80% is ~5%, below is noise - success rate ≥0.80 — mistral + qwen3-coder independently proposed this exact threshold across two rounds - similarity ≥0.90 — middle of the 0.85/0.95 consensus spread - bootstrap: null audit_consensus ALLOWED (auditor → pathway update not wired yet; probation + success_rate gates alone enforce safety during bootstrap; explicit audit FAIL still blocks) - retirement is sticky — prevents oscillation on noise Files + crates/vectord/src/pathway_memory.rs (new, 600 lines + 18 tests) PathwayTrace, LadderAttempt, KbChunkRef, ObserverSignal, BridgeHit, SubPipelineCall, AuditConsensus, HotSwapCandidate, PathwayMemory, PathwayMemoryStats. 18/18 tests green. Cosine + 32-bucket L2-normalized embedding; mirror of TS impl. M crates/vectord/src/lib.rs pub mod pathway_memory; M crates/vectord/src/service.rs VectorState grows pathway_memory field; 4 HTTP handlers (/pathway/insert, /pathway/query, /pathway/record_replay, /pathway/stats). M crates/gateway/src/main.rs Construct PathwayMemory + load from storage on boot, wire into VectorState. M tests/real-world/scrum_master_pipeline.ts Byte-matching TS bucket-hash (verified same bucket indices as Rust); pre-ladder hot-swap query; ladder reorder on hit; per-attempt latency capture; post-accept trace insert (fire-and-forget); replay outcome recording; observer /event emits pathway_hot_swap_hit, pathway_similarity, rungs_saved per review for the VCP UI. M ui/server.ts /data/pathway_stats aggregates /vectors/pathway/stats + scrum_reviews.jsonl window for the value metric. M ui/ui.js Three new metric cards: · pathway reuse rate (activity: is it firing?) · avg rungs saved (value: is it earning its keep?) · pathways tracked (stability: retirement = learning) What's not in this commit (queued) - auditor → pathway audit_consensus update wire (explicit audit-fail block activates when this lands) - bridge_hits + sub_pipeline_calls population from context7 / LLM Team extract results (fields wired, callers not yet) - replay log (PathwayReplayOutcome {matched_id, succeeded, ts}) as a separate jsonl for forensic audit of why specific replays failed Why > summarization Summaries discard the causal chain. With this, auditor can verify citation provenance, applier can distinguish lucky from learned paths, and the matrix indexing actually stores end-to-end pathways instead of just RAG chunks — which is what J meant by "why aren't we using it for everything." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 05:15:32 -05:00 · 2026-04-24 05:15:32 -05:00 · 2f8b347f37
commit 2f8b347f37
parent 9cc0ceb894
8 changed files with 1120 additions and 3 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
@ -6898,6 +6898,7 @@ dependencies = [
 "storaged",
 "tokio",
 "tracing",
 "truth",
 "url",
 ]
--- a/crates/gateway/src/main.rs
+++ b/crates/gateway/src/main.rs
@ -93,6 +93,12 @@ async fn main() {
    // operators call POST /vectors/playbook_memory/rebuild to populate.
    let pbm = vectord::playbook_memory::PlaybookMemory::new(store.clone());
    let _ = pbm.load_from_storage().await;
    // Pathway memory — consensus-designed sidecar for full-context
    // backtracking + hot-swap of successful review pathways. Same
    // load-on-boot pattern as playbook_memory: empty state is fine,
    // operators start populating via scrum_master_pipeline.ts.
    let pwm = vectord::pathway_memory::PathwayMemory::new(store.clone());
    let _ = pwm.load_from_storage().await;
    // Phase 16.2: spawn the autotune agent. When config.agent.enabled=false
    // this returns a handle that drops triggers silently — no surprise load.
@ -178,6 +184,7 @@ async fn main() {
                bucket_registry.clone(), index_reg.clone(),
            ),
            playbook_memory: pbm,
            pathway_memory: pwm,
            embed_semaphore: std::sync::Arc::new(tokio::sync::Semaphore::new(1)),
        }))
        .nest("/workspaces", queryd::workspace_service::router(workspace_mgr))
--- a/crates/vectord/src/lib.rs
+++ b/crates/vectord/src/lib.rs
@ -8,6 +8,7 @@ pub mod hnsw;
 pub mod index_registry;
 pub mod jobs;
 pub mod playbook_memory;
 pub mod pathway_memory;
 pub mod doc_drift;
 pub mod promotion;
 pub mod refresh;
--- a/crates/vectord/src/pathway_memory.rs
+++ b/crates/vectord/src/pathway_memory.rs
@ -0,0 +1,704 @@
 //! Pathway memory — full backtrack-able context for scrum/auditor reviews.
 //!
 //! Consensus-designed (10-probe N=3 ensemble, see
 //! `data/_kb/consensus_reducer_design_*.json`). The reducer emits a
 //! `PathwayTrace` sidecar alongside its legacy summary. Traces are
 //! fingerprinted narrowly (`task_class + file_prefix + signal_class`) for
 //! generalizing hot-swap, and embedded via normalized-metadata-token
 //! concatenation so the HNSW similarity search can discriminate between
 //! pathways that share a fingerprint but diverged in ladder/KB choices.
 //!
 //! The hot-swap decision requires four conditions in AND:
 //!   1. narrow fingerprint match
 //!   2. audit_consensus.pass == true
 //!   3. replay_count >= 3
 //!   4. replays_succeeded / replay_count >= 0.80
 //!   5. NOT retired
 //!   6. similarity(new, stored) >= 0.90
 //!
 //! Any replay reports its outcome via `record_replay_outcome`; pathways
 //! whose success rate drops below 0.80 after >=3 replays are marked
 //! retired and excluded from further hot-swap consideration. This is the
 //! self-correcting learning loop — a pathway that worked once but breaks
 //! under distribution shift removes itself automatically.
 use std::collections::HashMap;
 use std::sync::Arc;
 use chrono::{DateTime, Utc};
 use object_store::ObjectStore;
 use serde::{Deserialize, Serialize};
 use sha2::{Digest, Sha256};
 use storaged::ops;
 use tokio::sync::RwLock;
 const STATE_KEY: &str = "_pathway_memory/state.json";
 /// Outcome of one ladder rung attempt. Captured for every attempt,
 /// regardless of whether it was accepted — rejections are signal too.
 #[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
 pub struct LadderAttempt {
    pub rung: u8,
    pub model: String,
    pub latency_ms: u64,
    pub accepted: bool,
    pub reject_reason: Option<String>,
 }
 /// Provenance of a RAG chunk retrieved for this review. The
 /// `cosine_score` is the similarity as returned by the index; `rank` is
 /// 0-indexed order in the top-K result list.
 #[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
 pub struct KbChunkRef {
    pub source_doc: String,
    pub chunk_id: String,
    pub cosine_score: f32,
    pub rank: u8,
 }
 /// Signal emitted by mcp-server/observer classifier.
 #[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
 pub struct ObserverSignal {
    pub class: String,
    pub priors: Vec<String>,
    pub prior_iter_outcomes: Vec<String>,
 }
 /// Context7-bridge lookup snapshot.
 #[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
 pub struct BridgeHit {
    pub library: String,
    pub version: String,
 }
 /// Call to LLM Team (/api/run?mode=extract) or auditor N=3 consensus.
 #[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
 pub struct SubPipelineCall {
    pub pipeline: String, // "llm_team_extract" / "audit_consensus" / etc.
    pub result_summary: String,
 }
 /// N=3 independent consensus re-check result.
 #[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
 pub struct AuditConsensus {
    pub pass: bool,
    pub models: Vec<String>,
    pub disagreements: u32,
 }
 /// Full backtrack-able context for one reviewed file. Lives alongside
 /// the reducer's summary — summary is what the reviewer LLM sees, this
 /// is what the auditor / future iterations / hot-swap use.
 #[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
 pub struct PathwayTrace {
    pub pathway_id: String, // SHA256(task_class|file_prefix|signal_class)
    pub task_class: String,
    pub file_path: String,
    pub signal_class: Option<String>,
    pub created_at: DateTime<Utc>,
    pub ladder_attempts: Vec<LadderAttempt>,
    pub kb_chunks: Vec<KbChunkRef>,
    pub observer_signals: Vec<ObserverSignal>,
    pub bridge_hits: Vec<BridgeHit>,
    pub sub_pipeline_calls: Vec<SubPipelineCall>,
    pub audit_consensus: Option<AuditConsensus>,
    pub reducer_summary: String,
    pub final_verdict: String,
    /// Normalized-metadata-token embedding. Dimension fixed per index
    /// version (current: 32, sufficient to distinguish task/file/signal
    /// combinations without requiring an external embedding model —
    /// round-3 consensus said "small metadata tokens", not "full JSON").
    pub pathway_vec: Vec<f32>,
    /// Number of times this pathway has been replayed via hot-swap.
    /// Replay only begins after first insert; initial insert itself is
    /// NOT a replay. Probation of ≥3 replays is required before the
    /// success-rate gate can fire.
    pub replay_count: u32,
    pub replays_succeeded: u32,
    /// Marked true when replay_count >= 3 AND success_rate < 0.80.
    /// Retired pathways are excluded from hot-swap forever. (If the
    /// underlying file / task / signal characteristics genuinely change
    /// such that a retired pathway would work again, a new PathwayTrace
    /// with a fresh id will be inserted — retirement is per-id.)
    pub retired: bool,
 }
 impl PathwayTrace {
    /// Compute the narrow fingerprint id from task_class + file_prefix
    /// + signal_class. `file_prefix` is the first path segment
    /// ("crates/queryd", not "crates/queryd/src/service.rs") so that
    /// related files in the same crate share pathways.
    pub fn compute_id(task_class: &str, file_path: &str, signal_class: Option<&str>) -> String {
        let prefix = file_prefix(file_path);
        let sig = signal_class.unwrap_or("");
        let mut hasher = Sha256::new();
        hasher.update(task_class.as_bytes());
        hasher.update(b"|");
        hasher.update(prefix.as_bytes());
        hasher.update(b"|");
        hasher.update(sig.as_bytes());
        format!("{:x}", hasher.finalize())
    }
    pub fn success_rate(&self) -> f32 {
        if self.replay_count == 0 {
            return 0.0;
        }
        self.replays_succeeded as f32 / self.replay_count as f32
    }
 }
 /// First two path segments, so `crates/queryd/src/service.rs` →
 /// `crates/queryd`. This is intentional — similar files in the same
 /// crate often share task characteristics (e.g., all files in
 /// `crates/queryd/` are SQL-path Rust code), so fingerprinting on the
 /// crate-level prefix lets the hot-swap generalize across files within
 /// the crate. Exactly-matching file paths still match (same prefix).
 pub fn file_prefix(path: &str) -> String {
    let parts: Vec<&str> = path.split('/').take(2).collect();
    parts.join("/")
 }
 /// Build the pathway vector from trace metadata. Intentionally simple —
 /// deterministic bag-of-tokens hash into 32 buckets, normalized. Round-3
 /// consensus said "small metadata tokens, not full JSON." An external
 /// embedding model would work too but adds a dependency, failure mode,
 /// and drift risk the consensus flagged.
 pub fn build_pathway_vec(trace: &PathwayTrace) -> Vec<f32> {
    let mut buckets = vec![0f32; 32];
    let mut tokens: Vec<String> = Vec::new();
    tokens.push(trace.task_class.clone());
    tokens.push(trace.file_path.clone());
    if let Some(s) = &trace.signal_class {
        tokens.push(format!("signal:{s}"));
    }
    for a in &trace.ladder_attempts {
        tokens.push(format!("rung:{}", a.rung));
        tokens.push(format!("model:{}", a.model));
        tokens.push(format!("accepted:{}", a.accepted));
    }
    for k in &trace.kb_chunks {
        tokens.push(format!("kb:{}", k.source_doc));
    }
    for o in &trace.observer_signals {
        tokens.push(format!("class:{}", o.class));
    }
    for b in &trace.bridge_hits {
        tokens.push(format!("lib:{}", b.library));
    }
    for s in &trace.sub_pipeline_calls {
        tokens.push(format!("pipeline:{}", s.pipeline));
    }
    for t in &tokens {
        let mut h = Sha256::new();
        h.update(t.as_bytes());
        let d = h.finalize();
        // Two bucket writes per token: use different byte windows to
        // spread probability across buckets even when tokens share a
        // common prefix.
        let b1 = (d[0] as usize) % 32;
        let b2 = (d[8] as usize) % 32;
        buckets[b1] += 1.0;
        buckets[b2] += 1.0;
    }
    // L2 normalize so cosine similarity becomes a dot product.
    let norm: f32 = buckets.iter().map(|v| v * v).sum::<f32>().sqrt();
    if norm > 0.0 {
        for v in &mut buckets {
            *v /= norm;
        }
    }
    buckets
 }
 pub fn cosine(a: &[f32], b: &[f32]) -> f32 {
    if a.len() != b.len() {
        return 0.0;
    }
    a.iter().zip(b.iter()).map(|(x, y)| x * y).sum::<f32>()
 }
 #[derive(Default, Clone, Serialize, Deserialize)]
 struct PathwayMemoryState {
    pathways: HashMap<String, Vec<PathwayTrace>>, // key = pathway_id (narrow fingerprint)
    last_updated_at: i64,
 }
 #[derive(Clone)]
 pub struct PathwayMemory {
    state: Arc<RwLock<PathwayMemoryState>>,
    store: Arc<dyn ObjectStore>,
 }
 #[derive(Debug, Serialize)]
 pub struct HotSwapCandidate {
    pub pathway_id: String,
    pub similarity: f32,
    pub replay_count: u32,
    pub success_rate: f32,
    pub recommended_rung: u8,
    pub recommended_model: String,
 }
 impl PathwayMemory {
    pub fn new(store: Arc<dyn ObjectStore>) -> Self {
        Self {
            state: Arc::new(RwLock::new(PathwayMemoryState::default())),
            store,
        }
    }
    pub async fn load_from_storage(&self) -> Result<usize, String> {
        let data = match ops::get(&self.store, STATE_KEY).await {
            Ok(d) => d,
            Err(_) => return Ok(0),
        };
        let persisted: PathwayMemoryState = serde_json::from_slice(&data)
            .map_err(|e| format!("parse pathway_memory state: {e}"))?;
        let n: usize = persisted.pathways.values().map(|v| v.len()).sum();
        *self.state.write().await = persisted;
        tracing::info!("pathway_memory: loaded {n} traces from {STATE_KEY}");
        Ok(n)
    }
    async fn persist(&self) -> Result<(), String> {
        let snapshot = self.state.read().await.clone();
        let bytes = serde_json::to_vec_pretty(&snapshot).map_err(|e| e.to_string())?;
        ops::put(&self.store, STATE_KEY, bytes.into()).await
    }
    /// Insert a new pathway trace. Called by scrum_master_pipeline at
    /// the end of each file's review. Computes the pathway_vec from
    /// metadata if the caller didn't supply one. Appends to the bucket
    /// for this pathway_id — multiple traces can share a fingerprint
    /// (each represents one review of the same file/task/signal combo).
    pub async fn insert(&self, mut trace: PathwayTrace) -> Result<(), String> {
        if trace.pathway_vec.is_empty() {
            trace.pathway_vec = build_pathway_vec(&trace);
        }
        let mut s = self.state.write().await;
        s.pathways
            .entry(trace.pathway_id.clone())
            .or_default()
            .push(trace);
        s.last_updated_at = Utc::now().timestamp_millis();
        drop(s);
        self.persist().await
    }
    /// Query for a hot-swap candidate. Returns `None` if no eligible
    /// pathway exists — caller should run the full ladder. Returns
    /// `Some(cand)` if all gates pass — caller can short-circuit to
    /// `cand.recommended_rung` / `cand.recommended_model`.
    ///
    /// Gates (all must hold):
    ///  - narrow fingerprint match (same task/file_prefix/signal)
    ///  - audit_consensus.pass == true on the stored trace
    ///  - replay_count >= 3 (probation)
    ///  - success_rate >= 0.80
    ///  - NOT retired
    ///  - similarity(query_vec, stored.pathway_vec) >= 0.90
    pub async fn query_hot_swap(
        &self,
        task_class: &str,
        file_path: &str,
        signal_class: Option<&str>,
        query_vec: &[f32],
    ) -> Option<HotSwapCandidate> {
        let id = PathwayTrace::compute_id(task_class, file_path, signal_class);
        let s = self.state.read().await;
        let candidates = s.pathways.get(&id)?;
        let mut best: Option<(f32, &PathwayTrace)> = None;
        for p in candidates {
            if p.retired {
                continue;
            }
            // audit_consensus gate: explicit FAIL blocks hot-swap. A null
            // audit_consensus (auditor hasn't seen this pathway yet) is
            // NOT a block — the success_rate gate below still requires
            // ≥3 real-world replays at ≥80% success before a pathway
            // becomes hot-swap eligible, so the learning loop itself
            // provides the safety net during bootstrap. Once the auditor
            // pipeline wires pathway audit updates, this gate tightens
            // automatically: any explicit audit_consensus.pass == false
            // here will skip the candidate.
            if let Some(ac) = &p.audit_consensus {
                if !ac.pass {
                    continue;
                }
            }
            if p.replay_count < 3 {
                continue;
            }
            if p.success_rate() < 0.80 {
                continue;
            }
            let sim = cosine(query_vec, &p.pathway_vec);
            if sim < 0.90 {
                continue;
            }
            if best.as_ref().map(|(b, _)| sim > *b).unwrap_or(true) {
                best = Some((sim, p));
            }
        }
        let (similarity, p) = best?;
        // The "recommended" rung is the first accepted attempt in the
        // stored pathway — that's the one the ladder converged on.
        let accepted = p.ladder_attempts.iter().find(|a| a.accepted)?;
        Some(HotSwapCandidate {
            pathway_id: p.pathway_id.clone(),
            similarity,
            replay_count: p.replay_count,
            success_rate: p.success_rate(),
            recommended_rung: accepted.rung,
            recommended_model: accepted.model.clone(),
        })
    }
    /// Record the outcome of a hot-swap replay. Increments replay_count
    /// unconditionally; increments replays_succeeded iff succeeded;
    /// retires the pathway if replay_count >= 3 and success_rate falls
    /// below 0.80. Mistral's learning loop in code.
    pub async fn record_replay_outcome(
        &self,
        pathway_id: &str,
        succeeded: bool,
    ) -> Result<(), String> {
        let mut s = self.state.write().await;
        // Find the specific pathway across the bucket that matches by
        // full id (the bucket key is already the narrow id, but in case
        // of future multi-trace-per-id we take the most recent).
        let bucket = s
            .pathways
            .iter_mut()
            .find(|(k, _)| k.as_str() == pathway_id)
            .map(|(_, v)| v)
            .ok_or_else(|| format!("pathway {pathway_id} not found"))?;
        let p = bucket
            .last_mut()
            .ok_or_else(|| format!("pathway {pathway_id} has empty bucket"))?;
        p.replay_count = p.replay_count.saturating_add(1);
        if succeeded {
            p.replays_succeeded = p.replays_succeeded.saturating_add(1);
        }
        if p.replay_count >= 3 && p.success_rate() < 0.80 {
            p.retired = true;
        }
        s.last_updated_at = Utc::now().timestamp_millis();
        drop(s);
        self.persist().await
    }
    pub async fn stats(&self) -> PathwayMemoryStats {
        let s = self.state.read().await;
        let mut total = 0usize;
        let mut retired = 0usize;
        let mut with_audit_pass = 0usize;
        let mut total_replays = 0u64;
        let mut successful_replays = 0u64;
        for bucket in s.pathways.values() {
            for p in bucket {
                total += 1;
                if p.retired {
                    retired += 1;
                }
                if p.audit_consensus.as_ref().map(|a| a.pass).unwrap_or(false) {
                    with_audit_pass += 1;
                }
                total_replays += p.replay_count as u64;
                successful_replays += p.replays_succeeded as u64;
            }
        }
        PathwayMemoryStats {
            total_pathways: total,
            retired,
            with_audit_pass,
            total_replays,
            successful_replays,
            reuse_rate: if total == 0 {
                0.0
            } else {
                total_replays as f32 / total as f32
            },
            replay_success_rate: if total_replays == 0 {
                0.0
            } else {
                successful_replays as f32 / total_replays as f32
            },
        }
    }
 }
 #[derive(Debug, Serialize, Deserialize)]
 pub struct PathwayMemoryStats {
    pub total_pathways: usize,
    pub retired: usize,
    pub with_audit_pass: usize,
    pub total_replays: u64,
    pub successful_replays: u64,
    pub reuse_rate: f32,          // total_replays / total_pathways
    pub replay_success_rate: f32, // successful_replays / total_replays
 }
 #[cfg(test)]
 mod tests {
    use super::*;
    use object_store::memory::InMemory;
    fn mk_store() -> Arc<dyn ObjectStore> {
        Arc::new(InMemory::new())
    }
    fn mk_trace(id_tag: &str, audit_pass: bool, replays: u32, succ: u32) -> PathwayTrace {
        let pathway_id =
            PathwayTrace::compute_id("scrum_review", &format!("crates/{id_tag}/src/x.rs"), Some("CONVERGING"));
        let attempts = vec![LadderAttempt {
            rung: 2,
            model: "qwen3-coder:480b".into(),
            latency_ms: 1000,
            accepted: true,
            reject_reason: None,
        }];
        let mut trace = PathwayTrace {
            pathway_id,
            task_class: "scrum_review".into(),
            file_path: format!("crates/{id_tag}/src/x.rs"),
            signal_class: Some("CONVERGING".into()),
            created_at: Utc::now(),
            ladder_attempts: attempts,
            kb_chunks: vec![KbChunkRef {
                source_doc: "PRD.md".into(),
                chunk_id: "c1".into(),
                cosine_score: 0.88,
                rank: 0,
            }],
            observer_signals: vec![],
            bridge_hits: vec![],
            sub_pipeline_calls: vec![],
            audit_consensus: Some(AuditConsensus {
                pass: audit_pass,
                models: vec!["qwen3-coder:480b".into(), "gpt-oss:120b".into(), "kimi-k2:1t".into()],
                disagreements: 0,
            }),
            reducer_summary: "ok".into(),
            final_verdict: "accepted".into(),
            pathway_vec: vec![],
            replay_count: replays,
            replays_succeeded: succ,
            retired: false,
        };
        trace.pathway_vec = build_pathway_vec(&trace);
        trace
    }
    #[test]
    fn file_prefix_takes_first_two_segments() {
        assert_eq!(file_prefix("crates/queryd/src/service.rs"), "crates/queryd");
        assert_eq!(file_prefix("crates/gateway"), "crates/gateway");
        assert_eq!(file_prefix("README.md"), "README.md");
        assert_eq!(file_prefix(""), "");
    }
    #[test]
    fn compute_id_is_deterministic() {
        let a = PathwayTrace::compute_id("scrum", "crates/queryd/src/x.rs", Some("LOOPING"));
        let b = PathwayTrace::compute_id("scrum", "crates/queryd/src/x.rs", Some("LOOPING"));
        assert_eq!(a, b);
    }
    #[test]
    fn compute_id_generalizes_across_same_prefix() {
        // Same prefix + task + signal → same id. That IS the narrow
        // generalization — it's what lets hot-swap fire for different
        // files in the same crate that share the task/signal profile.
        let a = PathwayTrace::compute_id("scrum", "crates/queryd/src/a.rs", Some("L"));
        let b = PathwayTrace::compute_id("scrum", "crates/queryd/src/b.rs", Some("L"));
        assert_eq!(a, b);
    }
    #[test]
    fn compute_id_differs_on_signal_class() {
        let a = PathwayTrace::compute_id("scrum", "crates/q/s", Some("CONVERGING"));
        let b = PathwayTrace::compute_id("scrum", "crates/q/s", Some("LOOPING"));
        assert_ne!(a, b);
    }
    #[test]
    fn cosine_handles_mismatched_lengths() {
        assert_eq!(cosine(&[1.0, 0.0], &[1.0]), 0.0);
    }
    #[test]
    fn cosine_of_identical_normalized_is_one() {
        let v = vec![0.6, 0.8];
        let c = cosine(&v, &v);
        assert!((c - 1.0).abs() < 1e-5);
    }
    #[test]
    fn success_rate_is_zero_before_any_replay() {
        let t = mk_trace("a", true, 0, 0);
        assert_eq!(t.success_rate(), 0.0);
    }
    #[test]
    fn success_rate_ratio() {
        let t = mk_trace("a", true, 4, 3);
        assert!((t.success_rate() - 0.75).abs() < 1e-5);
    }
    #[tokio::test]
    async fn insert_and_stats_roundtrip() {
        let mem = PathwayMemory::new(mk_store());
        mem.insert(mk_trace("a", true, 0, 0)).await.unwrap();
        let stats = mem.stats().await;
        assert_eq!(stats.total_pathways, 1);
        assert_eq!(stats.retired, 0);
        assert_eq!(stats.with_audit_pass, 1);
    }
    #[tokio::test]
    async fn hot_swap_rejects_when_probation_not_met() {
        // Probation: replay_count must be >= 3 before success-rate gate
        // can fire. A fresh pathway with 0 replays must NEVER hot-swap
        // even if its similarity is 1.0 and audit passes.
        let mem = PathwayMemory::new(mk_store());
        let trace = mk_trace("a", true, 0, 0);
        let qvec = trace.pathway_vec.clone();
        mem.insert(trace).await.unwrap();
        let got = mem
            .query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
            .await;
        assert!(got.is_none(), "fresh pathway must not hot-swap");
    }
    #[tokio::test]
    async fn hot_swap_rejects_when_audit_explicitly_fails() {
        let mem = PathwayMemory::new(mk_store());
        let trace = mk_trace("a", false, 5, 5); // audit FAILED explicitly
        let qvec = trace.pathway_vec.clone();
        mem.insert(trace).await.unwrap();
        let got = mem
            .query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
            .await;
        assert!(got.is_none(), "pathway with explicit audit FAIL must not hot-swap");
    }
    #[tokio::test]
    async fn hot_swap_accepts_unaudited_pathway_for_bootstrap() {
        // v1 bootstrap: auditor doesn't update pathway audit_consensus
        // until Phase N+1 wires it. Until then, null audit_consensus
        // must NOT block hot-swap — the success_rate + probation gates
        // alone prove safety. Once auditor wires up, explicit audit
        // failures will re-introduce the block (see previous test).
        let mem = PathwayMemory::new(mk_store());
        let mut trace = mk_trace("a", true, 5, 5);
        trace.audit_consensus = None; // bootstrap path
        trace.pathway_vec = build_pathway_vec(&trace);
        let qvec = trace.pathway_vec.clone();
        mem.insert(trace).await.unwrap();
        let got = mem
            .query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
            .await;
        assert!(got.is_some(), "unaudited pathway with good replay history must hot-swap");
    }
    #[tokio::test]
    async fn hot_swap_rejects_when_success_rate_below_80pct() {
        // 10 replays, 7 succeeded = 70% — below the 0.80 threshold.
        let mem = PathwayMemory::new(mk_store());
        let trace = mk_trace("a", true, 10, 7);
        let qvec = trace.pathway_vec.clone();
        mem.insert(trace).await.unwrap();
        let got = mem
            .query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
            .await;
        assert!(got.is_none());
    }
    #[tokio::test]
    async fn hot_swap_accepts_when_all_gates_pass() {
        let mem = PathwayMemory::new(mk_store());
        let trace = mk_trace("a", true, 5, 5); // 100% success after 5 replays
        let qvec = trace.pathway_vec.clone();
        mem.insert(trace).await.unwrap();
        let got = mem
            .query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
            .await;
        let cand = got.expect("should hot-swap");
        assert!(cand.similarity >= 0.90);
        assert_eq!(cand.recommended_rung, 2);
        assert_eq!(cand.recommended_model, "qwen3-coder:480b");
    }
    #[tokio::test]
    async fn record_replay_retires_pathway_on_failure_pattern() {
        let mem = PathwayMemory::new(mk_store());
        let trace = mk_trace("a", true, 0, 0);
        let pid = trace.pathway_id.clone();
        mem.insert(trace).await.unwrap();
        // Three replays, all fail → success_rate = 0.0 → retired.
        mem.record_replay_outcome(&pid, false).await.unwrap();
        mem.record_replay_outcome(&pid, false).await.unwrap();
        mem.record_replay_outcome(&pid, false).await.unwrap();
        let stats = mem.stats().await;
        assert_eq!(stats.retired, 1, "3 failures after insert must retire");
    }
    #[tokio::test]
    async fn record_replay_does_not_retire_before_probation() {
        let mem = PathwayMemory::new(mk_store());
        let trace = mk_trace("a", true, 0, 0);
        let pid = trace.pathway_id.clone();
        mem.insert(trace).await.unwrap();
        // Two replays (below probation of 3), both fail. Should NOT
        // retire yet — probation requires minimum 3 data points.
        mem.record_replay_outcome(&pid, false).await.unwrap();
        mem.record_replay_outcome(&pid, false).await.unwrap();
        let stats = mem.stats().await;
        assert_eq!(stats.retired, 0, "only 2 replays → below probation floor");
    }
    #[tokio::test]
    async fn retired_pathway_never_hot_swaps_again() {
        let mem = PathwayMemory::new(mk_store());
        let trace = mk_trace("a", true, 0, 0);
        let pid = trace.pathway_id.clone();
        let qvec = trace.pathway_vec.clone();
        mem.insert(trace).await.unwrap();
        for _ in 0..3 {
            mem.record_replay_outcome(&pid, false).await.unwrap();
        }
        // Now record 10 successes to push success_rate well above 0.80.
        // Pathway is still retired — retirement is sticky by design, to
        // prevent oscillation on noise.
        for _ in 0..10 {
            mem.record_replay_outcome(&pid, true).await.unwrap();
        }
        let got = mem
            .query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
            .await;
        assert!(got.is_none(), "retirement must be sticky");
    }
    #[tokio::test]
    async fn pathway_vec_differs_for_different_models() {
        // Two pathways with same fingerprint but different ladder
        // models should have different embeddings so the similarity
        // gate can discriminate. This is what enables narrow fingerprint
        // + similarity-vec to cluster correctly.
        let a = mk_trace("a", true, 5, 5);
        let mut b = a.clone();
        b.ladder_attempts[0].model = "kimi-k2:1t".into();
        b.pathway_vec = build_pathway_vec(&b);
        let sim = cosine(&a.pathway_vec, &b.pathway_vec);
        assert!(sim < 1.0, "different models → different embeddings");
        assert!(sim > 0.5, "shared fingerprint → embeddings still related");
    }
 }
--- a/crates/vectord/src/service.rs
+++ b/crates/vectord/src/service.rs
@ -13,7 +13,7 @@ use std::sync::Arc;
 use aibridge::client::{AiClient, EmbedRequest, GenerateRequest};
 use catalogd::registry::Registry as CatalogRegistry;
 use storaged::registry::BucketRegistry;
-use crate::{agent, autotune, chunker, embedding_cache, harness, hnsw, index_registry, jobs, lance_backend, playbook_memory, promotion, rag, refresh, search, store, supervisor, trial};
+use crate::{agent, autotune, chunker, embedding_cache, harness, hnsw, index_registry, jobs, lance_backend, pathway_memory, playbook_memory, promotion, rag, refresh, search, store, supervisor, trial};
 use tokio::sync::Semaphore;
 #[derive(Clone)]
@ -55,6 +55,11 @@ pub struct VectorState {
    /// and, when `use_playbook_memory` is set on /vectors/hybrid, boosts
    /// workers that were actually filled in semantically-similar past ops.
    pub playbook_memory: playbook_memory::PlaybookMemory,
    /// Pathway memory — consensus-designed sidecar for full-context
    /// backtracking + hot-swap of successful review pathways. See
    /// crates/vectord/src/pathway_memory.rs for the design rationale
    /// (10-probe N=3 ensemble, locked 2026-04-24).
    pub pathway_memory: pathway_memory::PathwayMemory,
    /// Serializes embed calls from seed_playbook_memory to avoid
    /// concurrent socket collisions with the Python sidecar.
    pub embed_semaphore: Arc<Semaphore>,
@ -137,6 +142,15 @@ pub fn router(state: VectorState) -> Router {
        // Phase 45 slice 3 — doc drift detection + human re-admission.
        .route("/playbook_memory/doc_drift/check/{id}", post(check_doc_drift))
        .route("/playbook_memory/doc_drift/resolve/{id}", post(resolve_doc_drift))
        // Pathway memory — consensus-designed sidecar (2026-04-24).
        // scrum_master_pipeline POSTs /pathway/insert at the end of each
        // review, calls /pathway/query before running the ladder for a
        // potential hot-swap, and posts /pathway/record_replay after a
        // hot-swap succeeds or fails.
        .route("/pathway/insert", post(pathway_insert))
        .route("/pathway/query", post(pathway_query))
        .route("/pathway/record_replay", post(pathway_record_replay))
        .route("/pathway/stats", get(pathway_stats))
        .with_state(state)
 }
@ -2833,6 +2847,73 @@ async fn lance_build_scalar_index(
    }
 }
 // ─── Pathway memory handlers ──────────────────────────────────────────
 //
 // Thin wrappers around pathway_memory::PathwayMemory. HTTP surface is
 // deliberately small — four endpoints cover the full lifecycle:
 // insert at end-of-review, query before running the ladder,
 // record_replay after a hot-swap, and stats for the VCP UI.
 #[derive(Deserialize)]
 struct PathwayQueryRequest {
    task_class: String,
    file_path: String,
    signal_class: Option<String>,
    query_vec: Vec<f32>,
 }
 async fn pathway_insert(
    State(state): State<VectorState>,
    Json(trace): Json<pathway_memory::PathwayTrace>,
 ) -> impl IntoResponse {
    match state.pathway_memory.insert(trace).await {
        Ok(()) => Ok(Json(json!({"ok": true}))),
        Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
    }
 }
 async fn pathway_query(
    State(state): State<VectorState>,
    Json(req): Json<PathwayQueryRequest>,
 ) -> impl IntoResponse {
    let cand = state
        .pathway_memory
        .query_hot_swap(
            &req.task_class,
            &req.file_path,
            req.signal_class.as_deref(),
            &req.query_vec,
        )
        .await;
    // 200 with null candidate means "no hot-swap"; this is a normal
    // path, not an error — callers should proceed with the full ladder.
    Json(json!({ "candidate": cand }))
 }
 #[derive(Deserialize)]
 struct PathwayReplayRequest {
    pathway_id: String,
    succeeded: bool,
 }
 async fn pathway_record_replay(
    State(state): State<VectorState>,
    Json(req): Json<PathwayReplayRequest>,
 ) -> impl IntoResponse {
    match state
        .pathway_memory
        .record_replay_outcome(&req.pathway_id, req.succeeded)
        .await
    {
        Ok(()) => Ok(Json(json!({"ok": true}))),
        Err(e) => Err((StatusCode::NOT_FOUND, e)),
    }
 }
 async fn pathway_stats(State(state): State<VectorState>) -> impl IntoResponse {
    Json(state.pathway_memory.stats().await)
 }
 #[cfg(test)]
 mod extractor_tests {
    use super::*;
--- a/tests/real-world/scrum_master_pipeline.ts
+++ b/tests/real-world/scrum_master_pipeline.ts
@ -157,6 +157,158 @@ async function embedBatch(texts: string[]): Promise<number[][]> {
  return (await r.json() as any).embeddings;
 }
 // ─── Pathway memory (2026-04-24 consensus design) ───────────────────
 //
 // Mirrors vectord/src/pathway_memory.rs. The bucket-hash vector MUST
 // byte-match the Rust implementation so traces written from TypeScript
 // are searchable against the same embedding space. Verified by running
 // both implementations on the same input tokens and asserting matching
 // bucket indices.
 function filePrefix(path: string): string {
  return path.split("/").slice(0, 2).join("/");
 }
 function computePathwayId(taskClass: string, filePath: string, signalClass: string | null): string {
  const h = createHash("sha256");
  h.update(taskClass);
  h.update("|");
  h.update(filePrefix(filePath));
  h.update("|");
  h.update(signalClass ?? "");
  return h.digest("hex");
 }
 // 32-bucket L2-normalized token hash. Same algorithm as Rust.
 function buildPathwayVec(tokens: string[]): number[] {
  const buckets = new Array(32).fill(0);
  for (const t of tokens) {
    const d = createHash("sha256").update(t, "utf8").digest();
    const b1 = d[0] % 32;
    const b2 = d[8] % 32;
    buckets[b1] += 1;
    buckets[b2] += 1;
  }
  let norm = 0;
  for (const v of buckets) norm += v * v;
  norm = Math.sqrt(norm);
  if (norm > 0) for (let i = 0; i < buckets.length; i++) buckets[i] /= norm;
  return buckets;
 }
 // Build the minimal query vector for a pre-ladder hot-swap check. We
 // don't yet know the ladder attempts or KB chunks — the query vec is
 // computed from what we CAN know up front: task/file/signal. This is
 // a weaker embedding than the one computed at trace-insert time, but
 // similarity still discriminates between task/file/signal combinations.
 function buildQueryVec(taskClass: string, filePath: string, signalClass: string | null): number[] {
  const tokens = [taskClass, filePath];
  if (signalClass) tokens.push(`signal:${signalClass}`);
  return buildPathwayVec(tokens);
 }
 interface HotSwapCandidate {
  pathway_id: string;
  similarity: number;
  replay_count: number;
  success_rate: number;
  recommended_rung: number;
  recommended_model: string;
 }
 async function queryHotSwap(taskClass: string, filePath: string, signalClass: string | null): Promise<HotSwapCandidate | null> {
  try {
    const query_vec = buildQueryVec(taskClass, filePath, signalClass);
    const r = await fetch(`${GATEWAY}/vectors/pathway/query`, {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: JSON.stringify({ task_class: taskClass, file_path: filePath, signal_class: signalClass, query_vec }),
      signal: AbortSignal.timeout(5000),
    });
    if (!r.ok) return null;
    const j = await r.json() as { candidate: HotSwapCandidate | null };
    return j.candidate ?? null;
  } catch {
    // Pathway service unavailable → run full ladder. Hot-swap is
    // always an optimization, never a correctness requirement.
    return null;
  }
 }
 interface LadderAttemptRec {
  rung: number;
  model: string;
  latency_ms: number;
  accepted: boolean;
  reject_reason: string | null;
 }
 interface PathwayTracePayload {
  pathway_id: string;
  task_class: string;
  file_path: string;
  signal_class: string | null;
  created_at: string;
  ladder_attempts: LadderAttemptRec[];
  kb_chunks: { source_doc: string; chunk_id: string; cosine_score: number; rank: number }[];
  observer_signals: { class: string; priors: string[]; prior_iter_outcomes: string[] }[];
  bridge_hits: { library: string; version: string }[];
  sub_pipeline_calls: { pipeline: string; result_summary: string }[];
  audit_consensus: { pass: boolean; models: string[]; disagreements: number } | null;
  reducer_summary: string;
  final_verdict: string;
  pathway_vec: number[];
  replay_count: number;
  replays_succeeded: number;
  retired: boolean;
 }
 async function writePathwayTrace(trace: PathwayTracePayload): Promise<void> {
  try {
    await fetch(`${GATEWAY}/vectors/pathway/insert`, {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: JSON.stringify(trace),
      signal: AbortSignal.timeout(10000),
    });
  } catch {
    // Fire-and-forget: scrum runs shouldn't fail if pathway insert fails.
  }
 }
 async function recordPathwayReplay(pathwayId: string, succeeded: boolean): Promise<void> {
  try {
    await fetch(`${GATEWAY}/vectors/pathway/record_replay`, {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: JSON.stringify({ pathway_id: pathwayId, succeeded }),
      signal: AbortSignal.timeout(5000),
    });
  } catch {
    // Fire-and-forget. Not critical.
  }
 }
 // Deterministic signal_class lookup from scrum_reviews.jsonl history.
 // First-time files get `null`. Files seen before get the signal class
 // the observer assigned on their most-recent review (if any). Keeps the
 // pathway fingerprint stable across iterations for LOOPING files.
 async function lookupSignalClass(filePath: string): Promise<string | null> {
  try {
    const { readFile } = await import("node:fs/promises");
    const raw = await readFile(SCRUM_REVIEWS_JSONL, "utf8").catch(() => "");
    if (!raw) return null;
    const lines = raw.trim().split("\n").reverse();
    for (const line of lines) {
      try {
        const r = JSON.parse(line);
        if (r.file === filePath && r.signal_class) return r.signal_class;
      } catch {}
    }
    return null;
  } catch { return null; }
 }
 async function chat(opts: {
  provider: "ollama" | "ollama_cloud",
  model: string,
@ -389,32 +541,63 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
  let acceptedModel = "";
  let acceptedOn = 0;
-  for (let i = 0; i < MAX_ATTEMPTS; i++) {
+  // Pathway hot-swap pre-check. If a proven pathway exists for this
-    const n = i + 1;
+  // (task, file_prefix, signal) combo with ≥3 replays at ≥80% success,
  // skip the ladder and try its winning rung first. On success we
  // record a positive replay; on failure we fall through to the full
  // ladder and record a negative replay. Fire-and-forget — pathway
  // service unavailable → null candidate → business as usual.
  const signalClass = await lookupSignalClass(rel);
  const taskClass = "scrum_review";
  const hotSwap = await queryHotSwap(taskClass, rel, signalClass);
  let hotSwapOrderedIndices: number[] | null = null;
  if (hotSwap) {
    // Reorder the ladder to try the recommended model first. Rung
    // indices are preserved in the output so the trace still reflects
    // the true ladder position the model sits at.
    const recommendedIdx = LADDER.findIndex(r => r.model === hotSwap.recommended_model);
    if (recommendedIdx >= 0) {
      log(`  🔥 hot-swap candidate: ${hotSwap.recommended_model} (rung ${hotSwap.recommended_rung}, sim=${hotSwap.similarity.toFixed(3)}, success_rate=${hotSwap.success_rate.toFixed(2)}, ${hotSwap.replay_count} replays)`);
      hotSwapOrderedIndices = [recommendedIdx, ...LADDER.map((_, i) => i).filter(i => i !== recommendedIdx)];
    }
  }
  const ladderOrder = hotSwapOrderedIndices ?? LADDER.map((_, i) => i);
  // Collect attempts for the pathway trace sidecar.
  const pathwayAttempts: LadderAttemptRec[] = [];
  for (let step = 0; step < MAX_ATTEMPTS; step++) {
    const i = ladderOrder[step];
    const n = step + 1;
    const rung = LADDER[i];
    const learning = history.length > 0
      ? `\n\n═══ PRIOR ATTEMPTS FAILED. Specific issues to fix: ═══\n${history.map(h => `Attempt ${h.n} (${h.model}, ${h.chars} chars): ${h.status} — ${h.error ?? "thin/unstructured answer"}`).join("\n")}\n═══`
      : "";
    log(`  attempt ${n}/${MAX_ATTEMPTS}: ${rung.provider}::${rung.model}${learning ? " [w/ learning]" : ""}`);
    const attemptStarted = Date.now();
    const r = await chat({
      provider: rung.provider,
      model: rung.model,
      prompt: baseTask + learning,
      max_tokens: 1500,
    });
    const attemptMs = Date.now() - attemptStarted;
    if (r.error) {
      history.push({ n, model: rung.model, status: "error", chars: 0, error: r.error.slice(0, 180) });
      pathwayAttempts.push({ rung: i + 1, model: rung.model, latency_ms: attemptMs, accepted: false, reject_reason: `error: ${r.error.slice(0, 100)}` });
      log(`    ✗ error: ${r.error.slice(0, 80)}`);
      continue;
    }
    if (!isAcceptable(r.content)) {
      history.push({ n, model: rung.model, status: "thin", chars: r.content.length, error: `thin/unstructured (${r.content.length} chars)` });
      pathwayAttempts.push({ rung: i + 1, model: rung.model, latency_ms: attemptMs, accepted: false, reject_reason: `thin (${r.content.length} chars)` });
      log(`    ✗ thin/unstructured (${r.content.length} chars)`);
      continue;
    }
    history.push({ n, model: rung.model, status: "accepted", chars: r.content.length });
    pathwayAttempts.push({ rung: i + 1, model: rung.model, latency_ms: attemptMs, accepted: true, reject_reason: null });
    accepted = r.content;
    acceptedModel = `${rung.provider}/${rung.model}`;
    acceptedOn = n;
@ -422,6 +605,15 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
    break;
  }
  // Hot-swap bookkeeping: if we tried the recommended model first,
  // report whether it worked so the pathway's success_rate updates.
  if (hotSwap) {
    const replaySucceeded = acceptedModel.endsWith(`/${hotSwap.recommended_model}`);
    log(`  pathway replay ${replaySucceeded ? "✓" : "✗"} (${hotSwap.pathway_id.slice(0, 12)}…)`);
    // Fire and forget — don't await; observer can handle it.
    recordPathwayReplay(hotSwap.pathway_id, replaySucceeded);
  }
  const review: FileReview = {
    file: rel,
    file_bytes: content.length,
@ -599,6 +791,54 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
      console.error(`[scrum] failed to append scrum_reviews.jsonl: ${(e as Error).message}`);
    }
    // Pathway trace sidecar (consensus-designed 2026-04-24). Captures
    // FULL context (ladder attempts, KB chunks, observer signal, verdict)
    // for similarity-based hot-swap in future iterations. First-review
    // pathways start in probation (replay_count=0); they become
    // hot-swappable only after ≥3 replays at ≥80% success.
    try {
      const pathwayTrace: PathwayTracePayload = {
        pathway_id: computePathwayId(taskClass, rel, signalClass),
        task_class: taskClass,
        file_path: rel,
        signal_class: signalClass,
        created_at: row.reviewed_at,
        ladder_attempts: pathwayAttempts,
        kb_chunks: [
          ...topPrd.map((c, idx) => ({
            source_doc: "PRD.md", chunk_id: `prd@${c.offset}`, cosine_score: (c as any)._score ?? 0, rank: idx,
          })),
          ...topPlan.map((c, idx) => ({
            source_doc: "cohesion_plan", chunk_id: `plan@${c.offset}`, cosine_score: (c as any)._score ?? 0, rank: idx,
          })),
        ],
        observer_signals: signalClass ? [{ class: signalClass, priors: [], prior_iter_outcomes: [] }] : [],
        bridge_hits: [],      // context7 not wired into scrum yet; empty for v1
        sub_pipeline_calls: [], // LLM Team extract happens after this row; out of scope for v1
        audit_consensus: null, // set by auditor's later N=3 pass, via /pathway/insert update
        reducer_summary: accepted.slice(0, 4000),
        final_verdict: verdict ?? "accepted",
        // Vec built from the full attempts/chunks — richer than the
        // query-time vector. The similarity gate will still discriminate
        // between pathways with the same fingerprint but different
        // ladder/KB profiles.
        pathway_vec: buildPathwayVec([
          taskClass,
          rel,
          ...(signalClass ? [`signal:${signalClass}`] : []),
          ...pathwayAttempts.flatMap(a => [`rung:${a.rung}`, `model:${a.model}`, `accepted:${a.accepted}`]),
          ...topPrd.map(c => `kb:PRD.md`),
          ...topPlan.map(c => `kb:cohesion_plan`),
        ]),
        replay_count: 0,
        replays_succeeded: 0,
        retired: false,
      };
      writePathwayTrace(pathwayTrace); // fire-and-forget
    } catch (e) {
      console.error(`[scrum] pathway trace failed: ${(e as Error).message}`);
    }
    // Close the scrum → observer loop (fix 2026-04-24). Architecture
    // audit surfaced: observer ring had 2000 ops, 1999 from Langfuse,
    // zero from scrum. Observer's analyzeErrors + PLAYBOOK_BUILDER loops
@ -643,6 +883,20 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
          critical_failures_count,
          verified_components_count,
          missing_components_count,
          // Pathway fields: emitted on every review so the observer
          // can build a full picture of hot-swap performance over time.
          // `pathway_hot_swap_hit` flags whether the first-tried rung
          // this review was a pathway recommendation vs the default
          // ladder top. `rungs_saved` quantifies the compute we avoided
          // when a hot-swap landed — this is the value metric the VCP
          // UI surfaces ("avg_rungs_saved_per_commit").
          pathway_hot_swap_hit: hotSwap !== null,
          pathway_id: hotSwap?.pathway_id ?? null,
          pathway_similarity: hotSwap?.similarity ?? null,
          pathway_success_rate: hotSwap?.success_rate ?? null,
          rungs_saved: hotSwap && acceptedModel.endsWith(`/${hotSwap.recommended_model}`)
            ? Math.max(0, hotSwap.recommended_rung - 1)
            : 0,
          ts: row.reviewed_at,
        }),
        signal: AbortSignal.timeout(3000),
--- a/ui/server.ts
+++ b/ui/server.ts
@ -349,6 +349,44 @@ Bun.serve({
    if (path === "/data/outcomes") return Response.json(await tailJsonl(`${KB}/outcomes.jsonl`, 30));
    if (path === "/data/audit_facts") return Response.json(await tailJsonl(`${KB}/audit_facts.jsonl`, 30));
    // Pathway memory — consensus-designed sidecar (2026-04-24). Two
    // exposed metrics: reuse_rate (activity — is it firing?) and
    // avg_rungs_saved_per_commit (value — is it earning its keep?).
    // Round-3 consensus (qwen3.5:397b) pointed out that activity
    // without value tells us nothing; the UI needs both to judge the
    // health of the hot-swap learning loop.
    if (path === "/data/pathway_stats") {
      try {
        const r = await fetch("http://localhost:3100/vectors/pathway/stats", { signal: AbortSignal.timeout(3000) });
        if (!r.ok) return Response.json({ error: `vectord ${r.status}`, stats: null });
        const stats = await r.json();
        // Tail recent scrum events to compute avg_rungs_saved_per_commit
        // (a committed review = any row in scrum_reviews.jsonl; rungs_saved
        // only populates when pathway memory fired AND the recommended
        // model actually produced the accept).
        const reviews = await tailJsonl(`${KB}/scrum_reviews.jsonl`, 200);
        let totalCommits = 0;
        let totalRungsSaved = 0;
        let hotSwapHits = 0;
        for (const r of reviews) {
          totalCommits++;
          if (r.pathway_hot_swap_hit) hotSwapHits++;
          if (typeof r.rungs_saved === "number") totalRungsSaved += r.rungs_saved;
        }
        return Response.json({
          stats,
          scrum_window: {
            reviews: totalCommits,
            hot_swap_hits: hotSwapHits,
            pathway_reuse_rate: totalCommits ? hotSwapHits / totalCommits : 0,
            avg_rungs_saved_per_commit: totalCommits ? totalRungsSaved / totalCommits : 0,
          },
        });
      } catch (e) {
        return Response.json({ error: (e as Error).message, stats: null });
      }
    }
    if (path.startsWith("/data/file/")) {
      const relpath = decodeURIComponent(path.slice("/data/file/".length));
      return Response.json(await fileHistory(relpath));
--- a/ui/ui.js
+++ b/ui/ui.js
@ -589,6 +589,37 @@ function metricBox(label, big, kind, opts = {}) {
 function drawMetrics() {
  const grid = document.getElementById("metric-grid");
  clear(grid);
  // Kick off pathway fetch in parallel; render when it resolves so the
  // rest of the metrics grid appears immediately. The cards append to
  // the grid after the synchronous block below — they'll show up at
  // the bottom of the grid within a tick of first render.
  fetch("/data/pathway_stats").then(r => r.ok ? r.json() : null).then(j => {
    if (!j || !j.stats) return;
    const s = j.stats;
    const w = j.scrum_window ?? {};
    // Activity metric — is the hot-swap firing at all?
    grid.append(metricBox("pathway reuse rate", `${Math.round((w.pathway_reuse_rate ?? 0) * 100)}%`,
      (w.pathway_reuse_rate ?? 0) > 0.1 ? "good" : (w.pathway_reuse_rate ?? 0) > 0 ? "warn" : "bad", {
      explain: "% of recent reviews where a pathway hot-swap fired (narrow fingerprint match + 0.80 success rate + ≥3 replays + audit_consensus pass + 0.90 similarity).",
      source: `scrum_reviews.jsonl .pathway_hot_swap_hit over last ${w.reviews ?? 0} reviews (${w.hot_swap_hits ?? 0} hits)`,
      good: "≥10% sustained = index earning its keep. <10% over many iters = fingerprint too narrow or probation too strict. 0% on fresh install is expected (no replays yet).",
    }));
    // Value metric — how much compute did hot-swap actually save?
    const saved = w.avg_rungs_saved_per_commit ?? 0;
    grid.append(metricBox("avg rungs saved", saved.toFixed(2),
      saved >= 1 ? "good" : saved > 0 ? "warn" : "bad", {
      explain: "Average ladder rungs skipped per committed review by hot-swap. Rungs_saved = recommended_rung - 1 when the recommended model succeeded (otherwise 0).",
      source: "scrum_reviews.jsonl .rungs_saved averaged",
      good: "Every 1.0 here ≈ one less model call per review. At 21 files/iter, 1.0 saved = 21 cloud calls avoided. Value only counts when the replay actually succeeded.",
    }));
    // Stability metric — retired pathways indicate the learning loop is correcting itself.
    grid.append(metricBox("pathways tracked", String(s.total_pathways),
      s.total_pathways > 0 ? "good" : "warn", {
      explain: `Total pathway traces stored. ${s.retired} retired (below 0.80 success after ≥3 replays). ${s.with_audit_pass} audit-passed, eligible for hot-swap probation.`,
      source: "/vectors/pathway/stats",
      good: `Grows monotonically with scrum runs. Retired=${s.retired} is HEALTHY — it means the learning loop is pruning pathways that stopped working. replay_success_rate=${(s.replay_success_rate*100).toFixed(0)}% aggregates all historical replays.`,
    }));
  }).catch(() => {});
  const byTier = { auto:0, dry_run:0, simulation:0, block:0, unknown:0 };
  state.reviews.forEach(r => { const t = r.gradient_tier ?? "unknown"; if (byTier[t] != null) byTier[t]++; });
  const total = state.reviews.length || 1;