pathway_memory: consensus-designed sidecar + hot-swap learning loop
Some checks failed
lakehouse/auditor 11 warnings — see review

10-probe N=3 consensus (kimi-k2:1t / gpt-oss:120b / qwen3.5:latest /
deepseek-v3.1:671b / qwen3-coder:480b / mistral-large-3:675b /
qwen3.5:397b + 2 stability re-probes; 2 openrouter probes 429'd) locked
the design across three rounds. Full JSON responses in
data/_kb/consensus_reducer_design_{mocq3akn,mocq6pi1,mocqatik}.json.

What it does

Preserves FULL backtrack context per reviewed file (ladder attempts +
latencies + reject reasons, KB chunks with provenance + cosine + rank,
observer signals, context7 bridge hits, sub-pipeline calls, audit
consensus) and indexes them by narrow fingerprint for hot-swap of
proven review pathways.

When scrum reviews a file:
  1. narrow fingerprint = task_class + file_prefix + signal_class
  2. query_hot_swap checks pathway memory for a match that passes
     probation (≥3 replays @ ≥80% success) + audit gate + similarity
     (≥0.90 cosine on normalized-metadata-token embedding)
  3. if hot-swap eligible, recommended model tried first in the ladder
  4. replay outcome reported back, updating the pathway's success_rate
  5. pathways below 0.80 after ≥3 replays retire permanently (sticky)
  6. full PathwayTrace always inserted at end of review — hot-swap
     grows with use, it doesn't bootstrap from nothing

Gate design is load-bearing:
  - narrow fingerprint (6 of 8 consensus models converged on the same
    3-field composition; lock) — enables generalization within crate
  - probation ≥3 replays — binomial tail at 80% is ~5%, below is noise
  - success rate ≥0.80 — mistral + qwen3-coder independently proposed
    this exact threshold across two rounds
  - similarity ≥0.90 — middle of the 0.85/0.95 consensus spread
  - bootstrap: null audit_consensus ALLOWED (auditor → pathway update
    not wired yet; probation + success_rate gates alone enforce safety
    during bootstrap; explicit audit FAIL still blocks)
  - retirement is sticky — prevents oscillation on noise

Files

  + crates/vectord/src/pathway_memory.rs  (new, 600 lines + 18 tests)
    PathwayTrace, LadderAttempt, KbChunkRef, ObserverSignal, BridgeHit,
    SubPipelineCall, AuditConsensus, HotSwapCandidate, PathwayMemory,
    PathwayMemoryStats. 18/18 tests green.
    Cosine + 32-bucket L2-normalized embedding; mirror of TS impl.
  M crates/vectord/src/lib.rs
    pub mod pathway_memory;
  M crates/vectord/src/service.rs
    VectorState grows pathway_memory field;
    4 HTTP handlers (/pathway/insert, /pathway/query,
    /pathway/record_replay, /pathway/stats).
  M crates/gateway/src/main.rs
    Construct PathwayMemory + load from storage on boot,
    wire into VectorState.
  M tests/real-world/scrum_master_pipeline.ts
    Byte-matching TS bucket-hash (verified same bucket indices as
    Rust); pre-ladder hot-swap query; ladder reorder on hit;
    per-attempt latency capture; post-accept trace insert
    (fire-and-forget); replay outcome recording;
    observer /event emits pathway_hot_swap_hit, pathway_similarity,
    rungs_saved per review for the VCP UI.
  M ui/server.ts
    /data/pathway_stats aggregates /vectors/pathway/stats +
    scrum_reviews.jsonl window for the value metric.
  M ui/ui.js
    Three new metric cards:
      · pathway reuse rate (activity: is it firing?)
      · avg rungs saved (value: is it earning its keep?)
      · pathways tracked (stability: retirement = learning)

What's not in this commit (queued)

  - auditor → pathway audit_consensus update wire (explicit audit-fail
    block activates when this lands)
  - bridge_hits + sub_pipeline_calls population from context7 / LLM
    Team extract results (fields wired, callers not yet)
  - replay log (PathwayReplayOutcome {matched_id, succeeded, ts}) as
    a separate jsonl for forensic audit of why specific replays failed

Why > summarization

Summaries discard the causal chain. With this, auditor can verify
citation provenance, applier can distinguish lucky from learned paths,
and the matrix indexing actually stores end-to-end pathways instead of
just RAG chunks — which is what J meant by "why aren't we using it
for everything."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-04-24 05:15:32 -05:00
parent 9cc0ceb894
commit 2f8b347f37
8 changed files with 1120 additions and 3 deletions

1
Cargo.lock generated
View File

@ -6898,6 +6898,7 @@ dependencies = [
"storaged", "storaged",
"tokio", "tokio",
"tracing", "tracing",
"truth",
"url", "url",
] ]

View File

@ -93,6 +93,12 @@ async fn main() {
// operators call POST /vectors/playbook_memory/rebuild to populate. // operators call POST /vectors/playbook_memory/rebuild to populate.
let pbm = vectord::playbook_memory::PlaybookMemory::new(store.clone()); let pbm = vectord::playbook_memory::PlaybookMemory::new(store.clone());
let _ = pbm.load_from_storage().await; let _ = pbm.load_from_storage().await;
// Pathway memory — consensus-designed sidecar for full-context
// backtracking + hot-swap of successful review pathways. Same
// load-on-boot pattern as playbook_memory: empty state is fine,
// operators start populating via scrum_master_pipeline.ts.
let pwm = vectord::pathway_memory::PathwayMemory::new(store.clone());
let _ = pwm.load_from_storage().await;
// Phase 16.2: spawn the autotune agent. When config.agent.enabled=false // Phase 16.2: spawn the autotune agent. When config.agent.enabled=false
// this returns a handle that drops triggers silently — no surprise load. // this returns a handle that drops triggers silently — no surprise load.
@ -178,6 +184,7 @@ async fn main() {
bucket_registry.clone(), index_reg.clone(), bucket_registry.clone(), index_reg.clone(),
), ),
playbook_memory: pbm, playbook_memory: pbm,
pathway_memory: pwm,
embed_semaphore: std::sync::Arc::new(tokio::sync::Semaphore::new(1)), embed_semaphore: std::sync::Arc::new(tokio::sync::Semaphore::new(1)),
})) }))
.nest("/workspaces", queryd::workspace_service::router(workspace_mgr)) .nest("/workspaces", queryd::workspace_service::router(workspace_mgr))

View File

@ -8,6 +8,7 @@ pub mod hnsw;
pub mod index_registry; pub mod index_registry;
pub mod jobs; pub mod jobs;
pub mod playbook_memory; pub mod playbook_memory;
pub mod pathway_memory;
pub mod doc_drift; pub mod doc_drift;
pub mod promotion; pub mod promotion;
pub mod refresh; pub mod refresh;

View File

@ -0,0 +1,704 @@
//! Pathway memory — full backtrack-able context for scrum/auditor reviews.
//!
//! Consensus-designed (10-probe N=3 ensemble, see
//! `data/_kb/consensus_reducer_design_*.json`). The reducer emits a
//! `PathwayTrace` sidecar alongside its legacy summary. Traces are
//! fingerprinted narrowly (`task_class + file_prefix + signal_class`) for
//! generalizing hot-swap, and embedded via normalized-metadata-token
//! concatenation so the HNSW similarity search can discriminate between
//! pathways that share a fingerprint but diverged in ladder/KB choices.
//!
//! The hot-swap decision requires four conditions in AND:
//! 1. narrow fingerprint match
//! 2. audit_consensus.pass == true
//! 3. replay_count >= 3
//! 4. replays_succeeded / replay_count >= 0.80
//! 5. NOT retired
//! 6. similarity(new, stored) >= 0.90
//!
//! Any replay reports its outcome via `record_replay_outcome`; pathways
//! whose success rate drops below 0.80 after >=3 replays are marked
//! retired and excluded from further hot-swap consideration. This is the
//! self-correcting learning loop — a pathway that worked once but breaks
//! under distribution shift removes itself automatically.
use std::collections::HashMap;
use std::sync::Arc;
use chrono::{DateTime, Utc};
use object_store::ObjectStore;
use serde::{Deserialize, Serialize};
use sha2::{Digest, Sha256};
use storaged::ops;
use tokio::sync::RwLock;
const STATE_KEY: &str = "_pathway_memory/state.json";
/// Outcome of one ladder rung attempt. Captured for every attempt,
/// regardless of whether it was accepted — rejections are signal too.
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct LadderAttempt {
pub rung: u8,
pub model: String,
pub latency_ms: u64,
pub accepted: bool,
pub reject_reason: Option<String>,
}
/// Provenance of a RAG chunk retrieved for this review. The
/// `cosine_score` is the similarity as returned by the index; `rank` is
/// 0-indexed order in the top-K result list.
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct KbChunkRef {
pub source_doc: String,
pub chunk_id: String,
pub cosine_score: f32,
pub rank: u8,
}
/// Signal emitted by mcp-server/observer classifier.
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct ObserverSignal {
pub class: String,
pub priors: Vec<String>,
pub prior_iter_outcomes: Vec<String>,
}
/// Context7-bridge lookup snapshot.
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct BridgeHit {
pub library: String,
pub version: String,
}
/// Call to LLM Team (/api/run?mode=extract) or auditor N=3 consensus.
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct SubPipelineCall {
pub pipeline: String, // "llm_team_extract" / "audit_consensus" / etc.
pub result_summary: String,
}
/// N=3 independent consensus re-check result.
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct AuditConsensus {
pub pass: bool,
pub models: Vec<String>,
pub disagreements: u32,
}
/// Full backtrack-able context for one reviewed file. Lives alongside
/// the reducer's summary — summary is what the reviewer LLM sees, this
/// is what the auditor / future iterations / hot-swap use.
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct PathwayTrace {
pub pathway_id: String, // SHA256(task_class|file_prefix|signal_class)
pub task_class: String,
pub file_path: String,
pub signal_class: Option<String>,
pub created_at: DateTime<Utc>,
pub ladder_attempts: Vec<LadderAttempt>,
pub kb_chunks: Vec<KbChunkRef>,
pub observer_signals: Vec<ObserverSignal>,
pub bridge_hits: Vec<BridgeHit>,
pub sub_pipeline_calls: Vec<SubPipelineCall>,
pub audit_consensus: Option<AuditConsensus>,
pub reducer_summary: String,
pub final_verdict: String,
/// Normalized-metadata-token embedding. Dimension fixed per index
/// version (current: 32, sufficient to distinguish task/file/signal
/// combinations without requiring an external embedding model —
/// round-3 consensus said "small metadata tokens", not "full JSON").
pub pathway_vec: Vec<f32>,
/// Number of times this pathway has been replayed via hot-swap.
/// Replay only begins after first insert; initial insert itself is
/// NOT a replay. Probation of ≥3 replays is required before the
/// success-rate gate can fire.
pub replay_count: u32,
pub replays_succeeded: u32,
/// Marked true when replay_count >= 3 AND success_rate < 0.80.
/// Retired pathways are excluded from hot-swap forever. (If the
/// underlying file / task / signal characteristics genuinely change
/// such that a retired pathway would work again, a new PathwayTrace
/// with a fresh id will be inserted — retirement is per-id.)
pub retired: bool,
}
impl PathwayTrace {
/// Compute the narrow fingerprint id from task_class + file_prefix
/// + signal_class. `file_prefix` is the first path segment
/// ("crates/queryd", not "crates/queryd/src/service.rs") so that
/// related files in the same crate share pathways.
pub fn compute_id(task_class: &str, file_path: &str, signal_class: Option<&str>) -> String {
let prefix = file_prefix(file_path);
let sig = signal_class.unwrap_or("");
let mut hasher = Sha256::new();
hasher.update(task_class.as_bytes());
hasher.update(b"|");
hasher.update(prefix.as_bytes());
hasher.update(b"|");
hasher.update(sig.as_bytes());
format!("{:x}", hasher.finalize())
}
pub fn success_rate(&self) -> f32 {
if self.replay_count == 0 {
return 0.0;
}
self.replays_succeeded as f32 / self.replay_count as f32
}
}
/// First two path segments, so `crates/queryd/src/service.rs` →
/// `crates/queryd`. This is intentional — similar files in the same
/// crate often share task characteristics (e.g., all files in
/// `crates/queryd/` are SQL-path Rust code), so fingerprinting on the
/// crate-level prefix lets the hot-swap generalize across files within
/// the crate. Exactly-matching file paths still match (same prefix).
pub fn file_prefix(path: &str) -> String {
let parts: Vec<&str> = path.split('/').take(2).collect();
parts.join("/")
}
/// Build the pathway vector from trace metadata. Intentionally simple —
/// deterministic bag-of-tokens hash into 32 buckets, normalized. Round-3
/// consensus said "small metadata tokens, not full JSON." An external
/// embedding model would work too but adds a dependency, failure mode,
/// and drift risk the consensus flagged.
pub fn build_pathway_vec(trace: &PathwayTrace) -> Vec<f32> {
let mut buckets = vec![0f32; 32];
let mut tokens: Vec<String> = Vec::new();
tokens.push(trace.task_class.clone());
tokens.push(trace.file_path.clone());
if let Some(s) = &trace.signal_class {
tokens.push(format!("signal:{s}"));
}
for a in &trace.ladder_attempts {
tokens.push(format!("rung:{}", a.rung));
tokens.push(format!("model:{}", a.model));
tokens.push(format!("accepted:{}", a.accepted));
}
for k in &trace.kb_chunks {
tokens.push(format!("kb:{}", k.source_doc));
}
for o in &trace.observer_signals {
tokens.push(format!("class:{}", o.class));
}
for b in &trace.bridge_hits {
tokens.push(format!("lib:{}", b.library));
}
for s in &trace.sub_pipeline_calls {
tokens.push(format!("pipeline:{}", s.pipeline));
}
for t in &tokens {
let mut h = Sha256::new();
h.update(t.as_bytes());
let d = h.finalize();
// Two bucket writes per token: use different byte windows to
// spread probability across buckets even when tokens share a
// common prefix.
let b1 = (d[0] as usize) % 32;
let b2 = (d[8] as usize) % 32;
buckets[b1] += 1.0;
buckets[b2] += 1.0;
}
// L2 normalize so cosine similarity becomes a dot product.
let norm: f32 = buckets.iter().map(|v| v * v).sum::<f32>().sqrt();
if norm > 0.0 {
for v in &mut buckets {
*v /= norm;
}
}
buckets
}
pub fn cosine(a: &[f32], b: &[f32]) -> f32 {
if a.len() != b.len() {
return 0.0;
}
a.iter().zip(b.iter()).map(|(x, y)| x * y).sum::<f32>()
}
#[derive(Default, Clone, Serialize, Deserialize)]
struct PathwayMemoryState {
pathways: HashMap<String, Vec<PathwayTrace>>, // key = pathway_id (narrow fingerprint)
last_updated_at: i64,
}
#[derive(Clone)]
pub struct PathwayMemory {
state: Arc<RwLock<PathwayMemoryState>>,
store: Arc<dyn ObjectStore>,
}
#[derive(Debug, Serialize)]
pub struct HotSwapCandidate {
pub pathway_id: String,
pub similarity: f32,
pub replay_count: u32,
pub success_rate: f32,
pub recommended_rung: u8,
pub recommended_model: String,
}
impl PathwayMemory {
pub fn new(store: Arc<dyn ObjectStore>) -> Self {
Self {
state: Arc::new(RwLock::new(PathwayMemoryState::default())),
store,
}
}
pub async fn load_from_storage(&self) -> Result<usize, String> {
let data = match ops::get(&self.store, STATE_KEY).await {
Ok(d) => d,
Err(_) => return Ok(0),
};
let persisted: PathwayMemoryState = serde_json::from_slice(&data)
.map_err(|e| format!("parse pathway_memory state: {e}"))?;
let n: usize = persisted.pathways.values().map(|v| v.len()).sum();
*self.state.write().await = persisted;
tracing::info!("pathway_memory: loaded {n} traces from {STATE_KEY}");
Ok(n)
}
async fn persist(&self) -> Result<(), String> {
let snapshot = self.state.read().await.clone();
let bytes = serde_json::to_vec_pretty(&snapshot).map_err(|e| e.to_string())?;
ops::put(&self.store, STATE_KEY, bytes.into()).await
}
/// Insert a new pathway trace. Called by scrum_master_pipeline at
/// the end of each file's review. Computes the pathway_vec from
/// metadata if the caller didn't supply one. Appends to the bucket
/// for this pathway_id — multiple traces can share a fingerprint
/// (each represents one review of the same file/task/signal combo).
pub async fn insert(&self, mut trace: PathwayTrace) -> Result<(), String> {
if trace.pathway_vec.is_empty() {
trace.pathway_vec = build_pathway_vec(&trace);
}
let mut s = self.state.write().await;
s.pathways
.entry(trace.pathway_id.clone())
.or_default()
.push(trace);
s.last_updated_at = Utc::now().timestamp_millis();
drop(s);
self.persist().await
}
/// Query for a hot-swap candidate. Returns `None` if no eligible
/// pathway exists — caller should run the full ladder. Returns
/// `Some(cand)` if all gates pass — caller can short-circuit to
/// `cand.recommended_rung` / `cand.recommended_model`.
///
/// Gates (all must hold):
/// - narrow fingerprint match (same task/file_prefix/signal)
/// - audit_consensus.pass == true on the stored trace
/// - replay_count >= 3 (probation)
/// - success_rate >= 0.80
/// - NOT retired
/// - similarity(query_vec, stored.pathway_vec) >= 0.90
pub async fn query_hot_swap(
&self,
task_class: &str,
file_path: &str,
signal_class: Option<&str>,
query_vec: &[f32],
) -> Option<HotSwapCandidate> {
let id = PathwayTrace::compute_id(task_class, file_path, signal_class);
let s = self.state.read().await;
let candidates = s.pathways.get(&id)?;
let mut best: Option<(f32, &PathwayTrace)> = None;
for p in candidates {
if p.retired {
continue;
}
// audit_consensus gate: explicit FAIL blocks hot-swap. A null
// audit_consensus (auditor hasn't seen this pathway yet) is
// NOT a block — the success_rate gate below still requires
// ≥3 real-world replays at ≥80% success before a pathway
// becomes hot-swap eligible, so the learning loop itself
// provides the safety net during bootstrap. Once the auditor
// pipeline wires pathway audit updates, this gate tightens
// automatically: any explicit audit_consensus.pass == false
// here will skip the candidate.
if let Some(ac) = &p.audit_consensus {
if !ac.pass {
continue;
}
}
if p.replay_count < 3 {
continue;
}
if p.success_rate() < 0.80 {
continue;
}
let sim = cosine(query_vec, &p.pathway_vec);
if sim < 0.90 {
continue;
}
if best.as_ref().map(|(b, _)| sim > *b).unwrap_or(true) {
best = Some((sim, p));
}
}
let (similarity, p) = best?;
// The "recommended" rung is the first accepted attempt in the
// stored pathway — that's the one the ladder converged on.
let accepted = p.ladder_attempts.iter().find(|a| a.accepted)?;
Some(HotSwapCandidate {
pathway_id: p.pathway_id.clone(),
similarity,
replay_count: p.replay_count,
success_rate: p.success_rate(),
recommended_rung: accepted.rung,
recommended_model: accepted.model.clone(),
})
}
/// Record the outcome of a hot-swap replay. Increments replay_count
/// unconditionally; increments replays_succeeded iff succeeded;
/// retires the pathway if replay_count >= 3 and success_rate falls
/// below 0.80. Mistral's learning loop in code.
pub async fn record_replay_outcome(
&self,
pathway_id: &str,
succeeded: bool,
) -> Result<(), String> {
let mut s = self.state.write().await;
// Find the specific pathway across the bucket that matches by
// full id (the bucket key is already the narrow id, but in case
// of future multi-trace-per-id we take the most recent).
let bucket = s
.pathways
.iter_mut()
.find(|(k, _)| k.as_str() == pathway_id)
.map(|(_, v)| v)
.ok_or_else(|| format!("pathway {pathway_id} not found"))?;
let p = bucket
.last_mut()
.ok_or_else(|| format!("pathway {pathway_id} has empty bucket"))?;
p.replay_count = p.replay_count.saturating_add(1);
if succeeded {
p.replays_succeeded = p.replays_succeeded.saturating_add(1);
}
if p.replay_count >= 3 && p.success_rate() < 0.80 {
p.retired = true;
}
s.last_updated_at = Utc::now().timestamp_millis();
drop(s);
self.persist().await
}
pub async fn stats(&self) -> PathwayMemoryStats {
let s = self.state.read().await;
let mut total = 0usize;
let mut retired = 0usize;
let mut with_audit_pass = 0usize;
let mut total_replays = 0u64;
let mut successful_replays = 0u64;
for bucket in s.pathways.values() {
for p in bucket {
total += 1;
if p.retired {
retired += 1;
}
if p.audit_consensus.as_ref().map(|a| a.pass).unwrap_or(false) {
with_audit_pass += 1;
}
total_replays += p.replay_count as u64;
successful_replays += p.replays_succeeded as u64;
}
}
PathwayMemoryStats {
total_pathways: total,
retired,
with_audit_pass,
total_replays,
successful_replays,
reuse_rate: if total == 0 {
0.0
} else {
total_replays as f32 / total as f32
},
replay_success_rate: if total_replays == 0 {
0.0
} else {
successful_replays as f32 / total_replays as f32
},
}
}
}
#[derive(Debug, Serialize, Deserialize)]
pub struct PathwayMemoryStats {
pub total_pathways: usize,
pub retired: usize,
pub with_audit_pass: usize,
pub total_replays: u64,
pub successful_replays: u64,
pub reuse_rate: f32, // total_replays / total_pathways
pub replay_success_rate: f32, // successful_replays / total_replays
}
#[cfg(test)]
mod tests {
use super::*;
use object_store::memory::InMemory;
fn mk_store() -> Arc<dyn ObjectStore> {
Arc::new(InMemory::new())
}
fn mk_trace(id_tag: &str, audit_pass: bool, replays: u32, succ: u32) -> PathwayTrace {
let pathway_id =
PathwayTrace::compute_id("scrum_review", &format!("crates/{id_tag}/src/x.rs"), Some("CONVERGING"));
let attempts = vec![LadderAttempt {
rung: 2,
model: "qwen3-coder:480b".into(),
latency_ms: 1000,
accepted: true,
reject_reason: None,
}];
let mut trace = PathwayTrace {
pathway_id,
task_class: "scrum_review".into(),
file_path: format!("crates/{id_tag}/src/x.rs"),
signal_class: Some("CONVERGING".into()),
created_at: Utc::now(),
ladder_attempts: attempts,
kb_chunks: vec![KbChunkRef {
source_doc: "PRD.md".into(),
chunk_id: "c1".into(),
cosine_score: 0.88,
rank: 0,
}],
observer_signals: vec![],
bridge_hits: vec![],
sub_pipeline_calls: vec![],
audit_consensus: Some(AuditConsensus {
pass: audit_pass,
models: vec!["qwen3-coder:480b".into(), "gpt-oss:120b".into(), "kimi-k2:1t".into()],
disagreements: 0,
}),
reducer_summary: "ok".into(),
final_verdict: "accepted".into(),
pathway_vec: vec![],
replay_count: replays,
replays_succeeded: succ,
retired: false,
};
trace.pathway_vec = build_pathway_vec(&trace);
trace
}
#[test]
fn file_prefix_takes_first_two_segments() {
assert_eq!(file_prefix("crates/queryd/src/service.rs"), "crates/queryd");
assert_eq!(file_prefix("crates/gateway"), "crates/gateway");
assert_eq!(file_prefix("README.md"), "README.md");
assert_eq!(file_prefix(""), "");
}
#[test]
fn compute_id_is_deterministic() {
let a = PathwayTrace::compute_id("scrum", "crates/queryd/src/x.rs", Some("LOOPING"));
let b = PathwayTrace::compute_id("scrum", "crates/queryd/src/x.rs", Some("LOOPING"));
assert_eq!(a, b);
}
#[test]
fn compute_id_generalizes_across_same_prefix() {
// Same prefix + task + signal → same id. That IS the narrow
// generalization — it's what lets hot-swap fire for different
// files in the same crate that share the task/signal profile.
let a = PathwayTrace::compute_id("scrum", "crates/queryd/src/a.rs", Some("L"));
let b = PathwayTrace::compute_id("scrum", "crates/queryd/src/b.rs", Some("L"));
assert_eq!(a, b);
}
#[test]
fn compute_id_differs_on_signal_class() {
let a = PathwayTrace::compute_id("scrum", "crates/q/s", Some("CONVERGING"));
let b = PathwayTrace::compute_id("scrum", "crates/q/s", Some("LOOPING"));
assert_ne!(a, b);
}
#[test]
fn cosine_handles_mismatched_lengths() {
assert_eq!(cosine(&[1.0, 0.0], &[1.0]), 0.0);
}
#[test]
fn cosine_of_identical_normalized_is_one() {
let v = vec![0.6, 0.8];
let c = cosine(&v, &v);
assert!((c - 1.0).abs() < 1e-5);
}
#[test]
fn success_rate_is_zero_before_any_replay() {
let t = mk_trace("a", true, 0, 0);
assert_eq!(t.success_rate(), 0.0);
}
#[test]
fn success_rate_ratio() {
let t = mk_trace("a", true, 4, 3);
assert!((t.success_rate() - 0.75).abs() < 1e-5);
}
#[tokio::test]
async fn insert_and_stats_roundtrip() {
let mem = PathwayMemory::new(mk_store());
mem.insert(mk_trace("a", true, 0, 0)).await.unwrap();
let stats = mem.stats().await;
assert_eq!(stats.total_pathways, 1);
assert_eq!(stats.retired, 0);
assert_eq!(stats.with_audit_pass, 1);
}
#[tokio::test]
async fn hot_swap_rejects_when_probation_not_met() {
// Probation: replay_count must be >= 3 before success-rate gate
// can fire. A fresh pathway with 0 replays must NEVER hot-swap
// even if its similarity is 1.0 and audit passes.
let mem = PathwayMemory::new(mk_store());
let trace = mk_trace("a", true, 0, 0);
let qvec = trace.pathway_vec.clone();
mem.insert(trace).await.unwrap();
let got = mem
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
.await;
assert!(got.is_none(), "fresh pathway must not hot-swap");
}
#[tokio::test]
async fn hot_swap_rejects_when_audit_explicitly_fails() {
let mem = PathwayMemory::new(mk_store());
let trace = mk_trace("a", false, 5, 5); // audit FAILED explicitly
let qvec = trace.pathway_vec.clone();
mem.insert(trace).await.unwrap();
let got = mem
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
.await;
assert!(got.is_none(), "pathway with explicit audit FAIL must not hot-swap");
}
#[tokio::test]
async fn hot_swap_accepts_unaudited_pathway_for_bootstrap() {
// v1 bootstrap: auditor doesn't update pathway audit_consensus
// until Phase N+1 wires it. Until then, null audit_consensus
// must NOT block hot-swap — the success_rate + probation gates
// alone prove safety. Once auditor wires up, explicit audit
// failures will re-introduce the block (see previous test).
let mem = PathwayMemory::new(mk_store());
let mut trace = mk_trace("a", true, 5, 5);
trace.audit_consensus = None; // bootstrap path
trace.pathway_vec = build_pathway_vec(&trace);
let qvec = trace.pathway_vec.clone();
mem.insert(trace).await.unwrap();
let got = mem
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
.await;
assert!(got.is_some(), "unaudited pathway with good replay history must hot-swap");
}
#[tokio::test]
async fn hot_swap_rejects_when_success_rate_below_80pct() {
// 10 replays, 7 succeeded = 70% — below the 0.80 threshold.
let mem = PathwayMemory::new(mk_store());
let trace = mk_trace("a", true, 10, 7);
let qvec = trace.pathway_vec.clone();
mem.insert(trace).await.unwrap();
let got = mem
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
.await;
assert!(got.is_none());
}
#[tokio::test]
async fn hot_swap_accepts_when_all_gates_pass() {
let mem = PathwayMemory::new(mk_store());
let trace = mk_trace("a", true, 5, 5); // 100% success after 5 replays
let qvec = trace.pathway_vec.clone();
mem.insert(trace).await.unwrap();
let got = mem
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
.await;
let cand = got.expect("should hot-swap");
assert!(cand.similarity >= 0.90);
assert_eq!(cand.recommended_rung, 2);
assert_eq!(cand.recommended_model, "qwen3-coder:480b");
}
#[tokio::test]
async fn record_replay_retires_pathway_on_failure_pattern() {
let mem = PathwayMemory::new(mk_store());
let trace = mk_trace("a", true, 0, 0);
let pid = trace.pathway_id.clone();
mem.insert(trace).await.unwrap();
// Three replays, all fail → success_rate = 0.0 → retired.
mem.record_replay_outcome(&pid, false).await.unwrap();
mem.record_replay_outcome(&pid, false).await.unwrap();
mem.record_replay_outcome(&pid, false).await.unwrap();
let stats = mem.stats().await;
assert_eq!(stats.retired, 1, "3 failures after insert must retire");
}
#[tokio::test]
async fn record_replay_does_not_retire_before_probation() {
let mem = PathwayMemory::new(mk_store());
let trace = mk_trace("a", true, 0, 0);
let pid = trace.pathway_id.clone();
mem.insert(trace).await.unwrap();
// Two replays (below probation of 3), both fail. Should NOT
// retire yet — probation requires minimum 3 data points.
mem.record_replay_outcome(&pid, false).await.unwrap();
mem.record_replay_outcome(&pid, false).await.unwrap();
let stats = mem.stats().await;
assert_eq!(stats.retired, 0, "only 2 replays → below probation floor");
}
#[tokio::test]
async fn retired_pathway_never_hot_swaps_again() {
let mem = PathwayMemory::new(mk_store());
let trace = mk_trace("a", true, 0, 0);
let pid = trace.pathway_id.clone();
let qvec = trace.pathway_vec.clone();
mem.insert(trace).await.unwrap();
for _ in 0..3 {
mem.record_replay_outcome(&pid, false).await.unwrap();
}
// Now record 10 successes to push success_rate well above 0.80.
// Pathway is still retired — retirement is sticky by design, to
// prevent oscillation on noise.
for _ in 0..10 {
mem.record_replay_outcome(&pid, true).await.unwrap();
}
let got = mem
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
.await;
assert!(got.is_none(), "retirement must be sticky");
}
#[tokio::test]
async fn pathway_vec_differs_for_different_models() {
// Two pathways with same fingerprint but different ladder
// models should have different embeddings so the similarity
// gate can discriminate. This is what enables narrow fingerprint
// + similarity-vec to cluster correctly.
let a = mk_trace("a", true, 5, 5);
let mut b = a.clone();
b.ladder_attempts[0].model = "kimi-k2:1t".into();
b.pathway_vec = build_pathway_vec(&b);
let sim = cosine(&a.pathway_vec, &b.pathway_vec);
assert!(sim < 1.0, "different models → different embeddings");
assert!(sim > 0.5, "shared fingerprint → embeddings still related");
}
}

View File

@ -13,7 +13,7 @@ use std::sync::Arc;
use aibridge::client::{AiClient, EmbedRequest, GenerateRequest}; use aibridge::client::{AiClient, EmbedRequest, GenerateRequest};
use catalogd::registry::Registry as CatalogRegistry; use catalogd::registry::Registry as CatalogRegistry;
use storaged::registry::BucketRegistry; use storaged::registry::BucketRegistry;
use crate::{agent, autotune, chunker, embedding_cache, harness, hnsw, index_registry, jobs, lance_backend, playbook_memory, promotion, rag, refresh, search, store, supervisor, trial}; use crate::{agent, autotune, chunker, embedding_cache, harness, hnsw, index_registry, jobs, lance_backend, pathway_memory, playbook_memory, promotion, rag, refresh, search, store, supervisor, trial};
use tokio::sync::Semaphore; use tokio::sync::Semaphore;
#[derive(Clone)] #[derive(Clone)]
@ -55,6 +55,11 @@ pub struct VectorState {
/// and, when `use_playbook_memory` is set on /vectors/hybrid, boosts /// and, when `use_playbook_memory` is set on /vectors/hybrid, boosts
/// workers that were actually filled in semantically-similar past ops. /// workers that were actually filled in semantically-similar past ops.
pub playbook_memory: playbook_memory::PlaybookMemory, pub playbook_memory: playbook_memory::PlaybookMemory,
/// Pathway memory — consensus-designed sidecar for full-context
/// backtracking + hot-swap of successful review pathways. See
/// crates/vectord/src/pathway_memory.rs for the design rationale
/// (10-probe N=3 ensemble, locked 2026-04-24).
pub pathway_memory: pathway_memory::PathwayMemory,
/// Serializes embed calls from seed_playbook_memory to avoid /// Serializes embed calls from seed_playbook_memory to avoid
/// concurrent socket collisions with the Python sidecar. /// concurrent socket collisions with the Python sidecar.
pub embed_semaphore: Arc<Semaphore>, pub embed_semaphore: Arc<Semaphore>,
@ -137,6 +142,15 @@ pub fn router(state: VectorState) -> Router {
// Phase 45 slice 3 — doc drift detection + human re-admission. // Phase 45 slice 3 — doc drift detection + human re-admission.
.route("/playbook_memory/doc_drift/check/{id}", post(check_doc_drift)) .route("/playbook_memory/doc_drift/check/{id}", post(check_doc_drift))
.route("/playbook_memory/doc_drift/resolve/{id}", post(resolve_doc_drift)) .route("/playbook_memory/doc_drift/resolve/{id}", post(resolve_doc_drift))
// Pathway memory — consensus-designed sidecar (2026-04-24).
// scrum_master_pipeline POSTs /pathway/insert at the end of each
// review, calls /pathway/query before running the ladder for a
// potential hot-swap, and posts /pathway/record_replay after a
// hot-swap succeeds or fails.
.route("/pathway/insert", post(pathway_insert))
.route("/pathway/query", post(pathway_query))
.route("/pathway/record_replay", post(pathway_record_replay))
.route("/pathway/stats", get(pathway_stats))
.with_state(state) .with_state(state)
} }
@ -2833,6 +2847,73 @@ async fn lance_build_scalar_index(
} }
} }
// ─── Pathway memory handlers ──────────────────────────────────────────
//
// Thin wrappers around pathway_memory::PathwayMemory. HTTP surface is
// deliberately small — four endpoints cover the full lifecycle:
// insert at end-of-review, query before running the ladder,
// record_replay after a hot-swap, and stats for the VCP UI.
#[derive(Deserialize)]
struct PathwayQueryRequest {
task_class: String,
file_path: String,
signal_class: Option<String>,
query_vec: Vec<f32>,
}
async fn pathway_insert(
State(state): State<VectorState>,
Json(trace): Json<pathway_memory::PathwayTrace>,
) -> impl IntoResponse {
match state.pathway_memory.insert(trace).await {
Ok(()) => Ok(Json(json!({"ok": true}))),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
async fn pathway_query(
State(state): State<VectorState>,
Json(req): Json<PathwayQueryRequest>,
) -> impl IntoResponse {
let cand = state
.pathway_memory
.query_hot_swap(
&req.task_class,
&req.file_path,
req.signal_class.as_deref(),
&req.query_vec,
)
.await;
// 200 with null candidate means "no hot-swap"; this is a normal
// path, not an error — callers should proceed with the full ladder.
Json(json!({ "candidate": cand }))
}
#[derive(Deserialize)]
struct PathwayReplayRequest {
pathway_id: String,
succeeded: bool,
}
async fn pathway_record_replay(
State(state): State<VectorState>,
Json(req): Json<PathwayReplayRequest>,
) -> impl IntoResponse {
match state
.pathway_memory
.record_replay_outcome(&req.pathway_id, req.succeeded)
.await
{
Ok(()) => Ok(Json(json!({"ok": true}))),
Err(e) => Err((StatusCode::NOT_FOUND, e)),
}
}
async fn pathway_stats(State(state): State<VectorState>) -> impl IntoResponse {
Json(state.pathway_memory.stats().await)
}
#[cfg(test)] #[cfg(test)]
mod extractor_tests { mod extractor_tests {
use super::*; use super::*;

View File

@ -157,6 +157,158 @@ async function embedBatch(texts: string[]): Promise<number[][]> {
return (await r.json() as any).embeddings; return (await r.json() as any).embeddings;
} }
// ─── Pathway memory (2026-04-24 consensus design) ───────────────────
//
// Mirrors vectord/src/pathway_memory.rs. The bucket-hash vector MUST
// byte-match the Rust implementation so traces written from TypeScript
// are searchable against the same embedding space. Verified by running
// both implementations on the same input tokens and asserting matching
// bucket indices.
function filePrefix(path: string): string {
return path.split("/").slice(0, 2).join("/");
}
function computePathwayId(taskClass: string, filePath: string, signalClass: string | null): string {
const h = createHash("sha256");
h.update(taskClass);
h.update("|");
h.update(filePrefix(filePath));
h.update("|");
h.update(signalClass ?? "");
return h.digest("hex");
}
// 32-bucket L2-normalized token hash. Same algorithm as Rust.
function buildPathwayVec(tokens: string[]): number[] {
const buckets = new Array(32).fill(0);
for (const t of tokens) {
const d = createHash("sha256").update(t, "utf8").digest();
const b1 = d[0] % 32;
const b2 = d[8] % 32;
buckets[b1] += 1;
buckets[b2] += 1;
}
let norm = 0;
for (const v of buckets) norm += v * v;
norm = Math.sqrt(norm);
if (norm > 0) for (let i = 0; i < buckets.length; i++) buckets[i] /= norm;
return buckets;
}
// Build the minimal query vector for a pre-ladder hot-swap check. We
// don't yet know the ladder attempts or KB chunks — the query vec is
// computed from what we CAN know up front: task/file/signal. This is
// a weaker embedding than the one computed at trace-insert time, but
// similarity still discriminates between task/file/signal combinations.
function buildQueryVec(taskClass: string, filePath: string, signalClass: string | null): number[] {
const tokens = [taskClass, filePath];
if (signalClass) tokens.push(`signal:${signalClass}`);
return buildPathwayVec(tokens);
}
interface HotSwapCandidate {
pathway_id: string;
similarity: number;
replay_count: number;
success_rate: number;
recommended_rung: number;
recommended_model: string;
}
async function queryHotSwap(taskClass: string, filePath: string, signalClass: string | null): Promise<HotSwapCandidate | null> {
try {
const query_vec = buildQueryVec(taskClass, filePath, signalClass);
const r = await fetch(`${GATEWAY}/vectors/pathway/query`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({ task_class: taskClass, file_path: filePath, signal_class: signalClass, query_vec }),
signal: AbortSignal.timeout(5000),
});
if (!r.ok) return null;
const j = await r.json() as { candidate: HotSwapCandidate | null };
return j.candidate ?? null;
} catch {
// Pathway service unavailable → run full ladder. Hot-swap is
// always an optimization, never a correctness requirement.
return null;
}
}
interface LadderAttemptRec {
rung: number;
model: string;
latency_ms: number;
accepted: boolean;
reject_reason: string | null;
}
interface PathwayTracePayload {
pathway_id: string;
task_class: string;
file_path: string;
signal_class: string | null;
created_at: string;
ladder_attempts: LadderAttemptRec[];
kb_chunks: { source_doc: string; chunk_id: string; cosine_score: number; rank: number }[];
observer_signals: { class: string; priors: string[]; prior_iter_outcomes: string[] }[];
bridge_hits: { library: string; version: string }[];
sub_pipeline_calls: { pipeline: string; result_summary: string }[];
audit_consensus: { pass: boolean; models: string[]; disagreements: number } | null;
reducer_summary: string;
final_verdict: string;
pathway_vec: number[];
replay_count: number;
replays_succeeded: number;
retired: boolean;
}
async function writePathwayTrace(trace: PathwayTracePayload): Promise<void> {
try {
await fetch(`${GATEWAY}/vectors/pathway/insert`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify(trace),
signal: AbortSignal.timeout(10000),
});
} catch {
// Fire-and-forget: scrum runs shouldn't fail if pathway insert fails.
}
}
async function recordPathwayReplay(pathwayId: string, succeeded: boolean): Promise<void> {
try {
await fetch(`${GATEWAY}/vectors/pathway/record_replay`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({ pathway_id: pathwayId, succeeded }),
signal: AbortSignal.timeout(5000),
});
} catch {
// Fire-and-forget. Not critical.
}
}
// Deterministic signal_class lookup from scrum_reviews.jsonl history.
// First-time files get `null`. Files seen before get the signal class
// the observer assigned on their most-recent review (if any). Keeps the
// pathway fingerprint stable across iterations for LOOPING files.
async function lookupSignalClass(filePath: string): Promise<string | null> {
try {
const { readFile } = await import("node:fs/promises");
const raw = await readFile(SCRUM_REVIEWS_JSONL, "utf8").catch(() => "");
if (!raw) return null;
const lines = raw.trim().split("\n").reverse();
for (const line of lines) {
try {
const r = JSON.parse(line);
if (r.file === filePath && r.signal_class) return r.signal_class;
} catch {}
}
return null;
} catch { return null; }
}
async function chat(opts: { async function chat(opts: {
provider: "ollama" | "ollama_cloud", provider: "ollama" | "ollama_cloud",
model: string, model: string,
@ -389,32 +541,63 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
let acceptedModel = ""; let acceptedModel = "";
let acceptedOn = 0; let acceptedOn = 0;
for (let i = 0; i < MAX_ATTEMPTS; i++) { // Pathway hot-swap pre-check. If a proven pathway exists for this
const n = i + 1; // (task, file_prefix, signal) combo with ≥3 replays at ≥80% success,
// skip the ladder and try its winning rung first. On success we
// record a positive replay; on failure we fall through to the full
// ladder and record a negative replay. Fire-and-forget — pathway
// service unavailable → null candidate → business as usual.
const signalClass = await lookupSignalClass(rel);
const taskClass = "scrum_review";
const hotSwap = await queryHotSwap(taskClass, rel, signalClass);
let hotSwapOrderedIndices: number[] | null = null;
if (hotSwap) {
// Reorder the ladder to try the recommended model first. Rung
// indices are preserved in the output so the trace still reflects
// the true ladder position the model sits at.
const recommendedIdx = LADDER.findIndex(r => r.model === hotSwap.recommended_model);
if (recommendedIdx >= 0) {
log(` 🔥 hot-swap candidate: ${hotSwap.recommended_model} (rung ${hotSwap.recommended_rung}, sim=${hotSwap.similarity.toFixed(3)}, success_rate=${hotSwap.success_rate.toFixed(2)}, ${hotSwap.replay_count} replays)`);
hotSwapOrderedIndices = [recommendedIdx, ...LADDER.map((_, i) => i).filter(i => i !== recommendedIdx)];
}
}
const ladderOrder = hotSwapOrderedIndices ?? LADDER.map((_, i) => i);
// Collect attempts for the pathway trace sidecar.
const pathwayAttempts: LadderAttemptRec[] = [];
for (let step = 0; step < MAX_ATTEMPTS; step++) {
const i = ladderOrder[step];
const n = step + 1;
const rung = LADDER[i]; const rung = LADDER[i];
const learning = history.length > 0 const learning = history.length > 0
? `\n\n═══ PRIOR ATTEMPTS FAILED. Specific issues to fix: ═══\n${history.map(h => `Attempt ${h.n} (${h.model}, ${h.chars} chars): ${h.status}${h.error ?? "thin/unstructured answer"}`).join("\n")}\n═══` ? `\n\n═══ PRIOR ATTEMPTS FAILED. Specific issues to fix: ═══\n${history.map(h => `Attempt ${h.n} (${h.model}, ${h.chars} chars): ${h.status}${h.error ?? "thin/unstructured answer"}`).join("\n")}\n═══`
: ""; : "";
log(` attempt ${n}/${MAX_ATTEMPTS}: ${rung.provider}::${rung.model}${learning ? " [w/ learning]" : ""}`); log(` attempt ${n}/${MAX_ATTEMPTS}: ${rung.provider}::${rung.model}${learning ? " [w/ learning]" : ""}`);
const attemptStarted = Date.now();
const r = await chat({ const r = await chat({
provider: rung.provider, provider: rung.provider,
model: rung.model, model: rung.model,
prompt: baseTask + learning, prompt: baseTask + learning,
max_tokens: 1500, max_tokens: 1500,
}); });
const attemptMs = Date.now() - attemptStarted;
if (r.error) { if (r.error) {
history.push({ n, model: rung.model, status: "error", chars: 0, error: r.error.slice(0, 180) }); history.push({ n, model: rung.model, status: "error", chars: 0, error: r.error.slice(0, 180) });
pathwayAttempts.push({ rung: i + 1, model: rung.model, latency_ms: attemptMs, accepted: false, reject_reason: `error: ${r.error.slice(0, 100)}` });
log(` ✗ error: ${r.error.slice(0, 80)}`); log(` ✗ error: ${r.error.slice(0, 80)}`);
continue; continue;
} }
if (!isAcceptable(r.content)) { if (!isAcceptable(r.content)) {
history.push({ n, model: rung.model, status: "thin", chars: r.content.length, error: `thin/unstructured (${r.content.length} chars)` }); history.push({ n, model: rung.model, status: "thin", chars: r.content.length, error: `thin/unstructured (${r.content.length} chars)` });
pathwayAttempts.push({ rung: i + 1, model: rung.model, latency_ms: attemptMs, accepted: false, reject_reason: `thin (${r.content.length} chars)` });
log(` ✗ thin/unstructured (${r.content.length} chars)`); log(` ✗ thin/unstructured (${r.content.length} chars)`);
continue; continue;
} }
history.push({ n, model: rung.model, status: "accepted", chars: r.content.length }); history.push({ n, model: rung.model, status: "accepted", chars: r.content.length });
pathwayAttempts.push({ rung: i + 1, model: rung.model, latency_ms: attemptMs, accepted: true, reject_reason: null });
accepted = r.content; accepted = r.content;
acceptedModel = `${rung.provider}/${rung.model}`; acceptedModel = `${rung.provider}/${rung.model}`;
acceptedOn = n; acceptedOn = n;
@ -422,6 +605,15 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
break; break;
} }
// Hot-swap bookkeeping: if we tried the recommended model first,
// report whether it worked so the pathway's success_rate updates.
if (hotSwap) {
const replaySucceeded = acceptedModel.endsWith(`/${hotSwap.recommended_model}`);
log(` pathway replay ${replaySucceeded ? "✓" : "✗"} (${hotSwap.pathway_id.slice(0, 12)}…)`);
// Fire and forget — don't await; observer can handle it.
recordPathwayReplay(hotSwap.pathway_id, replaySucceeded);
}
const review: FileReview = { const review: FileReview = {
file: rel, file: rel,
file_bytes: content.length, file_bytes: content.length,
@ -599,6 +791,54 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
console.error(`[scrum] failed to append scrum_reviews.jsonl: ${(e as Error).message}`); console.error(`[scrum] failed to append scrum_reviews.jsonl: ${(e as Error).message}`);
} }
// Pathway trace sidecar (consensus-designed 2026-04-24). Captures
// FULL context (ladder attempts, KB chunks, observer signal, verdict)
// for similarity-based hot-swap in future iterations. First-review
// pathways start in probation (replay_count=0); they become
// hot-swappable only after ≥3 replays at ≥80% success.
try {
const pathwayTrace: PathwayTracePayload = {
pathway_id: computePathwayId(taskClass, rel, signalClass),
task_class: taskClass,
file_path: rel,
signal_class: signalClass,
created_at: row.reviewed_at,
ladder_attempts: pathwayAttempts,
kb_chunks: [
...topPrd.map((c, idx) => ({
source_doc: "PRD.md", chunk_id: `prd@${c.offset}`, cosine_score: (c as any)._score ?? 0, rank: idx,
})),
...topPlan.map((c, idx) => ({
source_doc: "cohesion_plan", chunk_id: `plan@${c.offset}`, cosine_score: (c as any)._score ?? 0, rank: idx,
})),
],
observer_signals: signalClass ? [{ class: signalClass, priors: [], prior_iter_outcomes: [] }] : [],
bridge_hits: [], // context7 not wired into scrum yet; empty for v1
sub_pipeline_calls: [], // LLM Team extract happens after this row; out of scope for v1
audit_consensus: null, // set by auditor's later N=3 pass, via /pathway/insert update
reducer_summary: accepted.slice(0, 4000),
final_verdict: verdict ?? "accepted",
// Vec built from the full attempts/chunks — richer than the
// query-time vector. The similarity gate will still discriminate
// between pathways with the same fingerprint but different
// ladder/KB profiles.
pathway_vec: buildPathwayVec([
taskClass,
rel,
...(signalClass ? [`signal:${signalClass}`] : []),
...pathwayAttempts.flatMap(a => [`rung:${a.rung}`, `model:${a.model}`, `accepted:${a.accepted}`]),
...topPrd.map(c => `kb:PRD.md`),
...topPlan.map(c => `kb:cohesion_plan`),
]),
replay_count: 0,
replays_succeeded: 0,
retired: false,
};
writePathwayTrace(pathwayTrace); // fire-and-forget
} catch (e) {
console.error(`[scrum] pathway trace failed: ${(e as Error).message}`);
}
// Close the scrum → observer loop (fix 2026-04-24). Architecture // Close the scrum → observer loop (fix 2026-04-24). Architecture
// audit surfaced: observer ring had 2000 ops, 1999 from Langfuse, // audit surfaced: observer ring had 2000 ops, 1999 from Langfuse,
// zero from scrum. Observer's analyzeErrors + PLAYBOOK_BUILDER loops // zero from scrum. Observer's analyzeErrors + PLAYBOOK_BUILDER loops
@ -643,6 +883,20 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
critical_failures_count, critical_failures_count,
verified_components_count, verified_components_count,
missing_components_count, missing_components_count,
// Pathway fields: emitted on every review so the observer
// can build a full picture of hot-swap performance over time.
// `pathway_hot_swap_hit` flags whether the first-tried rung
// this review was a pathway recommendation vs the default
// ladder top. `rungs_saved` quantifies the compute we avoided
// when a hot-swap landed — this is the value metric the VCP
// UI surfaces ("avg_rungs_saved_per_commit").
pathway_hot_swap_hit: hotSwap !== null,
pathway_id: hotSwap?.pathway_id ?? null,
pathway_similarity: hotSwap?.similarity ?? null,
pathway_success_rate: hotSwap?.success_rate ?? null,
rungs_saved: hotSwap && acceptedModel.endsWith(`/${hotSwap.recommended_model}`)
? Math.max(0, hotSwap.recommended_rung - 1)
: 0,
ts: row.reviewed_at, ts: row.reviewed_at,
}), }),
signal: AbortSignal.timeout(3000), signal: AbortSignal.timeout(3000),

View File

@ -349,6 +349,44 @@ Bun.serve({
if (path === "/data/outcomes") return Response.json(await tailJsonl(`${KB}/outcomes.jsonl`, 30)); if (path === "/data/outcomes") return Response.json(await tailJsonl(`${KB}/outcomes.jsonl`, 30));
if (path === "/data/audit_facts") return Response.json(await tailJsonl(`${KB}/audit_facts.jsonl`, 30)); if (path === "/data/audit_facts") return Response.json(await tailJsonl(`${KB}/audit_facts.jsonl`, 30));
// Pathway memory — consensus-designed sidecar (2026-04-24). Two
// exposed metrics: reuse_rate (activity — is it firing?) and
// avg_rungs_saved_per_commit (value — is it earning its keep?).
// Round-3 consensus (qwen3.5:397b) pointed out that activity
// without value tells us nothing; the UI needs both to judge the
// health of the hot-swap learning loop.
if (path === "/data/pathway_stats") {
try {
const r = await fetch("http://localhost:3100/vectors/pathway/stats", { signal: AbortSignal.timeout(3000) });
if (!r.ok) return Response.json({ error: `vectord ${r.status}`, stats: null });
const stats = await r.json();
// Tail recent scrum events to compute avg_rungs_saved_per_commit
// (a committed review = any row in scrum_reviews.jsonl; rungs_saved
// only populates when pathway memory fired AND the recommended
// model actually produced the accept).
const reviews = await tailJsonl(`${KB}/scrum_reviews.jsonl`, 200);
let totalCommits = 0;
let totalRungsSaved = 0;
let hotSwapHits = 0;
for (const r of reviews) {
totalCommits++;
if (r.pathway_hot_swap_hit) hotSwapHits++;
if (typeof r.rungs_saved === "number") totalRungsSaved += r.rungs_saved;
}
return Response.json({
stats,
scrum_window: {
reviews: totalCommits,
hot_swap_hits: hotSwapHits,
pathway_reuse_rate: totalCommits ? hotSwapHits / totalCommits : 0,
avg_rungs_saved_per_commit: totalCommits ? totalRungsSaved / totalCommits : 0,
},
});
} catch (e) {
return Response.json({ error: (e as Error).message, stats: null });
}
}
if (path.startsWith("/data/file/")) { if (path.startsWith("/data/file/")) {
const relpath = decodeURIComponent(path.slice("/data/file/".length)); const relpath = decodeURIComponent(path.slice("/data/file/".length));
return Response.json(await fileHistory(relpath)); return Response.json(await fileHistory(relpath));

View File

@ -589,6 +589,37 @@ function metricBox(label, big, kind, opts = {}) {
function drawMetrics() { function drawMetrics() {
const grid = document.getElementById("metric-grid"); const grid = document.getElementById("metric-grid");
clear(grid); clear(grid);
// Kick off pathway fetch in parallel; render when it resolves so the
// rest of the metrics grid appears immediately. The cards append to
// the grid after the synchronous block below — they'll show up at
// the bottom of the grid within a tick of first render.
fetch("/data/pathway_stats").then(r => r.ok ? r.json() : null).then(j => {
if (!j || !j.stats) return;
const s = j.stats;
const w = j.scrum_window ?? {};
// Activity metric — is the hot-swap firing at all?
grid.append(metricBox("pathway reuse rate", `${Math.round((w.pathway_reuse_rate ?? 0) * 100)}%`,
(w.pathway_reuse_rate ?? 0) > 0.1 ? "good" : (w.pathway_reuse_rate ?? 0) > 0 ? "warn" : "bad", {
explain: "% of recent reviews where a pathway hot-swap fired (narrow fingerprint match + 0.80 success rate + ≥3 replays + audit_consensus pass + 0.90 similarity).",
source: `scrum_reviews.jsonl .pathway_hot_swap_hit over last ${w.reviews ?? 0} reviews (${w.hot_swap_hits ?? 0} hits)`,
good: "≥10% sustained = index earning its keep. <10% over many iters = fingerprint too narrow or probation too strict. 0% on fresh install is expected (no replays yet).",
}));
// Value metric — how much compute did hot-swap actually save?
const saved = w.avg_rungs_saved_per_commit ?? 0;
grid.append(metricBox("avg rungs saved", saved.toFixed(2),
saved >= 1 ? "good" : saved > 0 ? "warn" : "bad", {
explain: "Average ladder rungs skipped per committed review by hot-swap. Rungs_saved = recommended_rung - 1 when the recommended model succeeded (otherwise 0).",
source: "scrum_reviews.jsonl .rungs_saved averaged",
good: "Every 1.0 here ≈ one less model call per review. At 21 files/iter, 1.0 saved = 21 cloud calls avoided. Value only counts when the replay actually succeeded.",
}));
// Stability metric — retired pathways indicate the learning loop is correcting itself.
grid.append(metricBox("pathways tracked", String(s.total_pathways),
s.total_pathways > 0 ? "good" : "warn", {
explain: `Total pathway traces stored. ${s.retired} retired (below 0.80 success after ≥3 replays). ${s.with_audit_pass} audit-passed, eligible for hot-swap probation.`,
source: "/vectors/pathway/stats",
good: `Grows monotonically with scrum runs. Retired=${s.retired} is HEALTHY — it means the learning loop is pruning pathways that stopped working. replay_success_rate=${(s.replay_success_rate*100).toFixed(0)}% aggregates all historical replays.`,
}));
}).catch(() => {});
const byTier = { auto:0, dry_run:0, simulation:0, block:0, unknown:0 }; const byTier = { auto:0, dry_run:0, simulation:0, block:0, unknown:0 };
state.reviews.forEach(r => { const t = r.gradient_tier ?? "unknown"; if (byTier[t] != null) byTier[t]++; }); state.reviews.forEach(r => { const t = r.gradient_tier ?? "unknown"; if (byTier[t] != null) byTier[t]++; });
const total = state.reviews.length || 1; const total = state.reviews.length || 1;