pathway_memory: consensus-designed sidecar + hot-swap learning loop
Some checks failed
lakehouse/auditor 11 warnings — see review
Some checks failed
lakehouse/auditor 11 warnings — see review
10-probe N=3 consensus (kimi-k2:1t / gpt-oss:120b / qwen3.5:latest /
deepseek-v3.1:671b / qwen3-coder:480b / mistral-large-3:675b /
qwen3.5:397b + 2 stability re-probes; 2 openrouter probes 429'd) locked
the design across three rounds. Full JSON responses in
data/_kb/consensus_reducer_design_{mocq3akn,mocq6pi1,mocqatik}.json.
What it does
Preserves FULL backtrack context per reviewed file (ladder attempts +
latencies + reject reasons, KB chunks with provenance + cosine + rank,
observer signals, context7 bridge hits, sub-pipeline calls, audit
consensus) and indexes them by narrow fingerprint for hot-swap of
proven review pathways.
When scrum reviews a file:
1. narrow fingerprint = task_class + file_prefix + signal_class
2. query_hot_swap checks pathway memory for a match that passes
probation (≥3 replays @ ≥80% success) + audit gate + similarity
(≥0.90 cosine on normalized-metadata-token embedding)
3. if hot-swap eligible, recommended model tried first in the ladder
4. replay outcome reported back, updating the pathway's success_rate
5. pathways below 0.80 after ≥3 replays retire permanently (sticky)
6. full PathwayTrace always inserted at end of review — hot-swap
grows with use, it doesn't bootstrap from nothing
Gate design is load-bearing:
- narrow fingerprint (6 of 8 consensus models converged on the same
3-field composition; lock) — enables generalization within crate
- probation ≥3 replays — binomial tail at 80% is ~5%, below is noise
- success rate ≥0.80 — mistral + qwen3-coder independently proposed
this exact threshold across two rounds
- similarity ≥0.90 — middle of the 0.85/0.95 consensus spread
- bootstrap: null audit_consensus ALLOWED (auditor → pathway update
not wired yet; probation + success_rate gates alone enforce safety
during bootstrap; explicit audit FAIL still blocks)
- retirement is sticky — prevents oscillation on noise
Files
+ crates/vectord/src/pathway_memory.rs (new, 600 lines + 18 tests)
PathwayTrace, LadderAttempt, KbChunkRef, ObserverSignal, BridgeHit,
SubPipelineCall, AuditConsensus, HotSwapCandidate, PathwayMemory,
PathwayMemoryStats. 18/18 tests green.
Cosine + 32-bucket L2-normalized embedding; mirror of TS impl.
M crates/vectord/src/lib.rs
pub mod pathway_memory;
M crates/vectord/src/service.rs
VectorState grows pathway_memory field;
4 HTTP handlers (/pathway/insert, /pathway/query,
/pathway/record_replay, /pathway/stats).
M crates/gateway/src/main.rs
Construct PathwayMemory + load from storage on boot,
wire into VectorState.
M tests/real-world/scrum_master_pipeline.ts
Byte-matching TS bucket-hash (verified same bucket indices as
Rust); pre-ladder hot-swap query; ladder reorder on hit;
per-attempt latency capture; post-accept trace insert
(fire-and-forget); replay outcome recording;
observer /event emits pathway_hot_swap_hit, pathway_similarity,
rungs_saved per review for the VCP UI.
M ui/server.ts
/data/pathway_stats aggregates /vectors/pathway/stats +
scrum_reviews.jsonl window for the value metric.
M ui/ui.js
Three new metric cards:
· pathway reuse rate (activity: is it firing?)
· avg rungs saved (value: is it earning its keep?)
· pathways tracked (stability: retirement = learning)
What's not in this commit (queued)
- auditor → pathway audit_consensus update wire (explicit audit-fail
block activates when this lands)
- bridge_hits + sub_pipeline_calls population from context7 / LLM
Team extract results (fields wired, callers not yet)
- replay log (PathwayReplayOutcome {matched_id, succeeded, ts}) as
a separate jsonl for forensic audit of why specific replays failed
Why > summarization
Summaries discard the causal chain. With this, auditor can verify
citation provenance, applier can distinguish lucky from learned paths,
and the matrix indexing actually stores end-to-end pathways instead of
just RAG chunks — which is what J meant by "why aren't we using it
for everything."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
9cc0ceb894
commit
2f8b347f37
1
Cargo.lock
generated
1
Cargo.lock
generated
@ -6898,6 +6898,7 @@ dependencies = [
|
|||||||
"storaged",
|
"storaged",
|
||||||
"tokio",
|
"tokio",
|
||||||
"tracing",
|
"tracing",
|
||||||
|
"truth",
|
||||||
"url",
|
"url",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|||||||
@ -93,6 +93,12 @@ async fn main() {
|
|||||||
// operators call POST /vectors/playbook_memory/rebuild to populate.
|
// operators call POST /vectors/playbook_memory/rebuild to populate.
|
||||||
let pbm = vectord::playbook_memory::PlaybookMemory::new(store.clone());
|
let pbm = vectord::playbook_memory::PlaybookMemory::new(store.clone());
|
||||||
let _ = pbm.load_from_storage().await;
|
let _ = pbm.load_from_storage().await;
|
||||||
|
// Pathway memory — consensus-designed sidecar for full-context
|
||||||
|
// backtracking + hot-swap of successful review pathways. Same
|
||||||
|
// load-on-boot pattern as playbook_memory: empty state is fine,
|
||||||
|
// operators start populating via scrum_master_pipeline.ts.
|
||||||
|
let pwm = vectord::pathway_memory::PathwayMemory::new(store.clone());
|
||||||
|
let _ = pwm.load_from_storage().await;
|
||||||
|
|
||||||
// Phase 16.2: spawn the autotune agent. When config.agent.enabled=false
|
// Phase 16.2: spawn the autotune agent. When config.agent.enabled=false
|
||||||
// this returns a handle that drops triggers silently — no surprise load.
|
// this returns a handle that drops triggers silently — no surprise load.
|
||||||
@ -178,6 +184,7 @@ async fn main() {
|
|||||||
bucket_registry.clone(), index_reg.clone(),
|
bucket_registry.clone(), index_reg.clone(),
|
||||||
),
|
),
|
||||||
playbook_memory: pbm,
|
playbook_memory: pbm,
|
||||||
|
pathway_memory: pwm,
|
||||||
embed_semaphore: std::sync::Arc::new(tokio::sync::Semaphore::new(1)),
|
embed_semaphore: std::sync::Arc::new(tokio::sync::Semaphore::new(1)),
|
||||||
}))
|
}))
|
||||||
.nest("/workspaces", queryd::workspace_service::router(workspace_mgr))
|
.nest("/workspaces", queryd::workspace_service::router(workspace_mgr))
|
||||||
|
|||||||
@ -8,6 +8,7 @@ pub mod hnsw;
|
|||||||
pub mod index_registry;
|
pub mod index_registry;
|
||||||
pub mod jobs;
|
pub mod jobs;
|
||||||
pub mod playbook_memory;
|
pub mod playbook_memory;
|
||||||
|
pub mod pathway_memory;
|
||||||
pub mod doc_drift;
|
pub mod doc_drift;
|
||||||
pub mod promotion;
|
pub mod promotion;
|
||||||
pub mod refresh;
|
pub mod refresh;
|
||||||
|
|||||||
704
crates/vectord/src/pathway_memory.rs
Normal file
704
crates/vectord/src/pathway_memory.rs
Normal file
@ -0,0 +1,704 @@
|
|||||||
|
//! Pathway memory — full backtrack-able context for scrum/auditor reviews.
|
||||||
|
//!
|
||||||
|
//! Consensus-designed (10-probe N=3 ensemble, see
|
||||||
|
//! `data/_kb/consensus_reducer_design_*.json`). The reducer emits a
|
||||||
|
//! `PathwayTrace` sidecar alongside its legacy summary. Traces are
|
||||||
|
//! fingerprinted narrowly (`task_class + file_prefix + signal_class`) for
|
||||||
|
//! generalizing hot-swap, and embedded via normalized-metadata-token
|
||||||
|
//! concatenation so the HNSW similarity search can discriminate between
|
||||||
|
//! pathways that share a fingerprint but diverged in ladder/KB choices.
|
||||||
|
//!
|
||||||
|
//! The hot-swap decision requires four conditions in AND:
|
||||||
|
//! 1. narrow fingerprint match
|
||||||
|
//! 2. audit_consensus.pass == true
|
||||||
|
//! 3. replay_count >= 3
|
||||||
|
//! 4. replays_succeeded / replay_count >= 0.80
|
||||||
|
//! 5. NOT retired
|
||||||
|
//! 6. similarity(new, stored) >= 0.90
|
||||||
|
//!
|
||||||
|
//! Any replay reports its outcome via `record_replay_outcome`; pathways
|
||||||
|
//! whose success rate drops below 0.80 after >=3 replays are marked
|
||||||
|
//! retired and excluded from further hot-swap consideration. This is the
|
||||||
|
//! self-correcting learning loop — a pathway that worked once but breaks
|
||||||
|
//! under distribution shift removes itself automatically.
|
||||||
|
|
||||||
|
use std::collections::HashMap;
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use chrono::{DateTime, Utc};
|
||||||
|
use object_store::ObjectStore;
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use sha2::{Digest, Sha256};
|
||||||
|
use storaged::ops;
|
||||||
|
use tokio::sync::RwLock;
|
||||||
|
|
||||||
|
const STATE_KEY: &str = "_pathway_memory/state.json";
|
||||||
|
|
||||||
|
/// Outcome of one ladder rung attempt. Captured for every attempt,
|
||||||
|
/// regardless of whether it was accepted — rejections are signal too.
|
||||||
|
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
|
||||||
|
pub struct LadderAttempt {
|
||||||
|
pub rung: u8,
|
||||||
|
pub model: String,
|
||||||
|
pub latency_ms: u64,
|
||||||
|
pub accepted: bool,
|
||||||
|
pub reject_reason: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Provenance of a RAG chunk retrieved for this review. The
|
||||||
|
/// `cosine_score` is the similarity as returned by the index; `rank` is
|
||||||
|
/// 0-indexed order in the top-K result list.
|
||||||
|
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
|
||||||
|
pub struct KbChunkRef {
|
||||||
|
pub source_doc: String,
|
||||||
|
pub chunk_id: String,
|
||||||
|
pub cosine_score: f32,
|
||||||
|
pub rank: u8,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Signal emitted by mcp-server/observer classifier.
|
||||||
|
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
|
||||||
|
pub struct ObserverSignal {
|
||||||
|
pub class: String,
|
||||||
|
pub priors: Vec<String>,
|
||||||
|
pub prior_iter_outcomes: Vec<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Context7-bridge lookup snapshot.
|
||||||
|
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
|
||||||
|
pub struct BridgeHit {
|
||||||
|
pub library: String,
|
||||||
|
pub version: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Call to LLM Team (/api/run?mode=extract) or auditor N=3 consensus.
|
||||||
|
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
|
||||||
|
pub struct SubPipelineCall {
|
||||||
|
pub pipeline: String, // "llm_team_extract" / "audit_consensus" / etc.
|
||||||
|
pub result_summary: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// N=3 independent consensus re-check result.
|
||||||
|
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
|
||||||
|
pub struct AuditConsensus {
|
||||||
|
pub pass: bool,
|
||||||
|
pub models: Vec<String>,
|
||||||
|
pub disagreements: u32,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Full backtrack-able context for one reviewed file. Lives alongside
|
||||||
|
/// the reducer's summary — summary is what the reviewer LLM sees, this
|
||||||
|
/// is what the auditor / future iterations / hot-swap use.
|
||||||
|
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
|
||||||
|
pub struct PathwayTrace {
|
||||||
|
pub pathway_id: String, // SHA256(task_class|file_prefix|signal_class)
|
||||||
|
pub task_class: String,
|
||||||
|
pub file_path: String,
|
||||||
|
pub signal_class: Option<String>,
|
||||||
|
pub created_at: DateTime<Utc>,
|
||||||
|
|
||||||
|
pub ladder_attempts: Vec<LadderAttempt>,
|
||||||
|
pub kb_chunks: Vec<KbChunkRef>,
|
||||||
|
pub observer_signals: Vec<ObserverSignal>,
|
||||||
|
pub bridge_hits: Vec<BridgeHit>,
|
||||||
|
pub sub_pipeline_calls: Vec<SubPipelineCall>,
|
||||||
|
pub audit_consensus: Option<AuditConsensus>,
|
||||||
|
|
||||||
|
pub reducer_summary: String,
|
||||||
|
pub final_verdict: String,
|
||||||
|
|
||||||
|
/// Normalized-metadata-token embedding. Dimension fixed per index
|
||||||
|
/// version (current: 32, sufficient to distinguish task/file/signal
|
||||||
|
/// combinations without requiring an external embedding model —
|
||||||
|
/// round-3 consensus said "small metadata tokens", not "full JSON").
|
||||||
|
pub pathway_vec: Vec<f32>,
|
||||||
|
|
||||||
|
/// Number of times this pathway has been replayed via hot-swap.
|
||||||
|
/// Replay only begins after first insert; initial insert itself is
|
||||||
|
/// NOT a replay. Probation of ≥3 replays is required before the
|
||||||
|
/// success-rate gate can fire.
|
||||||
|
pub replay_count: u32,
|
||||||
|
pub replays_succeeded: u32,
|
||||||
|
/// Marked true when replay_count >= 3 AND success_rate < 0.80.
|
||||||
|
/// Retired pathways are excluded from hot-swap forever. (If the
|
||||||
|
/// underlying file / task / signal characteristics genuinely change
|
||||||
|
/// such that a retired pathway would work again, a new PathwayTrace
|
||||||
|
/// with a fresh id will be inserted — retirement is per-id.)
|
||||||
|
pub retired: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl PathwayTrace {
|
||||||
|
/// Compute the narrow fingerprint id from task_class + file_prefix
|
||||||
|
/// + signal_class. `file_prefix` is the first path segment
|
||||||
|
/// ("crates/queryd", not "crates/queryd/src/service.rs") so that
|
||||||
|
/// related files in the same crate share pathways.
|
||||||
|
pub fn compute_id(task_class: &str, file_path: &str, signal_class: Option<&str>) -> String {
|
||||||
|
let prefix = file_prefix(file_path);
|
||||||
|
let sig = signal_class.unwrap_or("");
|
||||||
|
let mut hasher = Sha256::new();
|
||||||
|
hasher.update(task_class.as_bytes());
|
||||||
|
hasher.update(b"|");
|
||||||
|
hasher.update(prefix.as_bytes());
|
||||||
|
hasher.update(b"|");
|
||||||
|
hasher.update(sig.as_bytes());
|
||||||
|
format!("{:x}", hasher.finalize())
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn success_rate(&self) -> f32 {
|
||||||
|
if self.replay_count == 0 {
|
||||||
|
return 0.0;
|
||||||
|
}
|
||||||
|
self.replays_succeeded as f32 / self.replay_count as f32
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// First two path segments, so `crates/queryd/src/service.rs` →
|
||||||
|
/// `crates/queryd`. This is intentional — similar files in the same
|
||||||
|
/// crate often share task characteristics (e.g., all files in
|
||||||
|
/// `crates/queryd/` are SQL-path Rust code), so fingerprinting on the
|
||||||
|
/// crate-level prefix lets the hot-swap generalize across files within
|
||||||
|
/// the crate. Exactly-matching file paths still match (same prefix).
|
||||||
|
pub fn file_prefix(path: &str) -> String {
|
||||||
|
let parts: Vec<&str> = path.split('/').take(2).collect();
|
||||||
|
parts.join("/")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Build the pathway vector from trace metadata. Intentionally simple —
|
||||||
|
/// deterministic bag-of-tokens hash into 32 buckets, normalized. Round-3
|
||||||
|
/// consensus said "small metadata tokens, not full JSON." An external
|
||||||
|
/// embedding model would work too but adds a dependency, failure mode,
|
||||||
|
/// and drift risk the consensus flagged.
|
||||||
|
pub fn build_pathway_vec(trace: &PathwayTrace) -> Vec<f32> {
|
||||||
|
let mut buckets = vec![0f32; 32];
|
||||||
|
let mut tokens: Vec<String> = Vec::new();
|
||||||
|
tokens.push(trace.task_class.clone());
|
||||||
|
tokens.push(trace.file_path.clone());
|
||||||
|
if let Some(s) = &trace.signal_class {
|
||||||
|
tokens.push(format!("signal:{s}"));
|
||||||
|
}
|
||||||
|
for a in &trace.ladder_attempts {
|
||||||
|
tokens.push(format!("rung:{}", a.rung));
|
||||||
|
tokens.push(format!("model:{}", a.model));
|
||||||
|
tokens.push(format!("accepted:{}", a.accepted));
|
||||||
|
}
|
||||||
|
for k in &trace.kb_chunks {
|
||||||
|
tokens.push(format!("kb:{}", k.source_doc));
|
||||||
|
}
|
||||||
|
for o in &trace.observer_signals {
|
||||||
|
tokens.push(format!("class:{}", o.class));
|
||||||
|
}
|
||||||
|
for b in &trace.bridge_hits {
|
||||||
|
tokens.push(format!("lib:{}", b.library));
|
||||||
|
}
|
||||||
|
for s in &trace.sub_pipeline_calls {
|
||||||
|
tokens.push(format!("pipeline:{}", s.pipeline));
|
||||||
|
}
|
||||||
|
|
||||||
|
for t in &tokens {
|
||||||
|
let mut h = Sha256::new();
|
||||||
|
h.update(t.as_bytes());
|
||||||
|
let d = h.finalize();
|
||||||
|
// Two bucket writes per token: use different byte windows to
|
||||||
|
// spread probability across buckets even when tokens share a
|
||||||
|
// common prefix.
|
||||||
|
let b1 = (d[0] as usize) % 32;
|
||||||
|
let b2 = (d[8] as usize) % 32;
|
||||||
|
buckets[b1] += 1.0;
|
||||||
|
buckets[b2] += 1.0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// L2 normalize so cosine similarity becomes a dot product.
|
||||||
|
let norm: f32 = buckets.iter().map(|v| v * v).sum::<f32>().sqrt();
|
||||||
|
if norm > 0.0 {
|
||||||
|
for v in &mut buckets {
|
||||||
|
*v /= norm;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
buckets
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn cosine(a: &[f32], b: &[f32]) -> f32 {
|
||||||
|
if a.len() != b.len() {
|
||||||
|
return 0.0;
|
||||||
|
}
|
||||||
|
a.iter().zip(b.iter()).map(|(x, y)| x * y).sum::<f32>()
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Default, Clone, Serialize, Deserialize)]
|
||||||
|
struct PathwayMemoryState {
|
||||||
|
pathways: HashMap<String, Vec<PathwayTrace>>, // key = pathway_id (narrow fingerprint)
|
||||||
|
last_updated_at: i64,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub struct PathwayMemory {
|
||||||
|
state: Arc<RwLock<PathwayMemoryState>>,
|
||||||
|
store: Arc<dyn ObjectStore>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
pub struct HotSwapCandidate {
|
||||||
|
pub pathway_id: String,
|
||||||
|
pub similarity: f32,
|
||||||
|
pub replay_count: u32,
|
||||||
|
pub success_rate: f32,
|
||||||
|
pub recommended_rung: u8,
|
||||||
|
pub recommended_model: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl PathwayMemory {
|
||||||
|
pub fn new(store: Arc<dyn ObjectStore>) -> Self {
|
||||||
|
Self {
|
||||||
|
state: Arc::new(RwLock::new(PathwayMemoryState::default())),
|
||||||
|
store,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub async fn load_from_storage(&self) -> Result<usize, String> {
|
||||||
|
let data = match ops::get(&self.store, STATE_KEY).await {
|
||||||
|
Ok(d) => d,
|
||||||
|
Err(_) => return Ok(0),
|
||||||
|
};
|
||||||
|
let persisted: PathwayMemoryState = serde_json::from_slice(&data)
|
||||||
|
.map_err(|e| format!("parse pathway_memory state: {e}"))?;
|
||||||
|
let n: usize = persisted.pathways.values().map(|v| v.len()).sum();
|
||||||
|
*self.state.write().await = persisted;
|
||||||
|
tracing::info!("pathway_memory: loaded {n} traces from {STATE_KEY}");
|
||||||
|
Ok(n)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn persist(&self) -> Result<(), String> {
|
||||||
|
let snapshot = self.state.read().await.clone();
|
||||||
|
let bytes = serde_json::to_vec_pretty(&snapshot).map_err(|e| e.to_string())?;
|
||||||
|
ops::put(&self.store, STATE_KEY, bytes.into()).await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Insert a new pathway trace. Called by scrum_master_pipeline at
|
||||||
|
/// the end of each file's review. Computes the pathway_vec from
|
||||||
|
/// metadata if the caller didn't supply one. Appends to the bucket
|
||||||
|
/// for this pathway_id — multiple traces can share a fingerprint
|
||||||
|
/// (each represents one review of the same file/task/signal combo).
|
||||||
|
pub async fn insert(&self, mut trace: PathwayTrace) -> Result<(), String> {
|
||||||
|
if trace.pathway_vec.is_empty() {
|
||||||
|
trace.pathway_vec = build_pathway_vec(&trace);
|
||||||
|
}
|
||||||
|
let mut s = self.state.write().await;
|
||||||
|
s.pathways
|
||||||
|
.entry(trace.pathway_id.clone())
|
||||||
|
.or_default()
|
||||||
|
.push(trace);
|
||||||
|
s.last_updated_at = Utc::now().timestamp_millis();
|
||||||
|
drop(s);
|
||||||
|
self.persist().await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Query for a hot-swap candidate. Returns `None` if no eligible
|
||||||
|
/// pathway exists — caller should run the full ladder. Returns
|
||||||
|
/// `Some(cand)` if all gates pass — caller can short-circuit to
|
||||||
|
/// `cand.recommended_rung` / `cand.recommended_model`.
|
||||||
|
///
|
||||||
|
/// Gates (all must hold):
|
||||||
|
/// - narrow fingerprint match (same task/file_prefix/signal)
|
||||||
|
/// - audit_consensus.pass == true on the stored trace
|
||||||
|
/// - replay_count >= 3 (probation)
|
||||||
|
/// - success_rate >= 0.80
|
||||||
|
/// - NOT retired
|
||||||
|
/// - similarity(query_vec, stored.pathway_vec) >= 0.90
|
||||||
|
pub async fn query_hot_swap(
|
||||||
|
&self,
|
||||||
|
task_class: &str,
|
||||||
|
file_path: &str,
|
||||||
|
signal_class: Option<&str>,
|
||||||
|
query_vec: &[f32],
|
||||||
|
) -> Option<HotSwapCandidate> {
|
||||||
|
let id = PathwayTrace::compute_id(task_class, file_path, signal_class);
|
||||||
|
let s = self.state.read().await;
|
||||||
|
let candidates = s.pathways.get(&id)?;
|
||||||
|
let mut best: Option<(f32, &PathwayTrace)> = None;
|
||||||
|
for p in candidates {
|
||||||
|
if p.retired {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
// audit_consensus gate: explicit FAIL blocks hot-swap. A null
|
||||||
|
// audit_consensus (auditor hasn't seen this pathway yet) is
|
||||||
|
// NOT a block — the success_rate gate below still requires
|
||||||
|
// ≥3 real-world replays at ≥80% success before a pathway
|
||||||
|
// becomes hot-swap eligible, so the learning loop itself
|
||||||
|
// provides the safety net during bootstrap. Once the auditor
|
||||||
|
// pipeline wires pathway audit updates, this gate tightens
|
||||||
|
// automatically: any explicit audit_consensus.pass == false
|
||||||
|
// here will skip the candidate.
|
||||||
|
if let Some(ac) = &p.audit_consensus {
|
||||||
|
if !ac.pass {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if p.replay_count < 3 {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if p.success_rate() < 0.80 {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
let sim = cosine(query_vec, &p.pathway_vec);
|
||||||
|
if sim < 0.90 {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if best.as_ref().map(|(b, _)| sim > *b).unwrap_or(true) {
|
||||||
|
best = Some((sim, p));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
let (similarity, p) = best?;
|
||||||
|
// The "recommended" rung is the first accepted attempt in the
|
||||||
|
// stored pathway — that's the one the ladder converged on.
|
||||||
|
let accepted = p.ladder_attempts.iter().find(|a| a.accepted)?;
|
||||||
|
Some(HotSwapCandidate {
|
||||||
|
pathway_id: p.pathway_id.clone(),
|
||||||
|
similarity,
|
||||||
|
replay_count: p.replay_count,
|
||||||
|
success_rate: p.success_rate(),
|
||||||
|
recommended_rung: accepted.rung,
|
||||||
|
recommended_model: accepted.model.clone(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Record the outcome of a hot-swap replay. Increments replay_count
|
||||||
|
/// unconditionally; increments replays_succeeded iff succeeded;
|
||||||
|
/// retires the pathway if replay_count >= 3 and success_rate falls
|
||||||
|
/// below 0.80. Mistral's learning loop in code.
|
||||||
|
pub async fn record_replay_outcome(
|
||||||
|
&self,
|
||||||
|
pathway_id: &str,
|
||||||
|
succeeded: bool,
|
||||||
|
) -> Result<(), String> {
|
||||||
|
let mut s = self.state.write().await;
|
||||||
|
// Find the specific pathway across the bucket that matches by
|
||||||
|
// full id (the bucket key is already the narrow id, but in case
|
||||||
|
// of future multi-trace-per-id we take the most recent).
|
||||||
|
let bucket = s
|
||||||
|
.pathways
|
||||||
|
.iter_mut()
|
||||||
|
.find(|(k, _)| k.as_str() == pathway_id)
|
||||||
|
.map(|(_, v)| v)
|
||||||
|
.ok_or_else(|| format!("pathway {pathway_id} not found"))?;
|
||||||
|
let p = bucket
|
||||||
|
.last_mut()
|
||||||
|
.ok_or_else(|| format!("pathway {pathway_id} has empty bucket"))?;
|
||||||
|
p.replay_count = p.replay_count.saturating_add(1);
|
||||||
|
if succeeded {
|
||||||
|
p.replays_succeeded = p.replays_succeeded.saturating_add(1);
|
||||||
|
}
|
||||||
|
if p.replay_count >= 3 && p.success_rate() < 0.80 {
|
||||||
|
p.retired = true;
|
||||||
|
}
|
||||||
|
s.last_updated_at = Utc::now().timestamp_millis();
|
||||||
|
drop(s);
|
||||||
|
self.persist().await
|
||||||
|
}
|
||||||
|
|
||||||
|
pub async fn stats(&self) -> PathwayMemoryStats {
|
||||||
|
let s = self.state.read().await;
|
||||||
|
let mut total = 0usize;
|
||||||
|
let mut retired = 0usize;
|
||||||
|
let mut with_audit_pass = 0usize;
|
||||||
|
let mut total_replays = 0u64;
|
||||||
|
let mut successful_replays = 0u64;
|
||||||
|
for bucket in s.pathways.values() {
|
||||||
|
for p in bucket {
|
||||||
|
total += 1;
|
||||||
|
if p.retired {
|
||||||
|
retired += 1;
|
||||||
|
}
|
||||||
|
if p.audit_consensus.as_ref().map(|a| a.pass).unwrap_or(false) {
|
||||||
|
with_audit_pass += 1;
|
||||||
|
}
|
||||||
|
total_replays += p.replay_count as u64;
|
||||||
|
successful_replays += p.replays_succeeded as u64;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
PathwayMemoryStats {
|
||||||
|
total_pathways: total,
|
||||||
|
retired,
|
||||||
|
with_audit_pass,
|
||||||
|
total_replays,
|
||||||
|
successful_replays,
|
||||||
|
reuse_rate: if total == 0 {
|
||||||
|
0.0
|
||||||
|
} else {
|
||||||
|
total_replays as f32 / total as f32
|
||||||
|
},
|
||||||
|
replay_success_rate: if total_replays == 0 {
|
||||||
|
0.0
|
||||||
|
} else {
|
||||||
|
successful_replays as f32 / total_replays as f32
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize, Deserialize)]
|
||||||
|
pub struct PathwayMemoryStats {
|
||||||
|
pub total_pathways: usize,
|
||||||
|
pub retired: usize,
|
||||||
|
pub with_audit_pass: usize,
|
||||||
|
pub total_replays: u64,
|
||||||
|
pub successful_replays: u64,
|
||||||
|
pub reuse_rate: f32, // total_replays / total_pathways
|
||||||
|
pub replay_success_rate: f32, // successful_replays / total_replays
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use object_store::memory::InMemory;
|
||||||
|
|
||||||
|
fn mk_store() -> Arc<dyn ObjectStore> {
|
||||||
|
Arc::new(InMemory::new())
|
||||||
|
}
|
||||||
|
|
||||||
|
fn mk_trace(id_tag: &str, audit_pass: bool, replays: u32, succ: u32) -> PathwayTrace {
|
||||||
|
let pathway_id =
|
||||||
|
PathwayTrace::compute_id("scrum_review", &format!("crates/{id_tag}/src/x.rs"), Some("CONVERGING"));
|
||||||
|
let attempts = vec![LadderAttempt {
|
||||||
|
rung: 2,
|
||||||
|
model: "qwen3-coder:480b".into(),
|
||||||
|
latency_ms: 1000,
|
||||||
|
accepted: true,
|
||||||
|
reject_reason: None,
|
||||||
|
}];
|
||||||
|
let mut trace = PathwayTrace {
|
||||||
|
pathway_id,
|
||||||
|
task_class: "scrum_review".into(),
|
||||||
|
file_path: format!("crates/{id_tag}/src/x.rs"),
|
||||||
|
signal_class: Some("CONVERGING".into()),
|
||||||
|
created_at: Utc::now(),
|
||||||
|
ladder_attempts: attempts,
|
||||||
|
kb_chunks: vec![KbChunkRef {
|
||||||
|
source_doc: "PRD.md".into(),
|
||||||
|
chunk_id: "c1".into(),
|
||||||
|
cosine_score: 0.88,
|
||||||
|
rank: 0,
|
||||||
|
}],
|
||||||
|
observer_signals: vec![],
|
||||||
|
bridge_hits: vec![],
|
||||||
|
sub_pipeline_calls: vec![],
|
||||||
|
audit_consensus: Some(AuditConsensus {
|
||||||
|
pass: audit_pass,
|
||||||
|
models: vec!["qwen3-coder:480b".into(), "gpt-oss:120b".into(), "kimi-k2:1t".into()],
|
||||||
|
disagreements: 0,
|
||||||
|
}),
|
||||||
|
reducer_summary: "ok".into(),
|
||||||
|
final_verdict: "accepted".into(),
|
||||||
|
pathway_vec: vec![],
|
||||||
|
replay_count: replays,
|
||||||
|
replays_succeeded: succ,
|
||||||
|
retired: false,
|
||||||
|
};
|
||||||
|
trace.pathway_vec = build_pathway_vec(&trace);
|
||||||
|
trace
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn file_prefix_takes_first_two_segments() {
|
||||||
|
assert_eq!(file_prefix("crates/queryd/src/service.rs"), "crates/queryd");
|
||||||
|
assert_eq!(file_prefix("crates/gateway"), "crates/gateway");
|
||||||
|
assert_eq!(file_prefix("README.md"), "README.md");
|
||||||
|
assert_eq!(file_prefix(""), "");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn compute_id_is_deterministic() {
|
||||||
|
let a = PathwayTrace::compute_id("scrum", "crates/queryd/src/x.rs", Some("LOOPING"));
|
||||||
|
let b = PathwayTrace::compute_id("scrum", "crates/queryd/src/x.rs", Some("LOOPING"));
|
||||||
|
assert_eq!(a, b);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn compute_id_generalizes_across_same_prefix() {
|
||||||
|
// Same prefix + task + signal → same id. That IS the narrow
|
||||||
|
// generalization — it's what lets hot-swap fire for different
|
||||||
|
// files in the same crate that share the task/signal profile.
|
||||||
|
let a = PathwayTrace::compute_id("scrum", "crates/queryd/src/a.rs", Some("L"));
|
||||||
|
let b = PathwayTrace::compute_id("scrum", "crates/queryd/src/b.rs", Some("L"));
|
||||||
|
assert_eq!(a, b);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn compute_id_differs_on_signal_class() {
|
||||||
|
let a = PathwayTrace::compute_id("scrum", "crates/q/s", Some("CONVERGING"));
|
||||||
|
let b = PathwayTrace::compute_id("scrum", "crates/q/s", Some("LOOPING"));
|
||||||
|
assert_ne!(a, b);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn cosine_handles_mismatched_lengths() {
|
||||||
|
assert_eq!(cosine(&[1.0, 0.0], &[1.0]), 0.0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn cosine_of_identical_normalized_is_one() {
|
||||||
|
let v = vec![0.6, 0.8];
|
||||||
|
let c = cosine(&v, &v);
|
||||||
|
assert!((c - 1.0).abs() < 1e-5);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn success_rate_is_zero_before_any_replay() {
|
||||||
|
let t = mk_trace("a", true, 0, 0);
|
||||||
|
assert_eq!(t.success_rate(), 0.0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn success_rate_ratio() {
|
||||||
|
let t = mk_trace("a", true, 4, 3);
|
||||||
|
assert!((t.success_rate() - 0.75).abs() < 1e-5);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn insert_and_stats_roundtrip() {
|
||||||
|
let mem = PathwayMemory::new(mk_store());
|
||||||
|
mem.insert(mk_trace("a", true, 0, 0)).await.unwrap();
|
||||||
|
let stats = mem.stats().await;
|
||||||
|
assert_eq!(stats.total_pathways, 1);
|
||||||
|
assert_eq!(stats.retired, 0);
|
||||||
|
assert_eq!(stats.with_audit_pass, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn hot_swap_rejects_when_probation_not_met() {
|
||||||
|
// Probation: replay_count must be >= 3 before success-rate gate
|
||||||
|
// can fire. A fresh pathway with 0 replays must NEVER hot-swap
|
||||||
|
// even if its similarity is 1.0 and audit passes.
|
||||||
|
let mem = PathwayMemory::new(mk_store());
|
||||||
|
let trace = mk_trace("a", true, 0, 0);
|
||||||
|
let qvec = trace.pathway_vec.clone();
|
||||||
|
mem.insert(trace).await.unwrap();
|
||||||
|
let got = mem
|
||||||
|
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
|
||||||
|
.await;
|
||||||
|
assert!(got.is_none(), "fresh pathway must not hot-swap");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn hot_swap_rejects_when_audit_explicitly_fails() {
|
||||||
|
let mem = PathwayMemory::new(mk_store());
|
||||||
|
let trace = mk_trace("a", false, 5, 5); // audit FAILED explicitly
|
||||||
|
let qvec = trace.pathway_vec.clone();
|
||||||
|
mem.insert(trace).await.unwrap();
|
||||||
|
let got = mem
|
||||||
|
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
|
||||||
|
.await;
|
||||||
|
assert!(got.is_none(), "pathway with explicit audit FAIL must not hot-swap");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn hot_swap_accepts_unaudited_pathway_for_bootstrap() {
|
||||||
|
// v1 bootstrap: auditor doesn't update pathway audit_consensus
|
||||||
|
// until Phase N+1 wires it. Until then, null audit_consensus
|
||||||
|
// must NOT block hot-swap — the success_rate + probation gates
|
||||||
|
// alone prove safety. Once auditor wires up, explicit audit
|
||||||
|
// failures will re-introduce the block (see previous test).
|
||||||
|
let mem = PathwayMemory::new(mk_store());
|
||||||
|
let mut trace = mk_trace("a", true, 5, 5);
|
||||||
|
trace.audit_consensus = None; // bootstrap path
|
||||||
|
trace.pathway_vec = build_pathway_vec(&trace);
|
||||||
|
let qvec = trace.pathway_vec.clone();
|
||||||
|
mem.insert(trace).await.unwrap();
|
||||||
|
let got = mem
|
||||||
|
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
|
||||||
|
.await;
|
||||||
|
assert!(got.is_some(), "unaudited pathway with good replay history must hot-swap");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn hot_swap_rejects_when_success_rate_below_80pct() {
|
||||||
|
// 10 replays, 7 succeeded = 70% — below the 0.80 threshold.
|
||||||
|
let mem = PathwayMemory::new(mk_store());
|
||||||
|
let trace = mk_trace("a", true, 10, 7);
|
||||||
|
let qvec = trace.pathway_vec.clone();
|
||||||
|
mem.insert(trace).await.unwrap();
|
||||||
|
let got = mem
|
||||||
|
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
|
||||||
|
.await;
|
||||||
|
assert!(got.is_none());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn hot_swap_accepts_when_all_gates_pass() {
|
||||||
|
let mem = PathwayMemory::new(mk_store());
|
||||||
|
let trace = mk_trace("a", true, 5, 5); // 100% success after 5 replays
|
||||||
|
let qvec = trace.pathway_vec.clone();
|
||||||
|
mem.insert(trace).await.unwrap();
|
||||||
|
let got = mem
|
||||||
|
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
|
||||||
|
.await;
|
||||||
|
let cand = got.expect("should hot-swap");
|
||||||
|
assert!(cand.similarity >= 0.90);
|
||||||
|
assert_eq!(cand.recommended_rung, 2);
|
||||||
|
assert_eq!(cand.recommended_model, "qwen3-coder:480b");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn record_replay_retires_pathway_on_failure_pattern() {
|
||||||
|
let mem = PathwayMemory::new(mk_store());
|
||||||
|
let trace = mk_trace("a", true, 0, 0);
|
||||||
|
let pid = trace.pathway_id.clone();
|
||||||
|
mem.insert(trace).await.unwrap();
|
||||||
|
// Three replays, all fail → success_rate = 0.0 → retired.
|
||||||
|
mem.record_replay_outcome(&pid, false).await.unwrap();
|
||||||
|
mem.record_replay_outcome(&pid, false).await.unwrap();
|
||||||
|
mem.record_replay_outcome(&pid, false).await.unwrap();
|
||||||
|
let stats = mem.stats().await;
|
||||||
|
assert_eq!(stats.retired, 1, "3 failures after insert must retire");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn record_replay_does_not_retire_before_probation() {
|
||||||
|
let mem = PathwayMemory::new(mk_store());
|
||||||
|
let trace = mk_trace("a", true, 0, 0);
|
||||||
|
let pid = trace.pathway_id.clone();
|
||||||
|
mem.insert(trace).await.unwrap();
|
||||||
|
// Two replays (below probation of 3), both fail. Should NOT
|
||||||
|
// retire yet — probation requires minimum 3 data points.
|
||||||
|
mem.record_replay_outcome(&pid, false).await.unwrap();
|
||||||
|
mem.record_replay_outcome(&pid, false).await.unwrap();
|
||||||
|
let stats = mem.stats().await;
|
||||||
|
assert_eq!(stats.retired, 0, "only 2 replays → below probation floor");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn retired_pathway_never_hot_swaps_again() {
|
||||||
|
let mem = PathwayMemory::new(mk_store());
|
||||||
|
let trace = mk_trace("a", true, 0, 0);
|
||||||
|
let pid = trace.pathway_id.clone();
|
||||||
|
let qvec = trace.pathway_vec.clone();
|
||||||
|
mem.insert(trace).await.unwrap();
|
||||||
|
for _ in 0..3 {
|
||||||
|
mem.record_replay_outcome(&pid, false).await.unwrap();
|
||||||
|
}
|
||||||
|
// Now record 10 successes to push success_rate well above 0.80.
|
||||||
|
// Pathway is still retired — retirement is sticky by design, to
|
||||||
|
// prevent oscillation on noise.
|
||||||
|
for _ in 0..10 {
|
||||||
|
mem.record_replay_outcome(&pid, true).await.unwrap();
|
||||||
|
}
|
||||||
|
let got = mem
|
||||||
|
.query_hot_swap("scrum_review", "crates/a/src/x.rs", Some("CONVERGING"), &qvec)
|
||||||
|
.await;
|
||||||
|
assert!(got.is_none(), "retirement must be sticky");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn pathway_vec_differs_for_different_models() {
|
||||||
|
// Two pathways with same fingerprint but different ladder
|
||||||
|
// models should have different embeddings so the similarity
|
||||||
|
// gate can discriminate. This is what enables narrow fingerprint
|
||||||
|
// + similarity-vec to cluster correctly.
|
||||||
|
let a = mk_trace("a", true, 5, 5);
|
||||||
|
let mut b = a.clone();
|
||||||
|
b.ladder_attempts[0].model = "kimi-k2:1t".into();
|
||||||
|
b.pathway_vec = build_pathway_vec(&b);
|
||||||
|
let sim = cosine(&a.pathway_vec, &b.pathway_vec);
|
||||||
|
assert!(sim < 1.0, "different models → different embeddings");
|
||||||
|
assert!(sim > 0.5, "shared fingerprint → embeddings still related");
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -13,7 +13,7 @@ use std::sync::Arc;
|
|||||||
use aibridge::client::{AiClient, EmbedRequest, GenerateRequest};
|
use aibridge::client::{AiClient, EmbedRequest, GenerateRequest};
|
||||||
use catalogd::registry::Registry as CatalogRegistry;
|
use catalogd::registry::Registry as CatalogRegistry;
|
||||||
use storaged::registry::BucketRegistry;
|
use storaged::registry::BucketRegistry;
|
||||||
use crate::{agent, autotune, chunker, embedding_cache, harness, hnsw, index_registry, jobs, lance_backend, playbook_memory, promotion, rag, refresh, search, store, supervisor, trial};
|
use crate::{agent, autotune, chunker, embedding_cache, harness, hnsw, index_registry, jobs, lance_backend, pathway_memory, playbook_memory, promotion, rag, refresh, search, store, supervisor, trial};
|
||||||
use tokio::sync::Semaphore;
|
use tokio::sync::Semaphore;
|
||||||
|
|
||||||
#[derive(Clone)]
|
#[derive(Clone)]
|
||||||
@ -55,6 +55,11 @@ pub struct VectorState {
|
|||||||
/// and, when `use_playbook_memory` is set on /vectors/hybrid, boosts
|
/// and, when `use_playbook_memory` is set on /vectors/hybrid, boosts
|
||||||
/// workers that were actually filled in semantically-similar past ops.
|
/// workers that were actually filled in semantically-similar past ops.
|
||||||
pub playbook_memory: playbook_memory::PlaybookMemory,
|
pub playbook_memory: playbook_memory::PlaybookMemory,
|
||||||
|
/// Pathway memory — consensus-designed sidecar for full-context
|
||||||
|
/// backtracking + hot-swap of successful review pathways. See
|
||||||
|
/// crates/vectord/src/pathway_memory.rs for the design rationale
|
||||||
|
/// (10-probe N=3 ensemble, locked 2026-04-24).
|
||||||
|
pub pathway_memory: pathway_memory::PathwayMemory,
|
||||||
/// Serializes embed calls from seed_playbook_memory to avoid
|
/// Serializes embed calls from seed_playbook_memory to avoid
|
||||||
/// concurrent socket collisions with the Python sidecar.
|
/// concurrent socket collisions with the Python sidecar.
|
||||||
pub embed_semaphore: Arc<Semaphore>,
|
pub embed_semaphore: Arc<Semaphore>,
|
||||||
@ -137,6 +142,15 @@ pub fn router(state: VectorState) -> Router {
|
|||||||
// Phase 45 slice 3 — doc drift detection + human re-admission.
|
// Phase 45 slice 3 — doc drift detection + human re-admission.
|
||||||
.route("/playbook_memory/doc_drift/check/{id}", post(check_doc_drift))
|
.route("/playbook_memory/doc_drift/check/{id}", post(check_doc_drift))
|
||||||
.route("/playbook_memory/doc_drift/resolve/{id}", post(resolve_doc_drift))
|
.route("/playbook_memory/doc_drift/resolve/{id}", post(resolve_doc_drift))
|
||||||
|
// Pathway memory — consensus-designed sidecar (2026-04-24).
|
||||||
|
// scrum_master_pipeline POSTs /pathway/insert at the end of each
|
||||||
|
// review, calls /pathway/query before running the ladder for a
|
||||||
|
// potential hot-swap, and posts /pathway/record_replay after a
|
||||||
|
// hot-swap succeeds or fails.
|
||||||
|
.route("/pathway/insert", post(pathway_insert))
|
||||||
|
.route("/pathway/query", post(pathway_query))
|
||||||
|
.route("/pathway/record_replay", post(pathway_record_replay))
|
||||||
|
.route("/pathway/stats", get(pathway_stats))
|
||||||
.with_state(state)
|
.with_state(state)
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -2833,6 +2847,73 @@ async fn lance_build_scalar_index(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ─── Pathway memory handlers ──────────────────────────────────────────
|
||||||
|
//
|
||||||
|
// Thin wrappers around pathway_memory::PathwayMemory. HTTP surface is
|
||||||
|
// deliberately small — four endpoints cover the full lifecycle:
|
||||||
|
// insert at end-of-review, query before running the ladder,
|
||||||
|
// record_replay after a hot-swap, and stats for the VCP UI.
|
||||||
|
|
||||||
|
#[derive(Deserialize)]
|
||||||
|
struct PathwayQueryRequest {
|
||||||
|
task_class: String,
|
||||||
|
file_path: String,
|
||||||
|
signal_class: Option<String>,
|
||||||
|
query_vec: Vec<f32>,
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn pathway_insert(
|
||||||
|
State(state): State<VectorState>,
|
||||||
|
Json(trace): Json<pathway_memory::PathwayTrace>,
|
||||||
|
) -> impl IntoResponse {
|
||||||
|
match state.pathway_memory.insert(trace).await {
|
||||||
|
Ok(()) => Ok(Json(json!({"ok": true}))),
|
||||||
|
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn pathway_query(
|
||||||
|
State(state): State<VectorState>,
|
||||||
|
Json(req): Json<PathwayQueryRequest>,
|
||||||
|
) -> impl IntoResponse {
|
||||||
|
let cand = state
|
||||||
|
.pathway_memory
|
||||||
|
.query_hot_swap(
|
||||||
|
&req.task_class,
|
||||||
|
&req.file_path,
|
||||||
|
req.signal_class.as_deref(),
|
||||||
|
&req.query_vec,
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
// 200 with null candidate means "no hot-swap"; this is a normal
|
||||||
|
// path, not an error — callers should proceed with the full ladder.
|
||||||
|
Json(json!({ "candidate": cand }))
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Deserialize)]
|
||||||
|
struct PathwayReplayRequest {
|
||||||
|
pathway_id: String,
|
||||||
|
succeeded: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn pathway_record_replay(
|
||||||
|
State(state): State<VectorState>,
|
||||||
|
Json(req): Json<PathwayReplayRequest>,
|
||||||
|
) -> impl IntoResponse {
|
||||||
|
match state
|
||||||
|
.pathway_memory
|
||||||
|
.record_replay_outcome(&req.pathway_id, req.succeeded)
|
||||||
|
.await
|
||||||
|
{
|
||||||
|
Ok(()) => Ok(Json(json!({"ok": true}))),
|
||||||
|
Err(e) => Err((StatusCode::NOT_FOUND, e)),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn pathway_stats(State(state): State<VectorState>) -> impl IntoResponse {
|
||||||
|
Json(state.pathway_memory.stats().await)
|
||||||
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod extractor_tests {
|
mod extractor_tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
|
|||||||
@ -157,6 +157,158 @@ async function embedBatch(texts: string[]): Promise<number[][]> {
|
|||||||
return (await r.json() as any).embeddings;
|
return (await r.json() as any).embeddings;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ─── Pathway memory (2026-04-24 consensus design) ───────────────────
|
||||||
|
//
|
||||||
|
// Mirrors vectord/src/pathway_memory.rs. The bucket-hash vector MUST
|
||||||
|
// byte-match the Rust implementation so traces written from TypeScript
|
||||||
|
// are searchable against the same embedding space. Verified by running
|
||||||
|
// both implementations on the same input tokens and asserting matching
|
||||||
|
// bucket indices.
|
||||||
|
|
||||||
|
function filePrefix(path: string): string {
|
||||||
|
return path.split("/").slice(0, 2).join("/");
|
||||||
|
}
|
||||||
|
|
||||||
|
function computePathwayId(taskClass: string, filePath: string, signalClass: string | null): string {
|
||||||
|
const h = createHash("sha256");
|
||||||
|
h.update(taskClass);
|
||||||
|
h.update("|");
|
||||||
|
h.update(filePrefix(filePath));
|
||||||
|
h.update("|");
|
||||||
|
h.update(signalClass ?? "");
|
||||||
|
return h.digest("hex");
|
||||||
|
}
|
||||||
|
|
||||||
|
// 32-bucket L2-normalized token hash. Same algorithm as Rust.
|
||||||
|
function buildPathwayVec(tokens: string[]): number[] {
|
||||||
|
const buckets = new Array(32).fill(0);
|
||||||
|
for (const t of tokens) {
|
||||||
|
const d = createHash("sha256").update(t, "utf8").digest();
|
||||||
|
const b1 = d[0] % 32;
|
||||||
|
const b2 = d[8] % 32;
|
||||||
|
buckets[b1] += 1;
|
||||||
|
buckets[b2] += 1;
|
||||||
|
}
|
||||||
|
let norm = 0;
|
||||||
|
for (const v of buckets) norm += v * v;
|
||||||
|
norm = Math.sqrt(norm);
|
||||||
|
if (norm > 0) for (let i = 0; i < buckets.length; i++) buckets[i] /= norm;
|
||||||
|
return buckets;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build the minimal query vector for a pre-ladder hot-swap check. We
|
||||||
|
// don't yet know the ladder attempts or KB chunks — the query vec is
|
||||||
|
// computed from what we CAN know up front: task/file/signal. This is
|
||||||
|
// a weaker embedding than the one computed at trace-insert time, but
|
||||||
|
// similarity still discriminates between task/file/signal combinations.
|
||||||
|
function buildQueryVec(taskClass: string, filePath: string, signalClass: string | null): number[] {
|
||||||
|
const tokens = [taskClass, filePath];
|
||||||
|
if (signalClass) tokens.push(`signal:${signalClass}`);
|
||||||
|
return buildPathwayVec(tokens);
|
||||||
|
}
|
||||||
|
|
||||||
|
interface HotSwapCandidate {
|
||||||
|
pathway_id: string;
|
||||||
|
similarity: number;
|
||||||
|
replay_count: number;
|
||||||
|
success_rate: number;
|
||||||
|
recommended_rung: number;
|
||||||
|
recommended_model: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function queryHotSwap(taskClass: string, filePath: string, signalClass: string | null): Promise<HotSwapCandidate | null> {
|
||||||
|
try {
|
||||||
|
const query_vec = buildQueryVec(taskClass, filePath, signalClass);
|
||||||
|
const r = await fetch(`${GATEWAY}/vectors/pathway/query`, {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "content-type": "application/json" },
|
||||||
|
body: JSON.stringify({ task_class: taskClass, file_path: filePath, signal_class: signalClass, query_vec }),
|
||||||
|
signal: AbortSignal.timeout(5000),
|
||||||
|
});
|
||||||
|
if (!r.ok) return null;
|
||||||
|
const j = await r.json() as { candidate: HotSwapCandidate | null };
|
||||||
|
return j.candidate ?? null;
|
||||||
|
} catch {
|
||||||
|
// Pathway service unavailable → run full ladder. Hot-swap is
|
||||||
|
// always an optimization, never a correctness requirement.
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
interface LadderAttemptRec {
|
||||||
|
rung: number;
|
||||||
|
model: string;
|
||||||
|
latency_ms: number;
|
||||||
|
accepted: boolean;
|
||||||
|
reject_reason: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface PathwayTracePayload {
|
||||||
|
pathway_id: string;
|
||||||
|
task_class: string;
|
||||||
|
file_path: string;
|
||||||
|
signal_class: string | null;
|
||||||
|
created_at: string;
|
||||||
|
ladder_attempts: LadderAttemptRec[];
|
||||||
|
kb_chunks: { source_doc: string; chunk_id: string; cosine_score: number; rank: number }[];
|
||||||
|
observer_signals: { class: string; priors: string[]; prior_iter_outcomes: string[] }[];
|
||||||
|
bridge_hits: { library: string; version: string }[];
|
||||||
|
sub_pipeline_calls: { pipeline: string; result_summary: string }[];
|
||||||
|
audit_consensus: { pass: boolean; models: string[]; disagreements: number } | null;
|
||||||
|
reducer_summary: string;
|
||||||
|
final_verdict: string;
|
||||||
|
pathway_vec: number[];
|
||||||
|
replay_count: number;
|
||||||
|
replays_succeeded: number;
|
||||||
|
retired: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function writePathwayTrace(trace: PathwayTracePayload): Promise<void> {
|
||||||
|
try {
|
||||||
|
await fetch(`${GATEWAY}/vectors/pathway/insert`, {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "content-type": "application/json" },
|
||||||
|
body: JSON.stringify(trace),
|
||||||
|
signal: AbortSignal.timeout(10000),
|
||||||
|
});
|
||||||
|
} catch {
|
||||||
|
// Fire-and-forget: scrum runs shouldn't fail if pathway insert fails.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function recordPathwayReplay(pathwayId: string, succeeded: boolean): Promise<void> {
|
||||||
|
try {
|
||||||
|
await fetch(`${GATEWAY}/vectors/pathway/record_replay`, {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "content-type": "application/json" },
|
||||||
|
body: JSON.stringify({ pathway_id: pathwayId, succeeded }),
|
||||||
|
signal: AbortSignal.timeout(5000),
|
||||||
|
});
|
||||||
|
} catch {
|
||||||
|
// Fire-and-forget. Not critical.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Deterministic signal_class lookup from scrum_reviews.jsonl history.
|
||||||
|
// First-time files get `null`. Files seen before get the signal class
|
||||||
|
// the observer assigned on their most-recent review (if any). Keeps the
|
||||||
|
// pathway fingerprint stable across iterations for LOOPING files.
|
||||||
|
async function lookupSignalClass(filePath: string): Promise<string | null> {
|
||||||
|
try {
|
||||||
|
const { readFile } = await import("node:fs/promises");
|
||||||
|
const raw = await readFile(SCRUM_REVIEWS_JSONL, "utf8").catch(() => "");
|
||||||
|
if (!raw) return null;
|
||||||
|
const lines = raw.trim().split("\n").reverse();
|
||||||
|
for (const line of lines) {
|
||||||
|
try {
|
||||||
|
const r = JSON.parse(line);
|
||||||
|
if (r.file === filePath && r.signal_class) return r.signal_class;
|
||||||
|
} catch {}
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
} catch { return null; }
|
||||||
|
}
|
||||||
|
|
||||||
async function chat(opts: {
|
async function chat(opts: {
|
||||||
provider: "ollama" | "ollama_cloud",
|
provider: "ollama" | "ollama_cloud",
|
||||||
model: string,
|
model: string,
|
||||||
@ -389,32 +541,63 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
|
|||||||
let acceptedModel = "";
|
let acceptedModel = "";
|
||||||
let acceptedOn = 0;
|
let acceptedOn = 0;
|
||||||
|
|
||||||
for (let i = 0; i < MAX_ATTEMPTS; i++) {
|
// Pathway hot-swap pre-check. If a proven pathway exists for this
|
||||||
const n = i + 1;
|
// (task, file_prefix, signal) combo with ≥3 replays at ≥80% success,
|
||||||
|
// skip the ladder and try its winning rung first. On success we
|
||||||
|
// record a positive replay; on failure we fall through to the full
|
||||||
|
// ladder and record a negative replay. Fire-and-forget — pathway
|
||||||
|
// service unavailable → null candidate → business as usual.
|
||||||
|
const signalClass = await lookupSignalClass(rel);
|
||||||
|
const taskClass = "scrum_review";
|
||||||
|
const hotSwap = await queryHotSwap(taskClass, rel, signalClass);
|
||||||
|
let hotSwapOrderedIndices: number[] | null = null;
|
||||||
|
if (hotSwap) {
|
||||||
|
// Reorder the ladder to try the recommended model first. Rung
|
||||||
|
// indices are preserved in the output so the trace still reflects
|
||||||
|
// the true ladder position the model sits at.
|
||||||
|
const recommendedIdx = LADDER.findIndex(r => r.model === hotSwap.recommended_model);
|
||||||
|
if (recommendedIdx >= 0) {
|
||||||
|
log(` 🔥 hot-swap candidate: ${hotSwap.recommended_model} (rung ${hotSwap.recommended_rung}, sim=${hotSwap.similarity.toFixed(3)}, success_rate=${hotSwap.success_rate.toFixed(2)}, ${hotSwap.replay_count} replays)`);
|
||||||
|
hotSwapOrderedIndices = [recommendedIdx, ...LADDER.map((_, i) => i).filter(i => i !== recommendedIdx)];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
const ladderOrder = hotSwapOrderedIndices ?? LADDER.map((_, i) => i);
|
||||||
|
|
||||||
|
// Collect attempts for the pathway trace sidecar.
|
||||||
|
const pathwayAttempts: LadderAttemptRec[] = [];
|
||||||
|
|
||||||
|
for (let step = 0; step < MAX_ATTEMPTS; step++) {
|
||||||
|
const i = ladderOrder[step];
|
||||||
|
const n = step + 1;
|
||||||
const rung = LADDER[i];
|
const rung = LADDER[i];
|
||||||
const learning = history.length > 0
|
const learning = history.length > 0
|
||||||
? `\n\n═══ PRIOR ATTEMPTS FAILED. Specific issues to fix: ═══\n${history.map(h => `Attempt ${h.n} (${h.model}, ${h.chars} chars): ${h.status} — ${h.error ?? "thin/unstructured answer"}`).join("\n")}\n═══`
|
? `\n\n═══ PRIOR ATTEMPTS FAILED. Specific issues to fix: ═══\n${history.map(h => `Attempt ${h.n} (${h.model}, ${h.chars} chars): ${h.status} — ${h.error ?? "thin/unstructured answer"}`).join("\n")}\n═══`
|
||||||
: "";
|
: "";
|
||||||
|
|
||||||
log(` attempt ${n}/${MAX_ATTEMPTS}: ${rung.provider}::${rung.model}${learning ? " [w/ learning]" : ""}`);
|
log(` attempt ${n}/${MAX_ATTEMPTS}: ${rung.provider}::${rung.model}${learning ? " [w/ learning]" : ""}`);
|
||||||
|
const attemptStarted = Date.now();
|
||||||
const r = await chat({
|
const r = await chat({
|
||||||
provider: rung.provider,
|
provider: rung.provider,
|
||||||
model: rung.model,
|
model: rung.model,
|
||||||
prompt: baseTask + learning,
|
prompt: baseTask + learning,
|
||||||
max_tokens: 1500,
|
max_tokens: 1500,
|
||||||
});
|
});
|
||||||
|
const attemptMs = Date.now() - attemptStarted;
|
||||||
|
|
||||||
if (r.error) {
|
if (r.error) {
|
||||||
history.push({ n, model: rung.model, status: "error", chars: 0, error: r.error.slice(0, 180) });
|
history.push({ n, model: rung.model, status: "error", chars: 0, error: r.error.slice(0, 180) });
|
||||||
|
pathwayAttempts.push({ rung: i + 1, model: rung.model, latency_ms: attemptMs, accepted: false, reject_reason: `error: ${r.error.slice(0, 100)}` });
|
||||||
log(` ✗ error: ${r.error.slice(0, 80)}`);
|
log(` ✗ error: ${r.error.slice(0, 80)}`);
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
if (!isAcceptable(r.content)) {
|
if (!isAcceptable(r.content)) {
|
||||||
history.push({ n, model: rung.model, status: "thin", chars: r.content.length, error: `thin/unstructured (${r.content.length} chars)` });
|
history.push({ n, model: rung.model, status: "thin", chars: r.content.length, error: `thin/unstructured (${r.content.length} chars)` });
|
||||||
|
pathwayAttempts.push({ rung: i + 1, model: rung.model, latency_ms: attemptMs, accepted: false, reject_reason: `thin (${r.content.length} chars)` });
|
||||||
log(` ✗ thin/unstructured (${r.content.length} chars)`);
|
log(` ✗ thin/unstructured (${r.content.length} chars)`);
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
history.push({ n, model: rung.model, status: "accepted", chars: r.content.length });
|
history.push({ n, model: rung.model, status: "accepted", chars: r.content.length });
|
||||||
|
pathwayAttempts.push({ rung: i + 1, model: rung.model, latency_ms: attemptMs, accepted: true, reject_reason: null });
|
||||||
accepted = r.content;
|
accepted = r.content;
|
||||||
acceptedModel = `${rung.provider}/${rung.model}`;
|
acceptedModel = `${rung.provider}/${rung.model}`;
|
||||||
acceptedOn = n;
|
acceptedOn = n;
|
||||||
@ -422,6 +605,15 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
|
|||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Hot-swap bookkeeping: if we tried the recommended model first,
|
||||||
|
// report whether it worked so the pathway's success_rate updates.
|
||||||
|
if (hotSwap) {
|
||||||
|
const replaySucceeded = acceptedModel.endsWith(`/${hotSwap.recommended_model}`);
|
||||||
|
log(` pathway replay ${replaySucceeded ? "✓" : "✗"} (${hotSwap.pathway_id.slice(0, 12)}…)`);
|
||||||
|
// Fire and forget — don't await; observer can handle it.
|
||||||
|
recordPathwayReplay(hotSwap.pathway_id, replaySucceeded);
|
||||||
|
}
|
||||||
|
|
||||||
const review: FileReview = {
|
const review: FileReview = {
|
||||||
file: rel,
|
file: rel,
|
||||||
file_bytes: content.length,
|
file_bytes: content.length,
|
||||||
@ -599,6 +791,54 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
|
|||||||
console.error(`[scrum] failed to append scrum_reviews.jsonl: ${(e as Error).message}`);
|
console.error(`[scrum] failed to append scrum_reviews.jsonl: ${(e as Error).message}`);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Pathway trace sidecar (consensus-designed 2026-04-24). Captures
|
||||||
|
// FULL context (ladder attempts, KB chunks, observer signal, verdict)
|
||||||
|
// for similarity-based hot-swap in future iterations. First-review
|
||||||
|
// pathways start in probation (replay_count=0); they become
|
||||||
|
// hot-swappable only after ≥3 replays at ≥80% success.
|
||||||
|
try {
|
||||||
|
const pathwayTrace: PathwayTracePayload = {
|
||||||
|
pathway_id: computePathwayId(taskClass, rel, signalClass),
|
||||||
|
task_class: taskClass,
|
||||||
|
file_path: rel,
|
||||||
|
signal_class: signalClass,
|
||||||
|
created_at: row.reviewed_at,
|
||||||
|
ladder_attempts: pathwayAttempts,
|
||||||
|
kb_chunks: [
|
||||||
|
...topPrd.map((c, idx) => ({
|
||||||
|
source_doc: "PRD.md", chunk_id: `prd@${c.offset}`, cosine_score: (c as any)._score ?? 0, rank: idx,
|
||||||
|
})),
|
||||||
|
...topPlan.map((c, idx) => ({
|
||||||
|
source_doc: "cohesion_plan", chunk_id: `plan@${c.offset}`, cosine_score: (c as any)._score ?? 0, rank: idx,
|
||||||
|
})),
|
||||||
|
],
|
||||||
|
observer_signals: signalClass ? [{ class: signalClass, priors: [], prior_iter_outcomes: [] }] : [],
|
||||||
|
bridge_hits: [], // context7 not wired into scrum yet; empty for v1
|
||||||
|
sub_pipeline_calls: [], // LLM Team extract happens after this row; out of scope for v1
|
||||||
|
audit_consensus: null, // set by auditor's later N=3 pass, via /pathway/insert update
|
||||||
|
reducer_summary: accepted.slice(0, 4000),
|
||||||
|
final_verdict: verdict ?? "accepted",
|
||||||
|
// Vec built from the full attempts/chunks — richer than the
|
||||||
|
// query-time vector. The similarity gate will still discriminate
|
||||||
|
// between pathways with the same fingerprint but different
|
||||||
|
// ladder/KB profiles.
|
||||||
|
pathway_vec: buildPathwayVec([
|
||||||
|
taskClass,
|
||||||
|
rel,
|
||||||
|
...(signalClass ? [`signal:${signalClass}`] : []),
|
||||||
|
...pathwayAttempts.flatMap(a => [`rung:${a.rung}`, `model:${a.model}`, `accepted:${a.accepted}`]),
|
||||||
|
...topPrd.map(c => `kb:PRD.md`),
|
||||||
|
...topPlan.map(c => `kb:cohesion_plan`),
|
||||||
|
]),
|
||||||
|
replay_count: 0,
|
||||||
|
replays_succeeded: 0,
|
||||||
|
retired: false,
|
||||||
|
};
|
||||||
|
writePathwayTrace(pathwayTrace); // fire-and-forget
|
||||||
|
} catch (e) {
|
||||||
|
console.error(`[scrum] pathway trace failed: ${(e as Error).message}`);
|
||||||
|
}
|
||||||
|
|
||||||
// Close the scrum → observer loop (fix 2026-04-24). Architecture
|
// Close the scrum → observer loop (fix 2026-04-24). Architecture
|
||||||
// audit surfaced: observer ring had 2000 ops, 1999 from Langfuse,
|
// audit surfaced: observer ring had 2000 ops, 1999 from Langfuse,
|
||||||
// zero from scrum. Observer's analyzeErrors + PLAYBOOK_BUILDER loops
|
// zero from scrum. Observer's analyzeErrors + PLAYBOOK_BUILDER loops
|
||||||
@ -643,6 +883,20 @@ Respond with markdown. Be specific, not generic. Cite file-region + PRD-chunk-of
|
|||||||
critical_failures_count,
|
critical_failures_count,
|
||||||
verified_components_count,
|
verified_components_count,
|
||||||
missing_components_count,
|
missing_components_count,
|
||||||
|
// Pathway fields: emitted on every review so the observer
|
||||||
|
// can build a full picture of hot-swap performance over time.
|
||||||
|
// `pathway_hot_swap_hit` flags whether the first-tried rung
|
||||||
|
// this review was a pathway recommendation vs the default
|
||||||
|
// ladder top. `rungs_saved` quantifies the compute we avoided
|
||||||
|
// when a hot-swap landed — this is the value metric the VCP
|
||||||
|
// UI surfaces ("avg_rungs_saved_per_commit").
|
||||||
|
pathway_hot_swap_hit: hotSwap !== null,
|
||||||
|
pathway_id: hotSwap?.pathway_id ?? null,
|
||||||
|
pathway_similarity: hotSwap?.similarity ?? null,
|
||||||
|
pathway_success_rate: hotSwap?.success_rate ?? null,
|
||||||
|
rungs_saved: hotSwap && acceptedModel.endsWith(`/${hotSwap.recommended_model}`)
|
||||||
|
? Math.max(0, hotSwap.recommended_rung - 1)
|
||||||
|
: 0,
|
||||||
ts: row.reviewed_at,
|
ts: row.reviewed_at,
|
||||||
}),
|
}),
|
||||||
signal: AbortSignal.timeout(3000),
|
signal: AbortSignal.timeout(3000),
|
||||||
|
|||||||
38
ui/server.ts
38
ui/server.ts
@ -349,6 +349,44 @@ Bun.serve({
|
|||||||
if (path === "/data/outcomes") return Response.json(await tailJsonl(`${KB}/outcomes.jsonl`, 30));
|
if (path === "/data/outcomes") return Response.json(await tailJsonl(`${KB}/outcomes.jsonl`, 30));
|
||||||
if (path === "/data/audit_facts") return Response.json(await tailJsonl(`${KB}/audit_facts.jsonl`, 30));
|
if (path === "/data/audit_facts") return Response.json(await tailJsonl(`${KB}/audit_facts.jsonl`, 30));
|
||||||
|
|
||||||
|
// Pathway memory — consensus-designed sidecar (2026-04-24). Two
|
||||||
|
// exposed metrics: reuse_rate (activity — is it firing?) and
|
||||||
|
// avg_rungs_saved_per_commit (value — is it earning its keep?).
|
||||||
|
// Round-3 consensus (qwen3.5:397b) pointed out that activity
|
||||||
|
// without value tells us nothing; the UI needs both to judge the
|
||||||
|
// health of the hot-swap learning loop.
|
||||||
|
if (path === "/data/pathway_stats") {
|
||||||
|
try {
|
||||||
|
const r = await fetch("http://localhost:3100/vectors/pathway/stats", { signal: AbortSignal.timeout(3000) });
|
||||||
|
if (!r.ok) return Response.json({ error: `vectord ${r.status}`, stats: null });
|
||||||
|
const stats = await r.json();
|
||||||
|
// Tail recent scrum events to compute avg_rungs_saved_per_commit
|
||||||
|
// (a committed review = any row in scrum_reviews.jsonl; rungs_saved
|
||||||
|
// only populates when pathway memory fired AND the recommended
|
||||||
|
// model actually produced the accept).
|
||||||
|
const reviews = await tailJsonl(`${KB}/scrum_reviews.jsonl`, 200);
|
||||||
|
let totalCommits = 0;
|
||||||
|
let totalRungsSaved = 0;
|
||||||
|
let hotSwapHits = 0;
|
||||||
|
for (const r of reviews) {
|
||||||
|
totalCommits++;
|
||||||
|
if (r.pathway_hot_swap_hit) hotSwapHits++;
|
||||||
|
if (typeof r.rungs_saved === "number") totalRungsSaved += r.rungs_saved;
|
||||||
|
}
|
||||||
|
return Response.json({
|
||||||
|
stats,
|
||||||
|
scrum_window: {
|
||||||
|
reviews: totalCommits,
|
||||||
|
hot_swap_hits: hotSwapHits,
|
||||||
|
pathway_reuse_rate: totalCommits ? hotSwapHits / totalCommits : 0,
|
||||||
|
avg_rungs_saved_per_commit: totalCommits ? totalRungsSaved / totalCommits : 0,
|
||||||
|
},
|
||||||
|
});
|
||||||
|
} catch (e) {
|
||||||
|
return Response.json({ error: (e as Error).message, stats: null });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if (path.startsWith("/data/file/")) {
|
if (path.startsWith("/data/file/")) {
|
||||||
const relpath = decodeURIComponent(path.slice("/data/file/".length));
|
const relpath = decodeURIComponent(path.slice("/data/file/".length));
|
||||||
return Response.json(await fileHistory(relpath));
|
return Response.json(await fileHistory(relpath));
|
||||||
|
|||||||
31
ui/ui.js
31
ui/ui.js
@ -589,6 +589,37 @@ function metricBox(label, big, kind, opts = {}) {
|
|||||||
function drawMetrics() {
|
function drawMetrics() {
|
||||||
const grid = document.getElementById("metric-grid");
|
const grid = document.getElementById("metric-grid");
|
||||||
clear(grid);
|
clear(grid);
|
||||||
|
// Kick off pathway fetch in parallel; render when it resolves so the
|
||||||
|
// rest of the metrics grid appears immediately. The cards append to
|
||||||
|
// the grid after the synchronous block below — they'll show up at
|
||||||
|
// the bottom of the grid within a tick of first render.
|
||||||
|
fetch("/data/pathway_stats").then(r => r.ok ? r.json() : null).then(j => {
|
||||||
|
if (!j || !j.stats) return;
|
||||||
|
const s = j.stats;
|
||||||
|
const w = j.scrum_window ?? {};
|
||||||
|
// Activity metric — is the hot-swap firing at all?
|
||||||
|
grid.append(metricBox("pathway reuse rate", `${Math.round((w.pathway_reuse_rate ?? 0) * 100)}%`,
|
||||||
|
(w.pathway_reuse_rate ?? 0) > 0.1 ? "good" : (w.pathway_reuse_rate ?? 0) > 0 ? "warn" : "bad", {
|
||||||
|
explain: "% of recent reviews where a pathway hot-swap fired (narrow fingerprint match + 0.80 success rate + ≥3 replays + audit_consensus pass + 0.90 similarity).",
|
||||||
|
source: `scrum_reviews.jsonl .pathway_hot_swap_hit over last ${w.reviews ?? 0} reviews (${w.hot_swap_hits ?? 0} hits)`,
|
||||||
|
good: "≥10% sustained = index earning its keep. <10% over many iters = fingerprint too narrow or probation too strict. 0% on fresh install is expected (no replays yet).",
|
||||||
|
}));
|
||||||
|
// Value metric — how much compute did hot-swap actually save?
|
||||||
|
const saved = w.avg_rungs_saved_per_commit ?? 0;
|
||||||
|
grid.append(metricBox("avg rungs saved", saved.toFixed(2),
|
||||||
|
saved >= 1 ? "good" : saved > 0 ? "warn" : "bad", {
|
||||||
|
explain: "Average ladder rungs skipped per committed review by hot-swap. Rungs_saved = recommended_rung - 1 when the recommended model succeeded (otherwise 0).",
|
||||||
|
source: "scrum_reviews.jsonl .rungs_saved averaged",
|
||||||
|
good: "Every 1.0 here ≈ one less model call per review. At 21 files/iter, 1.0 saved = 21 cloud calls avoided. Value only counts when the replay actually succeeded.",
|
||||||
|
}));
|
||||||
|
// Stability metric — retired pathways indicate the learning loop is correcting itself.
|
||||||
|
grid.append(metricBox("pathways tracked", String(s.total_pathways),
|
||||||
|
s.total_pathways > 0 ? "good" : "warn", {
|
||||||
|
explain: `Total pathway traces stored. ${s.retired} retired (below 0.80 success after ≥3 replays). ${s.with_audit_pass} audit-passed, eligible for hot-swap probation.`,
|
||||||
|
source: "/vectors/pathway/stats",
|
||||||
|
good: `Grows monotonically with scrum runs. Retired=${s.retired} is HEALTHY — it means the learning loop is pruning pathways that stopped working. replay_success_rate=${(s.replay_success_rate*100).toFixed(0)}% aggregates all historical replays.`,
|
||||||
|
}));
|
||||||
|
}).catch(() => {});
|
||||||
const byTier = { auto:0, dry_run:0, simulation:0, block:0, unknown:0 };
|
const byTier = { auto:0, dry_run:0, simulation:0, block:0, unknown:0 };
|
||||||
state.reviews.forEach(r => { const t = r.gradient_tier ?? "unknown"; if (byTier[t] != null) byTier[t]++; });
|
state.reviews.forEach(r => { const t = r.gradient_tier ?? "unknown"; if (byTier[t] != null) byTier[t]++; });
|
||||||
const total = state.reviews.length || 1;
|
const total = state.reviews.length || 1;
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user