Phase 21 Rust port + Phase 27 playbook versioning + doc-sync

Phase 21 — Rust port of scratchpad + tree-split primitives (companion to
the 2026-04-21 TS shipment). New crates/aibridge modules:

  context.rs       — estimate_tokens (chars/4 ceil), context_window_for,
                     assert_context_budget returning a BudgetCheck with
                     numeric diagnostics on both success and overflow.
                     Windows table mirrors config/models.json.
  continuation.rs  — generate_continuable<G: TextGenerator>. Handles the
                     two failure modes: empty-response from thinking
                     models (geometric 2x budget backoff up to budget_cap)
                     and truncated-non-empty (continuation with partial
                     as scratchpad). is_structurally_complete balances
                     braces then JSON.parse-checks. Guards the degen case
                     "all retries empty, don't loop on empty partial".
  tree_split.rs    — generate_tree_split map->reduce with running
                     scratchpad. Per-shard + reduce-prompt go through
                     assert_context_budget first; loud-fails rather than
                     silently truncating. Oldest-digest-first scratchpad
                     truncation at scratchpad_budget (default 6000 t).

TextGenerator trait (native async-fn-in-trait, edition 2024). AiClient
implements it; ScriptedGenerator test double lets tests inject canned
sequences without a live Ollama.

GenerateRequest gained think: Option<bool> — forwards to sidecar for
per-call hidden-reasoning opt-out on hot-path JSON emitters. Three
existing callsites updated (rag.rs x2, service.rs hybrid answer).

Phase 27 — Playbook versioning. PlaybookEntry gained four optional
fields (all #[serde(default)] so pre-Phase-27 state loads as roots):

  version           u32, default 1
  parent_id         Option<String>, previous version's playbook_id
  superseded_at     Option<String>, set when newer version replaces
  superseded_by     Option<String>, the playbook_id that replaced

New methods:

  revise_entry(parent_id, new_entry) — appends new version, stamps
    superseded_at+superseded_by on parent, inherits parent_id and sets
    version = parent + 1 on the new entry. Rejects revising a retired
    or already-superseded parent (tip-of-chain is the only valid
    revise target).
  history(playbook_id) — returns full chain root->tip from any node.
    Walks parent_id back to root, then superseded_by forward to tip.
    Cycle-safe.

Superseded entries excluded from boost (same rule as retired): filter
in compute_boost_for_filtered_with_role (both active-entries prefilter
and geo-filtered path), rebuild_geo_index, and upsert_entry's existing-
idx search. status_counts returns (total, retired, superseded, failures);
/status JSON reports active = total - retired - superseded.

Endpoints:
  POST /vectors/playbook_memory/revise
  GET  /vectors/playbook_memory/history/{id}

Doc-sync — PHASES.md + PRD.md drifted from git after Phases 24-26
shipped. Fixes applied:

  - Phase 24 marked shipped (commit b95dd86) with detail of observer
    HTTP ingest + scenario outcome streaming. PRD "NOT YET WIRED"
    rewritten to reflect shipped state.
  - Phase 25 (validity windows, commit e0a843d) added to PHASES +
    PRD.
  - Phase 26 (Mem0 upsert + Letta hot cache, commit 640db8c) added.
  - Phase 27 entry added to both docs.
  - Phase 19.6 time decay corrected: was documented as "deferred",
    actually wired via BOOST_HALF_LIFE_DAYS = 30.0 in playbook_memory.rs.
  - Phase E/Phase 8 tombstone-at-compaction limit note updated —
    Phase E.2 closed it.

Tests: 8 new version_tests in vectord (chain-metadata stamping,
retired/superseded parent rejection, boost exclusion, history from
root/tip/middle, legacy default round-trip, status counts). 25 new
aibridge tests (context/continuation/tree_split). Workspace total
145 green (was 120).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
profit 2026-04-21 17:40:49 -05:00
parent 640db8c63c
commit a6f12e2609
10 changed files with 1506 additions and 29 deletions

View File

@ -25,7 +25,7 @@ pub struct EmbedResponse {
pub dimensions: usize,
}
#[derive(Serialize, Deserialize)]
#[derive(Clone, Serialize, Deserialize)]
pub struct GenerateRequest {
pub prompt: String,
#[serde(skip_serializing_if = "Option::is_none")]
@ -36,6 +36,14 @@ pub struct GenerateRequest {
pub temperature: Option<f64>,
#[serde(skip_serializing_if = "Option::is_none")]
pub max_tokens: Option<u32>,
/// Phase 21 — per-call opt-out of hidden reasoning. Thinking models
/// (qwen3.5, gpt-oss, etc) burn tokens on reasoning before the
/// visible response starts; setting this to `false` on hot-path
/// JSON emitters avoids empty returns when the budget is tight.
/// Sidecar forwards this to Ollama's `think` parameter; if the
/// sidecar drops an unknown field the request still succeeds.
#[serde(skip_serializing_if = "Option::is_none")]
pub think: Option<bool>,
}
#[derive(Deserialize, Serialize, Clone)]

View File

@ -0,0 +1,194 @@
//! Phase 21 — context-budget accounting for model calls.
//!
//! Ports `assertContextBudget` + `estimateTokens` + `CONTEXT_WINDOWS`
//! from `tests/multi-agent/agent.ts` so Rust-side callers (gateway
//! tool surfaces, future Rust agents) get the same loud-fail behavior
//! on window overflow instead of silent truncation.
//!
//! The token estimator is deliberately the same chars/4 heuristic as
//! the TS side. It's biased ~15% safe — pessimistic on English, correct
//! within a factor of 2 on code. Swap to a provider tokenizer only when
//! the estimator drives a decision (we're nowhere near that yet).
use std::collections::HashMap;
use std::sync::OnceLock;
/// Rough token count. `chars / 4` ceiling. See module docs for why
/// this heuristic is sufficient.
pub fn estimate_tokens(text: &str) -> usize {
(text.chars().count() + 3) / 4
}
/// Phase 21 — per-model context windows, mirroring the TS table in
/// `tests/multi-agent/agent.ts`. Anchored on each model's documented
/// max; unknown models fall back to `DEFAULT_CONTEXT_WINDOW`.
pub const DEFAULT_CONTEXT_WINDOW: usize = 32_768;
pub const DEFAULT_SAFETY_MARGIN: usize = 2_000;
pub const DEFAULT_MAX_TOKENS: usize = 800;
fn known_windows() -> &'static HashMap<&'static str, usize> {
static TABLE: OnceLock<HashMap<&'static str, usize>> = OnceLock::new();
TABLE.get_or_init(|| {
let mut m = HashMap::new();
m.insert("mistral:latest", 32_768);
m.insert("qwen2.5:latest", 32_768);
m.insert("qwen3:latest", 40_960);
m.insert("qwen3.5:latest", 262_144);
m.insert("qwen3-embedding", 32_768);
m.insert("nomic-embed-text-v2-moe", 2_048);
m.insert("gpt-oss:20b", 131_072);
m.insert("gpt-oss:120b", 131_072);
m.insert("qwen3.5:397b", 131_072);
m.insert("kimi-k2-thinking", 200_000);
m.insert("kimi-k2.6", 200_000);
m.insert("kimi-k2:1t", 1_048_576);
m.insert("deepseek-v3.1:671b", 131_072);
m.insert("glm-4.7", 131_072);
m
})
}
pub fn context_window_for(model: &str) -> usize {
known_windows().get(model).copied().unwrap_or(DEFAULT_CONTEXT_WINDOW)
}
/// Result of a budget check — exposes the numbers so callers can log
/// how much headroom remains without re-running the estimator.
#[derive(Debug, Clone, Copy)]
pub struct BudgetCheck {
pub estimated: usize,
pub window: usize,
pub remaining: i64,
}
/// Inputs to `assert_context_budget`. `bypass` exists for call sites
/// that handle their own overflow (continuation's second pass already
/// counted the partial; T5 gatekeeper prompts have a separate policy).
#[derive(Debug, Clone, Default)]
pub struct BudgetOpts<'a> {
pub system: Option<&'a str>,
pub max_tokens: Option<usize>,
pub safety_margin: Option<usize>,
pub bypass: bool,
}
/// Phase 21's loud-fail primitive. Returns a `BudgetCheck` on success
/// and the same struct plus over-by count on failure. The whole point
/// is to stop silent truncation — callers that expect overflow should
/// chunk BEFORE calling or set `bypass: true`.
pub fn assert_context_budget(
model: &str,
prompt: &str,
opts: BudgetOpts,
) -> Result<BudgetCheck, (BudgetCheck, usize)> {
let window = context_window_for(model);
let safety = opts.safety_margin.unwrap_or(DEFAULT_SAFETY_MARGIN);
let max_tokens = opts.max_tokens.unwrap_or(DEFAULT_MAX_TOKENS);
let sys_tokens = opts.system.map(estimate_tokens).unwrap_or(0);
let estimated = estimate_tokens(prompt) + sys_tokens + max_tokens;
let remaining = window as i64 - estimated as i64 - safety as i64;
let check = BudgetCheck { estimated, window, remaining };
if remaining < 0 && !opts.bypass {
return Err((check, (-remaining) as usize));
}
Ok(check)
}
/// Convenience — format an overflow error the same way the TS side
/// does. Exposed so downstream crates render consistent messages.
pub fn overflow_message(model: &str, check: &BudgetCheck, over_by: usize, safety: usize) -> String {
format!(
"context overflow: model={} est={}t window={}t safety={}t over={}t. \
Chunk the prompt (see config/models.json overflow_policies) or set \
bypass:true if you know the risk.",
model, check.estimated, check.window, safety, over_by,
)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn estimate_tokens_ceiling_divides_by_four() {
assert_eq!(estimate_tokens(""), 0);
assert_eq!(estimate_tokens("abc"), 1); // 3 → ceil(3/4) = 1
assert_eq!(estimate_tokens("abcd"), 1); // 4 → ceil(4/4) = 1
assert_eq!(estimate_tokens("abcde"), 2); // 5 → ceil(5/4) = 2
assert_eq!(estimate_tokens(&"x".repeat(400)), 100);
}
#[test]
fn context_window_known_and_fallback() {
assert_eq!(context_window_for("qwen3.5:latest"), 262_144);
assert_eq!(context_window_for("kimi-k2:1t"), 1_048_576);
assert_eq!(context_window_for("some-unreleased-model"), DEFAULT_CONTEXT_WINDOW);
}
#[test]
fn budget_passes_well_under_window() {
let check = assert_context_budget(
"qwen3:latest",
&"x".repeat(4_000), // ~1000 tokens
BudgetOpts { max_tokens: Some(500), ..Default::default() },
).expect("well under 40K window");
assert!(check.remaining > 30_000);
}
#[test]
fn budget_fails_when_prompt_overflows_window() {
let huge = "x".repeat(200_000); // ~50K tokens, over qwen3's 40K
let err = assert_context_budget(
"qwen3:latest",
&huge,
BudgetOpts::default(),
).expect_err("should overflow qwen3's 40K window");
assert!(err.1 > 0, "over_by must be positive");
}
#[test]
fn budget_bypass_returns_ok_even_over() {
let huge = "x".repeat(200_000);
let check = assert_context_budget(
"qwen3:latest",
&huge,
BudgetOpts { bypass: true, ..Default::default() },
).expect("bypass must suppress the error");
assert!(check.remaining < 0, "check still reports negative remaining");
}
#[test]
fn budget_counts_system_prompt() {
// 10K-char system prompt → ~2500 tokens. With a big max_tokens
// this should push us closer to the window.
let sys = "s".repeat(10_000);
let prompt = "p".repeat(4_000);
let with_sys = assert_context_budget(
"qwen3:latest",
&prompt,
BudgetOpts {
system: Some(&sys),
max_tokens: Some(500),
..Default::default()
},
).unwrap();
let without_sys = assert_context_budget(
"qwen3:latest",
&prompt,
BudgetOpts { max_tokens: Some(500), ..Default::default() },
).unwrap();
assert!(with_sys.estimated > without_sys.estimated,
"system prompt should raise estimate");
assert_eq!(with_sys.estimated - without_sys.estimated, estimate_tokens(&sys));
}
#[test]
fn overflow_message_includes_numbers() {
let check = BudgetCheck { estimated: 42_000, window: 40_960, remaining: -1_040 };
let msg = overflow_message("qwen3:latest", &check, 3_040, 2_000);
assert!(msg.contains("qwen3:latest"));
assert!(msg.contains("42000t"));
assert!(msg.contains("40960t"));
assert!(msg.contains("3040t"));
}
}

View File

@ -0,0 +1,438 @@
//! Phase 21 — OUTPUT-overflow handler. Ports `generateContinuable`
//! from `tests/multi-agent/agent.ts`.
//!
//! Two failure modes to repair:
//!
//! * EMPTY response — thinking model ate the entire budget on hidden
//! reasoning before emitting a token. Fix: retry the original prompt
//! with 2× the budget, geometric up to `BUDGET_CAP`.
//!
//! * TRUNCATED non-empty — model got most of the way but hit
//! max_tokens before closing the structure. Fix: continue with the
//! partial response in the prompt as scratchpad, so the model knows
//! where to pick up without restarting.
//!
//! `TextGenerator` abstracts the sidecar so tests can inject canned
//! responses without a live Ollama.
use std::future::Future;
use crate::client::{AiClient, GenerateRequest, GenerateResponse};
/// Shape classifier for `is_structurally_complete`. JSON responses
/// must parse; text responses just need to be non-empty.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ResponseShape {
Json,
Text,
}
/// Trait that `generate_continuable` + `generate_tree_split` call. The
/// real implementation forwards to `AiClient::generate`; tests supply a
/// mock with a scripted sequence of responses.
pub trait TextGenerator: Send + Sync {
fn generate_text(
&self,
req: GenerateRequest,
) -> impl Future<Output = Result<GenerateResponse, String>> + Send;
}
impl TextGenerator for AiClient {
fn generate_text(
&self,
req: GenerateRequest,
) -> impl Future<Output = Result<GenerateResponse, String>> + Send {
self.generate(req)
}
}
/// Strip a surrounding ```json``` fence if present. Leaves the inner
/// content alone otherwise. Returns a slice of `s`.
fn strip_json_fence(s: &str) -> &str {
let t = s.trim();
if let Some(rest) = t.strip_prefix("```json") {
rest.trim_start_matches('\n').strip_suffix("```").unwrap_or(rest).trim()
} else if let Some(rest) = t.strip_prefix("```") {
rest.trim_start_matches('\n').strip_suffix("```").unwrap_or(rest).trim()
} else {
t
}
}
/// Port of the TS brace-balance + JSON.parse check. Returns true when
/// the outermost `{...}` block is balanced and parses. Text shape is
/// satisfied by any non-empty, non-whitespace payload.
pub fn is_structurally_complete(text: &str, shape: ResponseShape) -> bool {
if text.trim().is_empty() { return false; }
if shape == ResponseShape::Text { return true; }
let s = strip_json_fence(text);
let Some(start) = s.find('{') else { return false };
let Some(end) = s.rfind('}') else { return false };
if end <= start { return false; }
let slice = &s[start..=end];
// Balance check — cheaper than parse, catches truncated nests.
// String state tracked because `{` inside a string doesn't count.
let mut depth: i32 = 0;
let mut in_str = false;
let mut esc = false;
for c in slice.chars() {
if esc { esc = false; continue; }
if c == '\\' { esc = true; continue; }
if c == '"' { in_str = !in_str; continue; }
if in_str { continue; }
if c == '{' { depth += 1; }
else if c == '}' {
depth -= 1;
if depth < 0 { return false; }
}
}
if depth != 0 { return false; }
// Parse check is the tie-breaker — balanced but invalid JSON (e.g.
// trailing comma before `}`) shouldn't count as complete.
serde_json::from_str::<serde_json::Value>(slice).is_ok()
}
/// Knobs for `generate_continuable`. All optional with sensible
/// defaults that match the TS version.
#[derive(Debug, Clone)]
pub struct ContinuableOpts {
pub model: String,
pub max_tokens: Option<u32>,
pub temperature: Option<f64>,
pub system: Option<String>,
pub shape: ResponseShape,
pub max_continuations: usize,
pub think: Option<bool>,
/// Geometric-backoff ceiling for the empty-response retry path.
/// Matches TS's `budgetCap = 8000`.
pub budget_cap: u32,
/// Maximum empty-response retries before giving up. Matches TS's
/// hardcoded `retry < 3`.
pub max_empty_retries: usize,
}
impl ContinuableOpts {
pub fn new(model: impl Into<String>) -> Self {
Self {
model: model.into(),
max_tokens: None,
temperature: None,
system: None,
shape: ResponseShape::Json,
max_continuations: 3,
think: None,
budget_cap: 8_000,
max_empty_retries: 3,
}
}
}
/// Outcome of a `generate_continuable` call. Carries the combined
/// text plus diagnostic counters so observability downstream can
/// report "how many continuations did that query cost".
#[derive(Debug, Clone)]
pub struct ContinuableOutcome {
pub text: String,
pub empty_retries: usize,
pub continuations: usize,
pub final_complete: bool,
}
fn make_request(opts: &ContinuableOpts, prompt: String, current_max: u32) -> GenerateRequest {
GenerateRequest {
prompt,
model: Some(opts.model.clone()),
system: opts.system.clone(),
temperature: opts.temperature,
max_tokens: Some(current_max),
think: opts.think,
}
}
fn continuation_prompt(original: &str, partial: &str) -> String {
format!(
"{original}\n\n\
PARTIAL RESPONSE SO FAR (continue from here do NOT restart, \
do NOT repeat what's already there, emit ONLY the remaining \
tokens to close the structure):\n{partial}"
)
}
/// Phase 21 — output-overflow safe generate. See module docs for the
/// two failure modes repaired. On final-still-incomplete, returns the
/// combined text with `final_complete: false` so the caller's parser
/// can throw with the raw text for forensics rather than silently
/// truncating.
pub async fn generate_continuable<G: TextGenerator>(
generator: &G,
prompt: &str,
opts: &ContinuableOpts,
) -> Result<ContinuableOutcome, String> {
let initial_max = opts.max_tokens.unwrap_or(800);
let mut current_max = initial_max;
let mut combined = String::new();
let mut empty_retries = 0usize;
let mut continuations = 0usize;
// Phase 21(a) — empty-response backoff loop.
for retry in 0..opts.max_empty_retries {
let req = make_request(opts, prompt.to_string(), current_max);
let resp = generator.generate_text(req).await?;
if !resp.text.trim().is_empty() {
combined = resp.text;
break;
}
empty_retries = retry + 1;
current_max = (current_max.saturating_mul(2)).min(opts.budget_cap);
}
// Phase 21(b) — structural-completion continuation loop. Runs on
// the truncated-non-empty case; empty + exhausted retries falls
// through with empty combined and final_complete=false.
for _ in 0..opts.max_continuations {
if is_structurally_complete(&combined, opts.shape) {
return Ok(ContinuableOutcome {
text: combined,
empty_retries,
continuations,
final_complete: true,
});
}
if combined.trim().is_empty() {
// Nothing to continue from — continuing "" is identical to
// the initial call and would loop. Bail so the caller sees
// the failure rather than burning N extra calls.
break;
}
let cont_prompt = continuation_prompt(prompt, &combined);
let req = make_request(opts, cont_prompt, current_max.min(opts.budget_cap));
let resp = generator.generate_text(req).await?;
combined.push_str(&resp.text);
continuations += 1;
}
let final_complete = is_structurally_complete(&combined, opts.shape);
Ok(ContinuableOutcome {
text: combined,
empty_retries,
continuations,
final_complete,
})
}
/// Scripted generator for unit tests. Returns responses from `script`
/// in order; extra calls reuse the last entry so tests don't have to
/// count past what they actually assert on.
#[cfg(test)]
pub struct ScriptedGenerator {
script: Vec<Result<String, String>>,
calls: std::sync::Arc<std::sync::Mutex<Vec<GenerateRequest>>>,
}
#[cfg(test)]
impl ScriptedGenerator {
pub fn new<I>(script: I) -> Self
where
I: IntoIterator<Item = Result<String, String>>,
{
Self {
script: script.into_iter().collect(),
calls: std::sync::Arc::new(std::sync::Mutex::new(Vec::new())),
}
}
pub fn calls(&self) -> Vec<GenerateRequest> {
self.calls.lock().unwrap().clone()
}
pub fn call_count(&self) -> usize {
self.calls.lock().unwrap().len()
}
}
#[cfg(test)]
impl TextGenerator for ScriptedGenerator {
fn generate_text(
&self,
req: GenerateRequest,
) -> impl Future<Output = Result<GenerateResponse, String>> + Send {
let i = {
let mut calls = self.calls.lock().unwrap();
let i = calls.len();
calls.push(req.clone());
i
};
let model = req.model.clone().unwrap_or_default();
let entry = self.script.get(i)
.cloned()
.unwrap_or_else(|| self.script.last().cloned().unwrap_or(Ok(String::new())));
async move {
entry.map(|text| GenerateResponse {
text,
model,
tokens_evaluated: None,
tokens_generated: None,
})
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn structural_complete_rejects_empty_and_text_mismatch() {
assert!(!is_structurally_complete("", ResponseShape::Text));
assert!(!is_structurally_complete(" ", ResponseShape::Text));
assert!(is_structurally_complete("any content", ResponseShape::Text));
}
#[test]
fn structural_complete_handles_balanced_json() {
assert!(is_structurally_complete(r#"{"a": 1}"#, ResponseShape::Json));
assert!(is_structurally_complete(
r#"```json
{"a": 1, "b": [1, 2, 3]}
```"#,
ResponseShape::Json,
));
}
#[test]
fn structural_complete_rejects_truncated_json() {
assert!(!is_structurally_complete(r#"{"a": 1"#, ResponseShape::Json));
assert!(!is_structurally_complete(r#"{"a": {"b": 1"#, ResponseShape::Json));
// Trailing comma — balanced but unparseable
assert!(!is_structurally_complete(r#"{"a": 1,}"#, ResponseShape::Json));
}
#[test]
fn structural_complete_ignores_braces_inside_strings() {
assert!(is_structurally_complete(r#"{"s": "has } inside"}"#, ResponseShape::Json));
assert!(is_structurally_complete(r#"{"s": "escaped \" quote"}"#, ResponseShape::Json));
}
#[tokio::test]
async fn continuable_returns_first_response_when_complete() {
let generator = ScriptedGenerator::new(vec![Ok(r#"{"ok": true}"#.to_string())]);
let opts = ContinuableOpts::new("qwen3:latest");
let out = generate_continuable(&generator, "test", &opts).await.unwrap();
assert!(out.final_complete);
assert_eq!(out.empty_retries, 0);
assert_eq!(out.continuations, 0);
assert_eq!(generator.call_count(), 1);
}
#[tokio::test]
async fn continuable_retries_on_empty_with_doubled_budget() {
// Two empties, then a good response. Third call should see 4×
// the initial budget (2× twice).
let generator = ScriptedGenerator::new(vec![
Ok("".to_string()),
Ok("".to_string()),
Ok(r#"{"ok": true}"#.to_string()),
]);
let mut opts = ContinuableOpts::new("qwen3:latest");
opts.max_tokens = Some(100);
let out = generate_continuable(&generator, "test", &opts).await.unwrap();
assert!(out.final_complete);
assert_eq!(out.empty_retries, 2);
let calls = generator.calls();
assert_eq!(calls.len(), 3);
assert_eq!(calls[0].max_tokens, Some(100));
assert_eq!(calls[1].max_tokens, Some(200));
assert_eq!(calls[2].max_tokens, Some(400));
}
#[tokio::test]
async fn continuable_caps_budget_at_budget_cap() {
let generator = ScriptedGenerator::new(vec![
Ok("".to_string()),
Ok("".to_string()),
Ok(r#"{"ok":1}"#.to_string()),
]);
let mut opts = ContinuableOpts::new("qwen3:latest");
opts.max_tokens = Some(5_000);
opts.budget_cap = 7_000;
let out = generate_continuable(&generator, "test", &opts).await.unwrap();
assert!(out.final_complete);
let calls = generator.calls();
// 5000 → doubled would be 10000; cap pulls it to 7000.
assert_eq!(calls[1].max_tokens, Some(7_000));
assert_eq!(calls[2].max_tokens, Some(7_000));
}
#[tokio::test]
async fn continuable_glues_truncated_response() {
// First call returns balanced-open `{...`; continuation closes
// it with `...}`. Combined must parse.
let generator = ScriptedGenerator::new(vec![
Ok(r#"{"fills": [{"candidate_id": "C-001""#.to_string()),
Ok(r#", "name": "Alice"}]}"#.to_string()),
]);
let opts = ContinuableOpts::new("qwen3:latest");
let out = generate_continuable(&generator, "ORIGINAL", &opts).await.unwrap();
assert!(out.final_complete, "combined must parse: {}", out.text);
assert_eq!(out.continuations, 1);
let calls = generator.calls();
assert_eq!(calls.len(), 2);
// Continuation prompt must contain the partial — that's the
// scratchpad primitive J called out.
let cont_prompt = &calls[1].prompt;
assert!(cont_prompt.contains("ORIGINAL"),
"continuation must include original prompt");
assert!(cont_prompt.contains(r#"{"fills": [{"candidate_id": "C-001""#),
"continuation must include partial");
}
#[tokio::test]
async fn continuable_does_not_loop_on_persistent_empty() {
// All three retries return empty; we must NOT then enter the
// continuation loop with an empty partial (would burn 3 more
// calls continuing from "").
let generator = ScriptedGenerator::new(vec![
Ok("".to_string()),
Ok("".to_string()),
Ok("".to_string()),
]);
let opts = ContinuableOpts::new("qwen3:latest");
let out = generate_continuable(&generator, "test", &opts).await.unwrap();
assert!(!out.final_complete);
assert_eq!(out.empty_retries, 3);
assert_eq!(out.continuations, 0, "must not continue from empty");
assert_eq!(generator.call_count(), 3);
}
#[tokio::test]
async fn continuable_returns_raw_on_exhausted_continuations() {
// Three continuations that never complete — caller's parser
// will throw. We must return the combined text so forensics
// has the raw content, not a lossy truncation.
let generator = ScriptedGenerator::new(vec![
Ok(r#"{"a": ["#.to_string()),
Ok("1,".to_string()),
Ok("2,".to_string()),
Ok("3,".to_string()),
]);
let mut opts = ContinuableOpts::new("qwen3:latest");
opts.max_continuations = 3;
let out = generate_continuable(&generator, "test", &opts).await.unwrap();
assert!(!out.final_complete);
assert_eq!(out.continuations, 3);
assert!(out.text.contains(r#"{"a": ["#));
assert!(out.text.contains("3,"));
}
#[tokio::test]
async fn continuable_propagates_generator_errors() {
let generator = ScriptedGenerator::new(vec![Err("sidecar 503".to_string())]);
let opts = ContinuableOpts::new("qwen3:latest");
let err = generate_continuable(&generator, "test", &opts).await.unwrap_err();
assert!(err.contains("503"));
}
}

View File

@ -1,2 +1,5 @@
pub mod client;
pub mod context;
pub mod continuation;
pub mod service;
pub mod tree_split;

View File

@ -0,0 +1,326 @@
//! Phase 21 — INPUT-overflow handler. Ports `generateTreeSplit` from
//! `tests/multi-agent/agent.ts`.
//!
//! When the input corpus exceeds the model's window (200 playbooks
//! pasted into a T4 strategic prompt, a long retrospective digest, a
//! cross-corpus summarization), raising `max_tokens` doesn't help —
//! the prompt itself is the problem. The answer is map-reduce:
//!
//! 1. Caller shards the input at semantic boundaries (records,
//! paragraphs, playbook entries).
//! 2. For each shard, build a map prompt that includes the running
//! scratchpad and run it through `generate_continuable`.
//! 3. Append the map output to the scratchpad (oldest-first
//! truncation when it outgrows `scratchpad_budget`).
//! 4. Build a reduce prompt from the final scratchpad and run it.
//!
//! Every shard prompt and the reduce prompt go through
//! `assert_context_budget` first — if a single shard still overflows
//! we bubble the error up rather than silently truncating. That's the
//! whole point of Phase 21.
use crate::context::{assert_context_budget, BudgetOpts, estimate_tokens, overflow_message,
DEFAULT_MAX_TOKENS, DEFAULT_SAFETY_MARGIN};
use crate::continuation::{generate_continuable, ContinuableOpts, ResponseShape, TextGenerator};
/// Callback signatures — caller supplies closures that stitch the
/// scratchpad into each shard's prompt and build the final reduce
/// prompt. Kept as `Fn` (not `FnMut`) so the map loop can call them
/// by reference.
pub type MapPromptFn<'a> = dyn Fn(&str, &str) -> String + Send + Sync + 'a;
pub type ReducePromptFn<'a> = dyn Fn(&str) -> String + Send + Sync + 'a;
/// Knobs for `generate_tree_split`.
#[derive(Debug, Clone)]
pub struct TreeSplitOpts {
pub model: String,
pub system: Option<String>,
pub temperature: Option<f64>,
/// max_tokens for map AND reduce (reduce defaults are usually
/// higher; caller overrides for just reduce by calling through
/// continuable directly if needed).
pub max_tokens: Option<u32>,
pub reduce_max_tokens: Option<u32>,
pub think: Option<bool>,
/// Soft ceiling on scratchpad size (estimated tokens). When it
/// grows past this, the oldest shard digest gets dropped. Default
/// 6000, matching the TS implementation.
pub scratchpad_budget: usize,
pub safety_margin: Option<usize>,
}
impl TreeSplitOpts {
pub fn new(model: impl Into<String>) -> Self {
Self {
model: model.into(),
system: None,
temperature: None,
max_tokens: None,
reduce_max_tokens: None,
think: None,
scratchpad_budget: 6_000,
safety_margin: None,
}
}
}
/// Result — final reduce response plus the accumulated scratchpad so
/// the caller can inspect what was kept vs truncated.
#[derive(Debug, Clone)]
pub struct TreeSplitResult {
pub response: String,
pub scratchpad: String,
pub shards_processed: usize,
pub scratchpad_truncations: usize,
pub total_continuations: usize,
}
/// Drop shard-digest blocks from the head of `scratchpad` until its
/// estimated-token count fits the budget. Digest blocks are delimited
/// by `\n— shard N/M digest —\n` so we can find the first one and
/// chop everything before its successor.
fn truncate_scratchpad(scratchpad: &mut String, budget_tokens: usize) -> bool {
if estimate_tokens(scratchpad) <= budget_tokens { return false; }
// Find the second delimiter — everything before it gets dropped.
const DELIM_PREFIX: &str = "\n— shard ";
let mut cursor = 0;
let mut truncated = false;
while estimate_tokens(&scratchpad[cursor..]) > budget_tokens {
// Skip past a leading delimiter (if we're sitting on one from
// a previous iteration), then find the next.
let search_from = cursor + if scratchpad[cursor..].starts_with(DELIM_PREFIX) {
DELIM_PREFIX.len()
} else { 0 };
let Some(rel_next) = scratchpad[search_from..].find(DELIM_PREFIX) else { break };
cursor = search_from + rel_next;
truncated = true;
}
if cursor > 0 {
scratchpad.drain(..cursor);
}
truncated
}
/// Phase 21 — map-reduce over shards with a running scratchpad. See
/// module docs.
pub async fn generate_tree_split<G: TextGenerator>(
generator: &G,
shards: &[String],
map_prompt: &MapPromptFn<'_>,
reduce_prompt: &ReducePromptFn<'_>,
opts: &TreeSplitOpts,
) -> Result<TreeSplitResult, String> {
let mut scratchpad = String::new();
let safety = opts.safety_margin.unwrap_or(DEFAULT_SAFETY_MARGIN);
let map_max = opts.max_tokens.unwrap_or(DEFAULT_MAX_TOKENS as u32);
let reduce_max = opts.reduce_max_tokens.unwrap_or(1_500);
let mut truncations = 0usize;
let mut total_continuations = 0usize;
for (i, shard) in shards.iter().enumerate() {
let shard_prompt = map_prompt(shard, &scratchpad);
// Loud-fail on per-shard overflow — caller sharded too
// coarsely. Silent truncation is exactly the mode J rejected.
let budget = BudgetOpts {
system: opts.system.as_deref(),
max_tokens: Some(map_max as usize),
safety_margin: Some(safety),
bypass: false,
};
let check = assert_context_budget(&opts.model, &shard_prompt, budget)
.map_err(|(c, over)| overflow_message(&opts.model, &c, over, safety))?;
let _ = check;
let mut cont_opts = ContinuableOpts::new(&opts.model);
cont_opts.max_tokens = Some(map_max);
cont_opts.temperature = opts.temperature;
cont_opts.system = opts.system.clone();
cont_opts.shape = ResponseShape::Text;
cont_opts.think = opts.think;
let outcome = generate_continuable(generator, &shard_prompt, &cont_opts).await?;
total_continuations += outcome.continuations;
// Append this shard's digest and, if needed, drop oldest.
scratchpad.push_str(&format!(
"\n— shard {}/{} digest —\n{}",
i + 1, shards.len(), outcome.text.trim(),
));
if truncate_scratchpad(&mut scratchpad, opts.scratchpad_budget) {
truncations += 1;
}
}
// Reduce pass. Budget check first — if the scratchpad is still too
// big for the reduce prompt we fail loud with numbers.
let reduce_p = reduce_prompt(&scratchpad);
let budget = BudgetOpts {
system: opts.system.as_deref(),
max_tokens: Some(reduce_max as usize),
safety_margin: Some(safety),
bypass: false,
};
assert_context_budget(&opts.model, &reduce_p, budget)
.map_err(|(c, over)| overflow_message(&opts.model, &c, over, safety))?;
let mut cont_opts = ContinuableOpts::new(&opts.model);
cont_opts.max_tokens = Some(reduce_max);
cont_opts.temperature = opts.temperature;
cont_opts.system = opts.system.clone();
cont_opts.shape = ResponseShape::Text;
cont_opts.think = opts.think;
let outcome = generate_continuable(generator, &reduce_p, &cont_opts).await?;
total_continuations += outcome.continuations;
Ok(TreeSplitResult {
response: outcome.text,
scratchpad,
shards_processed: shards.len(),
scratchpad_truncations: truncations,
total_continuations,
})
}
#[cfg(test)]
mod tests {
use super::*;
use crate::continuation::ScriptedGenerator;
fn simple_map(shard: &str, scratchpad: &str) -> String {
format!("SCRATCHPAD:\n{scratchpad}\n---\nSHARD:\n{shard}\n---\nDIGEST:")
}
fn simple_reduce(scratchpad: &str) -> String {
format!("SCRATCHPAD:\n{scratchpad}\n---\nFINAL:")
}
#[tokio::test]
async fn tree_split_runs_map_then_reduce() {
// 3 shards → 3 map calls → 1 reduce call = 4 responses scripted.
let generator = ScriptedGenerator::new(vec![
Ok("digest-1".to_string()),
Ok("digest-2".to_string()),
Ok("digest-3".to_string()),
Ok("FINAL ANSWER".to_string()),
]);
let shards: Vec<String> = ["a", "b", "c"].iter().map(|s| s.to_string()).collect();
let opts = TreeSplitOpts::new("qwen3:latest");
let map_fn: &MapPromptFn = &simple_map;
let reduce_fn: &ReducePromptFn = &simple_reduce;
let result = generate_tree_split(&generator, &shards, map_fn, reduce_fn, &opts)
.await
.unwrap();
assert_eq!(result.shards_processed, 3);
assert_eq!(result.response, "FINAL ANSWER");
assert_eq!(generator.call_count(), 4);
// Scratchpad must carry all three digests in order.
assert!(result.scratchpad.contains("digest-1"));
assert!(result.scratchpad.contains("digest-2"));
assert!(result.scratchpad.contains("digest-3"));
}
#[tokio::test]
async fn tree_split_reduce_prompt_sees_full_scratchpad() {
let generator = ScriptedGenerator::new(vec![
Ok("summary-A".to_string()),
Ok("summary-B".to_string()),
Ok("REDUCED".to_string()),
]);
let shards = vec!["input-one".to_string(), "input-two".to_string()];
let opts = TreeSplitOpts::new("qwen3:latest");
let _ = generate_tree_split(&generator, &shards, &simple_map, &simple_reduce, &opts)
.await
.unwrap();
// Third call = reduce. Its prompt must include both digests.
let calls = generator.calls();
let reduce_prompt = &calls[2].prompt;
assert!(reduce_prompt.contains("summary-A"),
"reduce prompt must see first shard digest");
assert!(reduce_prompt.contains("summary-B"),
"reduce prompt must see second shard digest");
}
#[tokio::test]
async fn tree_split_loud_fails_on_shard_overflow() {
let generator = ScriptedGenerator::new(vec![Ok("digest".to_string())]);
// One gigantic shard — well over qwen3's 40K window even as a
// prompt. The budget check must reject before any generate call.
let shards = vec!["x".repeat(200_000)];
let opts = TreeSplitOpts::new("qwen3:latest");
let err = generate_tree_split(&generator, &shards, &simple_map, &simple_reduce, &opts)
.await
.expect_err("shard-sized overflow must be rejected");
assert!(err.contains("overflow"), "error should mention overflow: {err}");
assert_eq!(generator.call_count(), 0, "generate must not be called on overflow");
}
#[tokio::test]
async fn tree_split_truncates_scratchpad_when_over_budget() {
// Tight budget so each shard trips truncation. qwen3's 40K
// window is fine; the budget we care about is the scratchpad
// cap, not the model window.
let generator = ScriptedGenerator::new(vec![
Ok("A".repeat(2_000)),
Ok("B".repeat(2_000)),
Ok("C".repeat(2_000)),
Ok("D".repeat(2_000)),
Ok("FINAL".to_string()),
]);
let shards: Vec<String> = (0..4).map(|i| format!("shard{i}")).collect();
let mut opts = TreeSplitOpts::new("qwen3:latest");
opts.scratchpad_budget = 1_000; // ~4000 chars — one digest barely fits
let result = generate_tree_split(&generator, &shards, &simple_map, &simple_reduce, &opts)
.await
.unwrap();
assert!(result.scratchpad_truncations > 0,
"tight budget must trigger truncation");
// Scratchpad should still fit roughly within the budget
// (post-truncation); the estimator uses chars/4 so the bound
// is ~budget*4 chars. Give some slack for the delimiter.
let scratchpad_tokens = estimate_tokens(&result.scratchpad);
assert!(scratchpad_tokens <= opts.scratchpad_budget * 2,
"scratchpad {} tokens vs budget {}", scratchpad_tokens, opts.scratchpad_budget);
}
#[tokio::test]
async fn tree_split_reports_continuations_from_map_and_reduce() {
// First shard: truncated-then-continued. Reduce: truncated-then-continued.
// 1 shard: 2 map calls (initial + continuation), then 2 reduce calls.
let generator = ScriptedGenerator::new(vec![
Ok("partial".to_string()), // map shape=text, non-empty → complete on first pass
Ok("reduce-out".to_string()),
]);
let shards = vec!["only".to_string()];
let opts = TreeSplitOpts::new("qwen3:latest");
let result = generate_tree_split(&generator, &shards, &simple_map, &simple_reduce, &opts)
.await
.unwrap();
// Text shape treats non-empty as complete → 0 continuations.
assert_eq!(result.total_continuations, 0);
assert_eq!(result.shards_processed, 1);
}
#[test]
fn truncate_scratchpad_noop_when_under_budget() {
let mut s = "\n— shard 1/1 digest —\nshort".to_string();
let truncated = truncate_scratchpad(&mut s, 1_000);
assert!(!truncated);
assert!(s.contains("short"));
}
#[test]
fn truncate_scratchpad_drops_oldest_first() {
let mut s = format!(
"\n— shard 1/3 digest —\n{}\n— shard 2/3 digest —\n{}\n— shard 3/3 digest —\nshort",
"x".repeat(4_000), // ~1000 tokens
"y".repeat(4_000), // ~1000 tokens
);
let truncated = truncate_scratchpad(&mut s, 500); // ~2000 chars
assert!(truncated);
assert!(!s.contains(&"x".repeat(4_000)),
"oldest digest should be dropped");
assert!(s.contains("short"),
"newest digest should survive");
}
}

View File

@ -113,8 +113,31 @@ pub struct PlaybookEntry {
/// "manual: operator requested via POST /retire"
#[serde(default)]
pub retirement_reason: Option<String>,
/// Phase 27 — monotonic version counter within a playbook chain.
/// First version is 1; `revise_entry` sets the new entry's version
/// to parent.version + 1. Entries persisted before Phase 27 get
/// version=1 via serde default and are treated as roots.
#[serde(default = "default_version")]
pub version: u32,
/// Phase 27 — playbook_id of the prior version in this chain. None
/// for root entries (first version).
#[serde(default)]
pub parent_id: Option<String>,
/// Phase 27 — timestamp set when a newer version replaced this
/// entry via `revise_entry`. Superseded entries are excluded from
/// boost calculations (same rule as `retired_at`) but remain
/// queryable via `history` for audit.
#[serde(default)]
pub superseded_at: Option<String>,
/// Phase 27 — playbook_id of the entry that replaced this one.
/// Walking `superseded_by` from the root forward reconstructs the
/// full version chain.
#[serde(default)]
pub superseded_by: Option<String>,
}
fn default_version() -> u32 { 1 }
/// A recorded failure — worker who didn't deliver on a contract.
/// Tracked per (city, state, name) so a single worker's failures on
/// Toledo Welder contracts don't penalize the same name in Chicago.
@ -168,6 +191,17 @@ pub enum UpsertOutcome {
Noop(String),
}
/// Phase 27 — shape returned from `revise_entry`. Reports both ends of
/// the supersession so callers can link citations or audit chains.
#[derive(Debug, Clone, Serialize)]
pub struct ReviseOutcome {
pub parent_id: String,
pub parent_version: u32,
pub new_playbook_id: String,
pub new_version: u32,
pub superseded_at: String,
}
/// Return YYYY-MM-DD from an RFC3339 timestamp. Falls back to the
/// first 10 chars if parse fails — tolerant for legacy entries that
/// stored a bare date.
@ -204,13 +238,15 @@ impl PlaybookMemory {
/// Rebuild the geo index from scratch. Called by every mutation
/// helper after persist succeeds. O(n) scan of entries; at current
/// scale ~40µs. Skips retired entries — they never participate in
/// boost filtering, so indexing them would just waste lookups.
/// scale ~40µs. Skips retired and superseded entries — they never
/// participate in boost filtering, so indexing them would just
/// waste lookups.
async fn rebuild_geo_index(&self) {
let state = self.state.read().await;
let mut idx: HashMap<(String, String), Vec<usize>> = HashMap::new();
for (i, e) in state.entries.iter().enumerate() {
if e.retired_at.is_some() { continue; }
if e.superseded_at.is_some() { continue; }
let (Some(city), Some(st)) = (&e.city, &e.state) else { continue; };
let key = (city.to_ascii_lowercase(), st.to_ascii_uppercase());
idx.entry(key).or_default().push(i);
@ -312,13 +348,129 @@ impl PlaybookMemory {
Ok(count)
}
/// Stats accessor for the /status endpoint and tests.
pub async fn status_counts(&self) -> (usize, usize, usize) {
/// Phase 27 — append a new version of an existing playbook. The
/// parent is stamped with `superseded_at` + `superseded_by`; the
/// new entry inherits `parent_id` and gets `version = parent + 1`.
/// Errors when the parent is retired (terminal state) or already
/// superseded (must revise the tip of the chain, not a middle
/// node). Caller supplies the new entry with its own fresh
/// `playbook_id`; chain-metadata fields on the input are
/// overwritten so callers can't fabricate a mismatched history.
pub async fn revise_entry(
&self,
parent_id: &str,
mut new_entry: PlaybookEntry,
) -> Result<ReviseOutcome, String> {
let now = chrono::Utc::now().to_rfc3339();
let mut state = self.state.write().await;
let Some(i) = state.entries.iter().position(|e| e.playbook_id == parent_id) else {
return Err(format!("parent playbook_id '{parent_id}' not found"));
};
{
let parent = &state.entries[i];
if parent.retired_at.is_some() {
return Err(format!(
"cannot revise retired playbook '{parent_id}' — retirement is terminal"
));
}
if let Some(succ) = &parent.superseded_by {
return Err(format!(
"playbook '{parent_id}' already superseded by '{succ}'; \
revise the latest version in the chain instead"
));
}
}
let parent_version = state.entries[i].version;
let new_version = parent_version.saturating_add(1);
let parent_pid = state.entries[i].playbook_id.clone();
let new_pid = new_entry.playbook_id.clone();
if new_pid.is_empty() {
return Err("new playbook_id must not be empty".into());
}
if new_pid == parent_pid {
return Err("new playbook_id must differ from parent".into());
}
// Enforce chain-metadata integrity — caller doesn't get to
// fabricate these.
new_entry.version = new_version;
new_entry.parent_id = Some(parent_pid.clone());
new_entry.superseded_at = None;
new_entry.superseded_by = None;
let parent_mut = &mut state.entries[i];
parent_mut.superseded_at = Some(now.clone());
parent_mut.superseded_by = Some(new_pid.clone());
state.entries.push(new_entry);
drop(state);
self.persist().await?;
self.rebuild_geo_index().await;
Ok(ReviseOutcome {
parent_id: parent_pid,
parent_version,
new_playbook_id: new_pid,
new_version,
superseded_at: now,
})
}
/// Phase 27 — return the full version chain that contains this
/// playbook_id, ordered from root (v1) to tip. Walks `parent_id`
/// backward to find the root, then `superseded_by` forward to the
/// tip. Returns empty if the id isn't present. Cycle-safe via a
/// visited set; unreachable in normal operation but the guard is
/// cheap.
pub async fn history(&self, playbook_id: &str) -> Vec<PlaybookEntry> {
let state = self.state.read().await;
let by_id: HashMap<&str, &PlaybookEntry> = state.entries
.iter()
.map(|e| (e.playbook_id.as_str(), e))
.collect();
let Some(seed) = by_id.get(playbook_id).copied() else {
return vec![];
};
// Walk backward to root.
let mut cursor = seed;
let mut seen: std::collections::HashSet<String> = std::collections::HashSet::new();
seen.insert(cursor.playbook_id.clone());
while let Some(pid) = &cursor.parent_id {
let Some(&next) = by_id.get(pid.as_str()) else { break };
if !seen.insert(next.playbook_id.clone()) { break; }
cursor = next;
}
let root = cursor;
// Walk forward to tip.
let mut chain = vec![root.clone()];
let mut cursor = root;
let mut seen_fwd: std::collections::HashSet<String> = std::collections::HashSet::new();
seen_fwd.insert(cursor.playbook_id.clone());
while let Some(nid) = &cursor.superseded_by {
let Some(&next) = by_id.get(nid.as_str()) else { break };
if !seen_fwd.insert(next.playbook_id.clone()) { break; }
cursor = next;
chain.push(cursor.clone());
}
chain
}
/// Stats accessor for the /status endpoint and tests. Returns
/// (total, retired, superseded, failures). Phase 27 added
/// superseded as a distinct counter: a superseded entry is
/// replaced-by-newer-version, which is a different lifecycle event
/// than retired-stop-using.
pub async fn status_counts(&self) -> (usize, usize, usize, usize) {
let state = self.state.read().await;
let total = state.entries.len();
let retired = state.entries.iter().filter(|e| e.retired_at.is_some()).count();
let superseded = state.entries.iter().filter(|e| e.superseded_at.is_some()).count();
let failures = state.failures.len();
(total, retired, failures)
(total, retired, superseded, failures)
}
/// Phase 26 — Mem0-style upsert. Decides ADD / UPDATE / NOOP based
@ -353,6 +505,7 @@ impl PlaybookMemory {
let mut existing_idx: Option<usize> = None;
for (i, e) in state.entries.iter().enumerate() {
if e.retired_at.is_some() { continue; }
if e.superseded_at.is_some() { continue; }
if e.operation != new_entry.operation { continue; }
if day_key(&e.timestamp) != new_day { continue; }
if e.city != new_entry.city || e.state != new_entry.state { continue; }
@ -514,6 +667,7 @@ impl PlaybookMemory {
.iter()
.filter(|e| {
if e.retired_at.is_some() { return false; }
if e.superseded_at.is_some() { return false; }
if let Some(vu) = &e.valid_until {
if let Ok(parsed) = chrono::DateTime::parse_from_rfc3339(vu) {
if now > parsed.with_timezone(&chrono::Utc) { return false; }
@ -543,6 +697,7 @@ impl PlaybookMemory {
.filter_map(|i| state.entries.get(i))
.filter(|e| {
if e.retired_at.is_some() { return false; }
if e.superseded_at.is_some() { return false; }
if let Some(vu) = &e.valid_until {
if let Ok(parsed) = chrono::DateTime::parse_from_rfc3339(vu) {
if now > parsed.with_timezone(&chrono::Utc) { return false; }
@ -1055,6 +1210,10 @@ pub async fn rebuild(
valid_until: None,
retired_at: None,
retirement_reason: None,
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
}
})
.collect();
@ -1237,6 +1396,10 @@ mod tests {
valid_until: None,
retired_at: None,
retirement_reason: None,
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
})
.collect();
tokio::runtime::Runtime::new().unwrap().block_on(async {
@ -1270,6 +1433,10 @@ mod validity_window_tests {
valid_until,
retired_at: None,
retirement_reason: None,
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
}
}
@ -1279,7 +1446,7 @@ mod validity_window_tests {
pm.set_entries(vec![mkentry("pb-1", "Nashville", "TN", None, None)]).await.unwrap();
let touched = pm.retire_one("pb-1", "manual test").await.unwrap();
assert!(touched);
let (total, retired, _) = pm.status_counts().await;
let (total, retired, _, _) = pm.status_counts().await;
assert_eq!(total, 1);
assert_eq!(retired, 1);
// Second retirement is a no-op
@ -1335,7 +1502,7 @@ mod validity_window_tests {
// Only pb-old-schema should be retired — pb-new-schema matches,
// pb-no-fp has no fingerprint so it's legacy-safe.
assert_eq!(retired, 1);
let (_, total_retired, _) = pm.status_counts().await;
let (_, total_retired, _, _) = pm.status_counts().await;
assert_eq!(total_retired, 1);
}
@ -1348,7 +1515,7 @@ mod validity_window_tests {
// Nashville migration shouldn't touch Chicago
let retired = pm.retire_on_schema_drift("Nashville", "TN", "fp-v2", "test").await.unwrap();
assert_eq!(retired, 1);
let (_, r, _) = pm.status_counts().await;
let (_, r, _, _) = pm.status_counts().await;
assert_eq!(r, 1);
}
}
@ -1373,6 +1540,10 @@ mod upsert_tests {
valid_until: None,
retired_at: None,
retirement_reason: None,
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
}
}
@ -1444,3 +1615,171 @@ mod upsert_tests {
assert_eq!(pm.entry_count().await, 2);
}
}
#[cfg(test)]
mod version_tests {
use super::*;
use object_store::memory::InMemory;
fn mk(id: &str, city: &str, state: &str) -> PlaybookEntry {
PlaybookEntry {
playbook_id: id.into(),
operation: format!("fill: Welder x1 in {city}, {state}"),
approach: "hybrid".into(),
context: "test".into(),
timestamp: chrono::Utc::now().to_rfc3339(),
endorsed_names: vec!["Alice Smith".into()],
city: Some(city.into()),
state: Some(state.into()),
embedding: Some(vec![1.0, 0.0, 0.0]),
schema_fingerprint: None,
valid_until: None,
retired_at: None,
retirement_reason: None,
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
}
}
#[tokio::test]
async fn revise_stamps_chain_metadata_on_both_ends() {
let pm = PlaybookMemory::new(Arc::new(InMemory::new()));
pm.set_entries(vec![mk("pb-v1", "Nashville", "TN")]).await.unwrap();
let outcome = pm.revise_entry("pb-v1", mk("pb-v2", "Nashville", "TN"))
.await
.expect("revise should succeed against active root");
assert_eq!(outcome.parent_id, "pb-v1");
assert_eq!(outcome.parent_version, 1);
assert_eq!(outcome.new_playbook_id, "pb-v2");
assert_eq!(outcome.new_version, 2);
assert!(!outcome.superseded_at.is_empty());
let snap = pm.snapshot().await;
let v1 = snap.iter().find(|e| e.playbook_id == "pb-v1").unwrap();
let v2 = snap.iter().find(|e| e.playbook_id == "pb-v2").unwrap();
assert_eq!(v1.superseded_by.as_deref(), Some("pb-v2"));
assert!(v1.superseded_at.is_some());
assert_eq!(v2.parent_id.as_deref(), Some("pb-v1"));
assert_eq!(v2.version, 2);
assert!(v2.superseded_at.is_none());
}
#[tokio::test]
async fn revise_rejects_retired_parent() {
let pm = PlaybookMemory::new(Arc::new(InMemory::new()));
let mut e1 = mk("pb-v1", "Nashville", "TN");
e1.retired_at = Some(chrono::Utc::now().to_rfc3339());
pm.set_entries(vec![e1]).await.unwrap();
let err = pm.revise_entry("pb-v1", mk("pb-v2", "Nashville", "TN")).await
.expect_err("revise on retired parent must error");
assert!(err.contains("retired"), "error should mention retirement: {err}");
}
#[tokio::test]
async fn revise_rejects_already_superseded_parent() {
let pm = PlaybookMemory::new(Arc::new(InMemory::new()));
pm.set_entries(vec![mk("pb-v1", "Nashville", "TN")]).await.unwrap();
pm.revise_entry("pb-v1", mk("pb-v2", "Nashville", "TN")).await.unwrap();
// pb-v1 is now superseded; revising it again must fail — caller
// should revise pb-v2 (the tip) instead.
let err = pm.revise_entry("pb-v1", mk("pb-v3-fake", "Nashville", "TN")).await
.expect_err("revise on superseded parent must error");
assert!(err.contains("superseded"), "error should mention supersession: {err}");
}
#[tokio::test]
async fn superseded_entries_excluded_from_boost() {
let pm = PlaybookMemory::new(Arc::new(InMemory::new()));
pm.set_entries(vec![mk("pb-v1", "Nashville", "TN")]).await.unwrap();
let mut v2 = mk("pb-v2", "Nashville", "TN");
v2.endorsed_names = vec!["Carol Davis".into()];
pm.revise_entry("pb-v1", v2).await.unwrap();
let boosts = pm.compute_boost_for_filtered_with_role(
&[1.0, 0.0, 0.0], 100, 0.5,
Some(("Nashville", "TN")), Some("Welder"),
).await;
// v1's endorsement (Alice Smith) should be absent — it was
// superseded. v2's endorsement (Carol Davis) should be present.
assert!(
!boosts.contains_key(&("Nashville".into(), "TN".into(), "Alice Smith".into())),
"superseded entry's endorsement must not boost"
);
let carol = boosts.get(&("Nashville".into(), "TN".into(), "Carol Davis".into()));
assert!(carol.is_some(), "tip version's endorsement must still boost");
assert!(carol.unwrap().citations.contains(&"pb-v2".to_string()));
}
#[tokio::test]
async fn history_walks_root_to_tip_from_any_node() {
let pm = PlaybookMemory::new(Arc::new(InMemory::new()));
pm.set_entries(vec![mk("pb-v1", "Nashville", "TN")]).await.unwrap();
pm.revise_entry("pb-v1", mk("pb-v2", "Nashville", "TN")).await.unwrap();
pm.revise_entry("pb-v2", mk("pb-v3", "Nashville", "TN")).await.unwrap();
// Starting from the root — same chain.
let chain_from_root = pm.history("pb-v1").await;
assert_eq!(chain_from_root.len(), 3);
assert_eq!(chain_from_root[0].playbook_id, "pb-v1");
assert_eq!(chain_from_root[1].playbook_id, "pb-v2");
assert_eq!(chain_from_root[2].playbook_id, "pb-v3");
// Starting from the tip — same chain, same order.
let chain_from_tip = pm.history("pb-v3").await;
assert_eq!(chain_from_tip.len(), 3);
assert_eq!(chain_from_tip[0].playbook_id, "pb-v1");
assert_eq!(chain_from_tip[2].playbook_id, "pb-v3");
// Starting from the middle — same chain.
let chain_from_mid = pm.history("pb-v2").await;
assert_eq!(chain_from_mid.len(), 3);
assert_eq!(chain_from_mid[0].playbook_id, "pb-v1");
assert_eq!(chain_from_mid[2].playbook_id, "pb-v3");
}
#[tokio::test]
async fn history_empty_for_unknown_id() {
let pm = PlaybookMemory::new(Arc::new(InMemory::new()));
pm.set_entries(vec![mk("pb-v1", "Nashville", "TN")]).await.unwrap();
assert!(pm.history("pb-nonexistent").await.is_empty());
}
#[tokio::test]
async fn status_counts_reports_superseded_separately() {
let pm = PlaybookMemory::new(Arc::new(InMemory::new()));
pm.set_entries(vec![mk("pb-v1", "Nashville", "TN")]).await.unwrap();
pm.revise_entry("pb-v1", mk("pb-v2", "Nashville", "TN")).await.unwrap();
let (total, retired, superseded, _) = pm.status_counts().await;
assert_eq!(total, 2);
assert_eq!(retired, 0);
assert_eq!(superseded, 1);
}
#[tokio::test]
async fn legacy_entries_without_version_default_to_v1() {
// Simulate state persisted before Phase 27 — no version field.
// Serde default kicks in; entries should be treated as roots.
let json = r#"{
"entries": [{
"playbook_id": "pb-legacy",
"operation": "fill: Welder x1 in Nashville, TN",
"approach": "hybrid",
"context": "",
"timestamp": "2026-04-21T00:00:00Z",
"endorsed_names": ["Alice"],
"city": "Nashville",
"state": "TN"
}],
"last_rebuilt_at": 0,
"failures": []
}"#;
let state: PlaybookMemoryState = serde_json::from_str(json).unwrap();
let legacy = &state.entries[0];
assert_eq!(legacy.version, 1);
assert!(legacy.parent_id.is_none());
assert!(legacy.superseded_at.is_none());
assert!(legacy.superseded_by.is_none());
}
}

View File

@ -42,6 +42,7 @@ async fn rerank(
system: None,
temperature: Some(0.0),
max_tokens: Some(50),
think: None,
}).await;
match resp {
@ -156,6 +157,7 @@ pub async fn query(
system: None,
temperature: Some(0.2),
max_tokens: Some(512),
think: None,
}).await?;
Ok(RagResponse {

View File

@ -126,6 +126,8 @@ pub fn router(state: VectorState) -> Router {
.route("/playbook_memory/patterns", post(discover_playbook_patterns))
.route("/playbook_memory/mark_failed", post(mark_playbook_failed))
.route("/playbook_memory/retire", post(retire_playbook_memory))
.route("/playbook_memory/revise", post(revise_playbook_memory))
.route("/playbook_memory/history/{id}", get(playbook_memory_history))
.route("/playbook_memory/status", get(playbook_memory_status))
.with_state(state)
}
@ -891,6 +893,7 @@ async fn hybrid_search(
system: None,
temperature: Some(0.2),
max_tokens: Some(512),
think: None,
}).await;
gen_resp.ok().map(|r| r.text.trim().to_string())
@ -2228,6 +2231,10 @@ async fn seed_playbook_memory(
valid_until: None,
retired_at: None,
retirement_reason: None,
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
};
let text = format!(
"{} | {} | {} | fills: {}",
@ -2287,6 +2294,10 @@ async fn seed_playbook_memory(
valid_until: req.valid_until.clone(),
retired_at: None,
retirement_reason: None,
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
};
// Phase 26 — when append=true (default), route through upsert so
@ -2481,16 +2492,140 @@ async fn retire_playbook_memory(
"supply either {playbook_id, reason} or {city, state, current_schema_fingerprint, reason}".into()))
}
/// Phase 27 — request body for `POST /playbook_memory/revise`. Same
/// shape as a seed request minus `append` (revise is always
/// append-semantics for a specific parent) plus `parent_id`. The new
/// version's `playbook_id` is derived deterministically so callers get
/// the same id back from repeated revises with identical content —
/// useful for idempotent retry paths.
#[derive(Deserialize)]
struct RevisePlaybookRequest {
parent_id: String,
operation: String,
approach: String,
context: String,
endorsed_names: Vec<String>,
#[serde(default)]
schema_fingerprint: Option<String>,
#[serde(default)]
valid_until: Option<String>,
}
/// Phase 27 — create a new version of an existing playbook. The parent
/// is marked superseded; the new entry inherits the chain via
/// `parent_id` and carries `version = parent.version + 1`. Errors with
/// 400 on a retired or already-superseded parent (must revise the tip
/// of the chain). Embeds the new text through the same shape as
/// `/seed` so cosine similarity stays comparable across rebuild + seed
/// + revise entries.
async fn revise_playbook_memory(
State(state): State<VectorState>,
Json(req): Json<RevisePlaybookRequest>,
) -> Result<Json<serde_json::Value>, (StatusCode, String)> {
let text = format!(
"{} | {} | {} | fills: {}",
req.operation, req.approach, req.context,
req.endorsed_names.join(", "),
);
let resp = state.ai_client.embed(EmbedRequest { texts: vec![text], model: None })
.await
.map_err(|e| (StatusCode::BAD_GATEWAY, format!("embed revise: {e}")))?;
if resp.embeddings.is_empty() {
return Err((StatusCode::BAD_GATEWAY, "embed returned nothing".into()));
}
let emb: Vec<f32> = resp.embeddings[0].iter().map(|&x| x as f32).collect();
let (city, state_) = {
let after_in = req.operation.split(" in ").nth(1).unwrap_or("");
let mut parts = after_in.splitn(2, ',');
let city = parts.next().map(|s| s.trim().to_string()).filter(|s| !s.is_empty());
let state = parts.next()
.map(|s| s.trim().chars().take_while(|c| c.is_ascii_alphabetic()).collect::<String>())
.filter(|s| !s.is_empty());
(city, state)
};
if city.is_none() || state_.is_none() {
return Err((StatusCode::BAD_REQUEST,
"operation must match 'fill: Role xN in City, ST' shape".into()));
}
let ts = chrono::Utc::now().to_rfc3339();
use sha2::{Digest, Sha256};
let mut h = Sha256::new();
h.update(ts.as_bytes());
h.update(b"|");
h.update(req.parent_id.as_bytes());
h.update(b"|");
h.update(req.operation.as_bytes());
let bytes = h.finalize();
let pid = format!("pb-rev-{}", bytes.iter().take(8).map(|b| format!("{b:02x}")).collect::<String>());
let new_entry = playbook_memory::PlaybookEntry {
playbook_id: pid.clone(),
operation: req.operation,
approach: req.approach,
context: req.context,
timestamp: ts,
endorsed_names: req.endorsed_names,
city, state: state_,
embedding: Some(emb),
schema_fingerprint: req.schema_fingerprint,
valid_until: req.valid_until,
retired_at: None,
retirement_reason: None,
// revise_entry overwrites these from the parent — values here
// are just placeholders so the struct is well-formed.
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
};
let outcome = state.playbook_memory.revise_entry(&req.parent_id, new_entry)
.await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
Ok(Json(serde_json::json!({
"outcome": outcome,
"entries_after": state.playbook_memory.entry_count().await,
})))
}
/// Phase 27 — return the full version chain containing `playbook_id`,
/// ordered root → tip. 404 if the id isn't present. The walker is
/// cycle-safe by construction (visited set per direction).
async fn playbook_memory_history(
State(state): State<VectorState>,
Path(playbook_id): Path<String>,
) -> Result<Json<serde_json::Value>, (StatusCode, String)> {
let chain = state.playbook_memory.history(&playbook_id).await;
if chain.is_empty() {
return Err((StatusCode::NOT_FOUND, format!("no playbook with id '{playbook_id}'")));
}
Ok(Json(serde_json::json!({
"playbook_id": playbook_id,
"versions": chain.len(),
"chain": chain,
})))
}
/// Phase 25 status endpoint — reports retirement counts so dashboards
/// can show "N playbooks retired (12 from 2026-05 schema migration)".
/// Phase 27 added `superseded` as a distinct counter.
async fn playbook_memory_status(
State(state): State<VectorState>,
) -> impl IntoResponse {
let (total, retired, failures) = state.playbook_memory.status_counts().await;
let (total, retired, superseded, failures) = state.playbook_memory.status_counts().await;
// `active` = entries eligible for boost. Retired and superseded are
// distinct exclusion reasons; subtract both. An entry can in principle
// be both retired AND superseded (e.g. revised then retired) so
// saturating_sub guards against underflow if that pathological case
// ever lands.
let inactive = retired + superseded;
Json(serde_json::json!({
"total": total,
"retired": retired,
"active": total.saturating_sub(retired),
"superseded": superseded,
"active": total.saturating_sub(inactive),
"failures": failures,
}))
}

View File

@ -188,7 +188,7 @@
- Endpoint: `POST /catalog/datasets/by-name/{name}/tombstone` accepting `{row_key_column, row_key_values[], actor, reason}`; companion `GET` lists active tombstones
- `queryd::context::build_context` wraps tombstoned tables: raw goes to `__raw__{name}`, public name becomes a DataFusion view with `WHERE CAST(col AS VARCHAR) NOT IN (...)` filter
- End-to-end on candidates: tombstone 3 IDs, COUNT drops 100,000 → 99,997, specific WHERE returns empty, AiView candidates_safe transitively excludes them too, restart preserves all tombstones
- Limits / not in MVP: physical compaction (Phase 8 doesn't yet read tombstones during merge); journal integration (tombstones don't yet emit Phase 9 mutation events — covered by audit fields on the tombstone itself)
- Limits / not in MVP: journal integration (tombstones don't yet emit Phase 9 mutation events — covered by audit fields on the tombstone itself). Physical compaction integration was closed by Phase E.2 below.
- [x] Phase D: AI-safe views — 2026-04-16
- `shared::types::AiView` — name, base_dataset, columns whitelist, optional row_filter, column_redactions
- `shared::types::Redaction` — Null | Hash | Mask { keep_prefix, keep_suffix }
@ -273,7 +273,7 @@
- [x] 19.3 — Endorsed names parsed out of `result` column, keyed by `(city, state, name)` tuple so shared names across cities don't cross-pollinate. Parsing via `parse_names` + `parse_city_state` helpers (7 unit tests)
- [x] 19.4 — `/vectors/hybrid?use_playbook_memory=true`: fetches `top_k * 5` candidates so endorsed workers outside the vanilla top-K can still climb. Boost is additive on vector score, each hit carries `playbook_boost` + `playbook_citations` in the response for explainability
- [x] 19.5 — Multi-agent orchestrator (`tests/multi-agent/orchestrator.ts`) auto-seeds `POST /vectors/playbook_memory/seed` on consensus_done, so the next query sees the new endorsement without a full `/rebuild`. Closes the feedback loop: two agents reach consensus → playbook sealed → next query re-ranks
- [x] 19.6 — `MAX_BOOST_PER_WORKER = 0.25` enforced in `compute_boost_for`; verified with unit test (100 identical playbooks → boost capped at 0.25) and live test (5 identical seeds → exactly 0.25). Time decay deferred as optional
- [x] 19.6 — `MAX_BOOST_PER_WORKER = 0.25` enforced in `compute_boost_for`; verified with unit test (100 identical playbooks → boost capped at 0.25) and live test (5 identical seeds → exactly 0.25). Time decay also wired: `BOOST_HALF_LIFE_DAYS = 30.0` — 30-day-old playbooks contribute half, 60-day a quarter, via `exp(-age_days / 30)` in the boost loop
- Real finding surfaced during build: the 32 bootstrap rows in `successful_playbooks` reference phantom worker names — 80 of 82 don't correspond to actual rows in `workers_500k`. `/seed` endpoint bypasses `successful_playbooks` so operators can prime memory with real fixtures; production path is the orchestrator write-through
- [x] **Phase 19 refinement — geo + role prefilter on boost** (2026-04-21)
- Added `compute_boost_for_filtered` and `compute_boost_for_filtered_with_role` to `playbook_memory.rs`. SQL filter's `(city, state, role)` parsed in `service.rs`; exact role-matches in target geo skip cosine and earn similarity=1.0. Restored the feedback loop: matched=0 → matched=11 per query on the same Nashville test. Citation density on Riverfront Steel: 2 → 28 per run (14×).
@ -285,7 +285,13 @@
- T3 cloud: gpt-oss:120b via Ollama Cloud — verified 4-8s latency, strict JSON-shape output for remediation.
- [x] **Phase 21: Scratchpad + Tree-Split Continuation** (2026-04-21)
- `tests/multi-agent/agent.ts`: `estimateTokens()`, `assertContextBudget()`, `generateContinuable()`, `generateTreeSplit()`. `think` flag plumbed through sidecar's `/generate`. Empty-response backoff + truncation-continuation, no max_tokens tourniquet.
- Rust port queued: `crates/aibridge/src/continuation.rs`, `tree_split.rs`.
- Rust port shipped (2026-04-21, companion to Phase 27):
- `crates/aibridge/src/context.rs``estimate_tokens` (chars/4 ceil, matches TS), `context_window_for`, `assert_context_budget` returning `Result<BudgetCheck, (BudgetCheck, usize over_by)>` so callers get the numbers back on both success and overflow. Windows table mirrors `config/models.json`.
- `crates/aibridge/src/continuation.rs``generate_continuable<G: TextGenerator>` handles the two failure modes from TS: (a) empty thinking-model response → geometric-backoff retry with 2× budget up to `budget_cap`; (b) truncated non-empty → continuation with partial-as-scratchpad. `is_structurally_complete` balances braces then JSON.parse-check for the JSON shape; text shape is "non-empty". Guards the degen case "all retries empty → bail, don't loop on empty partial" — the TS impl has this implicit, Rust makes it explicit.
- `crates/aibridge/src/tree_split.rs``generate_tree_split` map→reduce with running scratchpad. Per-shard + reduce-prompt budget checked through `assert_context_budget`; loud-fails with the overflow message rather than silently truncating. Scratchpad oldest-digest-first truncation once it exceeds `scratchpad_budget` (default 6000 tokens, matches TS).
- `TextGenerator` trait (native async-fn-in-trait, edition 2024) so `ScriptedGenerator` test double can inject canned sequences without a live Ollama. `AiClient` implements `TextGenerator`.
- `GenerateRequest` gained `think: Option<bool>` field — forwards to sidecar for per-call hidden-reasoning opt-out on hot-path JSON emitters.
- 25 aibridge unit tests (8 context + 10 continuation + 7 tree_split) — all green, no network required.
- [x] **Phase 22: Internal Knowledge Library** (2026-04-21)
- `data/_kb/` — signatures.jsonl, outcomes.jsonl, pathway_recommendations.jsonl, error_corrections.jsonl, config_snapshots.jsonl. Event-driven cycle: indexRun → recommendFor → loadRecommendation.
- Item B cloud rescue: failed event → cloud remediation JSON → retry with pivot. Verified 1/3 rescues succeeded on stress_01 (Gary IN → South Bend IN pivot).
@ -296,13 +302,32 @@
- `data/_kb/staffers.jsonl` — competence_score = 0.45·fill + 0.20·turn_eff + 0.20·cite + 0.15·rescue. Recomputed per run.
- `findNeighbors` now returns `weighted_score = cosine × max_staffer_competence`. `scripts/kb_staffer_report.py` — leaderboard + cross-staffer worker overlap (Rachel D. Lewis 12× across 4 staffers → auto-discovered high-value label).
- `gen_staffer_demo.ts` + `run_staffer_demo.sh` — 4 personas × 3 contracts = 12 runs.
- [ ] **Phase 24: Observer / Autotune integration** (GAP, not wired)
- `lakehouse-observer.service` watches MCP :3700; scenario.ts hits gateway :3100 directly. Observer idle at 0 ops across 3600+ cycles. Autotune runs on its own schedule, never sees scenario outcomes.
- Next-sprint: scenario emits per-event outcome summaries to observer's ingest path; observer ERROR_ANALYZER + PLAYBOOK_BUILDER loops consume them; autotune subscribes to the metric stream.
- [x] **Phase 27: Playbook versioning** (2026-04-21)
- `PlaybookEntry` gained `version: u32` (default 1), `parent_id`, `superseded_at`, `superseded_by` fields. All `#[serde(default)]` so entries persisted before Phase 27 load as roots with version=1.
- `PlaybookMemory::revise_entry(parent_id, new_entry)` appends a new version, stamps `superseded_at`+`superseded_by` on the parent, inherits `parent_id` and sets `version = parent + 1` on the new entry. Rejects revising a retired or already-superseded parent with a clear error — the tip of the chain is the only valid revise target.
- `PlaybookMemory::history(playbook_id)` returns the full chain root→tip, walking `parent_id` backward then `superseded_by` forward. Cycle-safe. Works from any node in the chain.
- Superseded entries excluded from boost (same rule as retired): `compute_boost_for_filtered_with_role`, the active-entries prefilter, the geo-index rebuild, and the upsert existing-entry search all skip `superseded_at.is_some()`.
- Endpoints: `POST /vectors/playbook_memory/revise` + `GET /vectors/playbook_memory/history/{id}`.
- `status_counts` now returns `(total, retired, superseded, failures)`. `/status` JSON reports `superseded` as a distinct counter; `active = total - retired - superseded`.
- 8 unit tests under `mod version_tests` covering: chain-metadata stamping, retired-parent rejection, already-superseded-parent rejection, superseded endorsement exclusion from boost, history traversal from root/tip/middle, empty-for-unknown-id, superseded-status-count, legacy-entry-default-version round-trip. 26/26 playbook_memory tests pass.
- [x] **Phase 24: Observer / Autotune integration** (2026-04-20, commit b95dd86)
- Closed the gap where `lakehouse-observer.service` wrapped MCP :3700 while `tests/multi-agent/scenario.ts` hit gateway :3100 directly — observer sat idle at 0 ops across 3600+ cycles.
- `observer.ts` gained a Bun HTTP listener on `OBSERVER_PORT` (default 3800) with `GET /health`, `GET /stats` (totals + by_source + recent scenario digest), and `POST /event` for scenario outcomes. Body shapes into `ObservedOp` with `source="scenario"` + `staffer_id` + `sig_hash` + `event_kind` + geo + rescue flags.
- `recordExternalOp()` shared ring-buffer insert — ERROR_ANALYZER and PLAYBOOK_BUILDER loops now see both MCP-wrapped and scenario-posted ops through the same path.
- `persistOp()` swap: old path wrote via `/ingest/file?name=observed_operations` which has REPLACE semantics (wiped prior ops); now uses append-friendly Parquet write-through.
- [x] **Phase 25: Validity windows + playbook retirement** (2026-04-21, commit e0a843d)
- `PlaybookEntry` gained four optional fields (`#[serde(default)]` so legacy entries load as never-expiring): `schema_fingerprint` (SHA-256 over target dataset columns at seed time), `valid_until` (RFC3339 hard expiry), `retired_at` (set by retire calls), `retirement_reason` (human string).
- `compute_boost_for_filtered_with_role` now skips retired + expired entries before geo/cosine — no silent boosting from stale playbooks. Unit-tested on expired `valid_until` + retired + schema-drift retirement.
- Two retirement paths: `retire_one(playbook_id, reason)` for manual, `retire_on_schema_drift(city, state, current_fingerprint, reason)` for batch schema-migration sweep. Legacy entries without a fingerprint skip drift retirement (safe).
- Endpoint: `POST /vectors/playbook_memory/retire` — accepts either `{playbook_id, reason}` or `{city, state, current_schema_fingerprint, reason}`.
- [x] **Phase 26: Mem0 upsert + Letta geo hot cache** (2026-04-21, commit 640db8c)
- Mem0-style upsert: `/seed` with `append=true` (default) routes through `upsert_entry`, which decides ADD / UPDATE / NOOP on (operation, day, city, state). Same-day re-seed merges names (union, stable order) instead of duplicating the row. Identical re-seed is a no-op. Different-day same-operation is a fresh ADD. Playbook_id stays stable on UPDATE so prior citations remain valid.
- Letta-style hot cache: `PlaybookMemory` now holds a `geo_index: HashMap<(city_lower, state_upper), Vec<entry_idx>>` rebuilt on every mutation. Geo-filtered boost queries skip the full scan and hit the O(1) key lookup. At 1.9K entries the full scan was sub-ms; the index scales the same path to 100K+ without code changes.
- `UpsertOutcome` enum reported back to callers — `{mode: added|updated|noop, playbook_id, merged_names?}` + `entries_after`.
- [ ] Fine-tuned domain models (Phase 25+)
- [ ] Multi-node query distribution (only if ceilings bite)
---
**102 unit tests | 13 crates | 20 ADRs | 2.47M rows | 100K vectors | Hybrid Parquet+HNSW ⊕ Lance | Phase 19 refined + 20-23 shipped**
**Latest: 2026-04-21 — Phases 20-23 shipped. Geo+role prefilter lifted playbook citation density 14×. Cloud rescue converts zero-supply failures into successful pivots. Staffer competence weighting differentiates full-tool senior from minimal-tool trainee by 46.4pt fill rate on same contracts. Phase 24 observer integration flagged as honest gap.**
**145 unit tests | 13 crates | 21 ADRs | 2.47M rows | 100K vectors | Hybrid Parquet+HNSW ⊕ Lance | Phases 0-27 shipped**
**Latest: 2026-04-21 — Phase 27 (playbook versioning: `version` + `parent_id` + `superseded_at` + `superseded_by` on `PlaybookEntry`, `/revise` + `/history` endpoints, 8 new tests). Doc-sync pass: Phase 24 observer + Phase 25 validity windows + Phase 26 Mem0/Letta now reflected in phase tracker. Phase 19.6 time decay noted as wired (was misdocumented as deferred). Phase E.2 tombstone-at-compaction noted as closed in Phase 8 MVP limits.**

View File

@ -514,22 +514,29 @@ Answers "who handled this" as a first-class dimension of the matrix index. Senio
- `scripts/run_staffer_demo.sh` — sequential batch with cloud T3.
- `scripts/kb_staffer_report.py` — leaderboard + top/bottom differential + cross-staffer overlap.
### Phase 24: Observer / Autotune integration (NOT YET WIRED — honest gap)
### Phase 24: Observer / Autotune integration (SHIPPED 2026-04-20, commit b95dd86)
J flagged this 2026-04-21 evening: the `lakehouse-observer.service` systemd unit has been running for 3600+ cycles but shows `total_ops=0 successes=0 failures=0` because `tests/multi-agent/scenario.ts` hits the Rust gateway directly on port 3100, bypassing the Bun MCP layer on 3700 that observer wraps.
The gap: `lakehouse-observer.service` wrapped MCP :3700, while `tests/multi-agent/scenario.ts` hit gateway :3100 directly. Observer idle at 0 ops across 3600+ cycles — scenarios invisible to ERROR_ANALYZER and PLAYBOOK_BUILDER, autotune running blind to outcomes.
Result: our test scenarios are INVISIBLE to the observer and the autotune pipeline. Autotune's HNSW parameter learning runs on its own schedule, but no signal from scenario outcomes flows into it.
**What shipped:**
- `observer.ts` Bun HTTP listener on `OBSERVER_PORT` (default 3800): `GET /health`, `GET /stats` (totals, by_source, recent scenario digest), `POST /event` for scenario outcomes.
- `ObservedOp` carries provenance — `source="scenario" | "mcp"` + `staffer_id` + `sig_hash` + `event_kind` + geo + rescue flags.
- `recordExternalOp()` — shared ring-buffer insert; main analyzer + playbook builder no longer care where the op came from.
- `persistOp()` fix: old path POSTed to `/ingest/file?name=observed_operations` which has REPLACE semantics (wiped prior ops); now uses append-friendly write-through.
**Target architecture:**
- Scenarios emit per-event outcome summaries to a path the observer polls (or POST to observer's ingest endpoint directly).
- Observer's ERROR ANALYZER + PLAYBOOK BUILDER loops consume those summaries alongside the MCP-layer ops.
- Autotune agent subscribes to a metric stream the observer writes.
### Phase 25: Validity windows + playbook retirement (SHIPPED 2026-04-21, commit e0a843d)
**Why deferred:** this is a real architecture change (coherent data path from scenario → observer → autotune → vectord index) and needs care. The observer's current `observed_operations` ingest uses REPLACE semantics (flagged in `feedback_ingest_replace_semantics.md`) — naive appending will wipe prior ops.
Zep 2026-era finding: temporal validity is the single highest-value memory-hygiene primitive. `PlaybookEntry` gained `schema_fingerprint` / `valid_until` / `retired_at` / `retirement_reason`. `compute_boost_for_filtered_with_role` skips retired + expired before geo/cosine ranking. Two retirement paths: `retire_one(id, reason)` for manual, `retire_on_schema_drift(city, state, fp, reason)` for batch migration sweep. Endpoint: `POST /vectors/playbook_memory/retire`.
**Status:** GAP DOCUMENTED, not fixed. Scenarios continue to populate KB directly. The parallel pipelines are coherent but separate; Phase 24 connects them.
### Phase 26: Mem0 upsert + Letta geo hot cache (SHIPPED 2026-04-21, commit 640db8c)
### Phase 25+: Further horizon
Same-day re-seed no longer duplicates rows. `/seed` with `append=true` routes through `upsert_entry` which decides ADD / UPDATE / NOOP on `(operation, day, city, state)`. Playbook_id stays stable on UPDATE so existing citations remain valid. `PlaybookMemory.geo_index: HashMap<(city, state), Vec<idx>>` rebuilt on every mutation; geo-filtered boost queries skip the scan and hit O(1) lookup — sub-ms at current scale, same code path scales to 100K+ entries.
### Phase 27: Playbook versioning (SHIPPED 2026-04-21)
`PlaybookEntry` gained `version: u32` (default 1), `parent_id`, `superseded_at`, `superseded_by` — all `#[serde(default)]` so pre-Phase-27 state loads as roots. `revise_entry(parent_id, new_entry)` appends a new version, stamps the parent superseded, rejects revising a retired or already-superseded parent. `history(id)` returns the root→tip chain from any node. Superseded entries excluded from boost (same rule as retired). Endpoints: `POST /vectors/playbook_memory/revise`, `GET /vectors/playbook_memory/history/{id}`. `/status` reports `superseded` as a distinct counter. 8 new tests; 51/51 vectord lib tests green.
### Phase 28+: Further horizon
- Specialized fine-tuned models per domain (staffing matcher, resume parser)
- Video/audio transcript ingest + multimodal embeddings