lakehouse/crates/aibridge/src/tree_split.rs
profit a6f12e2609 Phase 21 Rust port + Phase 27 playbook versioning + doc-sync
Phase 21 — Rust port of scratchpad + tree-split primitives (companion to
the 2026-04-21 TS shipment). New crates/aibridge modules:

  context.rs       — estimate_tokens (chars/4 ceil), context_window_for,
                     assert_context_budget returning a BudgetCheck with
                     numeric diagnostics on both success and overflow.
                     Windows table mirrors config/models.json.
  continuation.rs  — generate_continuable<G: TextGenerator>. Handles the
                     two failure modes: empty-response from thinking
                     models (geometric 2x budget backoff up to budget_cap)
                     and truncated-non-empty (continuation with partial
                     as scratchpad). is_structurally_complete balances
                     braces then JSON.parse-checks. Guards the degen case
                     "all retries empty, don't loop on empty partial".
  tree_split.rs    — generate_tree_split map->reduce with running
                     scratchpad. Per-shard + reduce-prompt go through
                     assert_context_budget first; loud-fails rather than
                     silently truncating. Oldest-digest-first scratchpad
                     truncation at scratchpad_budget (default 6000 t).

TextGenerator trait (native async-fn-in-trait, edition 2024). AiClient
implements it; ScriptedGenerator test double lets tests inject canned
sequences without a live Ollama.

GenerateRequest gained think: Option<bool> — forwards to sidecar for
per-call hidden-reasoning opt-out on hot-path JSON emitters. Three
existing callsites updated (rag.rs x2, service.rs hybrid answer).

Phase 27 — Playbook versioning. PlaybookEntry gained four optional
fields (all #[serde(default)] so pre-Phase-27 state loads as roots):

  version           u32, default 1
  parent_id         Option<String>, previous version's playbook_id
  superseded_at     Option<String>, set when newer version replaces
  superseded_by     Option<String>, the playbook_id that replaced

New methods:

  revise_entry(parent_id, new_entry) — appends new version, stamps
    superseded_at+superseded_by on parent, inherits parent_id and sets
    version = parent + 1 on the new entry. Rejects revising a retired
    or already-superseded parent (tip-of-chain is the only valid
    revise target).
  history(playbook_id) — returns full chain root->tip from any node.
    Walks parent_id back to root, then superseded_by forward to tip.
    Cycle-safe.

Superseded entries excluded from boost (same rule as retired): filter
in compute_boost_for_filtered_with_role (both active-entries prefilter
and geo-filtered path), rebuild_geo_index, and upsert_entry's existing-
idx search. status_counts returns (total, retired, superseded, failures);
/status JSON reports active = total - retired - superseded.

Endpoints:
  POST /vectors/playbook_memory/revise
  GET  /vectors/playbook_memory/history/{id}

Doc-sync — PHASES.md + PRD.md drifted from git after Phases 24-26
shipped. Fixes applied:

  - Phase 24 marked shipped (commit b95dd86) with detail of observer
    HTTP ingest + scenario outcome streaming. PRD "NOT YET WIRED"
    rewritten to reflect shipped state.
  - Phase 25 (validity windows, commit e0a843d) added to PHASES +
    PRD.
  - Phase 26 (Mem0 upsert + Letta hot cache, commit 640db8c) added.
  - Phase 27 entry added to both docs.
  - Phase 19.6 time decay corrected: was documented as "deferred",
    actually wired via BOOST_HALF_LIFE_DAYS = 30.0 in playbook_memory.rs.
  - Phase E/Phase 8 tombstone-at-compaction limit note updated —
    Phase E.2 closed it.

Tests: 8 new version_tests in vectord (chain-metadata stamping,
retired/superseded parent rejection, boost exclusion, history from
root/tip/middle, legacy default round-trip, status counts). 25 new
aibridge tests (context/continuation/tree_split). Workspace total
145 green (was 120).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:40:49 -05:00

327 lines
13 KiB
Rust

//! Phase 21 — INPUT-overflow handler. Ports `generateTreeSplit` from
//! `tests/multi-agent/agent.ts`.
//!
//! When the input corpus exceeds the model's window (200 playbooks
//! pasted into a T4 strategic prompt, a long retrospective digest, a
//! cross-corpus summarization), raising `max_tokens` doesn't help —
//! the prompt itself is the problem. The answer is map-reduce:
//!
//! 1. Caller shards the input at semantic boundaries (records,
//! paragraphs, playbook entries).
//! 2. For each shard, build a map prompt that includes the running
//! scratchpad and run it through `generate_continuable`.
//! 3. Append the map output to the scratchpad (oldest-first
//! truncation when it outgrows `scratchpad_budget`).
//! 4. Build a reduce prompt from the final scratchpad and run it.
//!
//! Every shard prompt and the reduce prompt go through
//! `assert_context_budget` first — if a single shard still overflows
//! we bubble the error up rather than silently truncating. That's the
//! whole point of Phase 21.
use crate::context::{assert_context_budget, BudgetOpts, estimate_tokens, overflow_message,
DEFAULT_MAX_TOKENS, DEFAULT_SAFETY_MARGIN};
use crate::continuation::{generate_continuable, ContinuableOpts, ResponseShape, TextGenerator};
/// Callback signatures — caller supplies closures that stitch the
/// scratchpad into each shard's prompt and build the final reduce
/// prompt. Kept as `Fn` (not `FnMut`) so the map loop can call them
/// by reference.
pub type MapPromptFn<'a> = dyn Fn(&str, &str) -> String + Send + Sync + 'a;
pub type ReducePromptFn<'a> = dyn Fn(&str) -> String + Send + Sync + 'a;
/// Knobs for `generate_tree_split`.
#[derive(Debug, Clone)]
pub struct TreeSplitOpts {
pub model: String,
pub system: Option<String>,
pub temperature: Option<f64>,
/// max_tokens for map AND reduce (reduce defaults are usually
/// higher; caller overrides for just reduce by calling through
/// continuable directly if needed).
pub max_tokens: Option<u32>,
pub reduce_max_tokens: Option<u32>,
pub think: Option<bool>,
/// Soft ceiling on scratchpad size (estimated tokens). When it
/// grows past this, the oldest shard digest gets dropped. Default
/// 6000, matching the TS implementation.
pub scratchpad_budget: usize,
pub safety_margin: Option<usize>,
}
impl TreeSplitOpts {
pub fn new(model: impl Into<String>) -> Self {
Self {
model: model.into(),
system: None,
temperature: None,
max_tokens: None,
reduce_max_tokens: None,
think: None,
scratchpad_budget: 6_000,
safety_margin: None,
}
}
}
/// Result — final reduce response plus the accumulated scratchpad so
/// the caller can inspect what was kept vs truncated.
#[derive(Debug, Clone)]
pub struct TreeSplitResult {
pub response: String,
pub scratchpad: String,
pub shards_processed: usize,
pub scratchpad_truncations: usize,
pub total_continuations: usize,
}
/// Drop shard-digest blocks from the head of `scratchpad` until its
/// estimated-token count fits the budget. Digest blocks are delimited
/// by `\n— shard N/M digest —\n` so we can find the first one and
/// chop everything before its successor.
fn truncate_scratchpad(scratchpad: &mut String, budget_tokens: usize) -> bool {
if estimate_tokens(scratchpad) <= budget_tokens { return false; }
// Find the second delimiter — everything before it gets dropped.
const DELIM_PREFIX: &str = "\n— shard ";
let mut cursor = 0;
let mut truncated = false;
while estimate_tokens(&scratchpad[cursor..]) > budget_tokens {
// Skip past a leading delimiter (if we're sitting on one from
// a previous iteration), then find the next.
let search_from = cursor + if scratchpad[cursor..].starts_with(DELIM_PREFIX) {
DELIM_PREFIX.len()
} else { 0 };
let Some(rel_next) = scratchpad[search_from..].find(DELIM_PREFIX) else { break };
cursor = search_from + rel_next;
truncated = true;
}
if cursor > 0 {
scratchpad.drain(..cursor);
}
truncated
}
/// Phase 21 — map-reduce over shards with a running scratchpad. See
/// module docs.
pub async fn generate_tree_split<G: TextGenerator>(
generator: &G,
shards: &[String],
map_prompt: &MapPromptFn<'_>,
reduce_prompt: &ReducePromptFn<'_>,
opts: &TreeSplitOpts,
) -> Result<TreeSplitResult, String> {
let mut scratchpad = String::new();
let safety = opts.safety_margin.unwrap_or(DEFAULT_SAFETY_MARGIN);
let map_max = opts.max_tokens.unwrap_or(DEFAULT_MAX_TOKENS as u32);
let reduce_max = opts.reduce_max_tokens.unwrap_or(1_500);
let mut truncations = 0usize;
let mut total_continuations = 0usize;
for (i, shard) in shards.iter().enumerate() {
let shard_prompt = map_prompt(shard, &scratchpad);
// Loud-fail on per-shard overflow — caller sharded too
// coarsely. Silent truncation is exactly the mode J rejected.
let budget = BudgetOpts {
system: opts.system.as_deref(),
max_tokens: Some(map_max as usize),
safety_margin: Some(safety),
bypass: false,
};
let check = assert_context_budget(&opts.model, &shard_prompt, budget)
.map_err(|(c, over)| overflow_message(&opts.model, &c, over, safety))?;
let _ = check;
let mut cont_opts = ContinuableOpts::new(&opts.model);
cont_opts.max_tokens = Some(map_max);
cont_opts.temperature = opts.temperature;
cont_opts.system = opts.system.clone();
cont_opts.shape = ResponseShape::Text;
cont_opts.think = opts.think;
let outcome = generate_continuable(generator, &shard_prompt, &cont_opts).await?;
total_continuations += outcome.continuations;
// Append this shard's digest and, if needed, drop oldest.
scratchpad.push_str(&format!(
"\n— shard {}/{} digest —\n{}",
i + 1, shards.len(), outcome.text.trim(),
));
if truncate_scratchpad(&mut scratchpad, opts.scratchpad_budget) {
truncations += 1;
}
}
// Reduce pass. Budget check first — if the scratchpad is still too
// big for the reduce prompt we fail loud with numbers.
let reduce_p = reduce_prompt(&scratchpad);
let budget = BudgetOpts {
system: opts.system.as_deref(),
max_tokens: Some(reduce_max as usize),
safety_margin: Some(safety),
bypass: false,
};
assert_context_budget(&opts.model, &reduce_p, budget)
.map_err(|(c, over)| overflow_message(&opts.model, &c, over, safety))?;
let mut cont_opts = ContinuableOpts::new(&opts.model);
cont_opts.max_tokens = Some(reduce_max);
cont_opts.temperature = opts.temperature;
cont_opts.system = opts.system.clone();
cont_opts.shape = ResponseShape::Text;
cont_opts.think = opts.think;
let outcome = generate_continuable(generator, &reduce_p, &cont_opts).await?;
total_continuations += outcome.continuations;
Ok(TreeSplitResult {
response: outcome.text,
scratchpad,
shards_processed: shards.len(),
scratchpad_truncations: truncations,
total_continuations,
})
}
#[cfg(test)]
mod tests {
use super::*;
use crate::continuation::ScriptedGenerator;
fn simple_map(shard: &str, scratchpad: &str) -> String {
format!("SCRATCHPAD:\n{scratchpad}\n---\nSHARD:\n{shard}\n---\nDIGEST:")
}
fn simple_reduce(scratchpad: &str) -> String {
format!("SCRATCHPAD:\n{scratchpad}\n---\nFINAL:")
}
#[tokio::test]
async fn tree_split_runs_map_then_reduce() {
// 3 shards → 3 map calls → 1 reduce call = 4 responses scripted.
let generator = ScriptedGenerator::new(vec![
Ok("digest-1".to_string()),
Ok("digest-2".to_string()),
Ok("digest-3".to_string()),
Ok("FINAL ANSWER".to_string()),
]);
let shards: Vec<String> = ["a", "b", "c"].iter().map(|s| s.to_string()).collect();
let opts = TreeSplitOpts::new("qwen3:latest");
let map_fn: &MapPromptFn = &simple_map;
let reduce_fn: &ReducePromptFn = &simple_reduce;
let result = generate_tree_split(&generator, &shards, map_fn, reduce_fn, &opts)
.await
.unwrap();
assert_eq!(result.shards_processed, 3);
assert_eq!(result.response, "FINAL ANSWER");
assert_eq!(generator.call_count(), 4);
// Scratchpad must carry all three digests in order.
assert!(result.scratchpad.contains("digest-1"));
assert!(result.scratchpad.contains("digest-2"));
assert!(result.scratchpad.contains("digest-3"));
}
#[tokio::test]
async fn tree_split_reduce_prompt_sees_full_scratchpad() {
let generator = ScriptedGenerator::new(vec![
Ok("summary-A".to_string()),
Ok("summary-B".to_string()),
Ok("REDUCED".to_string()),
]);
let shards = vec!["input-one".to_string(), "input-two".to_string()];
let opts = TreeSplitOpts::new("qwen3:latest");
let _ = generate_tree_split(&generator, &shards, &simple_map, &simple_reduce, &opts)
.await
.unwrap();
// Third call = reduce. Its prompt must include both digests.
let calls = generator.calls();
let reduce_prompt = &calls[2].prompt;
assert!(reduce_prompt.contains("summary-A"),
"reduce prompt must see first shard digest");
assert!(reduce_prompt.contains("summary-B"),
"reduce prompt must see second shard digest");
}
#[tokio::test]
async fn tree_split_loud_fails_on_shard_overflow() {
let generator = ScriptedGenerator::new(vec![Ok("digest".to_string())]);
// One gigantic shard — well over qwen3's 40K window even as a
// prompt. The budget check must reject before any generate call.
let shards = vec!["x".repeat(200_000)];
let opts = TreeSplitOpts::new("qwen3:latest");
let err = generate_tree_split(&generator, &shards, &simple_map, &simple_reduce, &opts)
.await
.expect_err("shard-sized overflow must be rejected");
assert!(err.contains("overflow"), "error should mention overflow: {err}");
assert_eq!(generator.call_count(), 0, "generate must not be called on overflow");
}
#[tokio::test]
async fn tree_split_truncates_scratchpad_when_over_budget() {
// Tight budget so each shard trips truncation. qwen3's 40K
// window is fine; the budget we care about is the scratchpad
// cap, not the model window.
let generator = ScriptedGenerator::new(vec![
Ok("A".repeat(2_000)),
Ok("B".repeat(2_000)),
Ok("C".repeat(2_000)),
Ok("D".repeat(2_000)),
Ok("FINAL".to_string()),
]);
let shards: Vec<String> = (0..4).map(|i| format!("shard{i}")).collect();
let mut opts = TreeSplitOpts::new("qwen3:latest");
opts.scratchpad_budget = 1_000; // ~4000 chars — one digest barely fits
let result = generate_tree_split(&generator, &shards, &simple_map, &simple_reduce, &opts)
.await
.unwrap();
assert!(result.scratchpad_truncations > 0,
"tight budget must trigger truncation");
// Scratchpad should still fit roughly within the budget
// (post-truncation); the estimator uses chars/4 so the bound
// is ~budget*4 chars. Give some slack for the delimiter.
let scratchpad_tokens = estimate_tokens(&result.scratchpad);
assert!(scratchpad_tokens <= opts.scratchpad_budget * 2,
"scratchpad {} tokens vs budget {}", scratchpad_tokens, opts.scratchpad_budget);
}
#[tokio::test]
async fn tree_split_reports_continuations_from_map_and_reduce() {
// First shard: truncated-then-continued. Reduce: truncated-then-continued.
// 1 shard: 2 map calls (initial + continuation), then 2 reduce calls.
let generator = ScriptedGenerator::new(vec![
Ok("partial".to_string()), // map shape=text, non-empty → complete on first pass
Ok("reduce-out".to_string()),
]);
let shards = vec!["only".to_string()];
let opts = TreeSplitOpts::new("qwen3:latest");
let result = generate_tree_split(&generator, &shards, &simple_map, &simple_reduce, &opts)
.await
.unwrap();
// Text shape treats non-empty as complete → 0 continuations.
assert_eq!(result.total_continuations, 0);
assert_eq!(result.shards_processed, 1);
}
#[test]
fn truncate_scratchpad_noop_when_under_budget() {
let mut s = "\n— shard 1/1 digest —\nshort".to_string();
let truncated = truncate_scratchpad(&mut s, 1_000);
assert!(!truncated);
assert!(s.contains("short"));
}
#[test]
fn truncate_scratchpad_drops_oldest_first() {
let mut s = format!(
"\n— shard 1/3 digest —\n{}\n— shard 2/3 digest —\n{}\n— shard 3/3 digest —\nshort",
"x".repeat(4_000), // ~1000 tokens
"y".repeat(4_000), // ~1000 tokens
);
let truncated = truncate_scratchpad(&mut s, 500); // ~2000 chars
assert!(truncated);
assert!(!s.contains(&"x".repeat(4_000)),
"oldest digest should be dropped");
assert!(s.contains("short"),
"newest digest should survive");
}
}