Phase 21 — Rust port of scratchpad + tree-split primitives (companion to
the 2026-04-21 TS shipment). New crates/aibridge modules:
context.rs — estimate_tokens (chars/4 ceil), context_window_for,
assert_context_budget returning a BudgetCheck with
numeric diagnostics on both success and overflow.
Windows table mirrors config/models.json.
continuation.rs — generate_continuable<G: TextGenerator>. Handles the
two failure modes: empty-response from thinking
models (geometric 2x budget backoff up to budget_cap)
and truncated-non-empty (continuation with partial
as scratchpad). is_structurally_complete balances
braces then JSON.parse-checks. Guards the degen case
"all retries empty, don't loop on empty partial".
tree_split.rs — generate_tree_split map->reduce with running
scratchpad. Per-shard + reduce-prompt go through
assert_context_budget first; loud-fails rather than
silently truncating. Oldest-digest-first scratchpad
truncation at scratchpad_budget (default 6000 t).
TextGenerator trait (native async-fn-in-trait, edition 2024). AiClient
implements it; ScriptedGenerator test double lets tests inject canned
sequences without a live Ollama.
GenerateRequest gained think: Option<bool> — forwards to sidecar for
per-call hidden-reasoning opt-out on hot-path JSON emitters. Three
existing callsites updated (rag.rs x2, service.rs hybrid answer).
Phase 27 — Playbook versioning. PlaybookEntry gained four optional
fields (all #[serde(default)] so pre-Phase-27 state loads as roots):
version u32, default 1
parent_id Option<String>, previous version's playbook_id
superseded_at Option<String>, set when newer version replaces
superseded_by Option<String>, the playbook_id that replaced
New methods:
revise_entry(parent_id, new_entry) — appends new version, stamps
superseded_at+superseded_by on parent, inherits parent_id and sets
version = parent + 1 on the new entry. Rejects revising a retired
or already-superseded parent (tip-of-chain is the only valid
revise target).
history(playbook_id) — returns full chain root->tip from any node.
Walks parent_id back to root, then superseded_by forward to tip.
Cycle-safe.
Superseded entries excluded from boost (same rule as retired): filter
in compute_boost_for_filtered_with_role (both active-entries prefilter
and geo-filtered path), rebuild_geo_index, and upsert_entry's existing-
idx search. status_counts returns (total, retired, superseded, failures);
/status JSON reports active = total - retired - superseded.
Endpoints:
POST /vectors/playbook_memory/revise
GET /vectors/playbook_memory/history/{id}
Doc-sync — PHASES.md + PRD.md drifted from git after Phases 24-26
shipped. Fixes applied:
- Phase 24 marked shipped (commit b95dd86) with detail of observer
HTTP ingest + scenario outcome streaming. PRD "NOT YET WIRED"
rewritten to reflect shipped state.
- Phase 25 (validity windows, commit e0a843d) added to PHASES +
PRD.
- Phase 26 (Mem0 upsert + Letta hot cache, commit 640db8c) added.
- Phase 27 entry added to both docs.
- Phase 19.6 time decay corrected: was documented as "deferred",
actually wired via BOOST_HALF_LIFE_DAYS = 30.0 in playbook_memory.rs.
- Phase E/Phase 8 tombstone-at-compaction limit note updated —
Phase E.2 closed it.
Tests: 8 new version_tests in vectord (chain-metadata stamping,
retired/superseded parent rejection, boost exclusion, history from
root/tip/middle, legacy default round-trip, status counts). 25 new
aibridge tests (context/continuation/tree_split). Workspace total
145 green (was 120).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
195 lines
7.3 KiB
Rust
195 lines
7.3 KiB
Rust
//! Phase 21 — context-budget accounting for model calls.
|
|
//!
|
|
//! Ports `assertContextBudget` + `estimateTokens` + `CONTEXT_WINDOWS`
|
|
//! from `tests/multi-agent/agent.ts` so Rust-side callers (gateway
|
|
//! tool surfaces, future Rust agents) get the same loud-fail behavior
|
|
//! on window overflow instead of silent truncation.
|
|
//!
|
|
//! The token estimator is deliberately the same chars/4 heuristic as
|
|
//! the TS side. It's biased ~15% safe — pessimistic on English, correct
|
|
//! within a factor of 2 on code. Swap to a provider tokenizer only when
|
|
//! the estimator drives a decision (we're nowhere near that yet).
|
|
|
|
use std::collections::HashMap;
|
|
use std::sync::OnceLock;
|
|
|
|
/// Rough token count. `chars / 4` ceiling. See module docs for why
|
|
/// this heuristic is sufficient.
|
|
pub fn estimate_tokens(text: &str) -> usize {
|
|
(text.chars().count() + 3) / 4
|
|
}
|
|
|
|
/// Phase 21 — per-model context windows, mirroring the TS table in
|
|
/// `tests/multi-agent/agent.ts`. Anchored on each model's documented
|
|
/// max; unknown models fall back to `DEFAULT_CONTEXT_WINDOW`.
|
|
pub const DEFAULT_CONTEXT_WINDOW: usize = 32_768;
|
|
pub const DEFAULT_SAFETY_MARGIN: usize = 2_000;
|
|
pub const DEFAULT_MAX_TOKENS: usize = 800;
|
|
|
|
fn known_windows() -> &'static HashMap<&'static str, usize> {
|
|
static TABLE: OnceLock<HashMap<&'static str, usize>> = OnceLock::new();
|
|
TABLE.get_or_init(|| {
|
|
let mut m = HashMap::new();
|
|
m.insert("mistral:latest", 32_768);
|
|
m.insert("qwen2.5:latest", 32_768);
|
|
m.insert("qwen3:latest", 40_960);
|
|
m.insert("qwen3.5:latest", 262_144);
|
|
m.insert("qwen3-embedding", 32_768);
|
|
m.insert("nomic-embed-text-v2-moe", 2_048);
|
|
m.insert("gpt-oss:20b", 131_072);
|
|
m.insert("gpt-oss:120b", 131_072);
|
|
m.insert("qwen3.5:397b", 131_072);
|
|
m.insert("kimi-k2-thinking", 200_000);
|
|
m.insert("kimi-k2.6", 200_000);
|
|
m.insert("kimi-k2:1t", 1_048_576);
|
|
m.insert("deepseek-v3.1:671b", 131_072);
|
|
m.insert("glm-4.7", 131_072);
|
|
m
|
|
})
|
|
}
|
|
|
|
pub fn context_window_for(model: &str) -> usize {
|
|
known_windows().get(model).copied().unwrap_or(DEFAULT_CONTEXT_WINDOW)
|
|
}
|
|
|
|
/// Result of a budget check — exposes the numbers so callers can log
|
|
/// how much headroom remains without re-running the estimator.
|
|
#[derive(Debug, Clone, Copy)]
|
|
pub struct BudgetCheck {
|
|
pub estimated: usize,
|
|
pub window: usize,
|
|
pub remaining: i64,
|
|
}
|
|
|
|
/// Inputs to `assert_context_budget`. `bypass` exists for call sites
|
|
/// that handle their own overflow (continuation's second pass already
|
|
/// counted the partial; T5 gatekeeper prompts have a separate policy).
|
|
#[derive(Debug, Clone, Default)]
|
|
pub struct BudgetOpts<'a> {
|
|
pub system: Option<&'a str>,
|
|
pub max_tokens: Option<usize>,
|
|
pub safety_margin: Option<usize>,
|
|
pub bypass: bool,
|
|
}
|
|
|
|
/// Phase 21's loud-fail primitive. Returns a `BudgetCheck` on success
|
|
/// and the same struct plus over-by count on failure. The whole point
|
|
/// is to stop silent truncation — callers that expect overflow should
|
|
/// chunk BEFORE calling or set `bypass: true`.
|
|
pub fn assert_context_budget(
|
|
model: &str,
|
|
prompt: &str,
|
|
opts: BudgetOpts,
|
|
) -> Result<BudgetCheck, (BudgetCheck, usize)> {
|
|
let window = context_window_for(model);
|
|
let safety = opts.safety_margin.unwrap_or(DEFAULT_SAFETY_MARGIN);
|
|
let max_tokens = opts.max_tokens.unwrap_or(DEFAULT_MAX_TOKENS);
|
|
let sys_tokens = opts.system.map(estimate_tokens).unwrap_or(0);
|
|
let estimated = estimate_tokens(prompt) + sys_tokens + max_tokens;
|
|
let remaining = window as i64 - estimated as i64 - safety as i64;
|
|
let check = BudgetCheck { estimated, window, remaining };
|
|
if remaining < 0 && !opts.bypass {
|
|
return Err((check, (-remaining) as usize));
|
|
}
|
|
Ok(check)
|
|
}
|
|
|
|
/// Convenience — format an overflow error the same way the TS side
|
|
/// does. Exposed so downstream crates render consistent messages.
|
|
pub fn overflow_message(model: &str, check: &BudgetCheck, over_by: usize, safety: usize) -> String {
|
|
format!(
|
|
"context overflow: model={} est={}t window={}t safety={}t over={}t. \
|
|
Chunk the prompt (see config/models.json overflow_policies) or set \
|
|
bypass:true if you know the risk.",
|
|
model, check.estimated, check.window, safety, over_by,
|
|
)
|
|
}
|
|
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use super::*;
|
|
|
|
#[test]
|
|
fn estimate_tokens_ceiling_divides_by_four() {
|
|
assert_eq!(estimate_tokens(""), 0);
|
|
assert_eq!(estimate_tokens("abc"), 1); // 3 → ceil(3/4) = 1
|
|
assert_eq!(estimate_tokens("abcd"), 1); // 4 → ceil(4/4) = 1
|
|
assert_eq!(estimate_tokens("abcde"), 2); // 5 → ceil(5/4) = 2
|
|
assert_eq!(estimate_tokens(&"x".repeat(400)), 100);
|
|
}
|
|
|
|
#[test]
|
|
fn context_window_known_and_fallback() {
|
|
assert_eq!(context_window_for("qwen3.5:latest"), 262_144);
|
|
assert_eq!(context_window_for("kimi-k2:1t"), 1_048_576);
|
|
assert_eq!(context_window_for("some-unreleased-model"), DEFAULT_CONTEXT_WINDOW);
|
|
}
|
|
|
|
#[test]
|
|
fn budget_passes_well_under_window() {
|
|
let check = assert_context_budget(
|
|
"qwen3:latest",
|
|
&"x".repeat(4_000), // ~1000 tokens
|
|
BudgetOpts { max_tokens: Some(500), ..Default::default() },
|
|
).expect("well under 40K window");
|
|
assert!(check.remaining > 30_000);
|
|
}
|
|
|
|
#[test]
|
|
fn budget_fails_when_prompt_overflows_window() {
|
|
let huge = "x".repeat(200_000); // ~50K tokens, over qwen3's 40K
|
|
let err = assert_context_budget(
|
|
"qwen3:latest",
|
|
&huge,
|
|
BudgetOpts::default(),
|
|
).expect_err("should overflow qwen3's 40K window");
|
|
assert!(err.1 > 0, "over_by must be positive");
|
|
}
|
|
|
|
#[test]
|
|
fn budget_bypass_returns_ok_even_over() {
|
|
let huge = "x".repeat(200_000);
|
|
let check = assert_context_budget(
|
|
"qwen3:latest",
|
|
&huge,
|
|
BudgetOpts { bypass: true, ..Default::default() },
|
|
).expect("bypass must suppress the error");
|
|
assert!(check.remaining < 0, "check still reports negative remaining");
|
|
}
|
|
|
|
#[test]
|
|
fn budget_counts_system_prompt() {
|
|
// 10K-char system prompt → ~2500 tokens. With a big max_tokens
|
|
// this should push us closer to the window.
|
|
let sys = "s".repeat(10_000);
|
|
let prompt = "p".repeat(4_000);
|
|
let with_sys = assert_context_budget(
|
|
"qwen3:latest",
|
|
&prompt,
|
|
BudgetOpts {
|
|
system: Some(&sys),
|
|
max_tokens: Some(500),
|
|
..Default::default()
|
|
},
|
|
).unwrap();
|
|
let without_sys = assert_context_budget(
|
|
"qwen3:latest",
|
|
&prompt,
|
|
BudgetOpts { max_tokens: Some(500), ..Default::default() },
|
|
).unwrap();
|
|
assert!(with_sys.estimated > without_sys.estimated,
|
|
"system prompt should raise estimate");
|
|
assert_eq!(with_sys.estimated - without_sys.estimated, estimate_tokens(&sys));
|
|
}
|
|
|
|
#[test]
|
|
fn overflow_message_includes_numbers() {
|
|
let check = BudgetCheck { estimated: 42_000, window: 40_960, remaining: -1_040 };
|
|
let msg = overflow_message("qwen3:latest", &check, 3_040, 2_000);
|
|
assert!(msg.contains("qwen3:latest"));
|
|
assert!(msg.contains("42000t"));
|
|
assert!(msg.contains("40960t"));
|
|
assert!(msg.contains("3040t"));
|
|
}
|
|
}
|