profit a6f12e2609 Phase 21 Rust port + Phase 27 playbook versioning + doc-sync
Phase 21 — Rust port of scratchpad + tree-split primitives (companion to
the 2026-04-21 TS shipment). New crates/aibridge modules:

  context.rs       — estimate_tokens (chars/4 ceil), context_window_for,
                     assert_context_budget returning a BudgetCheck with
                     numeric diagnostics on both success and overflow.
                     Windows table mirrors config/models.json.
  continuation.rs  — generate_continuable<G: TextGenerator>. Handles the
                     two failure modes: empty-response from thinking
                     models (geometric 2x budget backoff up to budget_cap)
                     and truncated-non-empty (continuation with partial
                     as scratchpad). is_structurally_complete balances
                     braces then JSON.parse-checks. Guards the degen case
                     "all retries empty, don't loop on empty partial".
  tree_split.rs    — generate_tree_split map->reduce with running
                     scratchpad. Per-shard + reduce-prompt go through
                     assert_context_budget first; loud-fails rather than
                     silently truncating. Oldest-digest-first scratchpad
                     truncation at scratchpad_budget (default 6000 t).

TextGenerator trait (native async-fn-in-trait, edition 2024). AiClient
implements it; ScriptedGenerator test double lets tests inject canned
sequences without a live Ollama.

GenerateRequest gained think: Option<bool> — forwards to sidecar for
per-call hidden-reasoning opt-out on hot-path JSON emitters. Three
existing callsites updated (rag.rs x2, service.rs hybrid answer).

Phase 27 — Playbook versioning. PlaybookEntry gained four optional
fields (all #[serde(default)] so pre-Phase-27 state loads as roots):

  version           u32, default 1
  parent_id         Option<String>, previous version's playbook_id
  superseded_at     Option<String>, set when newer version replaces
  superseded_by     Option<String>, the playbook_id that replaced

New methods:

  revise_entry(parent_id, new_entry) — appends new version, stamps
    superseded_at+superseded_by on parent, inherits parent_id and sets
    version = parent + 1 on the new entry. Rejects revising a retired
    or already-superseded parent (tip-of-chain is the only valid
    revise target).
  history(playbook_id) — returns full chain root->tip from any node.
    Walks parent_id back to root, then superseded_by forward to tip.
    Cycle-safe.

Superseded entries excluded from boost (same rule as retired): filter
in compute_boost_for_filtered_with_role (both active-entries prefilter
and geo-filtered path), rebuild_geo_index, and upsert_entry's existing-
idx search. status_counts returns (total, retired, superseded, failures);
/status JSON reports active = total - retired - superseded.

Endpoints:
  POST /vectors/playbook_memory/revise
  GET  /vectors/playbook_memory/history/{id}

Doc-sync — PHASES.md + PRD.md drifted from git after Phases 24-26
shipped. Fixes applied:

  - Phase 24 marked shipped (commit b95dd86) with detail of observer
    HTTP ingest + scenario outcome streaming. PRD "NOT YET WIRED"
    rewritten to reflect shipped state.
  - Phase 25 (validity windows, commit e0a843d) added to PHASES +
    PRD.
  - Phase 26 (Mem0 upsert + Letta hot cache, commit 640db8c) added.
  - Phase 27 entry added to both docs.
  - Phase 19.6 time decay corrected: was documented as "deferred",
    actually wired via BOOST_HALF_LIFE_DAYS = 30.0 in playbook_memory.rs.
  - Phase E/Phase 8 tombstone-at-compaction limit note updated —
    Phase E.2 closed it.

Tests: 8 new version_tests in vectord (chain-metadata stamping,
retired/superseded parent rejection, boost exclusion, history from
root/tip/middle, legacy default round-trip, status counts). 25 new
aibridge tests (context/continuation/tree_split). Workspace total
145 green (was 120).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:40:49 -05:00

195 lines
7.3 KiB
Rust

//! Phase 21 — context-budget accounting for model calls.
//!
//! Ports `assertContextBudget` + `estimateTokens` + `CONTEXT_WINDOWS`
//! from `tests/multi-agent/agent.ts` so Rust-side callers (gateway
//! tool surfaces, future Rust agents) get the same loud-fail behavior
//! on window overflow instead of silent truncation.
//!
//! The token estimator is deliberately the same chars/4 heuristic as
//! the TS side. It's biased ~15% safe — pessimistic on English, correct
//! within a factor of 2 on code. Swap to a provider tokenizer only when
//! the estimator drives a decision (we're nowhere near that yet).
use std::collections::HashMap;
use std::sync::OnceLock;
/// Rough token count. `chars / 4` ceiling. See module docs for why
/// this heuristic is sufficient.
pub fn estimate_tokens(text: &str) -> usize {
(text.chars().count() + 3) / 4
}
/// Phase 21 — per-model context windows, mirroring the TS table in
/// `tests/multi-agent/agent.ts`. Anchored on each model's documented
/// max; unknown models fall back to `DEFAULT_CONTEXT_WINDOW`.
pub const DEFAULT_CONTEXT_WINDOW: usize = 32_768;
pub const DEFAULT_SAFETY_MARGIN: usize = 2_000;
pub const DEFAULT_MAX_TOKENS: usize = 800;
fn known_windows() -> &'static HashMap<&'static str, usize> {
static TABLE: OnceLock<HashMap<&'static str, usize>> = OnceLock::new();
TABLE.get_or_init(|| {
let mut m = HashMap::new();
m.insert("mistral:latest", 32_768);
m.insert("qwen2.5:latest", 32_768);
m.insert("qwen3:latest", 40_960);
m.insert("qwen3.5:latest", 262_144);
m.insert("qwen3-embedding", 32_768);
m.insert("nomic-embed-text-v2-moe", 2_048);
m.insert("gpt-oss:20b", 131_072);
m.insert("gpt-oss:120b", 131_072);
m.insert("qwen3.5:397b", 131_072);
m.insert("kimi-k2-thinking", 200_000);
m.insert("kimi-k2.6", 200_000);
m.insert("kimi-k2:1t", 1_048_576);
m.insert("deepseek-v3.1:671b", 131_072);
m.insert("glm-4.7", 131_072);
m
})
}
pub fn context_window_for(model: &str) -> usize {
known_windows().get(model).copied().unwrap_or(DEFAULT_CONTEXT_WINDOW)
}
/// Result of a budget check — exposes the numbers so callers can log
/// how much headroom remains without re-running the estimator.
#[derive(Debug, Clone, Copy)]
pub struct BudgetCheck {
pub estimated: usize,
pub window: usize,
pub remaining: i64,
}
/// Inputs to `assert_context_budget`. `bypass` exists for call sites
/// that handle their own overflow (continuation's second pass already
/// counted the partial; T5 gatekeeper prompts have a separate policy).
#[derive(Debug, Clone, Default)]
pub struct BudgetOpts<'a> {
pub system: Option<&'a str>,
pub max_tokens: Option<usize>,
pub safety_margin: Option<usize>,
pub bypass: bool,
}
/// Phase 21's loud-fail primitive. Returns a `BudgetCheck` on success
/// and the same struct plus over-by count on failure. The whole point
/// is to stop silent truncation — callers that expect overflow should
/// chunk BEFORE calling or set `bypass: true`.
pub fn assert_context_budget(
model: &str,
prompt: &str,
opts: BudgetOpts,
) -> Result<BudgetCheck, (BudgetCheck, usize)> {
let window = context_window_for(model);
let safety = opts.safety_margin.unwrap_or(DEFAULT_SAFETY_MARGIN);
let max_tokens = opts.max_tokens.unwrap_or(DEFAULT_MAX_TOKENS);
let sys_tokens = opts.system.map(estimate_tokens).unwrap_or(0);
let estimated = estimate_tokens(prompt) + sys_tokens + max_tokens;
let remaining = window as i64 - estimated as i64 - safety as i64;
let check = BudgetCheck { estimated, window, remaining };
if remaining < 0 && !opts.bypass {
return Err((check, (-remaining) as usize));
}
Ok(check)
}
/// Convenience — format an overflow error the same way the TS side
/// does. Exposed so downstream crates render consistent messages.
pub fn overflow_message(model: &str, check: &BudgetCheck, over_by: usize, safety: usize) -> String {
format!(
"context overflow: model={} est={}t window={}t safety={}t over={}t. \
Chunk the prompt (see config/models.json overflow_policies) or set \
bypass:true if you know the risk.",
model, check.estimated, check.window, safety, over_by,
)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn estimate_tokens_ceiling_divides_by_four() {
assert_eq!(estimate_tokens(""), 0);
assert_eq!(estimate_tokens("abc"), 1); // 3 → ceil(3/4) = 1
assert_eq!(estimate_tokens("abcd"), 1); // 4 → ceil(4/4) = 1
assert_eq!(estimate_tokens("abcde"), 2); // 5 → ceil(5/4) = 2
assert_eq!(estimate_tokens(&"x".repeat(400)), 100);
}
#[test]
fn context_window_known_and_fallback() {
assert_eq!(context_window_for("qwen3.5:latest"), 262_144);
assert_eq!(context_window_for("kimi-k2:1t"), 1_048_576);
assert_eq!(context_window_for("some-unreleased-model"), DEFAULT_CONTEXT_WINDOW);
}
#[test]
fn budget_passes_well_under_window() {
let check = assert_context_budget(
"qwen3:latest",
&"x".repeat(4_000), // ~1000 tokens
BudgetOpts { max_tokens: Some(500), ..Default::default() },
).expect("well under 40K window");
assert!(check.remaining > 30_000);
}
#[test]
fn budget_fails_when_prompt_overflows_window() {
let huge = "x".repeat(200_000); // ~50K tokens, over qwen3's 40K
let err = assert_context_budget(
"qwen3:latest",
&huge,
BudgetOpts::default(),
).expect_err("should overflow qwen3's 40K window");
assert!(err.1 > 0, "over_by must be positive");
}
#[test]
fn budget_bypass_returns_ok_even_over() {
let huge = "x".repeat(200_000);
let check = assert_context_budget(
"qwen3:latest",
&huge,
BudgetOpts { bypass: true, ..Default::default() },
).expect("bypass must suppress the error");
assert!(check.remaining < 0, "check still reports negative remaining");
}
#[test]
fn budget_counts_system_prompt() {
// 10K-char system prompt → ~2500 tokens. With a big max_tokens
// this should push us closer to the window.
let sys = "s".repeat(10_000);
let prompt = "p".repeat(4_000);
let with_sys = assert_context_budget(
"qwen3:latest",
&prompt,
BudgetOpts {
system: Some(&sys),
max_tokens: Some(500),
..Default::default()
},
).unwrap();
let without_sys = assert_context_budget(
"qwen3:latest",
&prompt,
BudgetOpts { max_tokens: Some(500), ..Default::default() },
).unwrap();
assert!(with_sys.estimated > without_sys.estimated,
"system prompt should raise estimate");
assert_eq!(with_sys.estimated - without_sys.estimated, estimate_tokens(&sys));
}
#[test]
fn overflow_message_includes_numbers() {
let check = BudgetCheck { estimated: 42_000, window: 40_960, remaining: -1_040 };
let msg = overflow_message("qwen3:latest", &check, 3_040, 2_000);
assert!(msg.contains("qwen3:latest"));
assert!(msg.contains("42000t"));
assert!(msg.contains("40960t"));
assert!(msg.contains("3040t"));
}
}