Cloud kimi-k2.5 executor for weak tiers + multi-strategy playbook retrieval

Two coupled changes from the 2026 agent-memory research + tool
asymmetry findings.

SCENARIO (weak-tier cloud substitute):
qwen2.5 collapsed to 0/14 across the basic/minimal tool_levels.
Replace with cloud kimi-k2.5 on Ollama Cloud — same family as k2.6
(pro-tier locked today, on J's upgrade path). Plumb cloud flag
through ACTIVE_EXECUTOR_CLOUD / ACTIVE_REVIEWER_CLOUD into
generateContinuable so executor/reviewer can route to cloud when
tool_level requires. think:false supported by Kimi family.

Tool level mapping (revised):
  full     — qwen3.5 local + qwen3 local + cloud gpt-oss:120b T3 + rescue
  local    — qwen3.5 local + qwen3 local + local gpt-oss:20b T3 + rescue
  basic    — kimi-k2.5 cloud + qwen3 local + local T3, no rescue
  minimal  — kimi-k2.5 cloud + qwen3 local, no T3, no rescue.
             Playbook inheritance alone on the decision path.

This is the honest version of J's "minimal tools still works via
inheritance" hypothesis — with the executor no longer broken at the
tokenizer level, we can actually measure whether playbook retrieval
substitutes for missing overseers.

PLAYBOOK_MEMORY (multi-strategy retrieval):
Zep / Mem0 research shows multi-strategy rerank (semantic + keyword +
graph + temporal) outperforms single-strategy cosine. Lakehouse now
has a two-tier:

  1. Exact (role, city, state) match: skip cosine, assign similarity=1.0,
     take up to top_k/2+1 slots. These are identity-class neighbors —
     the strongest possible signal.
  2. Cosine fallback within the same (city, state) but different role:
     fills remaining slots.

Exposed as compute_boost_for_filtered_with_role(target_geo, target_role).
Backwards-compatible: compute_boost_for_filtered forwards with role=None
so existing callers keep their current behavior.

Service.rs wires both: extract_target_geo and extract_target_role pull
from the executor's SQL filter. grab_eq_value is factored out of
extract_target_geo so both lookups share one parser. Diagnostic log
now prints target_role alongside target_geo for every hybrid_search:

  playbook_boost: boosts=88 sources=39 parsed=39 matched=5
    target_geo=Some(("Nashville", "TN")) target_role=Some("Welder")

Verified: Nashville Welder query returns 5/10 boosted workers in
top_k with clean role+geo provenance.

Research sources: atlan.com Agent Memory Frameworks 2026, Mem0 paper
(arxiv 2504.19413), Zep/Graphiti LongMemEval comparison, ossinsight
Agent Memory Race 2026.

kimi-k2.6 on current key returns 403 — pro-tier upgrade required.
kimi-k2.5 is the substitute today; swap to k2.6 by renaming one line
in applyToolLevel once the subscription lands.
This commit is contained in:
root 2026-04-20 23:20:07 -05:00
parent 5e89407939
commit ad0edbe29c
3 changed files with 133 additions and 58 deletions

View File

@ -235,6 +235,26 @@ impl PlaybookMemory {
top_k_playbooks: usize, top_k_playbooks: usize,
base_weight: f32, base_weight: f32,
target_geo: Option<(&str, &str)>, target_geo: Option<(&str, &str)>,
) -> HashMap<(String, String, String), BoostEntry> {
self.compute_boost_for_filtered_with_role(query_embedding, top_k_playbooks, base_weight, target_geo, None).await
}
/// Variant that also accepts a target role for pre-filtering.
/// Multi-strategy retrieval: exact (role, city, state) matches skip
/// cosine entirely and earn the maximum boost, since identity on
/// those three fields is the strongest possible similarity signal.
/// Remaining entries (within the same city+state but different
/// role, or unknown role) go through the normal cosine path as a
/// fallback. This addresses the 2026 agent-memory finding that
/// multi-strategy parallel retrieval with rerank outperforms
/// single-strategy semantic search.
pub async fn compute_boost_for_filtered_with_role(
&self,
query_embedding: &[f32],
top_k_playbooks: usize,
base_weight: f32,
target_geo: Option<(&str, &str)>,
target_role: Option<&str>,
) -> HashMap<(String, String, String), BoostEntry> { ) -> HashMap<(String, String, String), BoostEntry> {
let state = self.state.read().await; let state = self.state.read().await;
let entries = state.entries.clone(); let entries = state.entries.clone();
@ -246,11 +266,11 @@ impl PlaybookMemory {
} }
drop(state); drop(state);
// Brute-force cosine. Empty / missing embeddings just skip. // Pre-filter by target_geo (city, state) before cosine. When
// When target_geo is set, pre-filter to matching playbooks BEFORE // target_geo is set, only playbooks from that city go into the
// cosine sort — that way top-k is within the city, not across // ranking pool — prevents globally-popular semantic neighbors
// all cities. // from drowning out the city's local successful playbooks.
let mut scored: Vec<(f32, &PlaybookEntry)> = entries let geo_filtered: Vec<&PlaybookEntry> = entries
.iter() .iter()
.filter(|e| match (target_geo, &e.city, &e.state) { .filter(|e| match (target_geo, &e.city, &e.state) {
(None, _, _) => true, (None, _, _) => true,
@ -259,10 +279,45 @@ impl PlaybookMemory {
} }
_ => false, _ => false,
}) })
.filter_map(|e| e.embedding.as_ref().map(|v| (cosine(query_embedding, v), e)))
.collect(); .collect();
scored.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal));
scored.truncate(top_k_playbooks.max(1)); // Multi-strategy: split the geo-filtered pool into (exact role
// match) vs (other). Exact matches skip cosine — they're already
// the strongest signal possible. Operations are shaped
// "fill: Welder x3 in Toledo, OH" so we match role by checking
// whether `fill: {role} ` appears in the operation string,
// case-insensitive.
let mut exact_matches: Vec<&PlaybookEntry> = Vec::new();
let mut cosine_pool: Vec<(f32, &PlaybookEntry)> = Vec::new();
let role_needle = target_role
.map(|r| format!("fill: {} ", r).to_ascii_lowercase());
for e in geo_filtered {
let is_exact = role_needle.as_ref()
.map(|needle| e.operation.to_ascii_lowercase().contains(needle))
.unwrap_or(false);
if is_exact {
exact_matches.push(e);
} else if let Some(v) = &e.embedding {
cosine_pool.push((cosine(query_embedding, v), e));
}
}
cosine_pool.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal));
// Allocate top_k across the two pools — exact matches get first
// priority (up to min(exact_count, top_k/2) slots), then cosine
// fills the rest. This is rerank with hard preference for
// identity matches.
let exact_take = exact_matches.len().min(top_k_playbooks.max(1) / 2 + 1);
let cosine_take = top_k_playbooks.saturating_sub(exact_take);
// Score exact matches with max similarity (1.0) so downstream
// weighting treats them as the strongest possible signal.
let mut scored: Vec<(f32, &PlaybookEntry)> = exact_matches
.into_iter()
.take(exact_take)
.map(|e| (1.0_f32, e))
.collect();
scored.extend(cosine_pool.into_iter().take(cosine_take));
let now = chrono::Utc::now(); let now = chrono::Utc::now();
let mut boosts: HashMap<(String, String, String), BoostEntry> = HashMap::new(); let mut boosts: HashMap<(String, String, String), BoostEntry> = HashMap::new();

View File

@ -803,20 +803,23 @@ async fn hybrid_search(
// set. Additive boost on the existing vector score, then re-sort. // set. Additive boost on the existing vector score, then re-sort.
if req.use_playbook_memory { if req.use_playbook_memory {
let boost_k = req.playbook_memory_k.unwrap_or(playbook_memory::DEFAULT_TOP_K_PLAYBOOKS); let boost_k = req.playbook_memory_k.unwrap_or(playbook_memory::DEFAULT_TOP_K_PLAYBOOKS);
// Extract target (city, state) from the SQL filter so // Extract target (city, state, role) from the SQL filter so
// compute_boost_for can skip playbooks from other cities that // compute_boost_for can skip playbooks from other cities AND
// would never intersect the candidate pool. The executor's // prioritize exact role matches via the multi-strategy path.
// filter shape is stable: `... city = 'Toledo' AND state = 'OH' ...`. // The executor's filter shape is stable:
// Case-insensitive match, tolerant of single quotes and spaces. // `... role = 'Welder' AND city = 'Toledo' AND state = 'OH' ...`.
// Case-insensitive match, tolerant of single quotes.
let target_geo = req.sql_filter.as_deref().and_then(extract_target_geo); let target_geo = req.sql_filter.as_deref().and_then(extract_target_geo);
let target_role = req.sql_filter.as_deref().and_then(extract_target_role);
// We embedded the question as `qv` above — reuse it for the // We embedded the question as `qv` above — reuse it for the
// playbook similarity lookup so we don't double-pay Ollama. // playbook similarity lookup so we don't double-pay Ollama.
let boosts = state.playbook_memory let boosts = state.playbook_memory
.compute_boost_for_filtered( .compute_boost_for_filtered_with_role(
&qv, &qv,
boost_k, boost_k,
0.5, 0.5,
target_geo.as_ref().map(|(c, s)| (c.as_str(), s.as_str())), target_geo.as_ref().map(|(c, s)| (c.as_str(), s.as_str())),
target_role.as_deref(),
) )
.await; .await;
@ -850,12 +853,13 @@ async fn hybrid_search(
} }
} }
tracing::info!( tracing::info!(
"playbook_boost: boosts={} sources={} parsed={} matched={} target_geo={:?} (query='{}')", "playbook_boost: boosts={} sources={} parsed={} matched={} target_geo={:?} target_role={:?} (query='{}')",
boosts.len(), boosts.len(),
sources.len(), sources.len(),
parsed_count, parsed_count,
matched_count, matched_count,
target_geo, target_geo,
target_role,
req.question.chars().take(60).collect::<String>(), req.question.chars().take(60).collect::<String>(),
); );
// Re-rank: boosted scores can flip ordering. // Re-rank: boosted scores can flip ordering.
@ -2098,51 +2102,53 @@ struct LanceRecallQuery {
/// "{Name} — {Role} in {City}, {State}. Skills: …". /// "{Name} — {Role} in {City}, {State}. Skills: …".
/// Returns None if the chunk doesn't match the shape; callers simply /// Returns None if the chunk doesn't match the shape; callers simply
/// skip the boost for that hit. /// skip the boost for that hit.
/// Extract role from an SQL filter matching `role = 'Welder'` style.
/// Case-insensitive on the column name. Quoted value; quotes not
/// included in returned string.
fn extract_target_role(sql_filter: &str) -> Option<String> {
grab_eq_value(sql_filter, "role")
}
/// Shared equality-value extractor for (city, state, role) lookups.
fn grab_eq_value(src: &str, col: &str) -> Option<String> {
let lower = src.to_ascii_lowercase();
let col_lower = col.to_ascii_lowercase();
let mut search_from = 0usize;
while let Some(off) = lower[search_from..].find(&col_lower) {
let pos = search_from + off;
let prior_ok = pos == 0
|| !lower.as_bytes()[pos - 1].is_ascii_alphanumeric()
&& lower.as_bytes()[pos - 1] != b'_';
let after = pos + col_lower.len();
if !prior_ok || after >= src.len() {
search_from = pos + col_lower.len();
continue;
}
let mut i = after;
while i < src.len() && src.as_bytes()[i] == b' ' { i += 1; }
if i >= src.len() || src.as_bytes()[i] != b'=' { search_from = pos + col_lower.len(); continue; }
i += 1;
while i < src.len() && src.as_bytes()[i] == b' ' { i += 1; }
if i >= src.len() || src.as_bytes()[i] != b'\'' { search_from = pos + col_lower.len(); continue; }
i += 1;
let start = i;
while i < src.len() && src.as_bytes()[i] != b'\'' { i += 1; }
if i > start {
return Some(src[start..i].to_string());
}
search_from = pos + col_lower.len();
}
None
}
/// Pull (city, state) out of a SQL filter that uses /// Pull (city, state) out of a SQL filter that uses
/// `city = 'Toledo' AND state = 'OH'` style equality. Returns None if /// `city = 'Toledo' AND state = 'OH'` style equality. Returns None if
/// either is missing — the caller keeps the original global boost map /// either is missing — the caller keeps the original global boost map
/// behavior (no geo narrowing). Case-insensitive on the column name /// behavior (no geo narrowing). Case-insensitive on the column name
/// so `CITY=` or `City =` also work. /// so `CITY=` or `City =` also work.
fn extract_target_geo(sql_filter: &str) -> Option<(String, String)> { fn extract_target_geo(sql_filter: &str) -> Option<(String, String)> {
fn grab_eq(src: &str, col: &str) -> Option<String> { let city = grab_eq_value(sql_filter, "city")?;
// Very small parser, resilient enough for the executor's let state = grab_eq_value(sql_filter, "state")?;
// filter shapes. Matches `col = 'value'` or `col='value'` with
// case-insensitive column name.
let lower = src.to_ascii_lowercase();
let col_lower = col.to_ascii_lowercase();
let mut search_from = 0usize;
while let Some(off) = lower[search_from..].find(&col_lower) {
let pos = search_from + off;
// Require word boundary before the column name so "city"
// inside "civilian_rank" doesn't false-match.
let prior_ok = pos == 0
|| !lower.as_bytes()[pos - 1].is_ascii_alphanumeric()
&& lower.as_bytes()[pos - 1] != b'_';
let after = pos + col_lower.len();
if !prior_ok || after >= src.len() {
search_from = pos + col_lower.len();
continue;
}
// Walk past whitespace, require '='.
let mut i = after;
while i < src.len() && src.as_bytes()[i] == b' ' { i += 1; }
if i >= src.len() || src.as_bytes()[i] != b'=' { search_from = pos + col_lower.len(); continue; }
i += 1;
while i < src.len() && src.as_bytes()[i] == b' ' { i += 1; }
// Value is single-quoted literal; extract until the next '.
if i >= src.len() || src.as_bytes()[i] != b'\'' { search_from = pos + col_lower.len(); continue; }
i += 1;
let start = i;
while i < src.len() && src.as_bytes()[i] != b'\'' { i += 1; }
if i > start {
return Some(src[start..i].to_string());
}
search_from = pos + col_lower.len();
}
None
}
let city = grab_eq(sql_filter, "city")?;
let state = grab_eq(sql_filter, "state")?;
Some((city, state)) Some((city, state))
} }

View File

@ -92,6 +92,8 @@ const RETRY_ON_FAIL = process.env.LH_RETRY_ON_FAIL !== "0";
// based on staffer.tool_level before calling anything else. // based on staffer.tool_level before calling anything else.
let ACTIVE_EXECUTOR = EXECUTOR_MODEL; let ACTIVE_EXECUTOR = EXECUTOR_MODEL;
let ACTIVE_REVIEWER = REVIEWER_MODEL; let ACTIVE_REVIEWER = REVIEWER_MODEL;
let ACTIVE_EXECUTOR_CLOUD = false;
let ACTIVE_REVIEWER_CLOUD = false;
let ACTIVE_T3_DISABLED = T3_DISABLED; let ACTIVE_T3_DISABLED = T3_DISABLED;
let ACTIVE_OVERVIEW_CLOUD = OVERVIEW_CLOUD; let ACTIVE_OVERVIEW_CLOUD = OVERVIEW_CLOUD;
let ACTIVE_RETRY_ON_FAIL = RETRY_ON_FAIL; let ACTIVE_RETRY_ON_FAIL = RETRY_ON_FAIL;
@ -101,6 +103,8 @@ function applyToolLevel(level: Staffer["tool_level"] | undefined): void {
// don't leak. // don't leak.
ACTIVE_EXECUTOR = EXECUTOR_MODEL; ACTIVE_EXECUTOR = EXECUTOR_MODEL;
ACTIVE_REVIEWER = REVIEWER_MODEL; ACTIVE_REVIEWER = REVIEWER_MODEL;
ACTIVE_EXECUTOR_CLOUD = false;
ACTIVE_REVIEWER_CLOUD = false;
ACTIVE_T3_DISABLED = T3_DISABLED; ACTIVE_T3_DISABLED = T3_DISABLED;
ACTIVE_OVERVIEW_CLOUD = OVERVIEW_CLOUD; ACTIVE_OVERVIEW_CLOUD = OVERVIEW_CLOUD;
ACTIVE_RETRY_ON_FAIL = RETRY_ON_FAIL; ACTIVE_RETRY_ON_FAIL = RETRY_ON_FAIL;
@ -113,14 +117,22 @@ function applyToolLevel(level: Staffer["tool_level"] | undefined): void {
ACTIVE_OVERVIEW_CLOUD = false; ACTIVE_OVERVIEW_CLOUD = false;
break; break;
case "basic": case "basic":
ACTIVE_EXECUTOR = "qwen2.5:latest"; // qwen2.5 collapsed on this workload (0/14 fill). Replace with
ACTIVE_REVIEWER = "qwen2.5:latest"; // cloud kimi-k2.5 — same family as k2.6 (which requires a paid
// tier), strong at tool calling. kimi-k2.6 is targeted when the
// subscription upgrades.
ACTIVE_EXECUTOR = "kimi-k2.5";
ACTIVE_EXECUTOR_CLOUD = true;
ACTIVE_REVIEWER = "qwen3:latest"; // local reviewer stays cheap
ACTIVE_OVERVIEW_CLOUD = false; ACTIVE_OVERVIEW_CLOUD = false;
ACTIVE_RETRY_ON_FAIL = false; ACTIVE_RETRY_ON_FAIL = false;
break; break;
case "minimal": case "minimal":
ACTIVE_EXECUTOR = "qwen2.5:latest"; // Same executor as basic but strip the overseer + rescue.
ACTIVE_REVIEWER = "qwen2.5:latest"; // Proves whether playbook inheritance alone carries the load.
ACTIVE_EXECUTOR = "kimi-k2.5";
ACTIVE_EXECUTOR_CLOUD = true;
ACTIVE_REVIEWER = "qwen3:latest";
ACTIVE_T3_DISABLED = true; ACTIVE_T3_DISABLED = true;
ACTIVE_OVERVIEW_CLOUD = false; ACTIVE_OVERVIEW_CLOUD = false;
ACTIVE_RETRY_ON_FAIL = false; ACTIVE_RETRY_ON_FAIL = false;
@ -514,6 +526,7 @@ async function runAgentFill(
shape: "json", shape: "json",
max_continuations: 3, max_continuations: 3,
think: false, think: false,
cloud: ACTIVE_EXECUTOR_CLOUD,
on_continuation: (n, len) => on_continuation: (n, len) =>
append({ turn, role: "executor", model: ACTIVE_EXECUTOR, kind: "note", append({ turn, role: "executor", model: ACTIVE_EXECUTOR, kind: "note",
content: { continuation: n, combined_chars: len } }), content: { continuation: n, combined_chars: len } }),
@ -577,6 +590,7 @@ async function runAgentFill(
shape: "json", shape: "json",
max_continuations: 3, max_continuations: 3,
think: false, think: false,
cloud: ACTIVE_REVIEWER_CLOUD,
on_continuation: (n, len) => on_continuation: (n, len) =>
append({ turn, role: "reviewer", model: ACTIVE_REVIEWER, kind: "note", append({ turn, role: "reviewer", model: ACTIVE_REVIEWER, kind: "note",
content: { continuation: n, combined_chars: len } }), content: { continuation: n, combined_chars: len } }),