7 Commits

Author SHA1 Message Date
profit
a6f12e2609 Phase 21 Rust port + Phase 27 playbook versioning + doc-sync
Phase 21 — Rust port of scratchpad + tree-split primitives (companion to
the 2026-04-21 TS shipment). New crates/aibridge modules:

  context.rs       — estimate_tokens (chars/4 ceil), context_window_for,
                     assert_context_budget returning a BudgetCheck with
                     numeric diagnostics on both success and overflow.
                     Windows table mirrors config/models.json.
  continuation.rs  — generate_continuable<G: TextGenerator>. Handles the
                     two failure modes: empty-response from thinking
                     models (geometric 2x budget backoff up to budget_cap)
                     and truncated-non-empty (continuation with partial
                     as scratchpad). is_structurally_complete balances
                     braces then JSON.parse-checks. Guards the degen case
                     "all retries empty, don't loop on empty partial".
  tree_split.rs    — generate_tree_split map->reduce with running
                     scratchpad. Per-shard + reduce-prompt go through
                     assert_context_budget first; loud-fails rather than
                     silently truncating. Oldest-digest-first scratchpad
                     truncation at scratchpad_budget (default 6000 t).

TextGenerator trait (native async-fn-in-trait, edition 2024). AiClient
implements it; ScriptedGenerator test double lets tests inject canned
sequences without a live Ollama.

GenerateRequest gained think: Option<bool> — forwards to sidecar for
per-call hidden-reasoning opt-out on hot-path JSON emitters. Three
existing callsites updated (rag.rs x2, service.rs hybrid answer).

Phase 27 — Playbook versioning. PlaybookEntry gained four optional
fields (all #[serde(default)] so pre-Phase-27 state loads as roots):

  version           u32, default 1
  parent_id         Option<String>, previous version's playbook_id
  superseded_at     Option<String>, set when newer version replaces
  superseded_by     Option<String>, the playbook_id that replaced

New methods:

  revise_entry(parent_id, new_entry) — appends new version, stamps
    superseded_at+superseded_by on parent, inherits parent_id and sets
    version = parent + 1 on the new entry. Rejects revising a retired
    or already-superseded parent (tip-of-chain is the only valid
    revise target).
  history(playbook_id) — returns full chain root->tip from any node.
    Walks parent_id back to root, then superseded_by forward to tip.
    Cycle-safe.

Superseded entries excluded from boost (same rule as retired): filter
in compute_boost_for_filtered_with_role (both active-entries prefilter
and geo-filtered path), rebuild_geo_index, and upsert_entry's existing-
idx search. status_counts returns (total, retired, superseded, failures);
/status JSON reports active = total - retired - superseded.

Endpoints:
  POST /vectors/playbook_memory/revise
  GET  /vectors/playbook_memory/history/{id}

Doc-sync — PHASES.md + PRD.md drifted from git after Phases 24-26
shipped. Fixes applied:

  - Phase 24 marked shipped (commit b95dd86) with detail of observer
    HTTP ingest + scenario outcome streaming. PRD "NOT YET WIRED"
    rewritten to reflect shipped state.
  - Phase 25 (validity windows, commit e0a843d) added to PHASES +
    PRD.
  - Phase 26 (Mem0 upsert + Letta hot cache, commit 640db8c) added.
  - Phase 27 entry added to both docs.
  - Phase 19.6 time decay corrected: was documented as "deferred",
    actually wired via BOOST_HALF_LIFE_DAYS = 30.0 in playbook_memory.rs.
  - Phase E/Phase 8 tombstone-at-compaction limit note updated —
    Phase E.2 closed it.

Tests: 8 new version_tests in vectord (chain-metadata stamping,
retired/superseded parent rejection, boost exclusion, history from
root/tip/middle, legacy default round-trip, status counts). 25 new
aibridge tests (context/continuation/tree_split). Workspace total
145 green (was 120).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:40:49 -05:00
root
640db8c63c Phase 26 — Mem0 upsert + Letta geo hot cache
Closes the two remaining 2026-era memory findings. Both are
optimizations per J's framing — not load-bearing, but good data
hygiene + future-proofing at scale.

MEM0 UPSERT (data hygiene):
Before: /seed always appended. A scenario re-running the same
operation on the same day wrote duplicate entries, inflating the
playbook corpus with near-identical rows.

Now: upsert_entry(new) inspects existing non-retired entries and
decides ADD / UPDATE / NOOP:
  ADD     → no matching (operation, day, city, state) tuple, append
  UPDATE  → match exists with different names → merge (union, stable
            order), refresh timestamp, keep original playbook_id so
            citations stay valid
  NOOP    → match exists with identical names → skip, return id

Day-granularity keying on timestamp YYYY-MM-DD means intraday
re-seeds dedup but tomorrow's same-operation is a fresh ADD. Retired
entries don't block new seeds — they're out of scope anyway.

Seed endpoint returns {outcome: {mode, playbook_id, merged_names?},
entries_after}. Append=false retains old replace-all semantics.

5 unit tests pass: first_seed_is_add, identical_reseed_is_noop,
same_day_different_names_updates_and_merges, different_day_same_op_is_add,
retired_entry_doesnt_block_new_seed.

Live verified: three successive seeds with (Alice), (Alice),
(Alice, Bob) left entry count unchanged at 1936 with merged names
{Alejandro, Lauren, Alice, Bob}. Previously would have been 3
appends.

LETTA GEO HOT CACHE (scale primitive):
Added geo_index: HashMap<(city_lower, state_upper), Vec<usize>>
alongside PlaybookMemoryState. Rebuilt on every mutation: set_entries,
retire_one, retire_on_schema_drift, upsert_entry, load_from_storage.

compute_boost_for_filtered_with_role now uses the index for O(1) geo
lookup instead of scanning all entries. At current scale (1.9K) the
scan was sub-ms; at 100K+ the scan becomes the dominant cost. The
hot cache future-proofs without adding an LRU abstraction.

Retired entries excluded from index; valid_until still checked on the
hot path since it can elapse between rebuilds.

Owns cloned PlaybookEntries in the geo_filtered vector so the state
read-lock is released before cosine scoring — avoids lock contention
on the scoring path.

Memory-findings progress: 5 of 5 shipped.
  ✓ Multi-strategy parallel retrieval (Phase 19 refinement)
  ✓ Input normalization + unified /memory/query (Phase 24 TS)
  ✓ Zep validity windows (Phase 25)
  ✓ Mem0 UPSERT (Phase 26 today)
  ✓ Letta geo hot cache (Phase 26 today)

All 18 playbook_memory tests pass.
2026-04-21 00:24:05 -05:00
root
e0a843d1a5 Phase 25 — validity windows + playbook retirement
Addresses the load-bearing memory gap J flagged: playbook entries
had timestamps but no retirement semantic. When a schema migration
changed a column or a seasonal contract ended, stale playbooks kept
boosting candidates silently. Zep 2026-era finding — temporal
validity is the single highest-value memory-hygiene primitive.

SCHEMA (PlaybookEntry gains four optional fields, serde default):
  schema_fingerprint  — SHA-256 over dataset (column, type) tuples at
                        seed time. Missing = legacy entry, never
                        auto-retired on drift.
  valid_until         — RFC3339 hard expiry. compute_boost skips
                        entries past this moment.
  retired_at          — Set by retire_one or retire_on_schema_drift.
                        Retired entries excluded from all boost
                        calculations but kept in journal.
  retirement_reason   — Human-readable: "schema_drift: ...",
                        "expired: ...", "manual: ..."

RETRIEVAL PATH (compute_boost_for_filtered_with_role):
  Before geo+cosine, active_entries filter removes anything retired
  OR past valid_until. Uses chrono::Utc::now() once per call, no per-
  entry clock queries.

NEW METHODS on PlaybookMemory:
  retire_one(playbook_id, reason)
  retire_on_schema_drift(city, state, current_fp, reason) — idempotent,
    scopes by (city, state) so a Nashville migration doesn't touch
    Chicago. Skips legacy entries with no fingerprint.
  status_counts() -> (total, retired, failures)

HTTP ENDPOINTS:
  POST /vectors/playbook_memory/retire
    {playbook_id, reason}                        → retire by id
    {city, state, current_schema_fingerprint, reason} → schema drift
  GET  /vectors/playbook_memory/status
    {total, active, retired, failures}

SEED REQUEST extended with optional schema_fingerprint + valid_until
so the orchestrator (scenario.ts) can pass the current schema hash
when seeding, without a round trip through catalogd.

UNIT TESTS (5/5 pass): retire_one_marks_entry_and_persists,
retired_entries_do_not_boost, expired_valid_until_is_skipped,
schema_drift_retires_mismatched_fingerprints_only,
schema_drift_skips_other_cities.

LIVE VERIFIED: /status on current state = 1936 entries, 43 failures.
POST /retire with a sample playbook_id → "retired":1, /status now
reports active=1935, retired=1.

Memory-findings progress: 3 of 5 shipped.
  ✓ Multi-strategy parallel retrieval (Phase 19 refinement)
  ✓ Input normalization + unified /memory/query (Phase 24 TS)
  ✓ Zep-style validity windows (Phase 25, tonight)
   Mem0 UPDATE / DELETE / NOOP ops (dedup same-(op,date) seeds)
   Letta working-memory hot cache (not biting at 1.5K entries)
2026-04-21 00:11:02 -05:00
root
ad0edbe29c Cloud kimi-k2.5 executor for weak tiers + multi-strategy playbook retrieval
Two coupled changes from the 2026 agent-memory research + tool
asymmetry findings.

SCENARIO (weak-tier cloud substitute):
qwen2.5 collapsed to 0/14 across the basic/minimal tool_levels.
Replace with cloud kimi-k2.5 on Ollama Cloud — same family as k2.6
(pro-tier locked today, on J's upgrade path). Plumb cloud flag
through ACTIVE_EXECUTOR_CLOUD / ACTIVE_REVIEWER_CLOUD into
generateContinuable so executor/reviewer can route to cloud when
tool_level requires. think:false supported by Kimi family.

Tool level mapping (revised):
  full     — qwen3.5 local + qwen3 local + cloud gpt-oss:120b T3 + rescue
  local    — qwen3.5 local + qwen3 local + local gpt-oss:20b T3 + rescue
  basic    — kimi-k2.5 cloud + qwen3 local + local T3, no rescue
  minimal  — kimi-k2.5 cloud + qwen3 local, no T3, no rescue.
             Playbook inheritance alone on the decision path.

This is the honest version of J's "minimal tools still works via
inheritance" hypothesis — with the executor no longer broken at the
tokenizer level, we can actually measure whether playbook retrieval
substitutes for missing overseers.

PLAYBOOK_MEMORY (multi-strategy retrieval):
Zep / Mem0 research shows multi-strategy rerank (semantic + keyword +
graph + temporal) outperforms single-strategy cosine. Lakehouse now
has a two-tier:

  1. Exact (role, city, state) match: skip cosine, assign similarity=1.0,
     take up to top_k/2+1 slots. These are identity-class neighbors —
     the strongest possible signal.
  2. Cosine fallback within the same (city, state) but different role:
     fills remaining slots.

Exposed as compute_boost_for_filtered_with_role(target_geo, target_role).
Backwards-compatible: compute_boost_for_filtered forwards with role=None
so existing callers keep their current behavior.

Service.rs wires both: extract_target_geo and extract_target_role pull
from the executor's SQL filter. grab_eq_value is factored out of
extract_target_geo so both lookups share one parser. Diagnostic log
now prints target_role alongside target_geo for every hybrid_search:

  playbook_boost: boosts=88 sources=39 parsed=39 matched=5
    target_geo=Some(("Nashville", "TN")) target_role=Some("Welder")

Verified: Nashville Welder query returns 5/10 boosted workers in
top_k with clean role+geo provenance.

Research sources: atlan.com Agent Memory Frameworks 2026, Mem0 paper
(arxiv 2504.19413), Zep/Graphiti LongMemEval comparison, ossinsight
Agent Memory Race 2026.

kimi-k2.6 on current key returns 403 — pro-tier upgrade required.
kimi-k2.5 is the substitute today; swap to k2.6 by renaming one line
in applyToolLevel once the subscription lands.
2026-04-20 23:20:07 -05:00
root
a663698571 Item 3 — geo-filtered playbook boost; diagnostic logging
ROOT CAUSE (found via instrumentation, not hunch):
After a 20-scenario corpus batch, only 6/40 successful (role, city)
combos ever triggered playbook_memory citations on subsequent runs.
Added `playbook_boost:` tracing::info! line in vectord::service to log
boost map size vs candidate pool vs match count. One query revealed:

  boosts=170 sources=50 parsed=50 matched=0

170 endorsed workers came back from compute_boost_for — but zero were
in the 50-candidate Toledo pool. The boost map was pulling globally-
ranked semantic neighbors (top-100 playbooks across ALL cities),
dominated by Kansas City / Chicago / Detroit forklift playbooks the
Toledo SQL filter would never admit. The mechanism was correct at the
per-playbook level; the problem was pool intersection.

FIX (surgical, not cap-tuning):
- playbook_memory::compute_boost_for_filtered(): accepts optional
  (city, state) filter. When set, skips playbooks from other geos
  BEFORE cosine-ranking, so top-k is within the target city.
- Backwards-compatible: compute_boost_for() calls the filtered variant
  with None — existing callers unchanged.
- service::hybrid_search(): extracts target (city, state) from the
  executor's SQL filter via a small parser (extract_target_geo),
  passes to compute_boost_for_filtered.

VERIFIED:
  Before fix: boosts=170 sources=50 parsed=50 matched=0   (0% hit)
  After fix:  boosts=36  sources=50 parsed=50 matched=11  (22% hit)
Top-k=10 now has 7/10 boosted workers with 2-3 citations each.
Boost values 0.075-0.113 on cosine scores 0.67-0.74 — meaningful
reorder without saturation.

scripts/kb_measure.py:
Aggregator that reads data/_kb/*.jsonl and playbooks/*/results.json,
reports fill rate, citation density, recommender confidence trend,
and zero-citation-ok combos (item 3 target signal). Used to measure
before/after on bigger batches.

Diagnostic logging stays — the class of "boosts computed but not
matched" bug can recur if the SQL filter format ever drifts, and
without the counter it's invisible. Every hybrid_search with
use_playbook_memory=true now logs its boost stats.
2026-04-20 21:35:04 -05:00
root
95c26f04f8 Path 1 negative signal + Path 2 pattern discovery + name validation
New:
- /vectors/playbook_memory/patterns: meta-index pattern discovery.
  Given a query, finds top-K similar playbooks, pulls each endorsed
  worker's full workers_500k profile, aggregates shared traits (cert
  frequencies, skill frequencies, modal archetype, reliability
  distribution), returns a human-readable discovered_pattern. Surfaces
  signals operators didn't explicitly query — the original PRD's
  "identify things we didn't know" dimension.
- /vectors/playbook_memory/mark_failed: records worker failures per
  (city, state, name). compute_boost_for applies 0.5^n penalty per
  recorded failure, so 3 failures quarter a worker's positive boost and
  5 effectively zero it. Path 1 negative signal — recruiter trust
  depends on the system NOT recommending people who no-showed.
- Bun /log_failure: validates failed_names against workers_500k
  (same ghost-guard as /log), forwards to /mark_failed.

Improved:
- /log now validates endorsed_names against workers_500k for the
  contract's city+state before seeding. Ghost names (names that don't
  correspond to real workers) are rejected in the response and excluded
  from the seed, preventing silent boost failures.
- Bun /search auto-appends `CAST(availability AS DOUBLE) > 0.5` to
  sql_filter when the caller didn't constrain availability. Opt out
  with `include_unavailable: true`. Recruiter trust bug: surfacing
  already-placed workers breaks the first call.
- DEFAULT_TOP_K_PLAYBOOKS 25 → 100. Direct cosine measurement showed
  similarities cluster 0.55-0.67 across all playbooks regardless of
  geo, so k=25 missed relevant geo-matched playbooks. Brute-force is
  still sub-ms at this size.

Verified end-to-end on live data:
- Ghost names rejected on /log + /log_failure
- Availability filter drops unavailable workers from candidate pool
- Pattern discovery on unseen Cleveland OH Welder query returned
  recurring skills (first aid 43%, grinder 43%, blueprint 43%) and
  modal archetype (specialist) across 20 semantically similar past
  playbooks in 0.24s
- Negative signal: Helen Sanchez boost dropped +0.250 → +0.163 after
  3 failures recorded via /log_failure (34% reduction)
2026-04-20 14:55:46 -05:00
root
25b7e6c3a7 Phase 19 wiring + Path 1/2 work + chain integrity fixes
Backend:
- crates/vectord/src/playbook_memory.rs (new): Phase 19 in-memory boost
  store with seed/rebuild/snapshot, plus temporal decay (e^-age/30 per
  playbook), persist_to_sql endpoint backing successful_playbooks_live,
  and discover_patterns endpoint for meta-index pattern aggregation
  (recurring certs/skills/archetype/reliability across similar past fills).
- DEFAULT_TOP_K_PLAYBOOKS bumped 5 → 25; old default silently missed
  most boosts when memory had > 25 entries.
- service.rs: new routes /vectors/playbook_memory/{seed,rebuild,stats,
  persist_sql,patterns}.

Bun staffing co-pilot (mcp-server/):
- /search, /match, /verify, /proof, /simulation/run, MCP tools all
  forward use_playbook_memory:true and playbook_memory_k:25 to the
  hybrid endpoint. Boost was previously dark across the entire app.
- /log no longer POSTs to /ingest/file — that endpoint REPLACES the
  dataset's object list, so single-row CSV writes were wiping all prior
  rows in successful_playbooks (sp_rows went 33→1 in one /log call).
  /log now seeds playbook_memory with canonical short text and calls
  /persist_sql to keep successful_playbooks_live in sync.
- /simulation/run cumulative end-of-week CSV write removed for the same
  reason. Per-day per-contract /seed (added in this session) is the
  accumulating feedback path now.
- search.html addWorkerInsight renders a green "Endorsed · N playbooks"
  chip with playbook citations when boost > 0.

Internal Dioxus UI (crates/ui/):
- Dashboard phase list rewritten through Phase 19 (was stuck at "Phase
  16: File Watcher" / "Phase 17: DB Connector" — both wrong).
- Removed fabricated "27ms" stat label.
- Ask tab examples + SQL default replaced with real staffing prompts
  against candidates/clients/job_orders (was referencing nonexistent
  employees/products/events).
- New Playbook tab exposes /vectors/playbook_memory/{stats,rebuild} and
  side-by-side hybrid search (boost OFF vs ON) with citations.

Tests (tests/multi-agent/):
- run_e2e_rated.ts: parallel two-agent (mistral + qwen2.5) build phase
  + verifier rating (geo, auth, persist, boost, speed → /10).
- network_proving.ts: continuous build → verify → repeat with
  staffing-recruiter profile hot-swap; geo-discrimination check.
- chain_of_custody.ts: single recruiter operation traced through every
  layer (Bun /search, direct /vectors/hybrid parity, /log, SQL,
  playbook_memory growth, profile activation, post-op boost lift).
2026-04-20 06:21:13 -05:00