Hot path (T1/T2) stays mistral + qwen2.5. The new T3 tier runs a
thinking model SPARINGLY — after every misplacement, every N-th event
(default N=3), and once post-scenario for the cross-day lesson.
- agent.ts: generateCloud() for Ollama Cloud (gpt-oss:120b etc). Uses
the same /api/generate shape; thinking field is discarded.
- scenario.ts: runOverviewCheckpoint + runCrossDayLesson. Outputs land
in checkpoints.jsonl and lesson.md. Lesson also seeds playbook_memory
under operation "cross-day-lesson-{date}" — future runs pick it up
through the existing similarity boost.
- Env knobs: LH_OVERVIEW_CLOUD=1 routes T3 to cloud, LH_OVERVIEW_MODEL
overrides (default gpt-oss:20b local, gpt-oss:120b cloud),
LH_T3_CHECKPOINT_EVERY controls cadence, LH_T3_DISABLE=1 turns it off.
Why this shape: prior feedback_phase19_seed_text.md warned that verbose
seeds dilute the embedding and silently kill the boost. T3's rich prose
goes to lesson.md; the embedded "approach" + "context" stay terse.
Verified end-to-end: local 20b checkpoint 10.9s, lesson 4.0s; cloud
120b lesson 3.7s. Cloud output is both faster AND more specific than
local (sequenced, tactical, logging advice included).
parseAction now strips stray `)` before `}` and trailing commas —
qwen2.5 emits those regularly on tool_call outputs; soft-fix beats
retry-loops. hybrid_search no longer hard-requires `question`; defaults
to "qualified available workers" when the model drops it (mistral's
most common failure mode on complex events).
Kept original TOOL_CATALOG shape (args examples only, not full
action envelopes). The verbose few-shot version from the prior
iteration confused mistral into wrapping propose_done as tool_call.
Scenario V7 result: expansion (5 Forklift Ops) and emergency
(4 Loaders) — previously-failing complex events — now seal reliably.
Pool sizes: 687 and 380 from 500K corpus. Patterns endpoint produces
real operator-actionable signals:
expansion: "recurring certifications: Forklift (40%), OSHA-10 (40%)
· recurring skills: mill (40%) · archetype mostly: leader
· reliability median 0.83"
Baseline + recurring are now flaky (inverted trade-off, pure
model-reliability variance).
Backend:
- crates/vectord/src/playbook_memory.rs (new): Phase 19 in-memory boost
store with seed/rebuild/snapshot, plus temporal decay (e^-age/30 per
playbook), persist_to_sql endpoint backing successful_playbooks_live,
and discover_patterns endpoint for meta-index pattern aggregation
(recurring certs/skills/archetype/reliability across similar past fills).
- DEFAULT_TOP_K_PLAYBOOKS bumped 5 → 25; old default silently missed
most boosts when memory had > 25 entries.
- service.rs: new routes /vectors/playbook_memory/{seed,rebuild,stats,
persist_sql,patterns}.
Bun staffing co-pilot (mcp-server/):
- /search, /match, /verify, /proof, /simulation/run, MCP tools all
forward use_playbook_memory:true and playbook_memory_k:25 to the
hybrid endpoint. Boost was previously dark across the entire app.
- /log no longer POSTs to /ingest/file — that endpoint REPLACES the
dataset's object list, so single-row CSV writes were wiping all prior
rows in successful_playbooks (sp_rows went 33→1 in one /log call).
/log now seeds playbook_memory with canonical short text and calls
/persist_sql to keep successful_playbooks_live in sync.
- /simulation/run cumulative end-of-week CSV write removed for the same
reason. Per-day per-contract /seed (added in this session) is the
accumulating feedback path now.
- search.html addWorkerInsight renders a green "Endorsed · N playbooks"
chip with playbook citations when boost > 0.
Internal Dioxus UI (crates/ui/):
- Dashboard phase list rewritten through Phase 19 (was stuck at "Phase
16: File Watcher" / "Phase 17: DB Connector" — both wrong).
- Removed fabricated "27ms" stat label.
- Ask tab examples + SQL default replaced with real staffing prompts
against candidates/clients/job_orders (was referencing nonexistent
employees/products/events).
- New Playbook tab exposes /vectors/playbook_memory/{stats,rebuild} and
side-by-side hybrid search (boost OFF vs ON) with citations.
Tests (tests/multi-agent/):
- run_e2e_rated.ts: parallel two-agent (mistral + qwen2.5) build phase
+ verifier rating (geo, auth, persist, boost, speed → /10).
- network_proving.ts: continuous build → verify → repeat with
staffing-recruiter profile hot-swap; geo-discrimination check.
- chain_of_custody.ts: single recruiter operation traced through every
layer (Bun /search, direct /vectors/hybrid parity, /log, SQL,
playbook_memory growth, profile activation, post-op boost lift).