diff --git a/mcp-server/spec.html b/mcp-server/spec.html index 1b38c86..9b81834 100644 --- a/mcp-server/spec.html +++ b/mcp-server/spec.html @@ -251,19 +251,59 @@ table.plain tr:hover td{background:#0d1117}
Every sealed fill is seeded to playbook_memory via /vectors/playbook_memory/seed. The next hybrid query for a semantically similar role+geo surfaces the past endorsed workers with a boost. Math:
Every sealed fill is seeded to playbook_memory. The boost fires inside /vectors/hybrid when use_playbook_memory: true. Math, tightened 2026-04-21 after a diagnostic pass found globally-ranked playbooks were missing the SQL-filtered candidate pool entirely:
per_worker = cosine(query_emb, playbook_emb) × 0.5 × e^(-age/30) × 0.5^failures / n_workers boost[(city, state, name)] = min(Σ per_worker, 0.25)-
Caps, decay, and negative signal mean one popular worker can't dominate, old playbooks fade, and no-shows stop boosting. Verified live: 3 identical seeds → +0.250 boost capped, 3 citations.
+Multi-strategy retrieval (new): before cosine, compute_boost_for_filtered_with_role(target_geo, target_role) prefilters to same-city playbooks, then gives exact (role, city, state) matches similarity=1.0 and fills up to half the top-k. Cosine fills the rest. Mirrors 2026 Mem0/Zep guidance on parallel-strategy rerank.
Measured lift: before geo-filter, Nashville Welder query returned boosts=170 matched=0 (zero intersection with candidate pool). After: boosts=36 matched=11. On the Riverfront Steel scenario, total playbook citations went from 2 → 28 per run — a 14× delta on identical inputs. The diagnostic log playbook_boost: boosts=N sources=N parsed=N matched=N target_geo=? target_role=? runs on every hybrid call so the class of silent-miss bug stays visible.
/vectors/playbook_memory/patterns goes beyond "who was endorsed" to answer "what did past similar fills have in common?" Aggregates recurring certifications, skills, archetype, reliability distribution across the top-K semantically similar playbooks. Surfaces signal the operator didn't explicitly query for.
The vectord::agent background task runs continuously. Watches the HNSW trial journal, proposes configs, executes trials, promotes Pareto winners — without human intervention. Operator sees "the index got faster overnight" and doesn't know why. The journal knows why.
Meta-layer over playbook_memory. Files under data/_kb/:
signatures.jsonl — sig_hash + embedding of every run's event shapeoutcomes.jsonl — per-run summary (models, fill rate, turns, citations, rescue stats, staffer, elapsed)pathway_recommendations.jsonl — AI-synthesized advice for the next run of a similar sigerror_corrections.jsonl — fail→succeed deltas on the same sigconfig_snapshots.jsonl — hash of active models + tool_level at each runCycle: scenario ends → kb.indexRun() appends outcome → kb.recommendFor(nextSpec) finds k-NN signatures, feeds outcome history to an overview model, writes structured JSON advice → next scenario reads it via kb.loadRecommendation(spec) and injects pathway_notes into the executor's context alongside prior T3 lessons.
When an event fails (drift abort, JSON parse, pool exhaustion) and cloud T3 is enabled, requestCloudRemediation() feeds the full failure trace — SQL filters attempted, row counts, reviewer drift notes, gap signals, contract terms — to gpt-oss:120b on Ollama Cloud. Cloud returns structured {retry, new_city, new_state, new_role, new_count, rationale}. Event retries once with the pivot. Verified on stress_01: Gary IN (zero workers indexed) misplacement → cloud proposed South Bend IN → retry filled 1/1.
Answers "who handled this" as a first-class matrix-index dimension. Each scenario carries staffer: {id, name, tenure_months, role, tool_level}. After every run, recomputeStafferStats(staffer_id) aggregates their fill_rate, turn efficiency, citation density, rescue rate into a single competence_score (0.45·fill + 0.20·turn_eff + 0.20·cites + 0.15·rescue).
findNeighbors returns weighted_score = cosine × max_staffer_competence — top-performer playbooks rank above juniors' on similar scenarios. Auto-discovery emerges: running 4 staffers × 3 contracts × 3 rounds surfaced Rachel D. Lewis (Welder Nashville) with 18 endorsements across all 4 staffers, Angela U. Ward (Machine Op Indianapolis) with 19 — reliable-performer labels the system built without human tagging.
Observer runs as lakehouse-observer.service, now with an HTTP listener on :3800. Scenarios POST per-event outcomes to /event with full provenance (staffer_id, sig_hash, event_kind, role, city, state, rescue flags). Observer's ERROR_ANALYZER and PLAYBOOK_BUILDER loops consume them alongside MCP-wrapped ops. Persistence switched from the old /ingest/file REPLACE path to an append-only data/_observer/ops.jsonl journal so the trace survives across restarts.
Two surfaces added 2026-04-21 to make the memory stack respond coherently to any input shape:
+normalizeInput(raw) — accepts structured JSON, natural language, or mixed. Three-tier: structured fast-path → regex (handles "need 3 welders in Nashville, TN" in 0ms without an LLM call) → qwen3 LLM fallback for low-signal inputs.POST /memory/query — one endpoint, returns every memory surface in a single bundle: playbook_workers (with boost + citations), pathway_recommendation (KB), neighbor_signatures (competence-weighted), prior_lessons (T3 overseer history), top_staffers (leaderboard), discovered_patterns (auto-surfaced reliable workers for this role+city), latency_ms (per-source). Natural-language query end-to-end: 319ms.Three of the five 2026-era memory findings remain unwired. Flagged for near-term implementation, not hidden:
+valid_until or schema_fingerprint. Load-bearing: when a schema migration changes a column, stale playbooks silently keep boosting. Biggest-value remaining fix./seed only ADDs. Same (operation, date) pair appends a duplicate instead of refining an existing entry. Playbook file grows faster than necessary.Validity windows is next — preserves the trust signal (boost only fires on playbooks that are still true given the current schema) rather than the latency signal (which the current scale doesn't need yet).
+ +