diff --git a/mcp-server/spec.html b/mcp-server/spec.html index 1b38c86..9b81834 100644 --- a/mcp-server/spec.html +++ b/mcp-server/spec.html @@ -251,19 +251,59 @@ table.plain tr:hover td{background:#0d1117}
Chapter 6

How it gets better over time

-
Compounding learning in three paths — all three happen automatically, no operator intervention required.
+
Compounding learning across seven paths. The first three are automatic background loops. Paths 4-7 landed 2026-04-21 and turn the system into a reinforcement-learning pipeline: outcomes → knowledge base → pathway recommendations → cloud rescue → competence-weighted retrieval → observer analysis. All seven happen without operator intervention.
-

Path 1 — Playbook boost (Phase 19)

-

Every sealed fill is seeded to playbook_memory via /vectors/playbook_memory/seed. The next hybrid query for a semantically similar role+geo surfaces the past endorsed workers with a boost. Math:

+

Path 1 — Playbook boost with geo + role prefilter (Phase 19 + refinement)

+

Every sealed fill is seeded to playbook_memory. The boost fires inside /vectors/hybrid when use_playbook_memory: true. Math, tightened 2026-04-21 after a diagnostic pass found globally-ranked playbooks were missing the SQL-filtered candidate pool entirely:

per_worker = cosine(query_emb, playbook_emb) × 0.5 × e^(-age/30) × 0.5^failures / n_workers
 boost[(city, state, name)] = min(Σ per_worker, 0.25)
-

Caps, decay, and negative signal mean one popular worker can't dominate, old playbooks fade, and no-shows stop boosting. Verified live: 3 identical seeds → +0.250 boost capped, 3 citations.

+

Multi-strategy retrieval (new): before cosine, compute_boost_for_filtered_with_role(target_geo, target_role) prefilters to same-city playbooks, then gives exact (role, city, state) matches similarity=1.0 and fills up to half the top-k. Cosine fills the rest. Mirrors 2026 Mem0/Zep guidance on parallel-strategy rerank.

+

Measured lift: before geo-filter, Nashville Welder query returned boosts=170 matched=0 (zero intersection with candidate pool). After: boosts=36 matched=11. On the Riverfront Steel scenario, total playbook citations went from 2 → 28 per run — a 14× delta on identical inputs. The diagnostic log playbook_boost: boosts=N sources=N parsed=N matched=N target_geo=? target_role=? runs on every hybrid call so the class of silent-miss bug stays visible.

Path 2 — Pattern discovery (meta-index)

/vectors/playbook_memory/patterns goes beyond "who was endorsed" to answer "what did past similar fills have in common?" Aggregates recurring certifications, skills, archetype, reliability distribution across the top-K semantically similar playbooks. Surfaces signal the operator didn't explicitly query for.

Path 3 — Autotune agent

The vectord::agent background task runs continuously. Watches the HNSW trial journal, proposes configs, executes trials, promotes Pareto winners — without human intervention. Operator sees "the index got faster overnight" and doesn't know why. The journal knows why.

+ +

Path 4 — Knowledge Base + pathway recommender (Phase 22)

+

Meta-layer over playbook_memory. Files under data/_kb/:

+ +

Cycle: scenario ends → kb.indexRun() appends outcome → kb.recommendFor(nextSpec) finds k-NN signatures, feeds outcome history to an overview model, writes structured JSON advice → next scenario reads it via kb.loadRecommendation(spec) and injects pathway_notes into the executor's context alongside prior T3 lessons.

+ +

Path 5 — Cloud rescue on failure (Phase 22 item B)

+

When an event fails (drift abort, JSON parse, pool exhaustion) and cloud T3 is enabled, requestCloudRemediation() feeds the full failure trace — SQL filters attempted, row counts, reviewer drift notes, gap signals, contract terms — to gpt-oss:120b on Ollama Cloud. Cloud returns structured {retry, new_city, new_state, new_role, new_count, rationale}. Event retries once with the pivot. Verified on stress_01: Gary IN (zero workers indexed) misplacement → cloud proposed South Bend IN → retry filled 1/1.

+ +

Path 6 — Staffer competence-weighted retrieval (Phase 23)

+

Answers "who handled this" as a first-class matrix-index dimension. Each scenario carries staffer: {id, name, tenure_months, role, tool_level}. After every run, recomputeStafferStats(staffer_id) aggregates their fill_rate, turn efficiency, citation density, rescue rate into a single competence_score (0.45·fill + 0.20·turn_eff + 0.20·cites + 0.15·rescue).

+

findNeighbors returns weighted_score = cosine × max_staffer_competence — top-performer playbooks rank above juniors' on similar scenarios. Auto-discovery emerges: running 4 staffers × 3 contracts × 3 rounds surfaced Rachel D. Lewis (Welder Nashville) with 18 endorsements across all 4 staffers, Angela U. Ward (Machine Op Indianapolis) with 19 — reliable-performer labels the system built without human tagging.

+ +

Path 7 — Observer outcome ingest (Phase 24)

+

Observer runs as lakehouse-observer.service, now with an HTTP listener on :3800. Scenarios POST per-event outcomes to /event with full provenance (staffer_id, sig_hash, event_kind, role, city, state, rescue flags). Observer's ERROR_ANALYZER and PLAYBOOK_BUILDER loops consume them alongside MCP-wrapped ops. Persistence switched from the old /ingest/file REPLACE path to an append-only data/_observer/ops.jsonl journal so the trace survives across restarts.

+ +

Input normalizer + unified memory query

+

Two surfaces added 2026-04-21 to make the memory stack respond coherently to any input shape:

+ + +

Honest gaps — what we can still implement

+

Three of the five 2026-era memory findings remain unwired. Flagged for near-term implementation, not hidden:

+ +

Validity windows is next — preserves the trust signal (boost only fires on playbooks that are still true given the current schema) rather than the latency signal (which the current scale doesn't need yet).

+ +
Code: crates/vectord/src/{playbook_memory.rs, service.rs} · tests/multi-agent/{kb.ts, memory_query.ts, normalize.ts, scenario.ts} · mcp-server/{observer.ts, index.ts} · data/_kb/ · data/_observer/