From 3fb3a60da41dbc59e7b19260462291711d0d1174 Mon Sep 17 00:00:00 2001
From: root <root@island37.com>
Date: Tue, 21 Apr 2026 00:03:06 -0500
Subject: [PATCH] =?UTF-8?q?Spec=20ch6=20rewrite=20=E2=80=94=203=20learning?=
 =?UTF-8?q?=20paths=20=E2=86=92=207=20+=20honest=20gap=20list?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

J flagged the spec out of alignment with what's built. Ch6 now
reflects the full current architecture:

- Path 1 (playbook boost) — formula kept; geo+role prefilter
  refinement called out with measured 14× citation lift
- Path 2 (pattern discovery) — unchanged
- Path 3 (autotune agent) — unchanged
- Path 4 (KB + pathway recommender) — Phase 22, file layout
  documented
- Path 5 (cloud rescue on failure) — Phase 22 item B, verified
  stress_01 example cited
- Path 6 (staffer competence-weighted retrieval) — Phase 23,
  competence_score formula included, cross-staffer auto-
  discovered worker labels (Rachel D. Lewis 18× endorsements)
- Path 7 (observer outcome ingest) — Phase 24, :3800 HTTP
  listener + ops.jsonl append journal

Input normalizer + unified /memory/query surface documented as
the "seamless with whatever input" answer, with the 319ms
natural-language latency number.

Honest gaps kept visible in the spec itself, not hidden:
- Zep validity windows (most load-bearing remaining)
- Mem0 UPDATE/DELETE/NOOP ops
- Letta working-memory hot cache

Live at https://devop.live/lakehouse/spec#ch6 after service
restart. Verified post-deploy: geo+role prefilter, 14× delta,
validity windows gap all present in served HTML.
---
 mcp-server/spec.html | 48 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 44 insertions(+), 4 deletions(-)
diff --git a/mcp-server/spec.html b/mcp-server/spec.html
index 1b38c86..9b81834 100644
--- a/mcp-server/spec.html
+++ b/mcp-server/spec.html
@@ -251,19 +251,59 @@ table.plain tr:hover td{background:#0d1117}
 <div class="chapter" id="ch6">
 <div class="num">Chapter 6</div>
 <h2>How it gets better over time</h2>
-<div class="lede">Compounding learning in three paths — all three happen automatically, no operator intervention required.</div>
+<div class="lede">Compounding learning across seven paths. The first three are automatic background loops. Paths 4-7 landed 2026-04-21 and turn the system into a reinforcement-learning pipeline: outcomes → knowledge base → pathway recommendations → cloud rescue → competence-weighted retrieval → observer analysis. All seven happen without operator intervention.</div>
 
-<h3>Path 1 — Playbook boost (Phase 19)</h3>
-<p>Every sealed fill is seeded to <code>playbook_memory</code> via <code>/vectors/playbook_memory/seed</code>. The next hybrid query for a semantically similar role+geo surfaces the past endorsed workers with a boost. Math:</p>
+<h3>Path 1 — Playbook boost with geo + role prefilter (Phase 19 + refinement)</h3>
+<p>Every sealed fill is seeded to <code>playbook_memory</code>. The boost fires inside <code>/vectors/hybrid</code> when <code>use_playbook_memory: true</code>. Math, tightened 2026-04-21 after a diagnostic pass found globally-ranked playbooks were missing the SQL-filtered candidate pool entirely:</p>
 <pre>per_worker = cosine(query_emb, playbook_emb) × 0.5 × e^(-age/30) × 0.5^failures / n_workers
 boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
-<p>Caps, decay, and negative signal mean one popular worker can't dominate, old playbooks fade, and no-shows stop boosting. Verified live: 3 identical seeds → +0.250 boost capped, 3 citations.</p>
+<p><strong>Multi-strategy retrieval (new):</strong> before cosine, <code>compute_boost_for_filtered_with_role(target_geo, target_role)</code> prefilters to same-city playbooks, then gives exact (role, city, state) matches similarity=1.0 and fills up to half the top-k. Cosine fills the rest. Mirrors 2026 Mem0/Zep guidance on parallel-strategy rerank.</p>
+<p><strong>Measured lift:</strong> before geo-filter, Nashville Welder query returned <code>boosts=170 matched=0</code> (zero intersection with candidate pool). After: <code>boosts=36 matched=11</code>. On the Riverfront Steel scenario, total playbook citations went from 2 → 28 per run — a 14× delta on identical inputs. The diagnostic log <code>playbook_boost: boosts=N sources=N parsed=N matched=N target_geo=? target_role=?</code> runs on every hybrid call so the class of silent-miss bug stays visible.</p>
 
 <h3>Path 2 — Pattern discovery (meta-index)</h3>
 <p><code>/vectors/playbook_memory/patterns</code> goes beyond "who was endorsed" to answer "what did past similar fills have in common?" Aggregates recurring certifications, skills, archetype, reliability distribution across the top-K semantically similar playbooks. Surfaces signal the operator didn't explicitly query for.</p>
 
 <h3>Path 3 — Autotune agent</h3>
 <p>The <code>vectord::agent</code> background task runs continuously. Watches the HNSW trial journal, proposes configs, executes trials, promotes Pareto winners — without human intervention. Operator sees "the index got faster overnight" and doesn't know why. The journal knows why.</p>
+
+<h3>Path 4 — Knowledge Base + pathway recommender (Phase 22)</h3>
+<p>Meta-layer over playbook_memory. Files under <code>data/_kb/</code>:</p>
+<ul>
+<li><code>signatures.jsonl</code> — sig_hash + embedding of every run's event shape</li>
+<li><code>outcomes.jsonl</code> — per-run summary (models, fill rate, turns, citations, rescue stats, staffer, elapsed)</li>
+<li><code>pathway_recommendations.jsonl</code> — AI-synthesized advice for the next run of a similar sig</li>
+<li><code>error_corrections.jsonl</code> — fail→succeed deltas on the same sig</li>
+<li><code>config_snapshots.jsonl</code> — hash of active models + tool_level at each run</li>
+</ul>
+<p>Cycle: scenario ends → <code>kb.indexRun()</code> appends outcome → <code>kb.recommendFor(nextSpec)</code> finds k-NN signatures, feeds outcome history to an overview model, writes structured JSON advice → next scenario reads it via <code>kb.loadRecommendation(spec)</code> and injects <code>pathway_notes</code> into the executor's context alongside prior T3 lessons.</p>
+
+<h3>Path 5 — Cloud rescue on failure (Phase 22 item B)</h3>
+<p>When an event fails (drift abort, JSON parse, pool exhaustion) and cloud T3 is enabled, <code>requestCloudRemediation()</code> feeds the full failure trace — SQL filters attempted, row counts, reviewer drift notes, gap signals, contract terms — to <code>gpt-oss:120b</code> on Ollama Cloud. Cloud returns structured <code>{retry, new_city, new_state, new_role, new_count, rationale}</code>. Event retries once with the pivot. Verified on stress_01: Gary IN (zero workers indexed) misplacement → cloud proposed South Bend IN → retry filled 1/1.</p>
+
+<h3>Path 6 — Staffer competence-weighted retrieval (Phase 23)</h3>
+<p>Answers "who handled this" as a first-class matrix-index dimension. Each scenario carries <code>staffer: {id, name, tenure_months, role, tool_level}</code>. After every run, <code>recomputeStafferStats(staffer_id)</code> aggregates their fill_rate, turn efficiency, citation density, rescue rate into a single <code>competence_score</code> (0.45·fill + 0.20·turn_eff + 0.20·cites + 0.15·rescue).</p>
+<p><code>findNeighbors</code> returns <code>weighted_score = cosine × max_staffer_competence</code> — top-performer playbooks rank above juniors' on similar scenarios. Auto-discovery emerges: running 4 staffers × 3 contracts × 3 rounds surfaced Rachel D. Lewis (Welder Nashville) with 18 endorsements across all 4 staffers, Angela U. Ward (Machine Op Indianapolis) with 19 — reliable-performer labels the system built without human tagging.</p>
+
+<h3>Path 7 — Observer outcome ingest (Phase 24)</h3>
+<p>Observer runs as <code>lakehouse-observer.service</code>, now with an HTTP listener on <code>:3800</code>. Scenarios POST per-event outcomes to <code>/event</code> with full provenance (staffer_id, sig_hash, event_kind, role, city, state, rescue flags). Observer's ERROR_ANALYZER and PLAYBOOK_BUILDER loops consume them alongside MCP-wrapped ops. Persistence switched from the old <code>/ingest/file</code> REPLACE path to an append-only <code>data/_observer/ops.jsonl</code> journal so the trace survives across restarts.</p>
+
+<h3>Input normalizer + unified memory query</h3>
+<p>Two surfaces added 2026-04-21 to make the memory stack respond coherently to any input shape:</p>
+<ul>
+<li><code>normalizeInput(raw)</code> — accepts structured JSON, natural language, or mixed. Three-tier: structured fast-path → regex (handles "need 3 welders in Nashville, TN" in 0ms without an LLM call) → qwen3 LLM fallback for low-signal inputs.</li>
+<li><code>POST /memory/query</code> — one endpoint, returns every memory surface in a single bundle: <code>playbook_workers</code> (with boost + citations), <code>pathway_recommendation</code> (KB), <code>neighbor_signatures</code> (competence-weighted), <code>prior_lessons</code> (T3 overseer history), <code>top_staffers</code> (leaderboard), <code>discovered_patterns</code> (auto-surfaced reliable workers for this role+city), <code>latency_ms</code> (per-source). Natural-language query end-to-end: 319ms.</li>
+</ul>
+
+<h3>Honest gaps — what we can still implement</h3>
+<p>Three of the five 2026-era memory findings remain unwired. Flagged for near-term implementation, not hidden:</p>
+<ul>
+<li><strong>Zep-style validity windows</strong> — playbook entries have timestamps but no <code>valid_until</code> or <code>schema_fingerprint</code>. Load-bearing: when a schema migration changes a column, stale playbooks silently keep boosting. Biggest-value remaining fix.</li>
+<li><strong>Mem0-style UPDATE / DELETE / NOOP ops</strong> — <code>/seed</code> only ADDs. Same (operation, date) pair appends a duplicate instead of refining an existing entry. Playbook file grows faster than necessary.</li>
+<li><strong>Letta working-memory hot cache</strong> — every query scans all 1560 playbook entries from disk. Cheap today at 1.5K, not at 100K. Solution: LRU in-process cache for the last N playbooks or the current sig's neighborhood.</li>
+</ul>
+<p>Validity windows is next — preserves the trust signal (boost only fires on playbooks that are still true given the current schema) rather than the latency signal (which the current scale doesn't need yet).</p>
+
+<div class="ref"><strong>Code:</strong> crates/vectord/src/{playbook_memory.rs, service.rs} · tests/multi-agent/{kb.ts, memory_query.ts, normalize.ts, scenario.ts} · mcp-server/{observer.ts, index.ts} · data/_kb/ · data/_observer/</div>
 </div>
 
 <!-- ═══ 7. SCALE STORY ═══ -->