demo: spec — refresh repo layout + model fleet + per-staffer + paths 8-9
J: "how about devop.live/lakehouse/spec." Spec was anchored on
2026-04-21 state (v2 footer): mistral mentioned in the model matrix,
13 crates not 15, missing validator/truth/auditor crates, no mention
of OpenCode 40-model fleet, no pathway memory, no per-staffer
hot-swap, no Construction Activity Signal Engine, ADR count was 20.
Footer claimed Phases 19-25.
Edits, in order:
Ch1 Repository layout
+ crates/truth/ (ADR-021 rule store)
+ crates/validator/ (Phase 43 — schema/completeness/policy gates)
+ auditor/ (cross-lineage Kimi↔Haiku/Opus auto-promote)
+ scripts/distillation/ (frozen substrate v1.0.0 at e7636f2)
Updated aibridge to mention ProviderAdapter dispatch
Updated gateway to mention OpenAI-compat /v1/* drop-in middleware
Updated mcp-server route list to include /staffers + profiler/contractor pages
Updated config/ to mention modes.toml + providers.toml + routing.toml
Updated docs/ ADR count from 20 → 21
Updated data/ to mention _pathway_memory + _auditor/kimi_verdicts
Ch3 Measurement & indexing
REPLACED stale "Model matrix (Phase 20)" T1-T5 table that
mentioned mistral with the current 5-provider fleet:
ollama / ollama_cloud / openrouter / opencode (40 models, one
sk-* key reaches Claude Opus 4.7, GPT-5.5-pro, Gemini 3.1-pro,
Kimi K2.6, GLM, DeepSeek, Qwen, MiniMax, free) / kimi
ADDED 9-rung cloud-first ladder pseudocode
ADDED N=3 consensus + cross-architecture tie-breaker math
ADDED auditor cross-lineage Kimi K2.6 ↔ Haiku 4.5 + Opus auto-promote
ADDED distillation v1.0.0 freeze paragraph (145 tests, 22/22, 16/16)
Updated Continuation primitive to mention Phase 44 Rust port
Ch5 What a CRM can't do
Extended the table with 6 new capabilities:
- Per-staffer relevance gradient
- Triage in one shot (late-worker → backfills + draft SMS)
- Permit → fill plan derivation
- Public-issuer attribution across contractor graph
- Cross-lineage AI audit on every PR
- Pathway memory (system-level hot-swap, ADR-021)
Ch6 How it gets better over time
Lede updated from 7 paths → 10 paths
NEW Path 7 — Pathway memory (ADR-021)
NEW Path 8 — Per-staffer hot-swap index
NEW Path 9 — Construction Activity Signal Engine
Original Path 7 (observer ingest) renumbered to Path 10
Ch9 Per-staffer context
Lede now anchors Path 8 from Ch6
NEW lead section: Per-staffer hot-swap index — Maria/Devon/Aisha,
same query reshapes per coordinator (167 IL / 89 IN / 16 WI),
MARIA'S MEMORY pill, /staffers endpoint, metro-agnostic by
construction. The original Phase 17 profile / Phase 23 competence
sections retained beneath as the deeper architecture detail.
Ch10 A day in the life
Updated 14:00 emergency event to use the late-worker triage
handler — coordinator types "Dave running late site 4422", gets
profile + draft SMS + 5 backfills + Copy SMS button in 250ms.
The old Click No-show button → /log_failure flow remains valid
(penalty still records); the user-facing surface is the new
triage card.
Ch11 Known limits
REPLACED the Mem0/Letta/Phase-26 era list with current honest
limits: BAI persistence + backtesting, NYC DOB adapter, 12
awaiting public-data sources for contractor profile, rate/margin
awareness, Mem0-style UPDATE/DELETE, Letta hot cache (now 5K
not 1.9K), confidence calibration, SEC fuzzy precision, tighter
pathway+scrum integration.
Footer
v2 2026-04-21 → v3 2026-04-27
Phases 19-25 → 19-45
Lists today's phases: distillation v1.0.0 substrate, gateway as
OpenAI-compat drop-in, mode runner, validator + iterate, ADR-021
pathway memory, per-staffer hot-swap, Construction Activity Signal
Engine.
Nav
+ Profiler link
Date pill v1 · 2026-04-20 → v3 · 2026-04-27
Verified end-to-end on devop.live/lakehouse/spec — 11 chapter h2s
render in order, 67KB page (was 50KB-ish), all internal links resolve.
This commit is contained in:
parent
631b0329b1
commit
d571d62e9b
@ -78,13 +78,14 @@ table.plain tr:hover td{background:#0d1117}
|
||||
<nav>
|
||||
<a href=".">Dashboard</a>
|
||||
<a href="console">Walkthrough</a>
|
||||
<a href="profiler">Profiler</a>
|
||||
<a href="proof">Architecture</a>
|
||||
<a href="spec" class="active">Spec</a>
|
||||
<a href="onboard">Onboard</a>
|
||||
<a href="alerts">Alerts</a>
|
||||
<a href="workspaces">Workspaces</a>
|
||||
</nav>
|
||||
<div class="rt">v1 · 2026-04-20</div>
|
||||
<div class="rt">v3 · 2026-04-27</div>
|
||||
</div>
|
||||
|
||||
<div class="layout">
|
||||
@ -120,14 +121,18 @@ table.plain tr:hover td{background:#0d1117}
|
||||
<tr><td class="mono">crates/vectord/</td><td>The vector + learning surface. Embeddings stored as Parquet (ADR-008), HNSW index (Phase 15), trial system (autotune), promotion registry (Phase 16), playbook_memory (Phase 19). Core feedback loop lives here.</td></tr>
|
||||
<tr><td class="mono">crates/vectord-lance/</td><td>Firewall crate. Lance 4.0 + Arrow 57, isolated from the main Arrow-55 workspace. Provides secondary vector backend for large-scale, random-access, and append-heavy workloads (ADR-019).</td></tr>
|
||||
<tr><td class="mono">crates/journald/</td><td>Append-only mutation event log (ADR-012). Every insert/update/delete writes here — who, when, what, old/new value. Never mutated. Foundation for time-travel + compliance audit.</td></tr>
|
||||
<tr><td class="mono">crates/aibridge/</td><td>Rust ↔ Python sidecar. HTTP client over FastAPI wrapper around Ollama. VRAM introspection via nvidia-smi. All LLM calls (embed, generate, rerank) flow through here.</td></tr>
|
||||
<tr><td class="mono">crates/gateway/</td><td>Axum HTTP (:3100) + gRPC (:3101). Auth middleware, tools registry (Phase 12 — governed actions), CORS. Every external request enters here.</td></tr>
|
||||
<tr><td class="mono">crates/truth/</td><td>File-backed rule store. <code>evaluate(task_class, ctx) → Vec<RuleOutcome></code> (ADR-021 — semantic-correctness matrix layer). Loaded from <code>truth/*.toml</code> at gateway boot.</td></tr>
|
||||
<tr><td class="mono">crates/aibridge/</td><td>Rust ↔ Python sidecar + provider adapter trait. HTTP client over FastAPI wrapper around Ollama for local; <code>ProviderAdapter</code> dispatch for cloud (ollama_cloud, openrouter, opencode, kimi). VRAM introspection via nvidia-smi. All LLM calls flow through here.</td></tr>
|
||||
<tr><td class="mono">crates/gateway/</td><td>Axum HTTP (:3100) + gRPC (:3101). OpenAI-compat <code>/v1/*</code> (drop-in middleware), mode runner (<code>/v1/mode/execute</code>), validator (<code>/v1/validate</code>), iterate loop (<code>/v1/iterate</code>), tools registry, cost telemetry, Langfuse + observer fan-out on every chat. Every external request enters here.</td></tr>
|
||||
<tr><td class="mono">crates/validator/</td><td>Phase 43 production validator. Schema / completeness / consistency / policy gates over LLM outputs. <code>FillValidator</code>, <code>EmailValidator</code>, <code>ParquetWorkerLookup</code> (loads workers_500k.parquet at boot). Fail-closed when roster absent.</td></tr>
|
||||
<tr><td class="mono">crates/ui/</td><td>Dioxus WASM developer UI. Internal tool. Not exposed externally.</td></tr>
|
||||
<tr><td class="mono">mcp-server/</td><td>Bun/TypeScript recruiter-facing app. Serves <code>devop.live/lakehouse</code>. Routes: <code>/search /match /log /log_failure /clients/:c/blacklist /intelligence/* /memory/query /models/matrix /system/summary</code>. Observer sibling at <code>observer.ts</code> with HTTP listener on :3800 for scenario event ingest. Proxies to the Rust gateway for heavy work.</td></tr>
|
||||
<tr><td class="mono">tests/multi-agent/</td><td>Dual-agent scenario harness + memory stack. <code>agent.ts</code> (prompts, continuation + tree-split primitives, cloud routing), <code>orchestrator.ts</code>, <code>scenario.ts</code> (contracts + staffer + tool_level), <code>kb.ts</code> (KB indexing, competence scoring, neighbor retrieval), <code>normalize.ts</code> (input normalizer — structured / regex / LLM), <code>memory_query.ts</code> (unified /memory/query), <code>gen_scenarios.ts</code> + <code>gen_staffer_demo.ts</code> (corpus generators), <code>run_e2e_rated.ts</code>, <code>chain_of_custody.ts</code>. Unit tests colocated (<code>kb.test.ts</code>, <code>normalize.test.ts</code>).</td></tr>
|
||||
<tr><td class="mono">config/</td><td><code>models.json</code> — authoritative 5-tier model matrix (T1 hot local / T2 review local / T3 overview cloud / T4 strategic / T5 gatekeeper). Per-tier context_window + context_budget + overflow_policy. Read at runtime by scenario.ts; hot-swap friendly.</td></tr>
|
||||
<tr><td class="mono">docs/</td><td><code>PRD.md</code>, <code>PHASES.md</code>, <code>DECISIONS.md</code> (20 ADRs). Every significant architectural choice has an ADR with the alternatives that were rejected and why.</td></tr>
|
||||
<tr><td class="mono">data/</td><td>Default local object store. Parquet files per dataset, append-log batches, HNSW trial journals, promotion registries, <code>_playbook_memory/state.json</code> (now with retirement fields — Phase 25), catalog manifests. Plus four learning-loop directories: <code>_kb/</code> (signatures, outcomes, recommendations, error_corrections, config_snapshots, staffers), <code>_playbook_lessons/</code> (T3 cross-day lessons archived per run), <code>_observer/ops.jsonl</code> (append journal, durable scenario outcome stream), <code>_chunk_cache/</code> (spec'd for Phase 21 Rust port). Rebuildable from repo + this dir alone.</td></tr>
|
||||
<tr><td class="mono">mcp-server/</td><td>Bun/TypeScript public-facing app + MCP tool surface. Serves <code>devop.live/lakehouse</code>. Pages: dashboard / console / profiler / contractor / proof / spec / onboard / alerts / workspaces. Routes: <code>/search /match /log /log_failure /clients/:c/blacklist /intelligence/* /staffers /memory/query /models/matrix /system/summary</code>. Observer sibling at <code>observer.ts</code> on :3800 for event ingest.</td></tr>
|
||||
<tr><td class="mono">auditor/</td><td>External claim-vs-diff verifier on PRs. Polls Gitea for open PRs, builds adversarial prompt from PRD invariants + staffing matrix, alternates Kimi K2.6 ↔ Haiku 4.5 by SHA, auto-promotes Claude Opus 4.7 on diffs >100k chars. Per-PR cap=3 with auto-reset on each new head SHA. Verdicts at <code>data/_auditor/kimi_verdicts/</code>.</td></tr>
|
||||
<tr><td class="mono">tests/multi-agent/</td><td>Multi-agent scenario harness + memory stack. <code>agent.ts</code>, <code>scenario.ts</code> (contracts + staffer + tool_level), <code>kb.ts</code> (KB indexing, competence scoring), <code>normalize.ts</code>, <code>memory_query.ts</code>, <code>run_e2e_rated.ts</code>. Unit tests colocated.</td></tr>
|
||||
<tr><td class="mono">scripts/distillation/</td><td>Distillation substrate v1.0.0 (frozen at tag <code>distillation-v1.0.0</code> / commit <code>e7636f2</code>). 145 unit tests, 22/22 acceptance, 16/16 audit-full, bit-identical reproducibility. Multi-layer contamination firewall on SFT exports.</td></tr>
|
||||
<tr><td class="mono">config/</td><td><code>modes.toml</code> — task_class → mode/model router (<code>scrum_review</code>, <code>contract_analysis</code>, <code>staffing_inference</code>, <code>pr_audit</code>, <code>doc_drift_check</code>, <code>fact_extract</code>). <code>providers.toml</code> — 5 active providers (ollama, ollama_cloud, openrouter, opencode 40-model, kimi direct). <code>routing.toml</code> — cost gates per task class.</td></tr>
|
||||
<tr><td class="mono">docs/</td><td><code>PRD.md</code>, <code>PHASES.md</code>, <code>DECISIONS.md</code> (21 ADRs). Every significant architectural choice has an ADR with the alternatives that were rejected and why.</td></tr>
|
||||
<tr><td class="mono">data/</td><td>Default local object store. Parquet datasets, append-log batches, HNSW trial journals, promotion registries, <code>_playbook_memory/state.json</code>, <code>_pathway_memory/state.json</code> (88 traces, 11/11 successful replays, ADR-021), catalog manifests. Plus learning-loop directories: <code>_kb/</code>, <code>_playbook_lessons/</code>, <code>_observer/ops.jsonl</code>, <code>_auditor/kimi_verdicts/</code>. Rebuildable from repo + this dir alone.</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
@ -199,20 +204,42 @@ table.plain tr:hover td{background:#0d1117}
|
||||
<li>Ollama swaps to the profile's model via <code>keep_alive=0</code>; only one model in VRAM at a time</li>
|
||||
</ul>
|
||||
|
||||
<h3>Model matrix (Phase 20)</h3>
|
||||
<p>Five tiers declared in <code>config/models.json</code>. Each call site picks the tier appropriate to its purpose — hot-path JSON emitters get fast local, overview/strategic/gatekeeper decisions get thinking models on cloud. Every tier carries <code>context_window</code>, <code>context_budget</code>, and <code>overflow_policy</code>.</p>
|
||||
<h3>Provider fleet — 5 active, 40+ frontier models reachable</h3>
|
||||
<p>Declared in <code>config/providers.toml</code> + <code>config/modes.toml</code>. Gateway is an OpenAI-compatible drop-in middleware: any consumer that speaks <code>POST /v1/chat/completions</code> gets routing, audit, cost telemetry, and the full memory substrate behind every call.</p>
|
||||
<table class="plain">
|
||||
<thead><tr><th>Tier</th><th>Purpose</th><th>Primary model</th><th>Frequency</th></tr></thead>
|
||||
<thead><tr><th>Provider</th><th>Reach</th><th>Use case</th></tr></thead>
|
||||
<tbody>
|
||||
<tr><td>T1 hot</td><td>Per tool call — SQL gen, hybrid_search, propose_done</td><td><code>qwen3.5:latest</code> local, <code>think:false</code></td><td>50-200/scenario</td></tr>
|
||||
<tr><td>T2 review</td><td>Per-step consensus, drift flagging</td><td><code>qwen3:latest</code> local, <code>think:false</code></td><td>5-14/event</td></tr>
|
||||
<tr><td>T3 overview</td><td>Mid-day checkpoints + cross-day lesson distill</td><td><code>gpt-oss:120b</code> Ollama Cloud, thinking on</td><td>1-3/scenario</td></tr>
|
||||
<tr><td>T4 strategic</td><td>Pattern re-ranking, weekly gap audit</td><td><code>qwen3.5:397b</code> cloud</td><td>1-10/day</td></tr>
|
||||
<tr><td>T5 gatekeeper</td><td>Schema migrations, autotune config changes</td><td><code>kimi-k2-thinking</code> cloud, audit-logged</td><td>1-5/day</td></tr>
|
||||
<tr><td><code>ollama</code></td><td>localhost:3200 — local sidecar over Ollama</td><td>Hot-path JSON emitters, embeddings, last-resort rescue</td></tr>
|
||||
<tr><td><code>ollama_cloud</code></td><td>ollama.com bearer key — gpt-oss:120b, qwen3-coder:480b, deepseek-v3.1:671b, kimi-k2:1t, mistral-large-3:675b, qwen3.5:397b</td><td>Strong-model reviewer rungs, T3+ overview, scrum master pipeline</td></tr>
|
||||
<tr><td><code>openrouter</code></td><td>openrouter.ai/api/v1 — 343 models incl. Anthropic/Google/OpenAI/MiniMax/Qwen, paid + free tiers</td><td>Paid ladder for observer escalations, free-tier rescue</td></tr>
|
||||
<tr><td><code>opencode</code></td><td>opencode.ai/zen/v1 — <strong>40 frontier models reachable through ONE sk-* key</strong>: Claude Opus 4.7 / Sonnet / Haiku, GPT-5.5-pro / 5.4 / codex variants, Gemini 3.1-pro, Kimi K2.6, GLM 5.1, DeepSeek, Qwen 3.6+, MiniMax, plus 4 free-tier</td><td>Cross-architecture tie-breakers, auditor cross-lineage (Haiku 4.5 + Opus 4.7), high-context reasoning (Opus on diffs >100k chars)</td></tr>
|
||||
<tr><td><code>kimi</code></td><td>api.kimi.com/coding/v1 — direct Kimi For Coding</td><td>kimi_architect when ollama_cloud rate-limits; TOS-clean primary path</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p><strong>Key mechanical finding (2026-04-21):</strong> qwen3.5 and qwen3 are <em>thinking</em> models — they burn ~650 tokens of hidden reasoning before emitting the visible response. For hot-path JSON emitters this meant 400-token budgets returned empty strings. Fix: <code>think: false</code> plumbed through sidecar's <code>/generate</code> endpoint; hot path disables thinking (structure matters more than reasoning depth), overseer tiers keep it on. Mistral was dropped entirely after a 0/14 fill rate on complex scenarios (decoder-level malformed-JSON bug, not a prompt issue).</p>
|
||||
<p><strong>Continuation primitive (Phase 21):</strong> <code>generateContinuable()</code> handles output-overflow without <code>max_tokens</code> tourniquets — empty response → geometric backoff retry; truncated-JSON → continue with partial as scratchpad. <code>generateTreeSplit()</code> handles input-overflow via map-reduce with running scratchpad. Both respect <code>assertContextBudget()</code> so silent truncation can't happen.</p>
|
||||
|
||||
<h3>The 9-rung cloud-first ladder</h3>
|
||||
<p>Defined in <code>tests/real-world/scrum_master_pipeline.ts</code> as <code>const LADDER</code>. Each attempt is evaluated by <code>isAcceptable()</code> = chars ≥ 3800 ∧ not malformed JSON-only. On reject, the next rung sees a learning preamble carrying the prior rejection reason.</p>
|
||||
<pre>1 ollama_cloud / kimi-k2:1t 1T params · flagship
|
||||
2 ollama_cloud / qwen3-coder:480b coding specialist
|
||||
3 ollama_cloud / deepseek-v3.1:671b reasoning
|
||||
4 ollama_cloud / mistral-large-3:675b deep analysis
|
||||
5 ollama_cloud / gpt-oss:120b reliable workhorse
|
||||
6 ollama_cloud / qwen3.5:397b dense final thinker
|
||||
7 openrouter / openai/gpt-oss-120b:free rescue tier
|
||||
8 openrouter / google/gemma-3-27b-it:free fastest rescue
|
||||
9 ollama / qwen3.5:latest last-resort local</pre>
|
||||
|
||||
<h3>N=3 consensus + cross-architecture tie-breaker</h3>
|
||||
<p>Every audit and every consensus-required call fires the primary reviewer N=3 times in parallel (Promise.all — wall-clock = single call). Aggregate votes per claim_idx, majority wins. On a 1-1-1 split, a tie-breaker model with <em>different architecture</em> (qwen3-coder:480b vs primary gpt-oss/kimi) is invoked. Every disagreement, even when majority resolves, writes to <code>data/_kb/audit_discrepancies.jsonl</code>. Closes the cloud-non-determinism gap: <code>temp=0</code> isn't actually deterministic in practice across hours; consensus + cross-architecture tie-break stabilizes verdicts.</p>
|
||||
|
||||
<h3>Auditor cross-lineage (Kimi ↔ Haiku ↔ Opus)</h3>
|
||||
<p>Every push to PR #11 triggers <code>auditor/audit.ts</code> within ~90s. To prevent a single model lineage's blind spots from becoming the system's blind spots, audits alternate between Kimi K2.6 (Moonshot lineage) and Haiku 4.5 (Anthropic lineage) by head SHA. Diffs over 100k chars auto-promote to Claude Opus 4.7 (Anthropic frontier). Per-PR cap of 3 audits with auto-reset on each new head SHA prevents infinite-loop spend. <strong>Latest verdict on c3c9c21:</strong> Haiku 4.5, 24.6s, 100% grounding-verified across 10 findings.</p>
|
||||
|
||||
<h3>Distillation v1.0.0 — the frozen substrate</h3>
|
||||
<p>The substrate the auditor and mode runner sit on is tagged at <code>distillation-v1.0.0</code> / commit <code>e7636f2</code>. <strong>145 unit tests pass · 22/22 acceptance invariants · 16/16 audit-full checks · bit-identical reproducibility verified.</strong> The distillation phase exports clean SFT / RAG / preference samples with a multi-layer contamination firewall (<code>SFT_NEVER</code> constant + scorer category mapping + acceptance fixtures); the auditor consumes the substrate. The frozen tag means: any future "the system regressed" question has a baseline to bisect against, byte-for-byte.</p>
|
||||
|
||||
<h3>Continuation primitive (Phase 21)</h3>
|
||||
<p><code>generateContinuable()</code> handles output-overflow without <code>max_tokens</code> tourniquets — empty response → geometric backoff retry; truncated-JSON → continue with partial as scratchpad. <code>generateTreeSplit()</code> handles input-overflow via map-reduce with running scratchpad. Both respect <code>assertContextBudget()</code> so silent truncation can't happen. Now Rust-native in <code>crates/aibridge/src/continuation.rs</code> (Phase 44).</p>
|
||||
|
||||
<h3>Per-staffer tool_level (Phase 23)</h3>
|
||||
<p>Scenarios can be scoped to a specific coordinator (<code>staffer: {id, name, tenure_months, role, tool_level}</code>). <code>tool_level</code> controls which tiers are available:</p>
|
||||
@ -265,6 +292,12 @@ table.plain tr:hover td{background:#0d1117}
|
||||
<tr><td>Boost workers based on past success</td><td>No</td><td>Yes (Phase 19 playbook_memory)</td></tr>
|
||||
<tr><td>Penalize workers based on past failure</td><td>No</td><td>Yes (<code>/log_failure</code> + <code>0.5<sup>n</sup></code> penalty)</td></tr>
|
||||
<tr><td>Surface traits across past fills</td><td>No</td><td>Yes (<code>/vectors/playbook_memory/patterns</code>)</td></tr>
|
||||
<tr><td>Per-staffer relevance gradient</td><td>No</td><td>Yes — same query reshapes per coordinator (<code>staffer_id</code> on <code>/intelligence/chat</code>); MARIA'S MEMORY pill labels the playbook context with the active coordinator</td></tr>
|
||||
<tr><td>Triage in one shot — late-worker → backfills + draft SMS</td><td>No</td><td>Yes (<code>/intelligence/chat</code> Route 6 — pulls profile + 5 same-role same-geo backfills sorted by responsiveness + drafts client SMS in ~250ms)</td></tr>
|
||||
<tr><td>Permit → fill plan derivation (forward demand)</td><td>No</td><td>Yes (<code>/intelligence/permit_contracts</code> — Chicago Socrata permit → role / headcount / deadline / fill probability / gross revenue per card)</td></tr>
|
||||
<tr><td>Public-issuer attribution across contractor graph</td><td>No</td><td>Yes (<code>/intelligence/profiler_index</code> — direct + parent + co-permit associated tickers; live Stooq prices)</td></tr>
|
||||
<tr><td>Cross-lineage AI audit on every PR</td><td>No</td><td>Yes (auditor crate — Kimi K2.6 ↔ Haiku 4.5 alternation + Opus 4.7 auto-promote on big diffs)</td></tr>
|
||||
<tr><td>Pathway memory — system-level hot-swap by task fingerprint</td><td>No</td><td>Yes (88 traces, 11/11 successful replays, 100% reuse rate, ADR-021)</td></tr>
|
||||
<tr><td>Predict staffing demand from external data</td><td>No</td><td>Yes (Chicago permit feed + 30-day rolling forecast)</td></tr>
|
||||
<tr><td>Count down to staffing deadline per contract</td><td>No</td><td>Yes (permit issue_date + heuristic timeline)</td></tr>
|
||||
<tr><td>Explain why each candidate ranked</td><td>No</td><td>Yes (boost chip + narrative citations + memory pattern)</td></tr>
|
||||
@ -278,7 +311,7 @@ table.plain tr:hover td{background:#0d1117}
|
||||
<div class="chapter" id="ch6">
|
||||
<div class="num">Chapter 6</div>
|
||||
<h2>How it gets better over time</h2>
|
||||
<div class="lede">Compounding learning across seven paths. The first three are automatic background loops. Paths 4-7 landed 2026-04-21 and turn the system into a reinforcement-learning pipeline: outcomes → knowledge base → pathway recommendations → cloud rescue → competence-weighted retrieval → observer analysis. All seven happen without operator intervention.</div>
|
||||
<div class="lede">Compounding learning across ten paths. The first three are automatic background loops. Paths 4-7 (Phase 22-24) added the reinforcement layer: outcomes → KB → recommendations → cloud rescue → competence-weighted retrieval → observer analysis. Paths 7-9 (Phase 25-43, 2026-04-26→27) added the system-level memory layers: pathway memory by task fingerprint (ADR-021), per-staffer hot-swap, and the Construction Activity Signal Engine. All ten happen without operator intervention.</div>
|
||||
|
||||
<h3>Path 1 — Playbook boost with geo + role prefilter (Phase 19 + refinement)</h3>
|
||||
<p>Every sealed fill is seeded to <code>playbook_memory</code>. The boost fires inside <code>/vectors/hybrid</code> when <code>use_playbook_memory: true</code>. Math, tightened 2026-04-21 after a diagnostic pass found globally-ranked playbooks were missing the SQL-filtered candidate pool entirely:</p>
|
||||
@ -311,7 +344,19 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
|
||||
<p>Answers "who handled this" as a first-class matrix-index dimension. Each scenario carries <code>staffer: {id, name, tenure_months, role, tool_level}</code>. After every run, <code>recomputeStafferStats(staffer_id)</code> aggregates their fill_rate, turn efficiency, citation density, rescue rate into a single <code>competence_score</code> (0.45·fill + 0.20·turn_eff + 0.20·cites + 0.15·rescue).</p>
|
||||
<p><code>findNeighbors</code> returns <code>weighted_score = cosine × max_staffer_competence</code> — top-performer playbooks rank above juniors' on similar scenarios. Auto-discovery emerges: running 4 staffers × 3 contracts × 3 rounds surfaced Rachel D. Lewis (Welder Nashville) with 18 endorsements across all 4 staffers, Angela U. Ward (Machine Op Indianapolis) with 19 — reliable-performer labels the system built without human tagging.</p>
|
||||
|
||||
<h3>Path 7 — Observer outcome ingest (Phase 24)</h3>
|
||||
<h3>Path 7 — Pathway memory (ADR-021 — semantic-correctness matrix layer)</h3>
|
||||
<p>Memory at the system layer, not the worker layer. Every accepted scrum review writes a <code>PathwayTrace</code> with the full backtrack: file fingerprint, model used, signal class, KB chunks consulted, observer events, semantic flags (UnitMismatch, TypeConfusion, OffByOne, StaleReference, DeadCode, BoundaryViolation, …), bug fingerprints. A new query that fingerprints to the same trace can hot-swap to the prior result without re-running the 9-rung escalation. Five-factor hot-swap gate: narrow fingerprint match AND audit consensus pass AND replay_count ≥ 3 (probation) AND success_rate ≥ 0.80 AND NOT retired AND vector cosine ≥ 0.90.</p>
|
||||
<p><strong>Live state (verified on this load):</strong> 88 traces · 11 / 11 successful replays · 100% reuse rate · probation gate crossed. Endpoints: <code>/vectors/pathway/insert</code> · <code>/query</code> · <code>/record_replay</code> · <code>/stats</code> · <code>/bug_fingerprints</code>. Spec: <code>docs/DECISIONS.md</code> ADR-021.</p>
|
||||
|
||||
<h3>Path 8 — Per-staffer hot-swap index</h3>
|
||||
<p>Memory scoped to whoever's acting. <code>/intelligence/chat</code> accepts <code>staffer_id</code>; on match, defaults state filter to staffer territory, scopes playbook-pattern geo to staffer's primary city/state, and surfaces <code>response.staffer.name</code> so the UI relabels MEMORY → MARIA'S MEMORY. Same query "forklift operators" returns 167 IL workers as Maria, 89 IN as Devon, 16 WI as Aisha. The corpus stays intact; the relevance gradient is per coordinator; each accumulates fills independently.</p>
|
||||
<p><strong>Roster:</strong> <code>/staffers</code> endpoint reads from <code>STAFFERS</code> in <code>mcp-server/index.ts</code>. Three personas today (Maria/Devon/Aisha); architecture generalizes — every new metro adds territories, not code paths.</p>
|
||||
|
||||
<h3>Path 9 — Construction Activity Signal Engine</h3>
|
||||
<p>Memory at the network layer. Every contractor in the corpus is also a forward indicator on the public equities they touch via three attribution flavors: <code>direct</code> (contractor IS the public issuer — SEC tickers index match), <code>parent</code> (subsidiary of a public parent — curated KNOWN_PARENT_MAP, e.g. Turner → HOC.DE via Hochtief AG), <code>associated</code> (co-permit network — Bob's Electric appears with TARGET CORPORATION 3+ times → inherits TGT). The associated path is the moat: a staffing-permit dataset that maps contractor-to-public-issuer is not commercially available; we synthesize it from the Socrata co-occurrence graph.</p>
|
||||
<p><strong>BAI (Building Activity Index)</strong> = attribution-weighted average day-change across surfaced issuers. <strong>Indexed build value</strong> = total $ of permits attributable to ANY public issuer in scope. <strong>Network depth</strong> = issuers / total attribution edges. Cross-metro replication explicit in the architecture — Chicago is Phase 1; NYC DOB / LA County / Houston BCD / Boston ISD / DC DCRA are all Socrata-shaped, ship as config-only adapters.</p>
|
||||
|
||||
<h3>Path 10 — Observer outcome ingest (Phase 24)</h3>
|
||||
<p>Observer runs as <code>lakehouse-observer.service</code>, now with an HTTP listener on <code>:3800</code>. Scenarios POST per-event outcomes to <code>/event</code> with full provenance (staffer_id, sig_hash, event_kind, role, city, state, rescue flags). Observer's ERROR_ANALYZER and PLAYBOOK_BUILDER loops consume them alongside MCP-wrapped ops. Persistence switched from the old <code>/ingest/file</code> REPLACE path to an append-only <code>data/_observer/ops.jsonl</code> journal so the trace survives across restarts.</p>
|
||||
|
||||
<h3>Input normalizer + unified memory query</h3>
|
||||
@ -399,7 +444,11 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
|
||||
<div class="chapter" id="ch9">
|
||||
<div class="num">Chapter 9</div>
|
||||
<h2>Per-staffer context</h2>
|
||||
<div class="lede">Twenty staffers don't see the same UI state. Each one's session is shaped by their active profile, their workspaces, their assigned contracts, and their client's blacklists.</div>
|
||||
<div class="lede">Twenty staffers don't see the same UI state. Each one's session is shaped by their identity (the per-staffer hot-swap index — Path 8 in Ch6), their active profile, their workspaces, their assigned contracts, and their client's blacklists.</div>
|
||||
|
||||
<h3>Per-staffer hot-swap index (the recent layer)</h3>
|
||||
<p>Maria runs Chicago. Devon runs Indianapolis. Aisha runs Wisconsin/Michigan. They share one corpus, but search results, recurring-skill patterns, and playbook context all reshape to whoever is acting. <code>/intelligence/chat</code> accepts <code>staffer_id</code>; on match, defaults state filter to the staffer's territory, scopes playbook-pattern geo to their primary city/state, and surfaces <code>response.staffer.name</code> so the UI relabels MEMORY → <em>MARIA'S MEMORY</em>.</p>
|
||||
<p><strong>Verified end-to-end:</strong> same query "forklift operators" returns 167 IL workers as Maria, 89 IN as Devon, 16 WI as Aisha (live numbers; refresh the profiler page to recompute). The corpus stays intact; the relevance gradient is per coordinator. As each accumulates fills, their slice of the playbook compounds independently. <strong>Roster:</strong> <code>/staffers</code> endpoint, declared in <code>STAFFERS</code> in <code>mcp-server/index.ts</code>. Adding a staffer is one append; the architecture is metro-agnostic by construction.</p>
|
||||
|
||||
<h3>Active profile (Phase 17)</h3>
|
||||
<p>Scopes every search. A <code>staffing-recruiter</code> profile bound to <code>workers_500k</code> sees only that dataset. A <code>security-analyst</code> profile bound to <code>threat_intel</code> cannot see worker data. <code>GET /vectors/profile/<id>/audit</code> records every tool invocation by model identity.</p>
|
||||
@ -446,7 +495,7 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
|
||||
|
||||
<div class="step"><div class="n">12:30</div><div class="body"><strong>Client pushes 20 new contracts + 1M ATS delta.</strong> Ch7 scale flow fires. Ingest in seconds; embedding refresh kicks off as a background job. Searches continue against old embeddings.</div></div>
|
||||
|
||||
<div class="step"><div class="n">14:00</div><div class="body"><strong>Emergency: worker Dave no-showed.</strong> Sarah clicks No-show button on Dave's card → <code>/log_failure</code> → <code>mark_failed</code> records a penalty. Next similar query dampens Dave's boost by 0.5. Sarah continues the refill — the refill excludes Dave and the 2 others already booked for this shift.</div></div>
|
||||
<div class="step"><div class="n">14:00</div><div class="body"><strong>Emergency: worker Dave no-showed.</strong> Sarah types "Dave running late site 4422" into the search box. ~250ms later: triage card with Dave's profile + reliability + responsiveness, draft SMS to client ("dispatching X from local bench, 96% reliability, will confirm arrival"), and 5 same-role same-geo backfills sorted by responsiveness rendered as a green list below. Sarah clicks Copy SMS, pastes to client, clicks Call on the top backfill. <code>/log_failure</code> on Dave records the penalty for the next similar query.</div></div>
|
||||
|
||||
<div class="step"><div class="n">15:00</div><div class="body"><strong>New embeddings live.</strong> Hot-swap promotion. Searches now see all 1M new profiles. Sarah's noon query re-run would produce different top-5.</div></div>
|
||||
|
||||
@ -468,14 +517,15 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
|
||||
|
||||
<h4>Deferred — real architectural work, just not shipped yet</h4>
|
||||
<ul>
|
||||
<li><strong>BAI persistence + backtesting.</strong> Building Activity Index is computed live per page load. To validate the thesis (permit activity precedes equity moves) we need the daily series saved over months. Architectural support exists (<code>data/_kb/audit_baselines.jsonl</code> append pattern); just hasn't run long enough.</li>
|
||||
<li><strong>NYC DOB adapter.</strong> Architecture is metro-agnostic — Chicago is Phase 1. NYC DOB ships next as a config-only Socrata adapter; LA County, Houston BCD, Boston ISD, DC DCRA queue behind it. Each new metro multiplies network edges without multiplying the codebase.</li>
|
||||
<li><strong>12 awaiting public-data sources for contractor profile.</strong> DOL Wage & Hour, EPA ECHO, MSHA, BBB, PACER civil suits, UCC liens, D&B credit, State licensure, Surety bonds, DOT/FMCSA, State UI claims, DOL RAPIDS apprenticeships. Listed by name on every contractor profile with a one-line "would show:" sample. Each ships as a Socrata-style adapter; engineering scope is concrete.</li>
|
||||
<li><strong>Rate / margin awareness.</strong> Worker pay expectations vs contract bill rate not modeled. Requires adding <code>pay_rate</code> to workers, <code>bill_rate</code> to contracts, and a filter + warning path. Partially addressed via <code>ContractTerms.budget_per_hour_max</code> passed to T3/rescue prompts, but the match-time filter isn't wired yet.</li>
|
||||
<li><strong>Mem0-style UPDATE / DELETE / NOOP operations on playbooks.</strong> Today <code>/seed</code> only ADDs. Same <code>(operation, date)</code> pair appends a duplicate instead of refining an existing entry. Phase 26 item — cheap to add, moderate payoff.</li>
|
||||
<li><strong>Letta working-memory hot cache.</strong> Every boost query scans all active playbook entries from in-memory state. 1.9K today; cheap. Will bite somewhere north of 100K. LRU for the last-N playbooks or current-sig neighborhood deferred until that ceiling approaches.</li>
|
||||
<li><strong>Chunking cache (Phase 21 Rust port).</strong> TS primitives <code>generateContinuable</code> + <code>generateTreeSplit</code> are wired, but <code>crates/aibridge/src/{continuation.rs, tree_split.rs}</code> + <code>crates/storaged/src/chunk_cache.rs</code> remain queued. Gateway-side callers currently don't have the same protection against silent truncation that the TS test harness does.</li>
|
||||
<li><strong>Mem0-style UPDATE / DELETE / NOOP operations on playbooks.</strong> Today <code>/seed</code> only ADDs. Same <code>(operation, date)</code> pair appends a duplicate instead of refining an existing entry. Cheap to add, moderate payoff.</li>
|
||||
<li><strong>Letta working-memory hot cache.</strong> Every boost query scans all active playbook entries from in-memory state. ~5K today; cheap. Will bite somewhere north of 100K. Deferred until the ceiling approaches.</li>
|
||||
<li><strong>Confidence calibration.</strong> Top-K is a rank, not a probability. No calibrated "85% likely to accept" score. Requires outcome-labeled training data.</li>
|
||||
<li><strong>Neural re-ranker.</strong> Phase 19 is statistical + semantic (now with geo + role prefilter, Phase 25 retirement). A (query, candidate, outcome)-trained re-ranker is deferred only if the statistical floor plateaus below usable recall — current 14× citation lift on identical inputs suggests it hasn't.</li>
|
||||
<li><strong>Observer → autotune feedback wire.</strong> Phase 24 streams scenario outcomes into <code>data/_observer/ops.jsonl</code>; autotune agent still runs on its own HNSW-trial schedule and hasn't subscribed to the outcome metric stream yet. Phase 26+ item — connects the last loop.</li>
|
||||
<li><strong>call_log cross-reference.</strong> Infrastructure present; current synthetic candidates table is too small to cross-ref. Fixes when real ATS lands.</li>
|
||||
<li><strong>SEC name-to-ticker fuzzy precision.</strong> Current matcher requires ≥2 non-stopword overlap; rare false positives still surface (saw FLG attach to a PNC-adjacent contractor once). Tightenable to require an explicit allow-list for production trading use.</li>
|
||||
<li><strong>Tighter integration of pathway memory + scrum loop.</strong> ADR-021 substrate is shipped (88 traces, 11/11 replays). The hot-swap gate fires correctly; what's deferred is automatic mode-runner short-circuit when a high-confidence pathway match is available before any cloud call burns.</li>
|
||||
</ul>
|
||||
|
||||
<h4>Non-goals — explicitly out of scope</h4>
|
||||
@ -496,6 +546,6 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="footer">Lakehouse spec · v2 2026-04-21 · Phases 19-25 shipped (playbook boost, model matrix, continuation, KB, staffer competence, observer ingest, validity windows) · maintained from <code>docs/DECISIONS.md</code> · <a href="proof">architecture live-tested</a> · <a href="console">walkthrough</a></div>
|
||||
<div class="footer">Lakehouse spec · v3 2026-04-27 · Phases 19-45 shipped (playbook boost, KB, staffer competence, observer ingest, validity windows, distillation v1.0.0 substrate frozen at e7636f2, gateway as OpenAI-compat drop-in, mode runner, validator + iterate, pathway memory ADR-021, per-staffer hot-swap, Construction Activity Signal Engine) · maintained from <code>docs/DECISIONS.md</code> · <a href="proof">architecture live-tested</a> · <a href="console">walkthrough</a> · <a href="profiler">profiler</a></div>
|
||||
|
||||
</body></html>
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user