root d571d62e9b demo: spec — refresh repo layout + model fleet + per-staffer + paths 8-9
J: "how about devop.live/lakehouse/spec." Spec was anchored on
2026-04-21 state (v2 footer): mistral mentioned in the model matrix,
13 crates not 15, missing validator/truth/auditor crates, no mention
of OpenCode 40-model fleet, no pathway memory, no per-staffer
hot-swap, no Construction Activity Signal Engine, ADR count was 20.
Footer claimed Phases 19-25.

Edits, in order:

  Ch1 Repository layout
    + crates/truth/ (ADR-021 rule store)
    + crates/validator/ (Phase 43 — schema/completeness/policy gates)
    + auditor/ (cross-lineage Kimi↔Haiku/Opus auto-promote)
    + scripts/distillation/ (frozen substrate v1.0.0 at e7636f2)
    Updated aibridge to mention ProviderAdapter dispatch
    Updated gateway to mention OpenAI-compat /v1/* drop-in middleware
    Updated mcp-server route list to include /staffers + profiler/contractor pages
    Updated config/ to mention modes.toml + providers.toml + routing.toml
    Updated docs/ ADR count from 20 → 21
    Updated data/ to mention _pathway_memory + _auditor/kimi_verdicts

  Ch3 Measurement & indexing
    REPLACED stale "Model matrix (Phase 20)" T1-T5 table that
    mentioned mistral with the current 5-provider fleet:
      ollama / ollama_cloud / openrouter / opencode (40 models, one
      sk-* key reaches Claude Opus 4.7, GPT-5.5-pro, Gemini 3.1-pro,
      Kimi K2.6, GLM, DeepSeek, Qwen, MiniMax, free) / kimi
    ADDED 9-rung cloud-first ladder pseudocode
    ADDED N=3 consensus + cross-architecture tie-breaker math
    ADDED auditor cross-lineage Kimi K2.6 ↔ Haiku 4.5 + Opus auto-promote
    ADDED distillation v1.0.0 freeze paragraph (145 tests, 22/22, 16/16)
    Updated Continuation primitive to mention Phase 44 Rust port

  Ch5 What a CRM can't do
    Extended the table with 6 new capabilities:
      - Per-staffer relevance gradient
      - Triage in one shot (late-worker → backfills + draft SMS)
      - Permit → fill plan derivation
      - Public-issuer attribution across contractor graph
      - Cross-lineage AI audit on every PR
      - Pathway memory (system-level hot-swap, ADR-021)

  Ch6 How it gets better over time
    Lede updated from 7 paths → 10 paths
    NEW Path 7 — Pathway memory (ADR-021)
    NEW Path 8 — Per-staffer hot-swap index
    NEW Path 9 — Construction Activity Signal Engine
    Original Path 7 (observer ingest) renumbered to Path 10

  Ch9 Per-staffer context
    Lede now anchors Path 8 from Ch6
    NEW lead section: Per-staffer hot-swap index — Maria/Devon/Aisha,
    same query reshapes per coordinator (167 IL / 89 IN / 16 WI),
    MARIA'S MEMORY pill, /staffers endpoint, metro-agnostic by
    construction. The original Phase 17 profile / Phase 23 competence
    sections retained beneath as the deeper architecture detail.

  Ch10 A day in the life
    Updated 14:00 emergency event to use the late-worker triage
    handler — coordinator types "Dave running late site 4422", gets
    profile + draft SMS + 5 backfills + Copy SMS button in 250ms.
    The old Click No-show button → /log_failure flow remains valid
    (penalty still records); the user-facing surface is the new
    triage card.

  Ch11 Known limits
    REPLACED the Mem0/Letta/Phase-26 era list with current honest
    limits: BAI persistence + backtesting, NYC DOB adapter, 12
    awaiting public-data sources for contractor profile, rate/margin
    awareness, Mem0-style UPDATE/DELETE, Letta hot cache (now 5K
    not 1.9K), confidence calibration, SEC fuzzy precision, tighter
    pathway+scrum integration.

  Footer
    v2 2026-04-21 → v3 2026-04-27
    Phases 19-25 → 19-45
    Lists today's phases: distillation v1.0.0 substrate, gateway as
    OpenAI-compat drop-in, mode runner, validator + iterate, ADR-021
    pathway memory, per-staffer hot-swap, Construction Activity Signal
    Engine.

  Nav
    + Profiler link
    Date pill v1 · 2026-04-20 → v3 · 2026-04-27

Verified end-to-end on devop.live/lakehouse/spec — 11 chapter h2s
render in order, 67KB page (was 50KB-ish), all internal links resolve.
2026-04-28 06:01:04 -05:00

552 lines
66 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en"><head>
<meta charset="utf-8"><meta name="viewport" content="width=device-width,initial-scale=1">
<title>Lakehouse — Technical Specification</title>
<style>
*{margin:0;padding:0;box-sizing:border-box}
body{font-family:'Inter',-apple-system,system-ui,sans-serif;background:#090c10;color:#b0b8c4;font-size:14px;line-height:1.6;-webkit-font-smoothing:antialiased}
a{color:#58a6ff;text-decoration:none}
a:hover{color:#79c0ff}
.bar{background:#0d1117;padding:0 24px;height:56px;border-bottom:1px solid #171d27;display:flex;justify-content:space-between;align-items:center;position:sticky;top:0;z-index:10}
.bar h1{font-size:14px;font-weight:600;color:#e6edf3;letter-spacing:-0.2px}
.bar nav{display:flex;gap:2px}
.bar nav a{font-size:12px;color:#545d68;padding:6px 14px;border-radius:6px;transition:all 0.15s}
.bar nav a:hover{color:#e6edf3;background:#161b22}
.bar nav a.active{color:#e6edf3;background:#1c2333}
.bar .rt{font-size:11px;color:#545d68}
.layout{display:grid;grid-template-columns:220px 1fr;gap:0;max-width:1200px;margin:0 auto}
.toc{position:sticky;top:72px;align-self:start;padding:24px 12px;border-right:1px solid #171d27;height:calc(100vh - 72px);overflow-y:auto}
.toc .hdr{color:#545d68;font-size:10px;text-transform:uppercase;letter-spacing:1.4px;font-weight:600;margin-bottom:10px;padding-left:8px}
.toc a{display:block;color:#8b949e;font-size:12px;padding:6px 10px;border-radius:6px;margin-bottom:2px;line-height:1.4}
.toc a:hover{color:#e6edf3;background:#161b22}
.wrap{padding:28px 24px 60px;min-width:0}
.chapter{margin-bottom:48px;scroll-margin-top:72px}
.chapter .num{color:#545d68;font-size:11px;font-weight:600;letter-spacing:1.6px;text-transform:uppercase;margin-bottom:6px}
.chapter h2{color:#e6edf3;font-size:24px;font-weight:700;letter-spacing:-0.4px;margin-bottom:8px;line-height:1.2}
.chapter h3{color:#e6edf3;font-size:16px;font-weight:600;margin:22px 0 8px}
.chapter h4{color:#c9d1d9;font-size:14px;font-weight:600;margin:16px 0 6px}
.chapter .lede{color:#8b949e;font-size:14px;margin-bottom:18px;max-width:760px;line-height:1.7}
.chapter p{color:#b0b8c4;margin-bottom:12px;max-width:800px}
.chapter ul{color:#b0b8c4;margin:8px 0 16px 24px}
.chapter li{margin-bottom:6px;line-height:1.7}
.chapter strong{color:#e6edf3;font-weight:600}
.card{background:#0d1117;border:1px solid #171d27;border-radius:12px;padding:20px;margin:12px 0}
.accent-l{border-left:3px solid #2ea043}
.accent-b{border-left:3px solid #1f6feb}
.accent-a{border-left:3px solid #bc8cff}
.accent-w{border-left:3px solid #d29922}
.accent-r{border-left:3px solid #f85149}
code{background:#161b22;color:#e6edf3;padding:2px 6px;border-radius:4px;font-family:ui-monospace,Menlo,monospace;font-size:12px}
pre{background:#161b22;border:1px solid #171d27;border-radius:8px;padding:14px 16px;overflow-x:auto;font-family:ui-monospace,Menlo,monospace;font-size:12px;color:#c9d1d9;line-height:1.5;margin:8px 0}
table.plain{width:100%;border-collapse:collapse;font-size:12px;margin:10px 0}
table.plain th{text-align:left;padding:8px 12px;color:#545d68;font-weight:600;text-transform:uppercase;font-size:10px;letter-spacing:0.8px;border-bottom:1px solid #171d27;background:#0d1117}
table.plain td{padding:8px 12px;border-bottom:1px solid #171d27;color:#c9d1d9;vertical-align:top}
table.plain td.mono{font-family:ui-monospace,Menlo,monospace}
table.plain tr:hover td{background:#0d1117}
.narr{color:#8b949e;font-size:13px;line-height:1.7;margin:10px 0;padding:10px 14px;border-left:2px solid #21262d}
.narr strong{color:#c9d1d9}
.ref{color:#545d68;font-size:11px;margin-top:6px;font-family:ui-monospace,Menlo,monospace}
.ref strong{color:#79c0ff;font-weight:600}
.step{display:flex;gap:14px;margin-bottom:14px;padding:12px 16px;background:#0d1117;border:1px solid #171d27;border-radius:8px}
.step .n{color:#58a6ff;font-weight:700;font-size:18px;flex-shrink:0;min-width:30px}
.step .body{color:#b0b8c4;font-size:13px;line-height:1.7}
.step .body strong{color:#e6edf3}
.footer{border-top:1px solid #171d27;padding:20px;text-align:center;color:#3d444d;font-size:11px}
@media(max-width:900px){
.layout{grid-template-columns:1fr}
.toc{display:none}
.wrap{padding:20px 14px 40px}
.bar nav{display:none}
}
</style></head>
<body>
<div class="bar">
<h1>Lakehouse — Technical Specification</h1>
<nav>
<a href=".">Dashboard</a>
<a href="console">Walkthrough</a>
<a href="profiler">Profiler</a>
<a href="proof">Architecture</a>
<a href="spec" class="active">Spec</a>
<a href="onboard">Onboard</a>
<a href="alerts">Alerts</a>
<a href="workspaces">Workspaces</a>
</nav>
<div class="rt">v3 · 2026-04-27</div>
</div>
<div class="layout">
<aside class="toc">
<div class="hdr">Contents</div>
<a href="#ch1">1. Repository layout</a>
<a href="#ch2">2. Data ingest pipeline</a>
<a href="#ch3">3. Measurement &amp; indexing</a>
<a href="#ch4">4. Contract inference</a>
<a href="#ch5">5. What a CRM can't do</a>
<a href="#ch6">6. How it gets better over time</a>
<a href="#ch7">7. Scale story — 20 staffers, 300 contracts, a surge</a>
<a href="#ch8">8. Error surfaces &amp; recovery</a>
<a href="#ch9">9. Per-staffer context</a>
<a href="#ch10">10. A day in the life</a>
<a href="#ch11">11. Known limits &amp; non-goals</a>
</aside>
<div class="wrap">
<!-- ═══ 1. REPO LAYOUT ═══ -->
<div class="chapter" id="ch1">
<div class="num">Chapter 1</div>
<h2>Repository layout</h2>
<div class="lede">What lives where. Every folder below has a single, bounded responsibility. A maintainer reading this should know — in under ten minutes — which crate owns a failing behavior.</div>
<table class="plain">
<thead><tr><th>Path</th><th>Owns</th></tr></thead>
<tbody>
<tr><td class="mono">crates/shared/</td><td>Types, errors, Arrow helpers, schema fingerprints, PII detection, secrets provider. Every other crate depends on this.</td></tr>
<tr><td class="mono">crates/storaged/</td><td>Raw object I/O. <code>BucketRegistry</code> (multi-bucket, rescue-aware), <code>AppendLog</code> (write-once batched append), <code>ErrorJournal</code> (bucket op failures). ADR-017 (federation), ADR-018 (append pattern).</td></tr>
<tr><td class="mono">crates/catalogd/</td><td>Metadata authority. Dataset manifests, schema fingerprints (ADR-020), tombstones (soft delete), AI-safe views, model profiles (Phase 17). In-memory index persisted as Parquet on storage.</td></tr>
<tr><td class="mono">crates/queryd/</td><td>SQL engine. DataFusion over Parquet + MemTable cache + delta merge-on-read + compaction. Registers every bucket as an object_store so SQL can join across them.</td></tr>
<tr><td class="mono">crates/ingestd/</td><td>Data on-ramp. CSV / JSON / PDF (+OCR via Tesseract) / Postgres streaming / MySQL streaming / inbox watcher / cron schedules. Every ingest path auto-tags PII (emails, phones, SSNs, addresses), records lineage, and marks embeddings stale.</td></tr>
<tr><td class="mono">crates/vectord/</td><td>The vector + learning surface. Embeddings stored as Parquet (ADR-008), HNSW index (Phase 15), trial system (autotune), promotion registry (Phase 16), playbook_memory (Phase 19). Core feedback loop lives here.</td></tr>
<tr><td class="mono">crates/vectord-lance/</td><td>Firewall crate. Lance 4.0 + Arrow 57, isolated from the main Arrow-55 workspace. Provides secondary vector backend for large-scale, random-access, and append-heavy workloads (ADR-019).</td></tr>
<tr><td class="mono">crates/journald/</td><td>Append-only mutation event log (ADR-012). Every insert/update/delete writes here — who, when, what, old/new value. Never mutated. Foundation for time-travel + compliance audit.</td></tr>
<tr><td class="mono">crates/truth/</td><td>File-backed rule store. <code>evaluate(task_class, ctx) → Vec&lt;RuleOutcome&gt;</code> (ADR-021 — semantic-correctness matrix layer). Loaded from <code>truth/*.toml</code> at gateway boot.</td></tr>
<tr><td class="mono">crates/aibridge/</td><td>Rust ↔ Python sidecar + provider adapter trait. HTTP client over FastAPI wrapper around Ollama for local; <code>ProviderAdapter</code> dispatch for cloud (ollama_cloud, openrouter, opencode, kimi). VRAM introspection via nvidia-smi. All LLM calls flow through here.</td></tr>
<tr><td class="mono">crates/gateway/</td><td>Axum HTTP (:3100) + gRPC (:3101). OpenAI-compat <code>/v1/*</code> (drop-in middleware), mode runner (<code>/v1/mode/execute</code>), validator (<code>/v1/validate</code>), iterate loop (<code>/v1/iterate</code>), tools registry, cost telemetry, Langfuse + observer fan-out on every chat. Every external request enters here.</td></tr>
<tr><td class="mono">crates/validator/</td><td>Phase 43 production validator. Schema / completeness / consistency / policy gates over LLM outputs. <code>FillValidator</code>, <code>EmailValidator</code>, <code>ParquetWorkerLookup</code> (loads workers_500k.parquet at boot). Fail-closed when roster absent.</td></tr>
<tr><td class="mono">crates/ui/</td><td>Dioxus WASM developer UI. Internal tool. Not exposed externally.</td></tr>
<tr><td class="mono">mcp-server/</td><td>Bun/TypeScript public-facing app + MCP tool surface. Serves <code>devop.live/lakehouse</code>. Pages: dashboard / console / profiler / contractor / proof / spec / onboard / alerts / workspaces. Routes: <code>/search /match /log /log_failure /clients/:c/blacklist /intelligence/* /staffers /memory/query /models/matrix /system/summary</code>. Observer sibling at <code>observer.ts</code> on :3800 for event ingest.</td></tr>
<tr><td class="mono">auditor/</td><td>External claim-vs-diff verifier on PRs. Polls Gitea for open PRs, builds adversarial prompt from PRD invariants + staffing matrix, alternates Kimi K2.6 ↔ Haiku 4.5 by SHA, auto-promotes Claude Opus 4.7 on diffs &gt;100k chars. Per-PR cap=3 with auto-reset on each new head SHA. Verdicts at <code>data/_auditor/kimi_verdicts/</code>.</td></tr>
<tr><td class="mono">tests/multi-agent/</td><td>Multi-agent scenario harness + memory stack. <code>agent.ts</code>, <code>scenario.ts</code> (contracts + staffer + tool_level), <code>kb.ts</code> (KB indexing, competence scoring), <code>normalize.ts</code>, <code>memory_query.ts</code>, <code>run_e2e_rated.ts</code>. Unit tests colocated.</td></tr>
<tr><td class="mono">scripts/distillation/</td><td>Distillation substrate v1.0.0 (frozen at tag <code>distillation-v1.0.0</code> / commit <code>e7636f2</code>). 145 unit tests, 22/22 acceptance, 16/16 audit-full, bit-identical reproducibility. Multi-layer contamination firewall on SFT exports.</td></tr>
<tr><td class="mono">config/</td><td><code>modes.toml</code> — task_class → mode/model router (<code>scrum_review</code>, <code>contract_analysis</code>, <code>staffing_inference</code>, <code>pr_audit</code>, <code>doc_drift_check</code>, <code>fact_extract</code>). <code>providers.toml</code> — 5 active providers (ollama, ollama_cloud, openrouter, opencode 40-model, kimi direct). <code>routing.toml</code> — cost gates per task class.</td></tr>
<tr><td class="mono">docs/</td><td><code>PRD.md</code>, <code>PHASES.md</code>, <code>DECISIONS.md</code> (21 ADRs). Every significant architectural choice has an ADR with the alternatives that were rejected and why.</td></tr>
<tr><td class="mono">data/</td><td>Default local object store. Parquet datasets, append-log batches, HNSW trial journals, promotion registries, <code>_playbook_memory/state.json</code>, <code>_pathway_memory/state.json</code> (88 traces, 11/11 successful replays, ADR-021), catalog manifests. Plus learning-loop directories: <code>_kb/</code>, <code>_playbook_lessons/</code>, <code>_observer/ops.jsonl</code>, <code>_auditor/kimi_verdicts/</code>. Rebuildable from repo + this dir alone.</td></tr>
</tbody>
</table>
</div>
<!-- ═══ 2. INGEST PIPELINE ═══ -->
<div class="chapter" id="ch2">
<div class="num">Chapter 2</div>
<h2>Data ingest pipeline</h2>
<div class="lede">How staffing data gets into the system — whether from a CSV drop, an ATS export, a Postgres replica, or a PDF resume. Every path ends at the same place: a registered dataset with known schema, known lineage, known sensitivity.</div>
<div class="step"><div class="n">1</div><div class="body"><strong>Source arrives.</strong> Four shapes: (a) file upload via <code>POST /ingest/file</code>, (b) inbox watcher (drops in <code>./inbox/</code> → auto-ingested in under 15s), (c) Postgres or MySQL streaming connector (<code>POST /ingest/db</code> with DSN), (d) scheduled ingest via <code>ingestd::schedule</code> with cron.</div></div>
<div class="step"><div class="n">2</div><div class="body"><strong>Parse + normalize.</strong> CSV parser infers types per column; defaults to <code>String</code> on ambiguity (ADR-010 — better to ingest everything than reject on type mismatch). JSON parser flattens nested objects. PDF extractor uses <code>lopdf</code> first; falls back to Tesseract OCR for scanned/image PDFs. Output is always an Arrow <code>RecordBatch</code>.</div></div>
<div class="step"><div class="n">3</div><div class="body"><strong>Auto-detect PII.</strong> <code>shared::pii</code> scans column values and names. Identifies emails, phone numbers, SSNs, salaries, street addresses, medical terms. Tags columns with <code>sensitivity: PII | PHI | Financial | Internal | Public</code> (Phase 10 catalog v2).</div></div>
<div class="step"><div class="n">4</div><div class="body"><strong>Deduplicate by content hash.</strong> Every uploaded file's SHA-256 is checked against the catalog's seen-hash log. Re-ingesting the same file is a no-op (ADR invariant #5).</div></div>
<div class="step"><div class="n">5</div><div class="body"><strong>Write Parquet to object storage.</strong> <code>arrow_helpers::record_batch_to_parquet</code><code>storaged::ops::put</code> → file lands under <code>data/datasets/&lt;name&gt;.parquet</code> (or bucket-scoped via <code>BucketRegistry</code>). Schema fingerprint computed.</div></div>
<div class="step"><div class="n">6</div><div class="body"><strong>Register in catalog.</strong> <code>catalogd::Registry::register(name, fingerprint, objects)</code> — idempotent on (name, fingerprint). Same name + same fingerprint = reuse manifest, bump updated_at. Same name + different fingerprint = <code>409 Conflict</code> (ADR-020 — prevents silent schema drift). New name = create new manifest with owner, lineage, freshness SLA, column metadata, PII tags.</div></div>
<div class="step"><div class="n">7</div><div class="body"><strong>Mark embeddings stale.</strong> If the dataset already has a vector index, the new rows mean that index is now behind. <code>Registry::mark_embeddings_stale</code> flips a flag; <code>POST /vectors/refresh/&lt;dataset&gt;</code> runs an incremental re-embed (only new rows, not the whole corpus).</div></div>
<div class="step"><div class="n">8</div><div class="body"><strong>Queryable immediately.</strong> <code>queryd::context</code> picks up the new manifest on next query. Hot-cache warms on first hit. Delta merge-on-read means updates land without rewriting the base Parquet.</div></div>
<div class="ref"><strong>Code:</strong> crates/ingestd/src/{service.rs, csv.rs, json.rs, pdf.rs, pg_stream.rs, my_stream.rs, schedule.rs}</div>
</div>
<!-- ═══ 3. MEASUREMENT & INDEXING ═══ -->
<div class="chapter" id="ch3">
<div class="num">Chapter 3</div>
<h2>Measurement &amp; indexing</h2>
<div class="lede">Once data is in, the system describes it rigorously and builds fast-access indexes over the parts that will be queried. Every measurement is deterministic, versioned, and visible via HTTP.</div>
<h3>What gets measured per dataset</h3>
<ul>
<li><strong>Row count</strong> (from parquet footer, not a SELECT COUNT). O(1).</li>
<li><strong>Schema fingerprint</strong> — SHA-256 over (column_name, type, nullability, sort) tuples. Drives ADR-020 idempotent register.</li>
<li><strong>Owner / sensitivity / freshness SLA</strong> — catalog v2 metadata. PII auto-detected; owner assigned on ingest.</li>
<li><strong>Lineage</strong> — source_system → ingest_job → dataset. Who put this here, when, from what.</li>
<li><strong>Last embedded at</strong> — when the vector index covering this dataset was last refreshed. Drives stale-detection.</li>
</ul>
<h3>How vector indexes are built</h3>
<p>Two backends, chosen per profile (ADR-019):</p>
<table class="plain">
<thead><tr><th></th><th>HNSW over Parquet (primary)</th><th>Lance (secondary)</th></tr></thead>
<tbody>
<tr><td>Storage</td><td>Embeddings as Parquet columns (<code>doc_id, chunk_text, vector</code>)</td><td>Lance native dataset</td></tr>
<tr><td>Index</td><td>HNSW in RAM, serialized sidecar</td><td>IVF_PQ on disk</td></tr>
<tr><td>Build time (100K × 768d)</td><td>~230s</td><td>~16s (14× faster)</td></tr>
<tr><td>Search p50 (100K)</td><td>~873μs</td><td>~7.4ms at recall 1.0</td></tr>
<tr><td>Append</td><td>Rewrite required</td><td>Structural (0.08s for 100 rows)</td></tr>
<tr><td>Random fetch by doc_id</td><td>Full scan</td><td>~311μs (112× faster)</td></tr>
<tr><td>RAM ceiling</td><td>~5M vectors</td><td>Scales past RAM — disk-resident</td></tr>
</tbody>
</table>
<h3>Autotune</h3>
<p>The <code>vectord::agent</code> background task runs continuously. Per index, it proposes HNSW configurations (<code>ef_construction × ef_search</code>), executes a trial against a stored eval set, journals the result as JSONL, and — if recall beats the min_recall gate (0.9) and latency wins the Pareto test — promotes the new config atomically via <code>promotion_registry</code>. No downtime. Rollback in milliseconds.</p>
<h3>Per-profile / per-staffer indexing</h3>
<p>Model profiles (Phase 17) are not routing strings — they are named scopes. Each profile has <code>bound_datasets[]</code>, <code>hnsw_config</code>, <code>vector_backend</code>, and <code>bucket</code>. When a staffer activates a profile:</p>
<ul>
<li>EmbeddingCache warms for bound indexes only</li>
<li>HNSW is rebuilt with the profile's config (if different from current)</li>
<li>Search via <code>POST /vectors/profile/&lt;id&gt;/search</code> rejects out-of-scope queries with 403 + list of allowed bindings</li>
<li>Ollama swaps to the profile's model via <code>keep_alive=0</code>; only one model in VRAM at a time</li>
</ul>
<h3>Provider fleet — 5 active, 40+ frontier models reachable</h3>
<p>Declared in <code>config/providers.toml</code> + <code>config/modes.toml</code>. Gateway is an OpenAI-compatible drop-in middleware: any consumer that speaks <code>POST /v1/chat/completions</code> gets routing, audit, cost telemetry, and the full memory substrate behind every call.</p>
<table class="plain">
<thead><tr><th>Provider</th><th>Reach</th><th>Use case</th></tr></thead>
<tbody>
<tr><td><code>ollama</code></td><td>localhost:3200 — local sidecar over Ollama</td><td>Hot-path JSON emitters, embeddings, last-resort rescue</td></tr>
<tr><td><code>ollama_cloud</code></td><td>ollama.com bearer key — gpt-oss:120b, qwen3-coder:480b, deepseek-v3.1:671b, kimi-k2:1t, mistral-large-3:675b, qwen3.5:397b</td><td>Strong-model reviewer rungs, T3+ overview, scrum master pipeline</td></tr>
<tr><td><code>openrouter</code></td><td>openrouter.ai/api/v1 — 343 models incl. Anthropic/Google/OpenAI/MiniMax/Qwen, paid + free tiers</td><td>Paid ladder for observer escalations, free-tier rescue</td></tr>
<tr><td><code>opencode</code></td><td>opencode.ai/zen/v1 — <strong>40 frontier models reachable through ONE sk-* key</strong>: Claude Opus 4.7 / Sonnet / Haiku, GPT-5.5-pro / 5.4 / codex variants, Gemini 3.1-pro, Kimi K2.6, GLM 5.1, DeepSeek, Qwen 3.6+, MiniMax, plus 4 free-tier</td><td>Cross-architecture tie-breakers, auditor cross-lineage (Haiku 4.5 + Opus 4.7), high-context reasoning (Opus on diffs &gt;100k chars)</td></tr>
<tr><td><code>kimi</code></td><td>api.kimi.com/coding/v1 — direct Kimi For Coding</td><td>kimi_architect when ollama_cloud rate-limits; TOS-clean primary path</td></tr>
</tbody>
</table>
<h3>The 9-rung cloud-first ladder</h3>
<p>Defined in <code>tests/real-world/scrum_master_pipeline.ts</code> as <code>const LADDER</code>. Each attempt is evaluated by <code>isAcceptable()</code> = chars ≥ 3800 ∧ not malformed JSON-only. On reject, the next rung sees a learning preamble carrying the prior rejection reason.</p>
<pre>1 ollama_cloud / kimi-k2:1t 1T params · flagship
2 ollama_cloud / qwen3-coder:480b coding specialist
3 ollama_cloud / deepseek-v3.1:671b reasoning
4 ollama_cloud / mistral-large-3:675b deep analysis
5 ollama_cloud / gpt-oss:120b reliable workhorse
6 ollama_cloud / qwen3.5:397b dense final thinker
7 openrouter / openai/gpt-oss-120b:free rescue tier
8 openrouter / google/gemma-3-27b-it:free fastest rescue
9 ollama / qwen3.5:latest last-resort local</pre>
<h3>N=3 consensus + cross-architecture tie-breaker</h3>
<p>Every audit and every consensus-required call fires the primary reviewer N=3 times in parallel (Promise.all — wall-clock = single call). Aggregate votes per claim_idx, majority wins. On a 1-1-1 split, a tie-breaker model with <em>different architecture</em> (qwen3-coder:480b vs primary gpt-oss/kimi) is invoked. Every disagreement, even when majority resolves, writes to <code>data/_kb/audit_discrepancies.jsonl</code>. Closes the cloud-non-determinism gap: <code>temp=0</code> isn't actually deterministic in practice across hours; consensus + cross-architecture tie-break stabilizes verdicts.</p>
<h3>Auditor cross-lineage (Kimi ↔ Haiku ↔ Opus)</h3>
<p>Every push to PR #11 triggers <code>auditor/audit.ts</code> within ~90s. To prevent a single model lineage's blind spots from becoming the system's blind spots, audits alternate between Kimi K2.6 (Moonshot lineage) and Haiku 4.5 (Anthropic lineage) by head SHA. Diffs over 100k chars auto-promote to Claude Opus 4.7 (Anthropic frontier). Per-PR cap of 3 audits with auto-reset on each new head SHA prevents infinite-loop spend. <strong>Latest verdict on c3c9c21:</strong> Haiku 4.5, 24.6s, 100% grounding-verified across 10 findings.</p>
<h3>Distillation v1.0.0 — the frozen substrate</h3>
<p>The substrate the auditor and mode runner sit on is tagged at <code>distillation-v1.0.0</code> / commit <code>e7636f2</code>. <strong>145 unit tests pass · 22/22 acceptance invariants · 16/16 audit-full checks · bit-identical reproducibility verified.</strong> The distillation phase exports clean SFT / RAG / preference samples with a multi-layer contamination firewall (<code>SFT_NEVER</code> constant + scorer category mapping + acceptance fixtures); the auditor consumes the substrate. The frozen tag means: any future "the system regressed" question has a baseline to bisect against, byte-for-byte.</p>
<h3>Continuation primitive (Phase 21)</h3>
<p><code>generateContinuable()</code> handles output-overflow without <code>max_tokens</code> tourniquets — empty response → geometric backoff retry; truncated-JSON → continue with partial as scratchpad. <code>generateTreeSplit()</code> handles input-overflow via map-reduce with running scratchpad. Both respect <code>assertContextBudget()</code> so silent truncation can't happen. Now Rust-native in <code>crates/aibridge/src/continuation.rs</code> (Phase 44).</p>
<h3>Per-staffer tool_level (Phase 23)</h3>
<p>Scenarios can be scoped to a specific coordinator (<code>staffer: {id, name, tenure_months, role, tool_level}</code>). <code>tool_level</code> controls which tiers are available:</p>
<ul>
<li><code>full</code> — T1/T2 local, T3 cloud, cloud rescue on failure</li>
<li><code>local</code> — T1/T2/T3 all local (gpt-oss:20b as overseer)</li>
<li><code>basic</code> — kimi-k2.5 cloud executor + local reviewer + local T3, no rescue</li>
<li><code>minimal</code> — kimi-k2.5 cloud executor, no T3, no rescue. Playbook inheritance is the only signal.</li>
</ul>
<p>Measured 2026-04-21 on a 36-run demo (4 staffers × 3 contracts × 3 rounds): James Park (mid, local tools) ranked first at 92.9% fill and 36.8 cites/run, Maria Chen (senior, full tools) second at 81.0%. Cloud T3 adds latency without measurable benefit on this workload. Alex Rivera (trainee, minimal) still hit 59.5% fill purely from playbook inheritance — proof the memory carries knowledge when the model is capable.</p>
<div class="ref"><strong>Code:</strong> crates/vectord/src/{hnsw.rs, autotune.rs, agent.rs, promotion.rs} · tests/multi-agent/{agent.ts, scenario.ts} · config/models.json · ADR-019</div>
</div>
<!-- ═══ 4. CONTRACT INFERENCE ═══ -->
<div class="chapter" id="ch4">
<div class="num">Chapter 4</div>
<h2>Contract inference from external signal</h2>
<div class="lede">Most CRMs wait for a contract to land. This system watches upstream demand and pre-builds the ranking before the contract lands.</div>
<p>The concrete example running on <code>devop.live/lakehouse</code> is Chicago Department of Buildings permit data (public Socrata API). Every permit is a signal that construction — and therefore staffing — is coming.</p>
<h3>Flow</h3>
<div class="step"><div class="n">1</div><div class="body"><strong>Fetch.</strong> <code>/intelligence/market</code> and <code>/intelligence/permit_contracts</code> hit <code>data.cityofchicago.org/resource/ydr8-5enu.json</code> live. No caching of permit data — every page load is fresh.</div></div>
<div class="step"><div class="n">2</div><div class="body"><strong>Map work_type → role.</strong> Industry dictionary: "Electrical Work" → "Electrician", "Masonry Work" → "Production Worker", "Mechanical Work" → "Maintenance Tech", etc.</div></div>
<div class="step"><div class="n">3</div><div class="body"><strong>Derive worker count.</strong> Heuristic: ~1 worker per $150K of permit cost, capped 2-8 per contract for staffing realism. Operator-configurable when real client history is available.</div></div>
<div class="step"><div class="n">4</div><div class="body"><strong>Derive timeline.</strong> Permit issued → construction starts ~45 days later → staffing window opens ~14 days before construction. Classifies each permit as <code>overdue</code>, <code>urgent</code>, <code>soon</code>, <code>scheduled</code>.</div></div>
<div class="step"><div class="n">5</div><div class="body"><strong>Run hybrid search against the bench.</strong> For each derived contract, <code>POST /vectors/hybrid</code> with <code>sql_filter</code> on role+state+city+availability, <code>use_playbook_memory: true</code>, <code>playbook_memory_k: 200</code>. Returns top-5 candidates with boost + citations.</div></div>
<div class="step"><div class="n">6</div><div class="body"><strong>Query the meta-index.</strong> <code>POST /vectors/playbook_memory/patterns</code> aggregates traits across similar past playbooks — recurring certs, skills, archetype, reliability distribution. Surfaces signal the operator didn't query for.</div></div>
<div class="step"><div class="n">7</div><div class="body"><strong>Render on the dashboard.</strong> Each card shows permit + derived contract + top 3 candidates with memory chips + discovered pattern + urgency. All of this pre-computed before any staffer opens the UI.</div></div>
<h3>Coverage forecast</h3>
<p><code>/intelligence/staffing_forecast</code> aggregates the last 30 days of permits into predicted role-level demand, joins against the IL bench supply, computes coverage %, and classifies each role as <code>critical</code> / <code>tight</code> / <code>watch</code> / <code>ok</code>. The dashboard's top panel renders this — staffers see supply gaps before they query.</p>
</div>
<!-- ═══ 5. WHAT A CRM CAN'T DO ═══ -->
<div class="chapter" id="ch5">
<div class="num">Chapter 5</div>
<h2>What a CRM can't do (and why)</h2>
<div class="lede">A CRM stores. This system infers, predicts, re-ranks, and compounds. The six capabilities below are load-bearing — missing any of them is the gap between "software that logs calls" and "software that makes the next call better."</div>
<div class="card accent-b">
<table class="plain">
<thead><tr><th>Capability</th><th>CRM</th><th>This system</th></tr></thead>
<tbody>
<tr><td>Store candidate records</td><td>Yes</td><td>Yes (workers_500k, candidates)</td></tr>
<tr><td>Search by structured field</td><td>Yes</td><td>Yes (DataFusion SQL, sub-100ms on 3M rows)</td></tr>
<tr><td>Search by semantic meaning</td><td>No</td><td>Yes (HNSW + nomic-embed-text)</td></tr>
<tr><td>Combine SQL filter + semantic rank</td><td>No</td><td>Yes (<code>/vectors/hybrid</code>)</td></tr>
<tr><td>Boost workers based on past success</td><td>No</td><td>Yes (Phase 19 playbook_memory)</td></tr>
<tr><td>Penalize workers based on past failure</td><td>No</td><td>Yes (<code>/log_failure</code> + <code>0.5<sup>n</sup></code> penalty)</td></tr>
<tr><td>Surface traits across past fills</td><td>No</td><td>Yes (<code>/vectors/playbook_memory/patterns</code>)</td></tr>
<tr><td>Per-staffer relevance gradient</td><td>No</td><td>Yes — same query reshapes per coordinator (<code>staffer_id</code> on <code>/intelligence/chat</code>); MARIA'S MEMORY pill labels the playbook context with the active coordinator</td></tr>
<tr><td>Triage in one shot — late-worker → backfills + draft SMS</td><td>No</td><td>Yes (<code>/intelligence/chat</code> Route 6 — pulls profile + 5 same-role same-geo backfills sorted by responsiveness + drafts client SMS in ~250ms)</td></tr>
<tr><td>Permit → fill plan derivation (forward demand)</td><td>No</td><td>Yes (<code>/intelligence/permit_contracts</code> — Chicago Socrata permit → role / headcount / deadline / fill probability / gross revenue per card)</td></tr>
<tr><td>Public-issuer attribution across contractor graph</td><td>No</td><td>Yes (<code>/intelligence/profiler_index</code> — direct + parent + co-permit associated tickers; live Stooq prices)</td></tr>
<tr><td>Cross-lineage AI audit on every PR</td><td>No</td><td>Yes (auditor crate — Kimi K2.6 ↔ Haiku 4.5 alternation + Opus 4.7 auto-promote on big diffs)</td></tr>
<tr><td>Pathway memory — system-level hot-swap by task fingerprint</td><td>No</td><td>Yes (88 traces, 11/11 successful replays, 100% reuse rate, ADR-021)</td></tr>
<tr><td>Predict staffing demand from external data</td><td>No</td><td>Yes (Chicago permit feed + 30-day rolling forecast)</td></tr>
<tr><td>Count down to staffing deadline per contract</td><td>No</td><td>Yes (permit issue_date + heuristic timeline)</td></tr>
<tr><td>Explain why each candidate ranked</td><td>No</td><td>Yes (boost chip + narrative citations + memory pattern)</td></tr>
<tr><td>Improve ranking from operator actions</td><td>No</td><td>Yes (every Call/SMS/No-show click → re-rank signal)</td></tr>
</tbody>
</table>
</div>
</div>
<!-- ═══ 6. HOW IT GETS BETTER ═══ -->
<div class="chapter" id="ch6">
<div class="num">Chapter 6</div>
<h2>How it gets better over time</h2>
<div class="lede">Compounding learning across ten paths. The first three are automatic background loops. Paths 4-7 (Phase 22-24) added the reinforcement layer: outcomes → KB → recommendations → cloud rescue → competence-weighted retrieval → observer analysis. Paths 7-9 (Phase 25-43, 2026-04-26→27) added the system-level memory layers: pathway memory by task fingerprint (ADR-021), per-staffer hot-swap, and the Construction Activity Signal Engine. All ten happen without operator intervention.</div>
<h3>Path 1 — Playbook boost with geo + role prefilter (Phase 19 + refinement)</h3>
<p>Every sealed fill is seeded to <code>playbook_memory</code>. The boost fires inside <code>/vectors/hybrid</code> when <code>use_playbook_memory: true</code>. Math, tightened 2026-04-21 after a diagnostic pass found globally-ranked playbooks were missing the SQL-filtered candidate pool entirely:</p>
<pre>per_worker = cosine(query_emb, playbook_emb) × 0.5 × e^(-age/30) × 0.5^failures / n_workers
boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
<p><strong>Multi-strategy retrieval (new):</strong> before cosine, <code>compute_boost_for_filtered_with_role(target_geo, target_role)</code> prefilters to same-city playbooks, then gives exact (role, city, state) matches similarity=1.0 and fills up to half the top-k. Cosine fills the rest. Mirrors 2026 Mem0/Zep guidance on parallel-strategy rerank.</p>
<p><strong>Measured lift:</strong> before geo-filter, Nashville Welder query returned <code>boosts=170 matched=0</code> (zero intersection with candidate pool). After: <code>boosts=36 matched=11</code>. On the Riverfront Steel scenario, total playbook citations went from 2 → 28 per run — a 14× delta on identical inputs. The diagnostic log <code>playbook_boost: boosts=N sources=N parsed=N matched=N target_geo=? target_role=?</code> runs on every hybrid call so the class of silent-miss bug stays visible.</p>
<h3>Path 2 — Pattern discovery (meta-index)</h3>
<p><code>/vectors/playbook_memory/patterns</code> goes beyond "who was endorsed" to answer "what did past similar fills have in common?" Aggregates recurring certifications, skills, archetype, reliability distribution across the top-K semantically similar playbooks. Surfaces signal the operator didn't explicitly query for.</p>
<h3>Path 3 — Autotune agent</h3>
<p>The <code>vectord::agent</code> background task runs continuously. Watches the HNSW trial journal, proposes configs, executes trials, promotes Pareto winners — without human intervention. Operator sees "the index got faster overnight" and doesn't know why. The journal knows why.</p>
<h3>Path 4 — Knowledge Base + pathway recommender (Phase 22)</h3>
<p>Meta-layer over playbook_memory. Files under <code>data/_kb/</code>:</p>
<ul>
<li><code>signatures.jsonl</code> — sig_hash + embedding of every run's event shape</li>
<li><code>outcomes.jsonl</code> — per-run summary (models, fill rate, turns, citations, rescue stats, staffer, elapsed)</li>
<li><code>pathway_recommendations.jsonl</code> — AI-synthesized advice for the next run of a similar sig</li>
<li><code>error_corrections.jsonl</code> — fail→succeed deltas on the same sig</li>
<li><code>config_snapshots.jsonl</code> — hash of active models + tool_level at each run</li>
</ul>
<p>Cycle: scenario ends → <code>kb.indexRun()</code> appends outcome → <code>kb.recommendFor(nextSpec)</code> finds k-NN signatures, feeds outcome history to an overview model, writes structured JSON advice → next scenario reads it via <code>kb.loadRecommendation(spec)</code> and injects <code>pathway_notes</code> into the executor's context alongside prior T3 lessons.</p>
<h3>Path 5 — Cloud rescue on failure (Phase 22 item B)</h3>
<p>When an event fails (drift abort, JSON parse, pool exhaustion) and cloud T3 is enabled, <code>requestCloudRemediation()</code> feeds the full failure trace — SQL filters attempted, row counts, reviewer drift notes, gap signals, contract terms — to <code>gpt-oss:120b</code> on Ollama Cloud. Cloud returns structured <code>{retry, new_city, new_state, new_role, new_count, rationale}</code>. Event retries once with the pivot. Verified on stress_01: Gary IN (zero workers indexed) misplacement → cloud proposed South Bend IN → retry filled 1/1.</p>
<h3>Path 6 — Staffer competence-weighted retrieval (Phase 23)</h3>
<p>Answers "who handled this" as a first-class matrix-index dimension. Each scenario carries <code>staffer: {id, name, tenure_months, role, tool_level}</code>. After every run, <code>recomputeStafferStats(staffer_id)</code> aggregates their fill_rate, turn efficiency, citation density, rescue rate into a single <code>competence_score</code> (0.45·fill + 0.20·turn_eff + 0.20·cites + 0.15·rescue).</p>
<p><code>findNeighbors</code> returns <code>weighted_score = cosine × max_staffer_competence</code> — top-performer playbooks rank above juniors' on similar scenarios. Auto-discovery emerges: running 4 staffers × 3 contracts × 3 rounds surfaced Rachel D. Lewis (Welder Nashville) with 18 endorsements across all 4 staffers, Angela U. Ward (Machine Op Indianapolis) with 19 — reliable-performer labels the system built without human tagging.</p>
<h3>Path 7 — Pathway memory (ADR-021 — semantic-correctness matrix layer)</h3>
<p>Memory at the system layer, not the worker layer. Every accepted scrum review writes a <code>PathwayTrace</code> with the full backtrack: file fingerprint, model used, signal class, KB chunks consulted, observer events, semantic flags (UnitMismatch, TypeConfusion, OffByOne, StaleReference, DeadCode, BoundaryViolation, …), bug fingerprints. A new query that fingerprints to the same trace can hot-swap to the prior result without re-running the 9-rung escalation. Five-factor hot-swap gate: narrow fingerprint match AND audit consensus pass AND replay_count ≥ 3 (probation) AND success_rate ≥ 0.80 AND NOT retired AND vector cosine ≥ 0.90.</p>
<p><strong>Live state (verified on this load):</strong> 88 traces · 11 / 11 successful replays · 100% reuse rate · probation gate crossed. Endpoints: <code>/vectors/pathway/insert</code> · <code>/query</code> · <code>/record_replay</code> · <code>/stats</code> · <code>/bug_fingerprints</code>. Spec: <code>docs/DECISIONS.md</code> ADR-021.</p>
<h3>Path 8 — Per-staffer hot-swap index</h3>
<p>Memory scoped to whoever's acting. <code>/intelligence/chat</code> accepts <code>staffer_id</code>; on match, defaults state filter to staffer territory, scopes playbook-pattern geo to staffer's primary city/state, and surfaces <code>response.staffer.name</code> so the UI relabels MEMORY → MARIA'S MEMORY. Same query "forklift operators" returns 167 IL workers as Maria, 89 IN as Devon, 16 WI as Aisha. The corpus stays intact; the relevance gradient is per coordinator; each accumulates fills independently.</p>
<p><strong>Roster:</strong> <code>/staffers</code> endpoint reads from <code>STAFFERS</code> in <code>mcp-server/index.ts</code>. Three personas today (Maria/Devon/Aisha); architecture generalizes — every new metro adds territories, not code paths.</p>
<h3>Path 9 — Construction Activity Signal Engine</h3>
<p>Memory at the network layer. Every contractor in the corpus is also a forward indicator on the public equities they touch via three attribution flavors: <code>direct</code> (contractor IS the public issuer — SEC tickers index match), <code>parent</code> (subsidiary of a public parent — curated KNOWN_PARENT_MAP, e.g. Turner → HOC.DE via Hochtief AG), <code>associated</code> (co-permit network — Bob's Electric appears with TARGET CORPORATION 3+ times → inherits TGT). The associated path is the moat: a staffing-permit dataset that maps contractor-to-public-issuer is not commercially available; we synthesize it from the Socrata co-occurrence graph.</p>
<p><strong>BAI (Building Activity Index)</strong> = attribution-weighted average day-change across surfaced issuers. <strong>Indexed build value</strong> = total $ of permits attributable to ANY public issuer in scope. <strong>Network depth</strong> = issuers / total attribution edges. Cross-metro replication explicit in the architecture — Chicago is Phase 1; NYC DOB / LA County / Houston BCD / Boston ISD / DC DCRA are all Socrata-shaped, ship as config-only adapters.</p>
<h3>Path 10 — Observer outcome ingest (Phase 24)</h3>
<p>Observer runs as <code>lakehouse-observer.service</code>, now with an HTTP listener on <code>:3800</code>. Scenarios POST per-event outcomes to <code>/event</code> with full provenance (staffer_id, sig_hash, event_kind, role, city, state, rescue flags). Observer's ERROR_ANALYZER and PLAYBOOK_BUILDER loops consume them alongside MCP-wrapped ops. Persistence switched from the old <code>/ingest/file</code> REPLACE path to an append-only <code>data/_observer/ops.jsonl</code> journal so the trace survives across restarts.</p>
<h3>Input normalizer + unified memory query</h3>
<p>Two surfaces added 2026-04-21 to make the memory stack respond coherently to any input shape:</p>
<ul>
<li><code>normalizeInput(raw)</code> — accepts structured JSON, natural language, or mixed. Three-tier: structured fast-path → regex (handles "need 3 welders in Nashville, TN" in 0ms without an LLM call) → qwen3 LLM fallback for low-signal inputs.</li>
<li><code>POST /memory/query</code> — one endpoint, returns every memory surface in a single bundle: <code>playbook_workers</code> (with boost + citations), <code>pathway_recommendation</code> (KB), <code>neighbor_signatures</code> (competence-weighted), <code>prior_lessons</code> (T3 overseer history), <code>top_staffers</code> (leaderboard), <code>discovered_patterns</code> (auto-surfaced reliable workers for this role+city), <code>latency_ms</code> (per-source). Natural-language query end-to-end: 319ms.</li>
</ul>
<h3>Honest gaps — what we can still implement</h3>
<p>Three of the five 2026-era memory findings remain unwired. Flagged for near-term implementation, not hidden:</p>
<ul>
<li><strong>Zep-style validity windows</strong> — playbook entries have timestamps but no <code>valid_until</code> or <code>schema_fingerprint</code>. Load-bearing: when a schema migration changes a column, stale playbooks silently keep boosting. Biggest-value remaining fix.</li>
<li><strong>Mem0-style UPDATE / DELETE / NOOP ops</strong><code>/seed</code> only ADDs. Same (operation, date) pair appends a duplicate instead of refining an existing entry. Playbook file grows faster than necessary.</li>
<li><strong>Letta working-memory hot cache</strong> — every query scans all 1560 playbook entries from disk. Cheap today at 1.5K, not at 100K. Solution: LRU in-process cache for the last N playbooks or the current sig's neighborhood.</li>
</ul>
<p>Validity windows is next — preserves the trust signal (boost only fires on playbooks that are still true given the current schema) rather than the latency signal (which the current scale doesn't need yet).</p>
<div class="ref"><strong>Code:</strong> crates/vectord/src/{playbook_memory.rs, service.rs} · tests/multi-agent/{kb.ts, memory_query.ts, normalize.ts, scenario.ts} · mcp-server/{observer.ts, index.ts} · data/_kb/ · data/_observer/</div>
</div>
<!-- ═══ 7. SCALE STORY ═══ -->
<div class="chapter" id="ch7">
<div class="num">Chapter 7</div>
<h2>Scale story — 20 staffers, 300 contracts, a surge</h2>
<div class="lede">What happens when the demo-level load becomes the production-level load, and midday a client pushes 20 more contracts plus a 1M-row ATS delta. Honest: some of this is architectural headroom, not measured scale. The designed behaviors are below.</div>
<h3>20 concurrent staffers</h3>
<p><strong>Axum is async.</strong> The gateway handles concurrent requests on Tokio with work-stealing. No per-request thread. Tested at 10 parallel queries in 82ms total on this hardware.</p>
<p><strong>Per-staffer profile isolation.</strong> Each staffer activates their own profile (Phase 17) or workspace (Phase 8.5). Profile scopes their search to bound datasets. Workspace carries their in-progress contracts across sessions.</p>
<p><strong>Per-client blacklists.</strong> Auto-applied when the caller passes <code>client: "X"</code> on <code>/search</code>. Staffer A filling for Acme never sees Acme's flagged workers. Staffer B filling for MidState sees them normally.</p>
<h3>300 active contracts</h3>
<p><strong>SQL on <code>job_orders</code> is cheap.</strong> 300 rows is nothing — a scan is microseconds.</p>
<p><strong>Workspace per contract.</strong> Each contract gets its own workspace with saved searches, shortlists, activity log. Zero-copy handoff between staffers (pointer swap, not data copy).</p>
<p><strong>Forecast remains coherent.</strong> <code>/intelligence/staffing_forecast</code> aggregates 30-day permit data regardless of contract count. The bench supply query (<code>GROUP BY role</code> over workers_500k) is a single sub-second SQL.</p>
<h3>Midday surge: +20 contracts, +1M profiles</h3>
<p>The delta arrives at 12:30. Here's what happens in the following minutes:</p>
<div class="step"><div class="n">1</div><div class="body"><strong>+20 contracts via /ingest/db or /ingest/file.</strong> Parsed, schema-checked, Parquet-written, catalog-registered. No queries blocked — register holds a write lock across the manifest write only.</div></div>
<div class="step"><div class="n">2</div><div class="body"><strong>+1M worker profiles arrives as delta to workers_500k.</strong> Append-log pattern (ADR-018) means the new rows write to a fresh batch file — base Parquet is NOT rewritten. Queries against workers_500k immediately merge-on-read the new batches.</div></div>
<div class="step"><div class="n">3</div><div class="body"><strong>Embeddings marked stale.</strong> The vector index for workers_500k_v1 now has 1M rows it hasn't seen. <code>mark_embeddings_stale</code> flips the flag.</div></div>
<div class="step"><div class="n">4</div><div class="body"><strong>Incremental refresh fires.</strong> <code>POST /vectors/refresh/workers_500k</code> reads only the new rows (diff against existing embeddings), embeds them in batches of 64 via Ollama, writes delta embedding Parquet. Measured on threat_intel: 34 new rows in 970ms (6× faster than full re-embed).</div></div>
<div class="step"><div class="n">5</div><div class="body"><strong>Search degrades gracefully.</strong> During the refresh, searches against workers_500k_v1 still work — they serve from the old embeddings. Brute-force cosine over new-rows-without-embeddings is allowed but costs more. HNSW rebuild happens after all embeds complete.</div></div>
<div class="step"><div class="n">6</div><div class="body"><strong>Hot-swap promotion.</strong> When the new index is ready, <code>promotion_registry</code> atomically flips the active pointer. Next search hits the new config. Rollback stays available.</div></div>
<div class="step"><div class="n">7</div><div class="body"><strong>Autotune re-enters the loop.</strong> The agent queue picks up a <code>DatasetAppended</code> trigger and schedules a fresh HNSW trial cycle against the expanded index.</div></div>
<h3>Known pain points at this scale</h3>
<ul>
<li><strong>Ollama inference is serial.</strong> Embedding 1M rows at ~50 chunks/sec through nomic-embed-text = ~6 hours. Acceptable for overnight refresh, not for "immediate." Mitigated by incremental refresh (only deltas).</li>
<li><strong>RAM ceiling on HNSW.</strong> Around 5M vectors × 768d, HNSW stops fitting in 128GB comfortably. Mitigation: per-profile <code>vector_backend: lance</code> flip — disk-resident IVF_PQ scales past the RAM line (ADR-019).</li>
<li><strong>VRAM ceiling for model variety.</strong> A4000 16GB holds 1-2 loaded models. Multi-model recruiter surfaces are a sequential swap, not parallel (Ollama <code>keep_alive=0</code>). Phase 17 profile activation unloads the prior model on swap.</li>
<li><strong>playbook_memory growth.</strong> 1936 entries today. Phase 25 (2026-04-21) added retirement via <code>valid_until</code> + <code>schema_fingerprint</code> fields + <code>POST /vectors/playbook_memory/retire</code> endpoint (manual or schema-drift triggered). Active vs retired split surfaced on <code>GET /vectors/playbook_memory/status</code>. Brute-force cosine still sub-ms at current size; Letta-style working-memory hot cache deferred until entry count crosses ~100K.</li>
</ul>
</div>
<!-- ═══ 8. ERROR SURFACES ═══ -->
<div class="chapter" id="ch8">
<div class="num">Chapter 8</div>
<h2>Error surfaces &amp; recovery</h2>
<div class="lede">Every failure mode has a named surface, a structured response, and a recovery path. No silent failures.</div>
<table class="plain">
<thead><tr><th>Failure mode</th><th>Surface / response</th><th>Recovery</th></tr></thead>
<tbody>
<tr><td>Ingest receives file with schema mismatch vs existing dataset</td><td><code>409 Conflict</code> with both fingerprints named (ADR-020)</td><td>Re-ingest under a new name, or migrate the existing via Phase 14 schema evolution</td></tr>
<tr><td>Bucket unreachable on write</td><td>Hard 503, error journaled to <code>primary://_errors/bucket_errors/</code></td><td><code>GET /storage/errors</code> lists failures; <code>GET /storage/bucket-health</code> shows per-bucket status</td></tr>
<tr><td>Bucket unreachable on read</td><td>Rescue bucket fallback, <code>X-Lakehouse-Rescue-Used: true</code> header on response</td><td>Response still succeeds; operator sees rescue flag</td></tr>
<tr><td>/log receives name that doesn't exist in workers_500k</td><td>Seed is SKIPPED; response includes <code>rejected_ghost_names: [...]</code> and a note</td><td>Operator sees exactly which names were rejected and why</td></tr>
<tr><td>Dual-agent executor malforms tool call</td><td>Result appended to log with <code>error</code> field; counter increments</td><td>After 3 consecutive: abort with full log dump at <code>tests/multi-agent/playbooks/&lt;id&gt;-FAILED.json</code></td></tr>
<tr><td>Dual-agent drifts from target</td><td>Reviewer verdict = <code>drift</code>, counter increments</td><td>After 3 consecutive drifts: abort with full log</td></tr>
<tr><td>Hybrid search finds zero candidates</td><td>Returns empty <code>sources[]</code> + <code>sql_matches: 0</code></td><td>Gap signal captured by scenario runner; operator prompted to broaden filter</td></tr>
<tr><td>Ollama sidecar down</td><td>502 Bad Gateway from aibridge; <code>embed</code> calls fail fast</td><td>Restart: <code>systemctl restart lakehouse-sidecar</code>; vector search falls back to pre-computed embeddings</td></tr>
<tr><td>Gateway restart mid-operation</td><td>In-memory state (playbook_memory, HNSW) reloaded from persisted <code>state.json</code> / trial journals</td><td>Zero data loss; catalog, storage, journals are all source-of-truth</td></tr>
<tr><td>Schema fingerprint diverges across manifests</td><td><code>catalog::dedupe</code> reports <code>DedupeReport</code> with winner selection (non-null row_count first, then newest updated_at)</td><td><code>POST /catalog/dedupe</code> collapses duplicates idempotently</td></tr>
<tr><td>Scenario event fails on zero-supply city</td><td>Cloud rescue (Phase 22 item B) fires — <code>gpt-oss:120b</code> sees SQL filters attempted, row counts, reviewer drift notes, contract terms; returns structured <code>{retry, new_city, new_state, new_role, new_count, rationale}</code></td><td>Retry with pivot runs same executor loop with new geography; verified Gary IN → South Bend IN filled 1/1 after original drift-abort</td></tr>
<tr><td>LLM response truncated mid-JSON (thinking model ate token budget)</td><td>Phase 21 <code>generateContinuable()</code> detects via brace-balance + JSON.parse; no silent truncation</td><td>Auto-continue with partial as scratchpad, or geometric backoff if initial call returned empty. Bounded by <code>max_continuations</code>.</td></tr>
<tr><td>Schema migration invalidates existing playbooks</td><td>Phase 25 — <code>POST /vectors/playbook_memory/retire</code> with current fingerprint retires all mismatched entries in scope; diagnostic log shows counts</td><td>Retired entries stay in journal for forensics but are skipped by all boost calculations. Scoped by (city, state) so unrelated geos aren't touched.</td></tr>
<tr><td>Observer fails to reach scenario outcome stream</td><td>Scenario <code>postObserverEvent()</code> uses 2s <code>AbortSignal.timeout</code>; silent skip if :3800 is down</td><td>Scenario log is still the source of truth; observer re-ingest on next run restores the stream. <code>data/_observer/ops.jsonl</code> is append-only so prior events survive.</td></tr>
</tbody>
</table>
</div>
<!-- ═══ 9. PER-STAFFER CONTEXT ═══ -->
<div class="chapter" id="ch9">
<div class="num">Chapter 9</div>
<h2>Per-staffer context</h2>
<div class="lede">Twenty staffers don't see the same UI state. Each one's session is shaped by their identity (the per-staffer hot-swap index — Path 8 in Ch6), their active profile, their workspaces, their assigned contracts, and their client's blacklists.</div>
<h3>Per-staffer hot-swap index (the recent layer)</h3>
<p>Maria runs Chicago. Devon runs Indianapolis. Aisha runs Wisconsin/Michigan. They share one corpus, but search results, recurring-skill patterns, and playbook context all reshape to whoever is acting. <code>/intelligence/chat</code> accepts <code>staffer_id</code>; on match, defaults state filter to the staffer's territory, scopes playbook-pattern geo to their primary city/state, and surfaces <code>response.staffer.name</code> so the UI relabels MEMORY → <em>MARIA'S MEMORY</em>.</p>
<p><strong>Verified end-to-end:</strong> same query "forklift operators" returns 167 IL workers as Maria, 89 IN as Devon, 16 WI as Aisha (live numbers; refresh the profiler page to recompute). The corpus stays intact; the relevance gradient is per coordinator. As each accumulates fills, their slice of the playbook compounds independently. <strong>Roster:</strong> <code>/staffers</code> endpoint, declared in <code>STAFFERS</code> in <code>mcp-server/index.ts</code>. Adding a staffer is one append; the architecture is metro-agnostic by construction.</p>
<h3>Active profile (Phase 17)</h3>
<p>Scopes every search. A <code>staffing-recruiter</code> profile bound to <code>workers_500k</code> sees only that dataset. A <code>security-analyst</code> profile bound to <code>threat_intel</code> cannot see worker data. <code>GET /vectors/profile/&lt;id&gt;/audit</code> records every tool invocation by model identity.</p>
<h3>Workspace (Phase 8.5)</h3>
<p>Per-contract state. Each workspace has daily/weekly/monthly tiers, saved searches, shortlists, activity logs. Survives across sessions. Instant zero-copy handoff between staffers — pointer swap, not data copy. Persisted to object storage, rebuilt on startup.</p>
<h3>Client blacklist</h3>
<p>Per-client worker exclusion. Populated via <code>POST /clients/:client/blacklist</code>. Auto-applied when the caller passes <code>client: "X"</code> on <code>/search</code>. JSON-backed; would move to catalog table under real client load.</p>
<h3>Audit trail</h3>
<p>Phase 12 tool registry logs every governed-action invocation (who called what, with what args, when, outcome). <code>GET /tools/audit</code> queryable. Phase 13 access control layers on top — role-based field masking, query audit log.</p>
<h3>Daily summary per staffer</h3>
<p>Workspace activity log + per-staffer filter on the event journal gives <strong>"what did Sarah do today"</strong> as a direct query. The foundation for shift-handoff reports.</p>
<h3>Staffer identity + competence-weighted retrieval (Phase 23)</h3>
<p>Each scenario run carries an explicit <code>staffer: {id, name, tenure_months, role, tool_level}</code>. The KB aggregates per-staffer stats (<code>data/_kb/staffers.jsonl</code>) that roll up into a single <code>competence_score</code>:</p>
<pre>competence_score = 0.45·fill_rate + 0.20·turn_efficiency + 0.20·citation_density + 0.15·rescue_rate</pre>
<p>When any query runs <code>kb.findNeighbors(spec, k)</code>, the ranking isn't just cosine similarity — it's <code>weighted_score = cosine × max_staffer_competence</code> over the best coordinator who ran that signature. Senior staffers' playbooks surface above juniors' on similar scenarios, even when the juniors' scenario was marginally closer in embedding space.</p>
<p>The <code>tool_level</code> knob (full / local / basic / minimal) controls which tiers are available to a given staffer's runs. See Ch 3 for the mapping. Variance is real and measurable: 36-run demo produced a 46pt fill-rate delta between James (local tools, 93%) and Alex (minimal tools, 60%) on identical contracts.</p>
<h3>Auto-discovered reliable-performer labels</h3>
<p>A second-order output of the competence path: when multiple staffers independently endorse the same worker on similar-signature playbooks, that worker accumulates cross-staffer endorsements. <code>scripts/kb_staffer_report.py</code> surfaces them — after 36 runs, Rachel D. Lewis (Welder Nashville) had 18 endorsements across 4 staffers, Angela U. Ward (Machine Op Indianapolis) 19. These are high-confidence "reliable" labels the system produced without human tagging. The UI could badge these workers on future queries; today they're visible via <code>/memory/query</code>'s <code>discovered_patterns</code> bundle.</p>
</div>
<!-- ═══ 10. A DAY IN THE LIFE ═══ -->
<div class="chapter" id="ch10">
<div class="num">Chapter 10</div>
<h2>A day in the life — from morning brief to EOD retrospective</h2>
<div class="lede">Concrete operator timeline. Every step touches a real endpoint that exists today.</div>
<div class="step"><div class="n">07:00</div><div class="body"><strong>Overnight housekeeping.</strong> Scheduled ingest runs — the configured cron picks up the client's latest ATS CSV delta, runs it through the pipeline in Ch2, marks workers_500k embeddings stale. Autotune agent promotes any Pareto-winner HNSW configs from overnight trials.</div></div>
<div class="step"><div class="n">07:30</div><div class="body"><strong>Embedding refresh.</strong> Background job re-embeds the new rows. Old index keeps serving. Hot-swap promotes when done.</div></div>
<div class="step"><div class="n">08:00</div><div class="body"><strong>Sarah (staffer) opens devop.live/lakehouse.</strong> Page loads in ~3s. Forecast panel shows: "$275M construction coming, 4 tight roles this week." Live Contracts section shows 6 Chicago permits with proposed fills + boost chips + pattern signals.</div></div>
<div class="step"><div class="n">08:15</div><div class="body"><strong>Sarah drills into a $5M permit.</strong> Top candidate card: Carmen Green, Endorsed · 3 playbooks chip, boost +0.166, pattern line reads "leader archetype · 47% OSHA-10." Sarah hovers the chip — narrative tooltip: "filled Welder x2 in Toledo (2026-04-15), Welder x1 in Toledo (2026-04-18)."</div></div>
<div class="step"><div class="n">08:30</div><div class="body"><strong>Sarah calls Carmen.</strong> Clicks Call button → <code>/log</code> fires → <code>playbook_memory.seed</code><code>persist_sql</code> → successful_playbooks_live grows by one. Button flashes "Logged" for 1.4s. No modal, no form, no second click.</div></div>
<div class="step"><div class="n">09:00</div><div class="body"><strong>Kim (another staffer) opens the same UI.</strong> Her profile loads. Her workspaces show her own contracts. She searches "reliable forklift Chicago" — MEMORY chip shows the pattern discovered across Sarah's morning work AND prior fills. Carmen, already logged by Sarah, shows up with an updated citation count.</div></div>
<div class="step"><div class="n">12:30</div><div class="body"><strong>Client pushes 20 new contracts + 1M ATS delta.</strong> Ch7 scale flow fires. Ingest in seconds; embedding refresh kicks off as a background job. Searches continue against old embeddings.</div></div>
<div class="step"><div class="n">14:00</div><div class="body"><strong>Emergency: worker Dave no-showed.</strong> Sarah types "Dave running late site 4422" into the search box. ~250ms later: triage card with Dave's profile + reliability + responsiveness, draft SMS to client ("dispatching X from local bench, 96% reliability, will confirm arrival"), and 5 same-role same-geo backfills sorted by responsiveness rendered as a green list below. Sarah clicks Copy SMS, pastes to client, clicks Call on the top backfill. <code>/log_failure</code> on Dave records the penalty for the next similar query.</div></div>
<div class="step"><div class="n">15:00</div><div class="body"><strong>New embeddings live.</strong> Hot-swap promotion. Searches now see all 1M new profiles. Sarah's noon query re-run would produce different top-5.</div></div>
<div class="step"><div class="n">17:00</div><div class="body"><strong>End-of-day retrospective.</strong> Any staffer who ran <code>tests/multi-agent/scenario.ts</code> gets <code>report.md</code> auto-generated. Workspace activity logs aggregate per staffer. <code>GET /vectors/playbook_memory/status</code> shows active vs retired counts. KB indexes the run (<code>kb.indexRun</code>) and the overview model synthesizes a pathway recommendation for the next matching signature. Every event outcome has already streamed to <code>lakehouse-observer.service</code> on :3800 for ERROR_ANALYZER + PLAYBOOK_BUILDER consumption.</div></div>
<div class="step"><div class="n">17:15</div><div class="body"><strong>Kim fires a natural-language query from the search box.</strong> "need 3 forklift operators in Joliet by Monday" → <code>POST /memory/query</code> on the Bun MCP. Regex normalizer extracts role / city / count / deadline / intent in 0ms (no LLM call). The unified response returns playbook workers (auto-surfaced reliable performers for Joliet Forklift with citation counts), pathway recommendation from KB, prior T3 lessons for Joliet, and top staffers by competence — all in ~300ms.</div></div>
<div class="step"><div class="n">22:00</div><div class="body"><strong>Overnight trial cycle.</strong> Autotune agent continues in the background. Trial journal grows. KB's <code>detectErrorCorrections</code> scans today's outcomes for fail→succeed deltas on the same signature; any correction gets logged to <code>data/_kb/error_corrections.jsonl</code> with the config diff. Tomorrow morning, the system is measurably better at something it got asked about today.</div></div>
<h3>SMS + email drafts in the pipeline</h3>
<p>After each sealed fill (via scenario.ts or manual <code>/log</code> flow with downstream hooks), <code>generateArtifacts</code> in the scenario runner produces: (a) one SMS per worker (TO: Name, message under 180 chars), (b) one client confirmation email. Drafts are saved to <code>sms.md</code> and <code>emails.md</code> under the scenario output dir. Ollama drafts them; the staffer reviews and sends. No auto-send; human-in-the-loop.</p>
</div>
<!-- ═══ 11. LIMITS & NON-GOALS ═══ -->
<div class="chapter" id="ch11">
<div class="num">Chapter 11</div>
<h2>Known limits &amp; non-goals</h2>
<div class="lede">Honesty is a feature. Everything below is either deferred or explicitly out of scope.</div>
<h4>Deferred — real architectural work, just not shipped yet</h4>
<ul>
<li><strong>BAI persistence + backtesting.</strong> Building Activity Index is computed live per page load. To validate the thesis (permit activity precedes equity moves) we need the daily series saved over months. Architectural support exists (<code>data/_kb/audit_baselines.jsonl</code> append pattern); just hasn't run long enough.</li>
<li><strong>NYC DOB adapter.</strong> Architecture is metro-agnostic — Chicago is Phase 1. NYC DOB ships next as a config-only Socrata adapter; LA County, Houston BCD, Boston ISD, DC DCRA queue behind it. Each new metro multiplies network edges without multiplying the codebase.</li>
<li><strong>12 awaiting public-data sources for contractor profile.</strong> DOL Wage &amp; Hour, EPA ECHO, MSHA, BBB, PACER civil suits, UCC liens, D&amp;B credit, State licensure, Surety bonds, DOT/FMCSA, State UI claims, DOL RAPIDS apprenticeships. Listed by name on every contractor profile with a one-line "would show:" sample. Each ships as a Socrata-style adapter; engineering scope is concrete.</li>
<li><strong>Rate / margin awareness.</strong> Worker pay expectations vs contract bill rate not modeled. Requires adding <code>pay_rate</code> to workers, <code>bill_rate</code> to contracts, and a filter + warning path. Partially addressed via <code>ContractTerms.budget_per_hour_max</code> passed to T3/rescue prompts, but the match-time filter isn't wired yet.</li>
<li><strong>Mem0-style UPDATE / DELETE / NOOP operations on playbooks.</strong> Today <code>/seed</code> only ADDs. Same <code>(operation, date)</code> pair appends a duplicate instead of refining an existing entry. Cheap to add, moderate payoff.</li>
<li><strong>Letta working-memory hot cache.</strong> Every boost query scans all active playbook entries from in-memory state. ~5K today; cheap. Will bite somewhere north of 100K. Deferred until the ceiling approaches.</li>
<li><strong>Confidence calibration.</strong> Top-K is a rank, not a probability. No calibrated "85% likely to accept" score. Requires outcome-labeled training data.</li>
<li><strong>SEC name-to-ticker fuzzy precision.</strong> Current matcher requires ≥2 non-stopword overlap; rare false positives still surface (saw FLG attach to a PNC-adjacent contractor once). Tightenable to require an explicit allow-list for production trading use.</li>
<li><strong>Tighter integration of pathway memory + scrum loop.</strong> ADR-021 substrate is shipped (88 traces, 11/11 replays). The hot-swap gate fires correctly; what's deferred is automatic mode-runner short-circuit when a high-confidence pathway match is available before any cloud call burns.</li>
</ul>
<h4>Non-goals — explicitly out of scope</h4>
<ul>
<li><strong>Cloud deployment.</strong> Local-first by design. Works offline after setup.</li>
<li><strong>Full ACID transactions.</strong> Single-writer model is sufficient; Delta Lake-grade MVCC is deliberately not attempted.</li>
<li><strong>Real-time streaming / CDC.</strong> Batch ingest is the model. Scheduled refresh, not transactional replication.</li>
<li><strong>Replacing the CRM.</strong> This is the analytical + AI layer <em>behind</em> the CRM. Operational CRUD stays with the existing system.</li>
<li><strong>Custom file formats.</strong> Parquet for datasets, sidecar indexes for vectors. No proprietary formats (ADR-008, ADR-018 reaffirm).</li>
<li><strong>Hard multi-tenant isolation.</strong> Profiles and federation provide soft isolation. Adversarial multi-tenant is not a goal — this system assumes a single-trust operator.</li>
</ul>
<div class="narr">
<strong>Overall bet.</strong> The substrate is conservative: Parquet + DataFusion + HNSW + Ollama + object storage. Every layer is replaceable, open, auditable. The intelligence layer (playbook_memory, patterns, autotune) is statistical, not neural — cheaper, explainable, rebuildable from the journal alone. If the statistical floor plateaus below what a real client needs, Phase 20+ adds neural re-rank on top. We don't make that call until measurement demands it.
</div>
</div>
</div>
</div>
<div class="footer">Lakehouse spec · v3 2026-04-27 · Phases 19-45 shipped (playbook boost, KB, staffer competence, observer ingest, validity windows, distillation v1.0.0 substrate frozen at e7636f2, gateway as OpenAI-compat drop-in, mode runner, validator + iterate, pathway memory ADR-021, per-staffer hot-swap, Construction Activity Signal Engine) · maintained from <code>docs/DECISIONS.md</code> · <a href="proof">architecture live-tested</a> · <a href="console">walkthrough</a> · <a href="profiler">profiler</a></div>
</body></html>