Lakehouse — Technical Specification

v1 · 2026-04-20
Chapter 1

Repository layout

What lives where. Every folder below has a single, bounded responsibility. A maintainer reading this should know — in under ten minutes — which crate owns a failing behavior.
PathOwns
crates/shared/Types, errors, Arrow helpers, schema fingerprints, PII detection, secrets provider. Every other crate depends on this.
crates/storaged/Raw object I/O. BucketRegistry (multi-bucket, rescue-aware), AppendLog (write-once batched append), ErrorJournal (bucket op failures). ADR-017 (federation), ADR-018 (append pattern).
crates/catalogd/Metadata authority. Dataset manifests, schema fingerprints (ADR-020), tombstones (soft delete), AI-safe views, model profiles (Phase 17). In-memory index persisted as Parquet on storage.
crates/queryd/SQL engine. DataFusion over Parquet + MemTable cache + delta merge-on-read + compaction. Registers every bucket as an object_store so SQL can join across them.
crates/ingestd/Data on-ramp. CSV / JSON / PDF (+OCR via Tesseract) / Postgres streaming / MySQL streaming / inbox watcher / cron schedules. Every ingest path auto-tags PII (emails, phones, SSNs, addresses), records lineage, and marks embeddings stale.
crates/vectord/The vector + learning surface. Embeddings stored as Parquet (ADR-008), HNSW index (Phase 15), trial system (autotune), promotion registry (Phase 16), playbook_memory (Phase 19). Core feedback loop lives here.
crates/vectord-lance/Firewall crate. Lance 4.0 + Arrow 57, isolated from the main Arrow-55 workspace. Provides secondary vector backend for large-scale, random-access, and append-heavy workloads (ADR-019).
crates/journald/Append-only mutation event log (ADR-012). Every insert/update/delete writes here — who, when, what, old/new value. Never mutated. Foundation for time-travel + compliance audit.
crates/aibridge/Rust ↔ Python sidecar. HTTP client over FastAPI wrapper around Ollama. VRAM introspection via nvidia-smi. All LLM calls (embed, generate, rerank) flow through here.
crates/gateway/Axum HTTP (:3100) + gRPC (:3101). Auth middleware, tools registry (Phase 12 — governed actions), CORS. Every external request enters here.
crates/ui/Dioxus WASM developer UI. Internal tool. Not exposed externally.
mcp-server/Bun/TypeScript recruiter-facing app. Serves devop.live/lakehouse. Routes: /search /match /log /log_failure /clients/:c/blacklist /intelligence/* /memory/query /models/matrix /system/summary. Observer sibling at observer.ts with HTTP listener on :3800 for scenario event ingest. Proxies to the Rust gateway for heavy work.
tests/multi-agent/Dual-agent scenario harness + memory stack. agent.ts (prompts, continuation + tree-split primitives, cloud routing), orchestrator.ts, scenario.ts (contracts + staffer + tool_level), kb.ts (KB indexing, competence scoring, neighbor retrieval), normalize.ts (input normalizer — structured / regex / LLM), memory_query.ts (unified /memory/query), gen_scenarios.ts + gen_staffer_demo.ts (corpus generators), run_e2e_rated.ts, chain_of_custody.ts. Unit tests colocated (kb.test.ts, normalize.test.ts).
config/models.json — authoritative 5-tier model matrix (T1 hot local / T2 review local / T3 overview cloud / T4 strategic / T5 gatekeeper). Per-tier context_window + context_budget + overflow_policy. Read at runtime by scenario.ts; hot-swap friendly.
docs/PRD.md, PHASES.md, DECISIONS.md (20 ADRs). Every significant architectural choice has an ADR with the alternatives that were rejected and why.
data/Default local object store. Parquet files per dataset, append-log batches, HNSW trial journals, promotion registries, _playbook_memory/state.json (now with retirement fields — Phase 25), catalog manifests. Plus four learning-loop directories: _kb/ (signatures, outcomes, recommendations, error_corrections, config_snapshots, staffers), _playbook_lessons/ (T3 cross-day lessons archived per run), _observer/ops.jsonl (append journal, durable scenario outcome stream), _chunk_cache/ (spec'd for Phase 21 Rust port). Rebuildable from repo + this dir alone.
Chapter 2

Data ingest pipeline

How staffing data gets into the system — whether from a CSV drop, an ATS export, a Postgres replica, or a PDF resume. Every path ends at the same place: a registered dataset with known schema, known lineage, known sensitivity.
1
Source arrives. Four shapes: (a) file upload via POST /ingest/file, (b) inbox watcher (drops in ./inbox/ → auto-ingested in under 15s), (c) Postgres or MySQL streaming connector (POST /ingest/db with DSN), (d) scheduled ingest via ingestd::schedule with cron.
2
Parse + normalize. CSV parser infers types per column; defaults to String on ambiguity (ADR-010 — better to ingest everything than reject on type mismatch). JSON parser flattens nested objects. PDF extractor uses lopdf first; falls back to Tesseract OCR for scanned/image PDFs. Output is always an Arrow RecordBatch.
3
Auto-detect PII. shared::pii scans column values and names. Identifies emails, phone numbers, SSNs, salaries, street addresses, medical terms. Tags columns with sensitivity: PII | PHI | Financial | Internal | Public (Phase 10 catalog v2).
4
Deduplicate by content hash. Every uploaded file's SHA-256 is checked against the catalog's seen-hash log. Re-ingesting the same file is a no-op (ADR invariant #5).
5
Write Parquet to object storage. arrow_helpers::record_batch_to_parquetstoraged::ops::put → file lands under data/datasets/<name>.parquet (or bucket-scoped via BucketRegistry). Schema fingerprint computed.
6
Register in catalog. catalogd::Registry::register(name, fingerprint, objects) — idempotent on (name, fingerprint). Same name + same fingerprint = reuse manifest, bump updated_at. Same name + different fingerprint = 409 Conflict (ADR-020 — prevents silent schema drift). New name = create new manifest with owner, lineage, freshness SLA, column metadata, PII tags.
7
Mark embeddings stale. If the dataset already has a vector index, the new rows mean that index is now behind. Registry::mark_embeddings_stale flips a flag; POST /vectors/refresh/<dataset> runs an incremental re-embed (only new rows, not the whole corpus).
8
Queryable immediately. queryd::context picks up the new manifest on next query. Hot-cache warms on first hit. Delta merge-on-read means updates land without rewriting the base Parquet.
Code: crates/ingestd/src/{service.rs, csv.rs, json.rs, pdf.rs, pg_stream.rs, my_stream.rs, schedule.rs}
Chapter 3

Measurement & indexing

Once data is in, the system describes it rigorously and builds fast-access indexes over the parts that will be queried. Every measurement is deterministic, versioned, and visible via HTTP.

What gets measured per dataset

  • Row count (from parquet footer, not a SELECT COUNT). O(1).
  • Schema fingerprint — SHA-256 over (column_name, type, nullability, sort) tuples. Drives ADR-020 idempotent register.
  • Owner / sensitivity / freshness SLA — catalog v2 metadata. PII auto-detected; owner assigned on ingest.
  • Lineage — source_system → ingest_job → dataset. Who put this here, when, from what.
  • Last embedded at — when the vector index covering this dataset was last refreshed. Drives stale-detection.

How vector indexes are built

Two backends, chosen per profile (ADR-019):

HNSW over Parquet (primary)Lance (secondary)
StorageEmbeddings as Parquet columns (doc_id, chunk_text, vector)Lance native dataset
IndexHNSW in RAM, serialized sidecarIVF_PQ on disk
Build time (100K × 768d)~230s~16s (14× faster)
Search p50 (100K)~873μs~7.4ms at recall 1.0
AppendRewrite requiredStructural (0.08s for 100 rows)
Random fetch by doc_idFull scan~311μs (112× faster)
RAM ceiling~5M vectorsScales past RAM — disk-resident

Autotune

The vectord::agent background task runs continuously. Per index, it proposes HNSW configurations (ef_construction × ef_search), executes a trial against a stored eval set, journals the result as JSONL, and — if recall beats the min_recall gate (0.9) and latency wins the Pareto test — promotes the new config atomically via promotion_registry. No downtime. Rollback in milliseconds.

Per-profile / per-staffer indexing

Model profiles (Phase 17) are not routing strings — they are named scopes. Each profile has bound_datasets[], hnsw_config, vector_backend, and bucket. When a staffer activates a profile:

  • EmbeddingCache warms for bound indexes only
  • HNSW is rebuilt with the profile's config (if different from current)
  • Search via POST /vectors/profile/<id>/search rejects out-of-scope queries with 403 + list of allowed bindings
  • Ollama swaps to the profile's model via keep_alive=0; only one model in VRAM at a time

Model matrix (Phase 20)

Five tiers declared in config/models.json. Each call site picks the tier appropriate to its purpose — hot-path JSON emitters get fast local, overview/strategic/gatekeeper decisions get thinking models on cloud. Every tier carries context_window, context_budget, and overflow_policy.

TierPurposePrimary modelFrequency
T1 hotPer tool call — SQL gen, hybrid_search, propose_doneqwen3.5:latest local, think:false50-200/scenario
T2 reviewPer-step consensus, drift flaggingqwen3:latest local, think:false5-14/event
T3 overviewMid-day checkpoints + cross-day lesson distillgpt-oss:120b Ollama Cloud, thinking on1-3/scenario
T4 strategicPattern re-ranking, weekly gap auditqwen3.5:397b cloud1-10/day
T5 gatekeeperSchema migrations, autotune config changeskimi-k2-thinking cloud, audit-logged1-5/day

Key mechanical finding (2026-04-21): qwen3.5 and qwen3 are thinking models — they burn ~650 tokens of hidden reasoning before emitting the visible response. For hot-path JSON emitters this meant 400-token budgets returned empty strings. Fix: think: false plumbed through sidecar's /generate endpoint; hot path disables thinking (structure matters more than reasoning depth), overseer tiers keep it on. Mistral was dropped entirely after a 0/14 fill rate on complex scenarios (decoder-level malformed-JSON bug, not a prompt issue).

Continuation primitive (Phase 21): generateContinuable() handles output-overflow without max_tokens tourniquets — empty response → geometric backoff retry; truncated-JSON → continue with partial as scratchpad. generateTreeSplit() handles input-overflow via map-reduce with running scratchpad. Both respect assertContextBudget() so silent truncation can't happen.

Per-staffer tool_level (Phase 23)

Scenarios can be scoped to a specific coordinator (staffer: {id, name, tenure_months, role, tool_level}). tool_level controls which tiers are available:

  • full — T1/T2 local, T3 cloud, cloud rescue on failure
  • local — T1/T2/T3 all local (gpt-oss:20b as overseer)
  • basic — kimi-k2.5 cloud executor + local reviewer + local T3, no rescue
  • minimal — kimi-k2.5 cloud executor, no T3, no rescue. Playbook inheritance is the only signal.

Measured 2026-04-21 on a 36-run demo (4 staffers × 3 contracts × 3 rounds): James Park (mid, local tools) ranked first at 92.9% fill and 36.8 cites/run, Maria Chen (senior, full tools) second at 81.0%. Cloud T3 adds latency without measurable benefit on this workload. Alex Rivera (trainee, minimal) still hit 59.5% fill purely from playbook inheritance — proof the memory carries knowledge when the model is capable.

Code: crates/vectord/src/{hnsw.rs, autotune.rs, agent.rs, promotion.rs} · tests/multi-agent/{agent.ts, scenario.ts} · config/models.json · ADR-019
Chapter 4

Contract inference from external signal

Most CRMs wait for a contract to land. This system watches upstream demand and pre-builds the ranking before the contract lands.

The concrete example running on devop.live/lakehouse is Chicago Department of Buildings permit data (public Socrata API). Every permit is a signal that construction — and therefore staffing — is coming.

Flow

1
Fetch. /intelligence/market and /intelligence/permit_contracts hit data.cityofchicago.org/resource/ydr8-5enu.json live. No caching of permit data — every page load is fresh.
2
Map work_type → role. Industry dictionary: "Electrical Work" → "Electrician", "Masonry Work" → "Production Worker", "Mechanical Work" → "Maintenance Tech", etc.
3
Derive worker count. Heuristic: ~1 worker per $150K of permit cost, capped 2-8 per contract for staffing realism. Operator-configurable when real client history is available.
4
Derive timeline. Permit issued → construction starts ~45 days later → staffing window opens ~14 days before construction. Classifies each permit as overdue, urgent, soon, scheduled.
5
Run hybrid search against the bench. For each derived contract, POST /vectors/hybrid with sql_filter on role+state+city+availability, use_playbook_memory: true, playbook_memory_k: 200. Returns top-5 candidates with boost + citations.
6
Query the meta-index. POST /vectors/playbook_memory/patterns aggregates traits across similar past playbooks — recurring certs, skills, archetype, reliability distribution. Surfaces signal the operator didn't query for.
7
Render on the dashboard. Each card shows permit + derived contract + top 3 candidates with memory chips + discovered pattern + urgency. All of this pre-computed before any staffer opens the UI.

Coverage forecast

/intelligence/staffing_forecast aggregates the last 30 days of permits into predicted role-level demand, joins against the IL bench supply, computes coverage %, and classifies each role as critical / tight / watch / ok. The dashboard's top panel renders this — staffers see supply gaps before they query.

Chapter 5

What a CRM can't do (and why)

A CRM stores. This system infers, predicts, re-ranks, and compounds. The six capabilities below are load-bearing — missing any of them is the gap between "software that logs calls" and "software that makes the next call better."
CapabilityCRMThis system
Store candidate recordsYesYes (workers_500k, candidates)
Search by structured fieldYesYes (DataFusion SQL, sub-100ms on 3M rows)
Search by semantic meaningNoYes (HNSW + nomic-embed-text)
Combine SQL filter + semantic rankNoYes (/vectors/hybrid)
Boost workers based on past successNoYes (Phase 19 playbook_memory)
Penalize workers based on past failureNoYes (/log_failure + 0.5n penalty)
Surface traits across past fillsNoYes (/vectors/playbook_memory/patterns)
Predict staffing demand from external dataNoYes (Chicago permit feed + 30-day rolling forecast)
Count down to staffing deadline per contractNoYes (permit issue_date + heuristic timeline)
Explain why each candidate rankedNoYes (boost chip + narrative citations + memory pattern)
Improve ranking from operator actionsNoYes (every Call/SMS/No-show click → re-rank signal)
Chapter 6

How it gets better over time

Compounding learning across seven paths. The first three are automatic background loops. Paths 4-7 landed 2026-04-21 and turn the system into a reinforcement-learning pipeline: outcomes → knowledge base → pathway recommendations → cloud rescue → competence-weighted retrieval → observer analysis. All seven happen without operator intervention.

Path 1 — Playbook boost with geo + role prefilter (Phase 19 + refinement)

Every sealed fill is seeded to playbook_memory. The boost fires inside /vectors/hybrid when use_playbook_memory: true. Math, tightened 2026-04-21 after a diagnostic pass found globally-ranked playbooks were missing the SQL-filtered candidate pool entirely:

per_worker = cosine(query_emb, playbook_emb) × 0.5 × e^(-age/30) × 0.5^failures / n_workers
boost[(city, state, name)] = min(Σ per_worker, 0.25)

Multi-strategy retrieval (new): before cosine, compute_boost_for_filtered_with_role(target_geo, target_role) prefilters to same-city playbooks, then gives exact (role, city, state) matches similarity=1.0 and fills up to half the top-k. Cosine fills the rest. Mirrors 2026 Mem0/Zep guidance on parallel-strategy rerank.

Measured lift: before geo-filter, Nashville Welder query returned boosts=170 matched=0 (zero intersection with candidate pool). After: boosts=36 matched=11. On the Riverfront Steel scenario, total playbook citations went from 2 → 28 per run — a 14× delta on identical inputs. The diagnostic log playbook_boost: boosts=N sources=N parsed=N matched=N target_geo=? target_role=? runs on every hybrid call so the class of silent-miss bug stays visible.

Path 2 — Pattern discovery (meta-index)

/vectors/playbook_memory/patterns goes beyond "who was endorsed" to answer "what did past similar fills have in common?" Aggregates recurring certifications, skills, archetype, reliability distribution across the top-K semantically similar playbooks. Surfaces signal the operator didn't explicitly query for.

Path 3 — Autotune agent

The vectord::agent background task runs continuously. Watches the HNSW trial journal, proposes configs, executes trials, promotes Pareto winners — without human intervention. Operator sees "the index got faster overnight" and doesn't know why. The journal knows why.

Path 4 — Knowledge Base + pathway recommender (Phase 22)

Meta-layer over playbook_memory. Files under data/_kb/:

  • signatures.jsonl — sig_hash + embedding of every run's event shape
  • outcomes.jsonl — per-run summary (models, fill rate, turns, citations, rescue stats, staffer, elapsed)
  • pathway_recommendations.jsonl — AI-synthesized advice for the next run of a similar sig
  • error_corrections.jsonl — fail→succeed deltas on the same sig
  • config_snapshots.jsonl — hash of active models + tool_level at each run

Cycle: scenario ends → kb.indexRun() appends outcome → kb.recommendFor(nextSpec) finds k-NN signatures, feeds outcome history to an overview model, writes structured JSON advice → next scenario reads it via kb.loadRecommendation(spec) and injects pathway_notes into the executor's context alongside prior T3 lessons.

Path 5 — Cloud rescue on failure (Phase 22 item B)

When an event fails (drift abort, JSON parse, pool exhaustion) and cloud T3 is enabled, requestCloudRemediation() feeds the full failure trace — SQL filters attempted, row counts, reviewer drift notes, gap signals, contract terms — to gpt-oss:120b on Ollama Cloud. Cloud returns structured {retry, new_city, new_state, new_role, new_count, rationale}. Event retries once with the pivot. Verified on stress_01: Gary IN (zero workers indexed) misplacement → cloud proposed South Bend IN → retry filled 1/1.

Path 6 — Staffer competence-weighted retrieval (Phase 23)

Answers "who handled this" as a first-class matrix-index dimension. Each scenario carries staffer: {id, name, tenure_months, role, tool_level}. After every run, recomputeStafferStats(staffer_id) aggregates their fill_rate, turn efficiency, citation density, rescue rate into a single competence_score (0.45·fill + 0.20·turn_eff + 0.20·cites + 0.15·rescue).

findNeighbors returns weighted_score = cosine × max_staffer_competence — top-performer playbooks rank above juniors' on similar scenarios. Auto-discovery emerges: running 4 staffers × 3 contracts × 3 rounds surfaced Rachel D. Lewis (Welder Nashville) with 18 endorsements across all 4 staffers, Angela U. Ward (Machine Op Indianapolis) with 19 — reliable-performer labels the system built without human tagging.

Path 7 — Observer outcome ingest (Phase 24)

Observer runs as lakehouse-observer.service, now with an HTTP listener on :3800. Scenarios POST per-event outcomes to /event with full provenance (staffer_id, sig_hash, event_kind, role, city, state, rescue flags). Observer's ERROR_ANALYZER and PLAYBOOK_BUILDER loops consume them alongside MCP-wrapped ops. Persistence switched from the old /ingest/file REPLACE path to an append-only data/_observer/ops.jsonl journal so the trace survives across restarts.

Input normalizer + unified memory query

Two surfaces added 2026-04-21 to make the memory stack respond coherently to any input shape:

  • normalizeInput(raw) — accepts structured JSON, natural language, or mixed. Three-tier: structured fast-path → regex (handles "need 3 welders in Nashville, TN" in 0ms without an LLM call) → qwen3 LLM fallback for low-signal inputs.
  • POST /memory/query — one endpoint, returns every memory surface in a single bundle: playbook_workers (with boost + citations), pathway_recommendation (KB), neighbor_signatures (competence-weighted), prior_lessons (T3 overseer history), top_staffers (leaderboard), discovered_patterns (auto-surfaced reliable workers for this role+city), latency_ms (per-source). Natural-language query end-to-end: 319ms.

Honest gaps — what we can still implement

Three of the five 2026-era memory findings remain unwired. Flagged for near-term implementation, not hidden:

  • Zep-style validity windows — playbook entries have timestamps but no valid_until or schema_fingerprint. Load-bearing: when a schema migration changes a column, stale playbooks silently keep boosting. Biggest-value remaining fix.
  • Mem0-style UPDATE / DELETE / NOOP ops/seed only ADDs. Same (operation, date) pair appends a duplicate instead of refining an existing entry. Playbook file grows faster than necessary.
  • Letta working-memory hot cache — every query scans all 1560 playbook entries from disk. Cheap today at 1.5K, not at 100K. Solution: LRU in-process cache for the last N playbooks or the current sig's neighborhood.

Validity windows is next — preserves the trust signal (boost only fires on playbooks that are still true given the current schema) rather than the latency signal (which the current scale doesn't need yet).

Code: crates/vectord/src/{playbook_memory.rs, service.rs} · tests/multi-agent/{kb.ts, memory_query.ts, normalize.ts, scenario.ts} · mcp-server/{observer.ts, index.ts} · data/_kb/ · data/_observer/
Chapter 7

Scale story — 20 staffers, 300 contracts, a surge

What happens when the demo-level load becomes the production-level load, and midday a client pushes 20 more contracts plus a 1M-row ATS delta. Honest: some of this is architectural headroom, not measured scale. The designed behaviors are below.

20 concurrent staffers

Axum is async. The gateway handles concurrent requests on Tokio with work-stealing. No per-request thread. Tested at 10 parallel queries in 82ms total on this hardware.

Per-staffer profile isolation. Each staffer activates their own profile (Phase 17) or workspace (Phase 8.5). Profile scopes their search to bound datasets. Workspace carries their in-progress contracts across sessions.

Per-client blacklists. Auto-applied when the caller passes client: "X" on /search. Staffer A filling for Acme never sees Acme's flagged workers. Staffer B filling for MidState sees them normally.

300 active contracts

SQL on job_orders is cheap. 300 rows is nothing — a scan is microseconds.

Workspace per contract. Each contract gets its own workspace with saved searches, shortlists, activity log. Zero-copy handoff between staffers (pointer swap, not data copy).

Forecast remains coherent. /intelligence/staffing_forecast aggregates 30-day permit data regardless of contract count. The bench supply query (GROUP BY role over workers_500k) is a single sub-second SQL.

Midday surge: +20 contracts, +1M profiles

The delta arrives at 12:30. Here's what happens in the following minutes:

1
+20 contracts via /ingest/db or /ingest/file. Parsed, schema-checked, Parquet-written, catalog-registered. No queries blocked — register holds a write lock across the manifest write only.
2
+1M worker profiles arrives as delta to workers_500k. Append-log pattern (ADR-018) means the new rows write to a fresh batch file — base Parquet is NOT rewritten. Queries against workers_500k immediately merge-on-read the new batches.
3
Embeddings marked stale. The vector index for workers_500k_v1 now has 1M rows it hasn't seen. mark_embeddings_stale flips the flag.
4
Incremental refresh fires. POST /vectors/refresh/workers_500k reads only the new rows (diff against existing embeddings), embeds them in batches of 64 via Ollama, writes delta embedding Parquet. Measured on threat_intel: 34 new rows in 970ms (6× faster than full re-embed).
5
Search degrades gracefully. During the refresh, searches against workers_500k_v1 still work — they serve from the old embeddings. Brute-force cosine over new-rows-without-embeddings is allowed but costs more. HNSW rebuild happens after all embeds complete.
6
Hot-swap promotion. When the new index is ready, promotion_registry atomically flips the active pointer. Next search hits the new config. Rollback stays available.
7
Autotune re-enters the loop. The agent queue picks up a DatasetAppended trigger and schedules a fresh HNSW trial cycle against the expanded index.

Known pain points at this scale

  • Ollama inference is serial. Embedding 1M rows at ~50 chunks/sec through nomic-embed-text = ~6 hours. Acceptable for overnight refresh, not for "immediate." Mitigated by incremental refresh (only deltas).
  • RAM ceiling on HNSW. Around 5M vectors × 768d, HNSW stops fitting in 128GB comfortably. Mitigation: per-profile vector_backend: lance flip — disk-resident IVF_PQ scales past the RAM line (ADR-019).
  • VRAM ceiling for model variety. A4000 16GB holds 1-2 loaded models. Multi-model recruiter surfaces are a sequential swap, not parallel (Ollama keep_alive=0). Phase 17 profile activation unloads the prior model on swap.
  • playbook_memory growth. 1936 entries today. Phase 25 (2026-04-21) added retirement via valid_until + schema_fingerprint fields + POST /vectors/playbook_memory/retire endpoint (manual or schema-drift triggered). Active vs retired split surfaced on GET /vectors/playbook_memory/status. Brute-force cosine still sub-ms at current size; Letta-style working-memory hot cache deferred until entry count crosses ~100K.
Chapter 8

Error surfaces & recovery

Every failure mode has a named surface, a structured response, and a recovery path. No silent failures.
Failure modeSurface / responseRecovery
Ingest receives file with schema mismatch vs existing dataset409 Conflict with both fingerprints named (ADR-020)Re-ingest under a new name, or migrate the existing via Phase 14 schema evolution
Bucket unreachable on writeHard 503, error journaled to primary://_errors/bucket_errors/GET /storage/errors lists failures; GET /storage/bucket-health shows per-bucket status
Bucket unreachable on readRescue bucket fallback, X-Lakehouse-Rescue-Used: true header on responseResponse still succeeds; operator sees rescue flag
/log receives name that doesn't exist in workers_500kSeed is SKIPPED; response includes rejected_ghost_names: [...] and a noteOperator sees exactly which names were rejected and why
Dual-agent executor malforms tool callResult appended to log with error field; counter incrementsAfter 3 consecutive: abort with full log dump at tests/multi-agent/playbooks/<id>-FAILED.json
Dual-agent drifts from targetReviewer verdict = drift, counter incrementsAfter 3 consecutive drifts: abort with full log
Hybrid search finds zero candidatesReturns empty sources[] + sql_matches: 0Gap signal captured by scenario runner; operator prompted to broaden filter
Ollama sidecar down502 Bad Gateway from aibridge; embed calls fail fastRestart: systemctl restart lakehouse-sidecar; vector search falls back to pre-computed embeddings
Gateway restart mid-operationIn-memory state (playbook_memory, HNSW) reloaded from persisted state.json / trial journalsZero data loss; catalog, storage, journals are all source-of-truth
Schema fingerprint diverges across manifestscatalog::dedupe reports DedupeReport with winner selection (non-null row_count first, then newest updated_at)POST /catalog/dedupe collapses duplicates idempotently
Scenario event fails on zero-supply cityCloud rescue (Phase 22 item B) fires — gpt-oss:120b sees SQL filters attempted, row counts, reviewer drift notes, contract terms; returns structured {retry, new_city, new_state, new_role, new_count, rationale}Retry with pivot runs same executor loop with new geography; verified Gary IN → South Bend IN filled 1/1 after original drift-abort
LLM response truncated mid-JSON (thinking model ate token budget)Phase 21 generateContinuable() detects via brace-balance + JSON.parse; no silent truncationAuto-continue with partial as scratchpad, or geometric backoff if initial call returned empty. Bounded by max_continuations.
Schema migration invalidates existing playbooksPhase 25 — POST /vectors/playbook_memory/retire with current fingerprint retires all mismatched entries in scope; diagnostic log shows countsRetired entries stay in journal for forensics but are skipped by all boost calculations. Scoped by (city, state) so unrelated geos aren't touched.
Observer fails to reach scenario outcome streamScenario postObserverEvent() uses 2s AbortSignal.timeout; silent skip if :3800 is downScenario log is still the source of truth; observer re-ingest on next run restores the stream. data/_observer/ops.jsonl is append-only so prior events survive.
Chapter 9

Per-staffer context

Twenty staffers don't see the same UI state. Each one's session is shaped by their active profile, their workspaces, their assigned contracts, and their client's blacklists.

Active profile (Phase 17)

Scopes every search. A staffing-recruiter profile bound to workers_500k sees only that dataset. A security-analyst profile bound to threat_intel cannot see worker data. GET /vectors/profile/<id>/audit records every tool invocation by model identity.

Workspace (Phase 8.5)

Per-contract state. Each workspace has daily/weekly/monthly tiers, saved searches, shortlists, activity logs. Survives across sessions. Instant zero-copy handoff between staffers — pointer swap, not data copy. Persisted to object storage, rebuilt on startup.

Client blacklist

Per-client worker exclusion. Populated via POST /clients/:client/blacklist. Auto-applied when the caller passes client: "X" on /search. JSON-backed; would move to catalog table under real client load.

Audit trail

Phase 12 tool registry logs every governed-action invocation (who called what, with what args, when, outcome). GET /tools/audit queryable. Phase 13 access control layers on top — role-based field masking, query audit log.

Daily summary per staffer

Workspace activity log + per-staffer filter on the event journal gives "what did Sarah do today" as a direct query. The foundation for shift-handoff reports.

Staffer identity + competence-weighted retrieval (Phase 23)

Each scenario run carries an explicit staffer: {id, name, tenure_months, role, tool_level}. The KB aggregates per-staffer stats (data/_kb/staffers.jsonl) that roll up into a single competence_score:

competence_score = 0.45·fill_rate + 0.20·turn_efficiency + 0.20·citation_density + 0.15·rescue_rate

When any query runs kb.findNeighbors(spec, k), the ranking isn't just cosine similarity — it's weighted_score = cosine × max_staffer_competence over the best coordinator who ran that signature. Senior staffers' playbooks surface above juniors' on similar scenarios, even when the juniors' scenario was marginally closer in embedding space.

The tool_level knob (full / local / basic / minimal) controls which tiers are available to a given staffer's runs. See Ch 3 for the mapping. Variance is real and measurable: 36-run demo produced a 46pt fill-rate delta between James (local tools, 93%) and Alex (minimal tools, 60%) on identical contracts.

Auto-discovered reliable-performer labels

A second-order output of the competence path: when multiple staffers independently endorse the same worker on similar-signature playbooks, that worker accumulates cross-staffer endorsements. scripts/kb_staffer_report.py surfaces them — after 36 runs, Rachel D. Lewis (Welder Nashville) had 18 endorsements across 4 staffers, Angela U. Ward (Machine Op Indianapolis) 19. These are high-confidence "reliable" labels the system produced without human tagging. The UI could badge these workers on future queries; today they're visible via /memory/query's discovered_patterns bundle.

Chapter 10

A day in the life — from morning brief to EOD retrospective

Concrete operator timeline. Every step touches a real endpoint that exists today.
07:00
Overnight housekeeping. Scheduled ingest runs — the configured cron picks up the client's latest ATS CSV delta, runs it through the pipeline in Ch2, marks workers_500k embeddings stale. Autotune agent promotes any Pareto-winner HNSW configs from overnight trials.
07:30
Embedding refresh. Background job re-embeds the new rows. Old index keeps serving. Hot-swap promotes when done.
08:00
Sarah (staffer) opens devop.live/lakehouse. Page loads in ~3s. Forecast panel shows: "$275M construction coming, 4 tight roles this week." Live Contracts section shows 6 Chicago permits with proposed fills + boost chips + pattern signals.
08:15
Sarah drills into a $5M permit. Top candidate card: Carmen Green, Endorsed · 3 playbooks chip, boost +0.166, pattern line reads "leader archetype · 47% OSHA-10." Sarah hovers the chip — narrative tooltip: "filled Welder x2 in Toledo (2026-04-15), Welder x1 in Toledo (2026-04-18)."
08:30
Sarah calls Carmen. Clicks Call button → /log fires → playbook_memory.seedpersist_sql → successful_playbooks_live grows by one. Button flashes "Logged" for 1.4s. No modal, no form, no second click.
09:00
Kim (another staffer) opens the same UI. Her profile loads. Her workspaces show her own contracts. She searches "reliable forklift Chicago" — MEMORY chip shows the pattern discovered across Sarah's morning work AND prior fills. Carmen, already logged by Sarah, shows up with an updated citation count.
12:30
Client pushes 20 new contracts + 1M ATS delta. Ch7 scale flow fires. Ingest in seconds; embedding refresh kicks off as a background job. Searches continue against old embeddings.
14:00
Emergency: worker Dave no-showed. Sarah clicks No-show button on Dave's card → /log_failuremark_failed records a penalty. Next similar query dampens Dave's boost by 0.5. Sarah continues the refill — the refill excludes Dave and the 2 others already booked for this shift.
15:00
New embeddings live. Hot-swap promotion. Searches now see all 1M new profiles. Sarah's noon query re-run would produce different top-5.
17:00
End-of-day retrospective. Any staffer who ran tests/multi-agent/scenario.ts gets report.md auto-generated. Workspace activity logs aggregate per staffer. GET /vectors/playbook_memory/status shows active vs retired counts. KB indexes the run (kb.indexRun) and the overview model synthesizes a pathway recommendation for the next matching signature. Every event outcome has already streamed to lakehouse-observer.service on :3800 for ERROR_ANALYZER + PLAYBOOK_BUILDER consumption.
17:15
Kim fires a natural-language query from the search box. "need 3 forklift operators in Joliet by Monday" → POST /memory/query on the Bun MCP. Regex normalizer extracts role / city / count / deadline / intent in 0ms (no LLM call). The unified response returns playbook workers (auto-surfaced reliable performers for Joliet Forklift with citation counts), pathway recommendation from KB, prior T3 lessons for Joliet, and top staffers by competence — all in ~300ms.
22:00
Overnight trial cycle. Autotune agent continues in the background. Trial journal grows. KB's detectErrorCorrections scans today's outcomes for fail→succeed deltas on the same signature; any correction gets logged to data/_kb/error_corrections.jsonl with the config diff. Tomorrow morning, the system is measurably better at something it got asked about today.

SMS + email drafts in the pipeline

After each sealed fill (via scenario.ts or manual /log flow with downstream hooks), generateArtifacts in the scenario runner produces: (a) one SMS per worker (TO: Name, message under 180 chars), (b) one client confirmation email. Drafts are saved to sms.md and emails.md under the scenario output dir. Ollama drafts them; the staffer reviews and sends. No auto-send; human-in-the-loop.

Chapter 11

Known limits & non-goals

Honesty is a feature. Everything below is either deferred or explicitly out of scope.

Deferred — real architectural work, just not shipped yet

  • Rate / margin awareness. Worker pay expectations vs contract bill rate not modeled. Requires adding pay_rate to workers, bill_rate to contracts, and a filter + warning path. Partially addressed via ContractTerms.budget_per_hour_max passed to T3/rescue prompts, but the match-time filter isn't wired yet.
  • Mem0-style UPDATE / DELETE / NOOP operations on playbooks. Today /seed only ADDs. Same (operation, date) pair appends a duplicate instead of refining an existing entry. Phase 26 item — cheap to add, moderate payoff.
  • Letta working-memory hot cache. Every boost query scans all active playbook entries from in-memory state. 1.9K today; cheap. Will bite somewhere north of 100K. LRU for the last-N playbooks or current-sig neighborhood deferred until that ceiling approaches.
  • Chunking cache (Phase 21 Rust port). TS primitives generateContinuable + generateTreeSplit are wired, but crates/aibridge/src/{continuation.rs, tree_split.rs} + crates/storaged/src/chunk_cache.rs remain queued. Gateway-side callers currently don't have the same protection against silent truncation that the TS test harness does.
  • Confidence calibration. Top-K is a rank, not a probability. No calibrated "85% likely to accept" score. Requires outcome-labeled training data.
  • Neural re-ranker. Phase 19 is statistical + semantic (now with geo + role prefilter, Phase 25 retirement). A (query, candidate, outcome)-trained re-ranker is deferred only if the statistical floor plateaus below usable recall — current 14× citation lift on identical inputs suggests it hasn't.
  • Observer → autotune feedback wire. Phase 24 streams scenario outcomes into data/_observer/ops.jsonl; autotune agent still runs on its own HNSW-trial schedule and hasn't subscribed to the outcome metric stream yet. Phase 26+ item — connects the last loop.
  • call_log cross-reference. Infrastructure present; current synthetic candidates table is too small to cross-ref. Fixes when real ATS lands.

Non-goals — explicitly out of scope

  • Cloud deployment. Local-first by design. Works offline after setup.
  • Full ACID transactions. Single-writer model is sufficient; Delta Lake-grade MVCC is deliberately not attempted.
  • Real-time streaming / CDC. Batch ingest is the model. Scheduled refresh, not transactional replication.
  • Replacing the CRM. This is the analytical + AI layer behind the CRM. Operational CRUD stays with the existing system.
  • Custom file formats. Parquet for datasets, sidecar indexes for vectors. No proprietary formats (ADR-008, ADR-018 reaffirm).
  • Hard multi-tenant isolation. Profiles and federation provide soft isolation. Adversarial multi-tenant is not a goal — this system assumes a single-trust operator.
Overall bet. The substrate is conservative: Parquet + DataFusion + HNSW + Ollama + object storage. Every layer is replaceable, open, auditable. The intelligence layer (playbook_memory, patterns, autotune) is statistical, not neural — cheaper, explainable, rebuildable from the journal alone. If the statistical floor plateaus below what a real client needs, Phase 20+ adds neural re-rank on top. We don't make that call until measurement demands it.