Lakehouse — Architecture & Reproduction

Chapter 1

Receipts, not promises

Every test below ran live against the real gateway when you loaded this page. Sub-100ms SQL on multi-million-row Parquet, hybrid search with playbook boost applied. No fixtures. If a test fails, you'll see ✗.

Running tests…

Chapter 2

Architecture — 13 crates, one object store, one local AI runtime

Request flows top to bottom. Every node is independently swappable. Every line is a real HTTP or gRPC hop that you can trace with tcpdump.

                            HTTP :3100  +  gRPC :3101
                                    │
                            ┌───────▼───────┐
                            │   gateway     │   Rust · Axum · routing, CORS, auth, tools
                            └───────┬───────┘
           ┌────────────┬───────────┼───────────┬────────────┐
           │            │           │           │            │
      ┌────▼───┐   ┌────▼───┐  ┌────▼───┐  ┌────▼───┐   ┌────▼───┐
      │catalog │   │ query  │  │ vector │  │ ingest │   │aibridge│
      │   d    │   │   d    │  │   d    │  │   d    │   │        │
      └────┬───┘   └────┬───┘  └────┬───┘  └────┬───┘   └────┬───┘
           │            │           │           │            │
           └────────────┴───────────┼───────────┴────────────┘
                                    ▼
                          ┌─────────────────┐
                          │ object storage  │   Parquet files (local / S3)
                          └─────────────────┘
                                    ▲
                                    │
                            ┌───────┴────────┐
                            │ Python sidecar │   FastAPI → Ollama
                            │   (aibridge)   │   local models only
                            └────────────────┘

Per-crate responsibility

Crate	Role	Path
shared	Types, errors, Arrow helpers, PII detection, secrets provider	crates/shared/
storaged	object_store I/O, BucketRegistry (multi-bucket), AppendLog, ErrorJournal	crates/storaged/
catalogd	Metadata authority — manifests, views, tombstones, profiles, schema fingerprints	crates/catalogd/
queryd	DataFusion SQL engine, MemTable cache, delta merge-on-read, compaction	crates/queryd/
ingestd	CSV/JSON/PDF(+OCR)/Postgres/MySQL ingest, cron schedules, auto-PII	crates/ingestd/
vectord	Embeddings as Parquet, HNSW, trial system, autotune agent, playbook_memory	crates/vectord/
vectord-lance	Firewall crate — Lance 4.0 + Arrow 57 isolated from main Arrow 55	crates/vectord-lance/
journald	Append-only mutation event log for time-travel & audit	crates/journald/
aibridge	Rust↔Python sidecar, Ollama HTTP client, VRAM introspection	crates/aibridge/
gateway	Axum HTTP :3100 + gRPC :3101, middleware, tools registry	crates/gateway/
ui	Dioxus WASM internal developer UI	crates/ui/
mcp-server	Bun TypeScript recruiter-facing app (this server)	mcp-server/

Source: git.agentview.dev/profit/lakehouse · ADRs: docs/DECISIONS.md (currently 20 records)

Chapter 3

Dual-agent recursive consensus loop

The system we use to execute staffing fills is a dual-agent recursive protocol. Two agents with distinct roles iterate against a shared log until one of three terminal states is reached. It is deterministic in structure, stochastic in content, and verifiable through the per-run log artifact.

Agents and protocol

  task in
    │
    ▼
  ┌───────────────────────────────────────────────────────────┐
  │  EXECUTOR (mistral:latest)                                │
  │  ──────────────────────────────────────────────────────── │
  │  input:   task spec + shared log + seen-candidates ledger │
  │  output:  one JSON action per turn                        │
  │             · {kind:"plan",steps:[…]}                     │
  │             · {kind:"tool_call",tool,args,rationale}      │
  │             · {kind:"propose_done",fills:[N of N]}        │
  └───────────┬───────────────────────────────┬───────────────┘
              │ tool_call                     │ propose_done
              ▼                               │
  ┌──────────────────────────┐                │
  │  TOOL DISPATCH           │                │
  │  hybrid_search / sql     │                │
  │  (against live gateway)  │                │
  └──────────┬───────────────┘                │
             │ result (trimmed, exclusions)   │
             ▼                                ▼
  ┌───────────────────────────────────────────────────────────┐
  │  REVIEWER (qwen2.5:latest)                                │
  │  ──────────────────────────────────────────────────────── │
  │  input:   task spec + shared log (including tool result)  │
  │  output:  {kind:"critique",verdict:"continue|drift|       │
  │                                    approve_done",notes}   │
  └───────────┬───────────────────────────────────────────────┘
              │
        ┌─────┴─────┐
        ▼           ▼           ▼
    continue     drift       approve_done + propose_done ⟹ SEAL
    (next turn)  (cap ≈ 3 →
                  hard abort)

Code: tests/multi-agent/agent.ts (protocol + prompts) · tests/multi-agent/orchestrator.ts (run loop) · tests/multi-agent/scenario.ts (5-event warehouse week)

Why "dual" — role specialization

The executor is an optimist. Its job is to produce progress: pull candidates, verify SQL, propose consensus. It's instructed to be decisive.

The reviewer is a pessimist. Its job is to catch drift: proposals that don't match the task's geography, fill count, or role. It's authorized to stop the loop.

This adversarial separation is cheaper and more deterministic than asking a single model to self-critique. The reviewer has a hard rule: on the turn after a propose_done, it MUST emit either approve_done or drift — it cannot stall with continue.

Why "parallel" — orchestrator can fan out

Independent pairs run concurrently. tests/multi-agent/run_e2e_rated.ts runs two task-specific agent pairs via Promise.all. Ollama serializes inference at the model level, so "parallel" is concurrent orchestration — but the substrate (gateway, queryd, vectord) handles concurrent requests cleanly. Verified in the scenario harness: two contracts sealing simultaneously.

Why "recursive" — each seal feeds the next

Consensus does not end at the sealed playbook. Every sealed playbook is persisted to playbook_memory via POST /vectors/playbook_memory/seed. The next hybrid search for a semantically similar operation consults that memory via compute_boost_for(query_embedding, top_k, base_weight) and re-ranks the candidate pool. The system builds on itself turn over turn, playbook over playbook.

Termination guarantees

// three paths out, every run has one of these:
sealed = executor.propose_done ∧ reviewer.approve_done ∧ fills.count == target
abort = consecutive_tool_errors ≥ MAX_TOOL_ERRORS (3) // executor can't form a valid call
abort = consecutive_drifts ≥ MAX_CONSECUTIVE_DRIFTS (3) // reviewer keeps flagging
abort = turn > MAX_TURNS (12) // no consensus reached in window

Every abort dumps the full log to tests/multi-agent/playbooks/<id>-FAILED.json for forensic review. No consensus is ever implicit.

Chapter 4

Playbook memory — the compounding feedback loop

A CRM stores events. This system turns events into re-ranking signal. Every sealed playbook endorses specific (worker, city, state) tuples. Every failure penalizes them. Every similar future query inherits the signal through cosine similarity.

Seed shape

PlaybookEntry {
  playbook_id, // pb-seed-<sha8>
  operation, // "fill: Welder x2 in Toledo, OH"
  approach, context, // short canonical — long strings dilute embedding
  timestamp, // RFC3339
  endorsed_names[], // validated against workers_500k for city+state
  city, state, // parsed from operation
  embedding // 768-d nomic-embed-text of text shape
}

Code: crates/vectord/src/playbook_memory.rs (PlaybookEntry, FailureRecord, PlaybookMemoryState)

Boost math (positive + decay + negative)

// For each playbook pb among top-K most cosine-similar:
// given query embedding qv, constant base_weight, n_workers = |pb.endorsed_names|

similarity = cosine(qv, pb.embedding)    // skip if ≤ 0.05
age_days = (now - pb.timestamp) / 86_400 seconds
decay = e^{-age_days / 30}   // half-life = 30 days

// For each endorsed worker in pb:
key = (pb.city, pb.state, name)
fail_count = failures[key]   // # times this worker was marked no-show for same geo
penalty = 0.5^{min(fail_count, 20)}

per_worker = similarity × base_weight × decay × penalty / n_workers
boost[key] = min(boost[key] + per_worker, MAX_BOOST_PER_WORKER)

// MAX_BOOST_PER_WORKER = 0.25 — cap stops one popular worker from always winning

Code: crates/vectord/src/playbook_memory.rs::compute_boost_for · constants: MAX_BOOST_PER_WORKER, DEFAULT_TOP_K_PLAYBOOKS, BOOST_HALF_LIFE_DAYS

Application at query time

// In /vectors/hybrid handler (crates/vectord/src/service.rs):
1. SQL filter narrows workers_500k to geo/role/availability
2. Vector index returns top_k × 5 candidates by cosine to question
3. compute_boost_for(qv, k=200) returns boost map
4. For each candidate: parse (name, city, state) from chunk, look up boost, add to score
5. Re-sort sources by boosted score
6. Truncate to requested top_k, return with playbook_boost and playbook_citations

Why k=200. Direct measurement showed cosine similarity clusters in the 0.55-0.67 band across all playbooks regardless of geo (nomic-embed-text has narrow discrimination on this kind of structured operation text). A k of 25 silently missed geo-matched playbooks. k=200 is the measured floor for reliably catching compounding. Brute-force over 200 × 768-d is sub-ms even on this hardware.

Evidence: Chicago Electrician compounding test 2026-04-20 — Carmen Green, Anna Patel, Fatima Wilson went from rank >5 / boost 0 / 0 citations (run 0, no seed) to rank 1/2/3 / boost +0.250 (capped) / 3 citations each (run 3, after 3 identical seeds). Each seed increments citations; total boost caps at 0.25/worker.

Write-through to SQL

successful_playbooks_live is a DataFusion-queryable Parquet surface maintained by POST /vectors/playbook_memory/persist_sql. Every /log from the recruiter UI triggers seed → persist_sql. The in-memory store and the SQL surface stay synchronized (full snapshot on each persist, safe because memory is source of truth).

Code: crates/vectord/src/playbook_memory.rs::persist_to_sql · catalog-registered under "successful_playbooks_live"

Pattern discovery (Path 2 — meta-index)

Beyond "who was endorsed." POST /vectors/playbook_memory/patterns takes a query, finds top-K similar past playbooks, pulls each endorsed worker's full workers_500k profile, and aggregates shared traits: recurring certifications, skill frequencies, modal archetype, reliability distribution. Returns a discovered_pattern string showing operator-actionable signal the user didn't explicitly query for.

Code: crates/vectord/src/playbook_memory.rs::discover_patterns · Surfaces: /vectors/playbook_memory/patterns endpoint, /intelligence/chat response, /intelligence/permit_contracts cards

Chapter 5

Key architectural choices — what was picked and why

Each choice is documented in docs/DECISIONS.md (Architecture Decision Records). If you dispute any of these, the ADR names the alternatives we rejected and the measurement that drove the call.

ADR-001 · Object storage as source of truth

No traditional database. All data is Parquet on S3-compatible object storage. Eliminates DB operational overhead; every engine can read Parquet.

ADR-008 · Embeddings stored as Parquet, not a vector DB

Keeps all data in one portable format. No Pinecone/Weaviate/Qdrant lock-in. Trade-off: brute-force search up to ~100K; HNSW beyond.

ADR-012 · Append-only event journal — never destroy evidence

Every mutation is appended. Compliance, audit, AI-decision forensics. Impossible to retrofit; easy to add now.

ADR-015 · Tool registry before raw SQL for agents

Named, governed, audited actions for agents. Permission checks, rate limits, parameter validation. MCP-compatible.

ADR-019 · Hybrid Parquet+HNSW ⊕ Lance vector backend

Parquet+HNSW primary (2.55× faster search at 100K). Lance secondary for index-build speed (14× faster), random fetch (112× faster), append (structural). Per-profile vector_backend: Parquet | Lance.

ADR-020 · Idempotent register() with schema-fingerprint gate

Same (name, fingerprint) reuses manifest. Different fingerprint = 409 Conflict. Prevents silent duplicate manifests. Cleanup run collapsed 374 → 31 datasets.

Phase 19 design note · Statistical + semantic, not neural

Meta-index is cosine similarity + endorsement aggregation. No model training. Rebuildable from successful_playbooks alone. Neural re-ranker deferred to Phase 20+ only if statistical floor plateaus.

Chapter 6

Measured at scale, on this machine

Hardware: i9 + 128GB RAM + Nvidia A4000 16GB VRAM. Numbers below are from this running instance. Refresh the page and they'll recompute.

Loading scale data…

Chapter 7

Verify or dispute — reproduce it yourself

Every claim below is a curl away from falsification.

Health. Should return lakehouse ok.

curl http://localhost:3100/health

Any SQL on multi-million-row Parquet. Sub-100ms typical.

curl -s -X POST http://localhost:3100/query/sql \
  -H 'Content-Type: application/json' \
  -d '{"sql":"SELECT role, COUNT(*) FROM workers_500k WHERE state=\"IL\" GROUP BY role LIMIT 5"}'

Hybrid search with playbook boost. The whole Phase 19 feedback loop in one request.

curl -s -X POST http://localhost:3100/vectors/hybrid \
  -H 'Content-Type: application/json' \
  -d '{"index_name":"workers_500k_v1",
       "sql_filter":"role = '\''Forklift Operator'\'' AND city = '\''Chicago'\'' AND CAST(availability AS DOUBLE) > 0.5",
       "question":"reliable forklift operator",
       "top_k":5,"use_playbook_memory":true,"playbook_memory_k":200}'

Playbook memory stats. Count + endorsed names + sample.

curl http://localhost:3100/vectors/playbook_memory/stats

Pattern discovery. What do past similar fills have in common?

curl -s -X POST http://localhost:3100/vectors/playbook_memory/patterns \
  -H 'Content-Type: application/json' \
  -d '{"query":"Forklift Operator in Chicago, IL","top_k_playbooks":25,"min_trait_frequency":0.3}'

Run the dual-agent scenario yourself. All 5 events, real fills, real artifacts.

cd /home/profit/lakehouse
bun run tests/multi-agent/scenario.ts
# Output: tests/multi-agent/playbooks/scenario-<timestamp>/report.md

Chapter 8

What we are not claiming

Every impressive-sounding number comes with a footnote. Here are the honest limits.

workers_500k is synthetic.

Real client ATS export replaces this table. Schema is deliberately identical to a production ATS.

candidates table has 1,000 rows.

Intentionally small for demo. call_log references higher candidate_ids that don't cross-reference — this is a dataset alignment issue, not a pipeline issue.

Chicago permit data is real.

Pulled live from data.cityofchicago.org/resource/ydr8-5enu.json (Socrata API). Not synthetic. Not cached.

Playbook memory is seeded from demo runs.

The pipeline that seeds it is identical to what a live recruiter would trigger via /log. Same code path.

Local 7B models (mistral, qwen2.5) are imperfect.

They occasionally malform tool calls or drop fields. Multi-agent scenarios seal roughly 40-80% in one run. Larger models or constrained decoding would improve this. Not a substrate problem.

No rate/margin awareness yet.

Worker pay expectations vs contract bill rates are not modeled. Flagged as a Phase 20 item; no architectural blocker.