Merge pull request 'Post-PR-#11 polish: demo UI, staffer console, face pool, icons, contractor profile (24 commits)' (#12 ) from demo/post-pr11-polish-2026-04-28 into main

sanitize: drop over-broad path-missing branch + UTF-8-safe redaction
Re-scrum of yesterday's sanitizer fix surfaced 2 more real bugs in the fix itself (opus, both WARN, neither caught by kimi/qwen): W1 (service.rs:1949) — `mentions_path_missing` standalone branch was too aggressive. A registry-internal error like "/root/.cargo/.../x.rs: no such file or directory" would 404 because it triggers without dataset context. That's a real 500. Dropped the standalone branch; require dataset context AND missing-shape phrase. Lance's actual "Dataset at path X was not found" still satisfies it. W2 (service.rs:2018) — `out.push(bytes[i] as char)` corrupted multi-byte UTF-8 by casting raw bytes to char (only sound for ASCII < 128). A path containing user-supplied non-ASCII names produced Latin-1 mojibake. Rewrote redact_paths to track byte indices and emit unmatched runs as &str slices via push_str(&s[range]) — preserves multi-byte sequences verbatim. Step advance is now per-char, not per-byte, via small utf8_char_len helper. Two new regression tests: - is_not_found_does_not_match_unrelated_path_missing - redact_preserves_multibyte_utf8 (uses 工作 + café in input) 12/12 sanitize tests PASS. Smoke 10/10 PASS. Loop closure for opus re-scrum on the 2026-05-02 fix bundle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 05:16:15 +00:00 · 2026-05-03 00:15:23 -05:00 · 2026-05-03 00:11:59 -05:00 · 2026-05-02 23:34:54 -05:00 · 2026-05-02 22:40:03 -05:00 · 2026-05-03 03:39:52 +00:00
837 changed files with 119528 additions and 759 deletions
--- a/.gitignore
+++ b/.gitignore
@ -4,3 +4,49 @@
 .env
 __pycache__/
 *.pyc
 # Headshot pool — binary face JPGs are fetched by scripts/staffing/fetch_face_pool.py
 # (synthetic StyleGAN, ~580MB for 1000 faces). Manifest + fetch script are tracked.
 data/headshots/face_*.jpg
 data/headshots/_thumbs/
 # ComfyUI on-demand generated portraits (per-worker unique). Cached on first
 # request; fully regeneratable via /headshots/generate/:key.
 data/headshots_gen/
 # Runtime data — all regeneratable from inputs or accumulated by daemons.
 # Anything under data/_<name>/ is internal state (auditor outputs, KB caches,
 # pathway memory snapshots, HNSW trial results, etc.). Anything under
 # data/datasets/ or data/vectors/ is generated by ingest/index pipelines.
 data/_*/
 data/lance/
 data/datasets/
 data/vectors/
 data/demo/
 data/evidence/
 data/face_test/
 data/headshots_role_pool/
 data/icons_pool/
 data/scored-runs/
 data/workspaces/
 data/catalog/
 data/**/*.bak-*
 data/**/*.pre-*-bak
 # Logs
 logs/
 # Build artifacts
 node_modules/
 exports/
 mcp-server/data/
 # Per-run distillation reports (timestamp-named); keep the parent dir tracked
 # via .gitkeep if needed but don't carry every batch's report set.
 reports/distillation/[0-9]*/
 reports/distillation/*-*-*-*-*/
 # Test scratch — scratchpads, traces, sessions are regenerated each run.
 # PRD/scenario fixtures stay tracked (they ARE the test).
 tests/agent_test/_*
 tests/agent_test/sessions/
 tests/real-world/runs/
--- a/Cargo.lock
+++ b/Cargo.lock
@ -48,6 +48,7 @@ version = "0.1.0"
 dependencies = [
 "async-trait",
 "axum",
 "lru",
 "reqwest",
 "serde",
 "serde_json",
--- a/STATE_OF_PLAY.md
+++ b/STATE_OF_PLAY.md
@ -0,0 +1,269 @@
 # STATE OF PLAY — Lakehouse
 **Last verified:** 2026-05-02 evening CDT
 **Verified by:** live probe (smoke 9/9, parity 32/32, gateway restarted), not memory.
 > **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
 ---
 ## WHAT LANDED 2026-05-01 → 2026-05-02 (10 commits this wave)
 | Commit | What | Verified |
 |---|---|---|
 | `5d30b3d` | lance: auto-build doc_id btree in `lance_migrate` handler | doc-fetch ~5ms (was ~100ms full scan) on scale_test_10m |
 | `044650a` | lance-bench: same scalar build post-IVF (matches gateway) | cargo check clean |
 | `7594725` | lance: 4-pack — `sanitize_lance_err` + 7 unit tests + 9-probe smoke + 10M re-bench | smoke 9/9 PASS, tests 7/7 PASS |
 | `98b6647` | gateway: `IterateResponse.trace_id` echoed; session_log_path enabled | parity probes see one unified JSONL |
 | `57bde63` | gateway: trace-id propagation + coordinator session JSONL (Rust parity with Go wave) | session_log_parity 4/4 |
 | `ba928b1` | aibridge: drop Python sidecar from hot path; AiClient → direct Ollama | aibridge tests 32/32 PASS, /ai/embed live 768d |
 | `654797a` | gateway: pub `extract_json` + `parity_extract_json` bin | extract_json_parity 12/12 |
 | `c5654d4` | docs: pointer to `golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md` | — |
 | `150cc3b` | aibridge: LRU embed cache, 236× RPS warm (78ms → 129us p50) | load test |
 | `9eed982` | mcp-server: /_go/* pass-through for G5 cutover slice | — |
 | `6e34ef7` | gitignore: stop tracking 100+ runtime ephemera (data/_*, lance, logs, node_modules) | untracked dropped 100+ → 0 |
 | `41b0a99` | chore: add 33 real items that were sitting untracked (scripts, scenarios, kimi reports, dev UIs) | clean working tree |
 **Cross-runtime parity (post-this-wave):** 32/32 across 5 probes — `validator(6/6) + extract_json(12/12) + session_log(4/4) + materializer(2/2) + embed(8/8)`. Run `cd /home/profit/golangLAKEHOUSE && for p in scripts/cutover/parity/*.sh; do bash "$p"; done` to re-verify.
 **Lance backend (was untested 5 days ago, now gauntlet-ready):**
 - `cargo test -p vectord-lance --release` → 7/7 PASS
 - `./scripts/lance_smoke.sh` → 9/9 PASS against live gateway
 - `reports/lance_10m_rebench_2026-05-02.md` — search warm ~20ms / cold ~46ms median, doc-fetch ~5ms post-btree
 ---
 ## VERIFIED WORKING RIGHT NOW
 ### The client demo (Staffing Co-Pilot)
 **Public URL:** `https://devop.live/lakehouse/` — 200, "Staffing Co-Pilot" (159 KB SPA, leaflet maps, dark theme).
 **Local URL:** `http://localhost:3700/` — same page, served by `mcp-server/index.ts` (PID 1271, started 09:48 CDT today).
 **The staffers console** (the one the client was thoroughly impressed with):
 - `https://devop.live/lakehouse/console` — 200, "Lakehouse — What Your Staffing System Would Do" (26 KB)
 - Pulls project index via `/api/catalog/datasets` (36 datasets) + playbook memory via `/api/vectors/playbook_memory/stats` (4,701 entries with embeddings, real ops like *"fill: Maintenance Tech x2 in Milwaukee, WI"*)
 Client-visible flow that works end-to-end on the public URL:
 | Endpoint | Sample output |
 |---|---|
 | `GET /api/catalog/datasets` | 36 datasets indexed: timesheets 1M, call_log 800K, workers_500k 500K, email_log 500K, workers_100k 100K, candidates 100K, placements 50K, job_orders 15K, successful_playbooks_live 2,077 |
 | `GET /api/vectors/playbook_memory/stats` | 4,701 fill operations with embeddings |
 | `GET /system/summary` | 36 datasets, 2.98M rows, 60 indexes, 500K workers loaded, 1K candidates |
 | `POST /intelligence/staffing_forecast` | 744 Production Workers needed in 30d, 11,281 bench (4,687 reliable), coverage 1,444%, risk=ok. Same for Electrician (need 32, bench 2,440) and Maintenance Tech (need 17, bench 5,004). |
 | `POST /intelligence/permit_contracts` | permit `3442956` $500K → 3 Production Workers, 886-candidate pool, 95% fill, $36K gross. 5 more Chicago permits with 8 workers each, same pool, 95% fill, $96K each. |
 | `POST /intelligence/market` | major Chicago permits ranked: $730M O'Hare, $615M 307 N Michigan, $580M casino, $445M Loop transit (real geo coords). |
 | `POST /intelligence/permit_entities` | architects + contractors from permit contacts (e.g. "KACPRZYNSKI, ANDY", "SLS ELECTRICAL SERVICE"). |
 | `POST /intelligence/activity` + `/intelligence/arch_signals` + `/intelligence/chat` | all 200 |
 The demo tells the story: *"upcoming Chicago contracts → workers needed → coverage from the bench → architects/contractors involved → revenue and margin."* That's the "live data + anticipating contracts + complete workflow" pitch — working as of right now.
 ### Backend, verified live this session
 | Surface | State |
 |---|---|
 | Gateway `:3100` | up, 4 providers configured, `/v1/health` 200 with 500K workers loaded |
 | MCP server `:3700` (Co-Pilot demo) | up, all `/intelligence/*` endpoints respond |
 | VCP UI `:3950` | started this session, `/data/*` 200, real numbers |
 | Observer `:3800` | ring full (2,000/2,000) — older events evicted, query Langfuse for 24h-ago state |
 | Sidecar `:3200` | up |
 | Langfuse `:3001` | recording, `gw:/log` + `v1.chat:openrouter` traces visible |
 | LLM Team UI `:5000` | up, only `extract` mode registered |
 | OpenCode fleet | **40 models reachable through one `sk-*` key** (verified live `GET https://opencode.ai/zen/v1/models`) |
 OpenCode catalog (live):
 - Claude: opus-4-7, opus-4-6, opus-4-5, opus-4-1, sonnet-4-6, sonnet-4-5, sonnet-4, haiku-4-5
 - GPT-5: 5.5-pro, 5.5, 5.4-pro, 5.4, 5.4-mini, 5.4-nano, 5.3-codex-spark, 5.3-codex, 5.2, 5.2-codex, 5.1-codex-max, 5.1-codex, 5.1-codex-mini, 5.1, 5-codex, 5-nano, 5
 - Gemini: 3.1-pro, 3-flash
 - GLM: 5.1, 5
 - Minimax: m2.7, m2.5
 - Kimi: k2.6, k2.5
 - Qwen: 3.6-plus, 3.5-plus
 - Other: BIG-PKL (was a typo-prone name in the catalog, model id starts with "big-pkl-something")
 - Free tier: minimax-m2.5-free, hy3-preview-free, ling-2.6-flash-free, trinity-large-preview-free
 ### The substrate (frozen — do not re-architect)
 - Distillation v1.0.0 at tag `e7636f2` — **145/145 bun tests pass, 22/22 acceptance, 16/16 audit-full**
 - Output: `data/_kb/distilled_{facts,procedures,config_hints}.jsonl` + `data/vectors/distilled_{factual,procedural,config_hint}_v20260423102847.parquet`
 - Auditor cross-lineage: Kimi K2.6 ↔ Haiku 4.5 alternation, Opus auto-promote on diffs >100k chars, **per-PR cap=3 with auto-reset on new head SHA**
 - Pathway memory: 88 traces, 11/11 successful replays (probation gate crossed)
 - Mode runner: 5 native modes; `codereview_isolation` is default; composed-corpus auto-downgrade verified Apr 26 (composed lost 5/5 vs isolation, p=0.031)
 ### Matrix indexer
 30+ live corpora including:
 - 5 versions of `workers_500k_v1..v9` (50K embedded chunks each)
 - 11 batched 2K-row shards `w500k_b3..b17`
 - `chicago_permits_v1` (3,420), `resumes_100k_v2` (100K candidates), `ethereal_workers_v1` (10K)
 - `lakehouse_arch_v1` (2,119), `lakehouse_symbols_v1` (2,470), `lakehouse_answers_v1` (1,269), `scrum_findings_v1` (1,260)
 - `kb_team_runs_v1` (12,693) + `kb_team_runs_agent` (4,407) — LLM-team play history embedded
 - `distilled_factual_v20260423102507` (8) — distillation output
 ### Code health
 - `cargo check --workspace` → **0 warnings, 0 errors**
 - `bun test auditor + tests/distillation` → **145/145 pass**
 - `ui/server.ts` + `auditor.ts` bundle clean
 ---
 ## DO NOT RELITIGATE
 - **PR #11 is merged into `origin/main` as `ed57eda`** — do not "still need to merge PR #11."
 - **Distillation tag `distillation-v1.0.0` at `e7636f2` is FROZEN** — do not re-architect schemas, scorer rules, audit fixtures.
 - **Kimi forensic HOLD verdict (2026-04-27) was 2/8 false + 6/8 latent** — do not re-debate, see `reports/kimi/audit-last-week-full.md`.
 - **`candidates_safe` `vertical` column bug** — fixed at catalog metadata layer in commit `c3c9c21`. Do not "discover" it again.
 - **Decisions A/B/C/D from `synthetic-data-gap-report.md`** — all four scripts shipped today (`d56f08e`, `940737d`, `c3c9c21`). Do not "ask J for approval."
 - **`workers_500k.phone` type fixup** — already string. The fixup script is idempotent; running it is a no-op.
 - **`client_workerskjkk` typo dataset** — was breaking every SQL query (catalog had it registered, file didn't exist). Removed via `DELETE /catalog/datasets/by-name/client_workerskjkk` this session. Do not re-add. Adding a startup gate that errors on unrecognized parquet names is the long-term fix per now.md Step 2C.
 - **Python sidecar dropped from hot path 2026-05-02 (`ba928b1`)** — AiClient calls Ollama directly. Do not "wire python embedding back in." `lab_ui.py` + `pipeline_lab.py` keep running as dev-only UIs (not on the runtime path).
 - **Lance backend gauntlet (2026-05-02)** — sanitizer over all 5 routes, 7 unit tests, 9-probe smoke, 10M re-bench. The `doc_id` btree auto-builds inside `lance_migrate` AND `lance-bench`. Do not "discover" the missing scalar index again or the leaked filesystem paths in error bodies.
 - **Cross-runtime parity = 32/32** across 5 probes in `golangLAKEHOUSE/scripts/cutover/parity/`. Do not "build a parity probe for X" without checking — validator, extract_json, session_log, materializer, and embed are all already covered.
 - **Decisions tracker is `golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md`** — single living source of truth for cross-runtime decisions. As of 2026-05-02 it has 0 `_open_` code work items; only 2 strategic items left (Lance vs Parquet+HNSW-with-spilling, Go-vs-Rust primary cutover).
 ---
 ## FIXES MADE THIS SESSION (2026-04-27 evening)
 1. **`crates/gateway/src/v1/iterate.rs:93`** — `state` → `_state` (cleared the one cargo warning).
 2. **`lakehouse-ui.service` (Dioxus)** — disabled. Was failing 7,242 times against a missing `target/dx/ui/debug/web/public` build dir. `systemctl stop && disable`.
 3. **VCP UI on `:3950`** — started `bun run ui/server.ts` (PID 1162212, log `/tmp/lakehouse_ui.log`). `/data/*` endpoints now 200 with real data.
 4. **`client_workerskjkk` catalog entry** — `DELETE /catalog/datasets/by-name/client_workerskjkk` removed the dead manifest. **This was the actual root cause** of `/system/summary` reporting `workers_500k_rows: 0` and the demo showing zero bench. Every SQL query was failing schema inference on the missing file before reaching its target table. Fixed → `workers_500k_rows: 500000`, `candidates_rows: 1000`, demo coverage flipped from "critical 0%" to actual percentages on devop.live/lakehouse.
 ## FIXES MADE THIS SESSION (2026-04-28 early — face pool)
 5. **Synthetic StyleGAN face pool — 1000 faces, gender+race+age tagged.** `scripts/staffing/fetch_face_pool.py` fetches from thispersondoesnotexist.com; `scripts/staffing/tag_face_pool.py --min-age 22` runs deepface and excludes minors. `data/headshots/manifest.jsonl` now has gender (494 men / 458 women), race (caucasian 662 · east_asian 128 · hispanic 86 · middle_eastern 59 · black 14 · south_asian 3), age, and 48 minor exclusions. Server pool = 952 servable faces.
 6. **`mcp-server/index.ts:1308` `/headshots/:key` route** — gender×race×age intersection bucketing with graceful fallback (gender-only → all). Same key always returns same face; different keys spread evenly.
 7. **`/headshots/_thumbs/` pre-resized 384×384 webp** (60× smaller: 587KB → ~11KB). Without this, 40-card grids overran Chrome's parallel-connection budget and ~75% of tiles never finished decoding. Generated via parallel ffmpeg (`xargs -P 8`); `.gitignore`d.
 8. **`mcp-server/search.html` + `console.html`** — dropped `img.loading='lazy'`. With 11KB thumbs, eager load is cheap (~500KB for 50 cards) and avoids the off-screen race that lazy decode produced.
 9. **ComfyUI on-demand uniqueness — `serve_imagegen.py:32`** added `seed` to `_cache_key()` (was caching by prompt only — 3 different worker seeds collapsed to 1 cached image). Verified: seed=839185194/195/196 → 3 distinct md5s.
 10. **`mcp-server/index.ts:1234` `/headshots/generate/:key`** — ComfyUI hot-path that derives a deterministic-per-worker seed via djb2-style hash; cold ~1.5s, cached ~1ms. Worker prompt format: `professional corporate headshot portrait of a {age}-year-old {race} {gender}, {role}, neutral expression, plain studio background, soft natural lighting, sharp focus, photorealistic, dslr`. Cache at `data/headshots_gen/` (gitignored, regeneratable).
 11. **Confidence-default name resolution** in `search.html` — `genderFor()` and `guessEthnicityFromFirstName()` lookup tables (FEMALE_NAMES, MALE_NAMES, NAMES_HISPANIC, NAMES_BLACK, NAMES_SOUTH_ASIAN, NAMES_EAST_ASIAN, NAMES_MIDDLE_EASTERN). Xavier → man+hispanic, Aisha → woman+black, etc. Every worker resolves to a face-pool bucket.
 End-to-end verified: playwright run on `https://devop.live/lakehouse/?q=forklift+operators+IL` → 21/21 cards loaded, 0 broken, all 384×384 webp thumbs.
 ---
 ## OPEN — but not blocking the demo
 | Item | What | When to act |
 |---|---|---|
 | `modes.toml` `staffing_inference.matrix_corpus` | still says `workers_500k_v8`. v9 in vector index is from Apr 17 (raw-sourced, not safe-view). The new `build_workers_v9.sh` rebuilds from `workers_safe`. | Run when you have 30+ min for the rebuild. |
 | Open PRs #6, #7, #10 | sitting since Apr 22-24, auditor verdicts on disk at `data/_auditor/kimi_verdicts/{6,7,10}-*.json` | Read verdicts, decide reconcile/close. |
 | `test/enrich-prd-pipeline` branch | 35 unmerged commits, includes more-evolved auditor/inference.ts (666 vs main's 580 lines), curation+fact-extractor wiring | Reconcile or formally archive — see `memory/project_unmerged_architecture_work.md`. |
 | `federation-hnsw-trials` stash | Lance + S3/MinIO prototype, `aws-config` crate added, 708 insertions | Phase B from EXECUTION_PLAN.md — revisit when Parquet vector ceiling actually hurts. |
 | `candidates` manifest drift | manifest 100K vs SQL 1K. Cosmetic. | Run a metadata resync if it matters. |
 ---
 ## RUNTIME CHEATSHEET
 ```bash
 # Verify the demo (public + local both work)
 curl -sS https://devop.live/lakehouse/                   # Co-Pilot HTML
 curl -sS https://devop.live/lakehouse/console            # staffers console
 curl -sS -X POST https://devop.live/lakehouse/intelligence/staffing_forecast \
  -d '{}' -H 'content-type: application/json' \
  | jq '.forecast[] | {role, demand_workers, bench_total, coverage_pct, risk}'
 # Restart sequence (after Rust changes)
 sudo systemctl restart lakehouse.service                 # gateway :3100
 sudo systemctl restart lakehouse-auditor                 # auditor daemon
 sudo systemctl restart lakehouse-observer                # observer :3800
 # UI bun on :3950 is NOT systemd-managed (lakehouse-ui.service is disabled).
 # Restart manually:  kill <pid>; nohup bun run ui/server.ts > /tmp/lakehouse_ui.log 2>&1 &
 # Health checks
 curl -sS http://localhost:3100/v1/health | jq            # workers_count, providers
 curl -sS http://localhost:3100/vectors/pathway/stats | jq
 curl -sS http://localhost:3100/v1/usage | jq             # since-restart cost
 curl -sS http://localhost:3700/system/summary | jq       # dataset counts
 ```
 ---
 ## VISION — what we're actually building (not what's done)
 J's framing for the legacy staffing company:
 - Pull live data, anticipate contracts based on Chicago permits → real architect/contractor associations, headcount, time period, money, scope.
 - Hybrid + memory index → search large corpora cheaply.
 - Email comes in → verify against contract; SMS comes in → alert when index changes.
 - Real-time.
 - Invent metrics nobody else has using the hybrid index.
 - Next stage: workers download an app → geolocation clock-in → automatic responsiveness measurement, no user effort, with incentives for using it.
 - Find people getting certificates (passive cert tracking).
 - Pull union data → bring contracts that work for **employees**, not just employers.
 - All metrics visible, nothing hidden, value-aligned with what each side actually needs.
 If a future session is shaving away from this vision toward "fix the cutover" or "land Phase X," the vision wins. Phases are scaffolding for the vision, not the goal.
 ---
 ## CURRENT PLAN — fix the demo for the legacy staffing client
 Built from playwright audit of the live demo (2026-04-27 evening). Each item ends in something the client can SEE, not internal cleanups.
 **Demo state is anchored by git tag `demo-2026-04-27`** (commit `ed57eda`, the merge of PR #11). To restore code state: `git checkout demo-2026-04-27`. To restore runtime state: `DELETE /catalog/datasets/by-name/client_workerskjkk` (catalog hot-fix is not in git).
 ### P1 — Search box that actually filters (highest visible impact)
 **Problem:** typing in `#sq` and pressing Enter fires `POST /intelligence/chat` with body `{"message":"<query>"}`. The state (`#sst`) and role (`#srl`) selects are ignored — never sent in the body. So every search returns a generic chat completion, never a SQL+vector hybrid filter against `workers_500k`. That is the "cached/generic response" the client sees.
 **Fix:** in `mcp-server/search.html`, change the search-submit handler to call the real worker search endpoint with `{query, state, role, top_k}`. The MCP `search_workers` tool surface already exists; route the form there. Render returned worker rows in the existing card grid.
 **Done when:** typing "forklift" + state IL + role "Forklift Operator" returns ≤ top_k IL Forklift Operators, and changing state to WI returns different workers.
 ### P2 — Contractor-name click → `/contractor` profile page
 **Problem:** clicking a contractor name in any rendered card stays on `/lakehouse/`. URL doesn't change.
 **Fix:** wrap contractor names in `<a href="/contractor?name=<encoded>">`. The page `mcp-server/contractor.html` (14.8 KB, "Contractor Profile · Staffing Co-Pilot") already exists at `/contractor` and the data endpoint `/intelligence/contractor_profile` already returns rich data.
 **Then check contractor.html actually shows:** full history of every record the database has on that contractor + heat map of locations underneath + relevant info (per J 2026-04-27). If the page is incomplete, finish it. Otherwise just wire the link.
 **Done when:** clicking "KACPRZYNSKI, ANDY" opens a profile with: every Chicago permit they're contact_1 or contact_2 on, a leaflet map with markers for each address, and any matched workers from prior placements at their sites.
 ### P3 — Substrate signal at the bottom shows the right numbers
 **Problem:** J reports the bottom panel says "playbook memory empty, 80 traces 0 replies." Reality from the live endpoints: `/api/vectors/playbook_memory/stats` = 4,701 entries with embeddings; `/vectors/pathway/stats` = 88 traces, 11/11 replays.
 **Fix:** find the renderer in search.html that builds the substrate signal panel; verify it's hitting the right endpoints and reading the right keys; fix shape mismatches.
 **Done when:** bottom panel shows real numbers (4,701 playbooks, 88 traces, 11/11 replays) and references at least one specific recent operation from the playbook stats sample.
 ### P4 — Top nav reflects today's architecture
 **Problem:** Walkthrough/Architecture/Spec/Onboard/Alerts/Workspaces tabs all return 200 but content is from old architecture. Doesn't mention: gateway scratchpad, memory indexer, ranker, mode runner, OpenCode 40-model fleet, distillation substrate, auditor cross-lineage.
 **Fix:** rewrite `mcp-server/proof.html` (or add a single new page "What's running" that replaces Architecture+Spec) to describe what's actually shipped as of `demo-2026-04-27`. Keep one architecture page, drop redundancy. Either complete or hide Onboard/Alerts/Workspaces — J's call which.
 **Done when:** the architecture page tells a non-technical reader, in 2 minutes, what each piece does in coordinator-relatable terms ("intern that read every email", not "3-stage adversarial inference pipeline").
 ### P5 — Caching for the project-index build_signal (J flagged unfinished)
 **Problem:** "we never finished our caching for project index build signal it's not pulling new information." Need to find what `build_signal` refers to. Likely a scrum/auditor signal that should rebuild the `lakehouse_arch_v1` corpus on commit but isn't wired to.
 **Fix:** identify the build-signal pipeline (likely in `auditor/` or `crates/vectord/`), wire its emit to a corpus rebuild, verify by making a test commit and watching the new chunk appear in `/vectors/indexes` for `lakehouse_arch_v1`.
 **Done when:** committing a new file to `crates/` causes `lakehouse_arch_v1` chunk_count to increase within N minutes.
 ### P0 — Anchor the demo state (DONE)
 Tagged `ed57eda` as `demo-2026-04-27`. Future sessions: `git checkout demo-2026-04-27` to land in this exact code state.
 ---
 ## EXECUTION ORDER
 1. **P1 first** — biggest visible bug, ~30-60 min
 2. **P2 next** — contractor click is the second-biggest "doesn't work" the client sees, ~20 min if profile is mostly done
 3. **P3** — small fix, big "looks alive" win
 4. **P4** — biggest scope; might split across sessions
 5. **P5** — feature work, only after the visible bugs are fixed
 Each item commits independently with the format `demo: P<n> — <one-line>` so the commit log doubles as a progress journal. After each merge to main, re-tag `demo-latest` to point at the new HEAD.
 Stop here and let J pick which item to start with. Do not silently extend scope.
--- a/bot/propose.ts
+++ b/bot/propose.ts
@ -16,12 +16,14 @@ import type { Gap, Proposal } from "./types.ts";
 // Phase 44 migration (2026-04-27): bot/propose.ts now flows through
 // the gateway's /v1/chat instead of hitting the sidecar's /generate
 // directly. /v1/usage tracks the call, Langfuse traces it, observer
-// sees it. Same upstream model (CLOUD_MODEL gpt-oss:120b on
+// sees it. Gateway owns the routing.
-// Ollama Cloud) — gateway just owns the routing.
+//
 // 2026-04-28: gpt-oss:120b → deepseek-v3.2 via Ollama Pro. Newer
 // DeepSeek revision, faster, still on the same OLLAMA_CLOUD_KEY.
 const GATEWAY_URL = process.env.LH_GATEWAY_URL ?? "http://localhost:3100";
 const REPO_ROOT = "/home/profit/lakehouse";
 const PRD_PATH = `${REPO_ROOT}/docs/PRD.md`;
-const CLOUD_MODEL = process.env.LH_BOT_MODEL ?? "gpt-oss:120b";
+const CLOUD_MODEL = process.env.LH_BOT_MODEL ?? "deepseek-v3.2";
 const MAX_TOKENS = 6000;
 export async function findGaps(): Promise<Gap[]> {
--- a/config/modes.toml
+++ b/config/modes.toml
@ -44,7 +44,10 @@ name = "staffing_inference"
 # pattern generalizes beyond code review.
 preferred_mode = "staffing_inference_lakehouse"
 fallback_modes = ["ladder", "consensus", "pipeline"]
-default_model = "openai/gpt-oss-120b:free"
+# 2026-04-28: gpt-oss-120b:free → kimi-k2.6 via Ollama Pro. Coding-
 # specialized, faster than gpt-oss, on the same OLLAMA_CLOUD_KEY so
 # no extra provider hop.
 default_model = "kimi-k2.6"
 matrix_corpus = "workers_500k_v8"
 [[task_class]]
@ -58,7 +61,9 @@ matrix_corpus = "kb_team_runs_v1"
 name = "doc_drift_check"
 preferred_mode = "drift"
 fallback_modes = ["validator"]
-default_model = "gpt-oss:120b"
+# 2026-04-28: gpt-oss:120b → gemini-3-flash-preview via Ollama Pro.
 # Speed leader on factual checking, same OLLAMA_CLOUD_KEY.
 default_model = "gemini-3-flash-preview"
 matrix_corpus = "distilled_factual_v20260423095819"
 [[task_class]]
--- a/config/providers.toml
+++ b/config/providers.toml
@ -15,22 +15,29 @@
 [[provider]]
 name = "ollama"
-base_url = "http://localhost:3200"
+base_url = "http://localhost:11434"
 auth = "none"
 default_model = "qwen3.5:latest"
-# Hot-path local inference. No bearer needed — Python sidecar on
+# Hot-path local inference. No bearer needed — direct to Ollama as of
-# localhost handles the Ollama API. Model names are bare
+# 2026-05-02 (Python sidecar's pass-through wrapper retired). Model
-# (e.g. "qwen3.5:latest", not "ollama/qwen3.5:latest").
+# names are bare (e.g. "qwen3.5:latest", not "ollama/qwen3.5:latest").
 [[provider]]
 name = "ollama_cloud"
 base_url = "https://ollama.com"
 auth = "bearer"
 auth_env = "OLLAMA_CLOUD_KEY"
-default_model = "gpt-oss:120b"
+default_model = "deepseek-v3.2"
-# Cloud-tier Ollama. Key resolved from OLLAMA_CLOUD_KEY env at gateway
+# Cloud-tier Ollama (Pro plan as of 2026-04-28). Key resolved from
-# boot. Model-prefix routing: "cloud/<model>" auto-routes here
+# OLLAMA_CLOUD_KEY at gateway boot; Pro tier upgraded the account so
-# (see gateway::v1::resolve_provider).
+# rate limits + model access widen without a key change. Model-prefix
 # routing: "cloud/<model>" auto-routes here. 39-model fleet now
 # includes deepseek-v3.2, deepseek-v4-{flash,pro}, gemini-3-flash-
 # preview, glm-{5,5.1}, kimi-k2.6, qwen3-coder-next.
 # 2026-04-28: default upgraded gpt-oss:120b → deepseek-v3.2 (newest
 # DeepSeek revision). NOTE: kimi-k2:1t is upstream-broken (HTTP 500
 # on Ollama Pro probe 2026-04-28) — do not route to it. Use kimi-k2.6
 # instead, which is what staffing_inference points at.
 [[provider]]
 name = "openrouter"
@ -38,7 +45,7 @@ base_url = "https://openrouter.ai/api/v1"
 auth = "bearer"
 auth_env = "OPENROUTER_API_KEY"
 auth_fallback_files = ["/home/profit/.env", "/root/llm_team_config.json"]
-default_model = "openai/gpt-oss-120b:free"
+default_model = "x-ai/grok-4.1-fast"
 # Multi-provider gateway. Covers Anthropic, Google, OpenAI, MiniMax,
 # Qwen, Gemma, etc. Key resolved via crates/gateway/src/v1/openrouter.rs
 # resolve_openrouter_key() — env first, then fallback files.
@ -74,8 +81,10 @@ auth_env = "KIMI_API_KEY"
 default_model = "kimi-for-coding"
 # Direct Kimi For Coding provider. `api.kimi.com` is a SEPARATE account
 # system from `api.moonshot.ai` and `api.moonshot.cn` — keys are NOT
-# interchangeable. Used when Ollama Cloud's `kimi-k2:1t` is upstream-
+# interchangeable. Used as a fallback when Ollama Cloud's kimi-k2.6 is
-# broken and OpenRouter's `moonshotai/kimi-k2.6` is rate-limited.
+# unavailable and OpenRouter's `moonshotai/kimi-k2.6` is rate-limited.
 # (Was `kimi-k2:1t` here pre-2026-05-03 — that model is upstream-broken
 # and removed from operator guidance.)
 # Model id: `kimi-for-coding` (kimi-k2.6 underneath).
 # Key file: /etc/lakehouse/kimi.env (loaded via systemd EnvironmentFile).
 # Model-prefix routing: "kimi/<model>" auto-routes here, prefix stripped.
--- a/crates/aibridge/Cargo.toml
+++ b/crates/aibridge/Cargo.toml
@ -12,3 +12,4 @@ serde_json = { workspace = true }
 tracing = { workspace = true }
 reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] }
 async-trait = "0.1"
 lru = "0.12"
--- a/crates/aibridge/src/client.rs
+++ b/crates/aibridge/src/client.rs
@ -1,28 +1,74 @@
 use lru::LruCache;
 use reqwest::Client;
 use serde::{Deserialize, Serialize};
 use std::num::NonZeroUsize;
 use std::sync::Mutex;
 use std::sync::atomic::{AtomicU64, Ordering};
 use std::sync::Arc;
 use std::time::Duration;
-/// HTTP client for the Python AI sidecar.
+/// HTTP client for Ollama (post-2026-05-02 — sidecar dropped).
 ///
 /// `base_url` was historically the Python sidecar at `:3200`, which
 /// pass-through-proxied to Ollama at `:11434`. The sidecar added zero
 /// logic on the hot path (embed.py + generate.py + rerank.py +
 /// admin.py = ~120 LOC of pure Ollama wrappers), so this client now
 /// talks to Ollama directly and the sidecar process can be retired.
 ///
 /// What stayed Python: `lab_ui.py` + `pipeline_lab.py` (~888 LOC of
 /// dev-mode Streamlit-shape UIs) — those aren't on the runtime hot
 /// path and continue running for prompt experimentation.
 ///
 /// `generate()` has two transport modes:
-/// - When `gateway_url` is None (default), it posts to
+/// - When `gateway_url` is None (default), posts directly to Ollama's
-///   `${base_url}/generate` (sidecar direct).
+///   `${base_url}/api/generate`.
-/// - When `gateway_url` is `Some(url)`, it posts to
+/// - When `gateway_url` is `Some(url)`, posts to `${url}/v1/chat`
-///   `${url}/v1/chat` with `provider="ollama"` so the call appears
+///   with `provider="ollama"` so the call appears in `/v1/usage` and
-///   in `/v1/usage` and Langfuse traces.
+///   Langfuse traces.
 ///
-/// `embed()`, `rerank()`, and admin methods always go direct to the
+/// `embed()`, `rerank()`, and admin methods always go direct to
-/// sidecar — no `/v1` equivalent yet, no point round-tripping.
+/// Ollama — no `/v1` equivalent for those surfaces yet.
 ///
 /// Phase 44 part 2 (2026-04-27): the gateway URL is wired in by
 /// callers that want observability (vectord modules); it's left
 /// unset by callers that ARE the gateway internals (avoids self-loops
 /// + redundant hops).
 /// Per-text embed cache key. We key on (model, text) so different
 /// model selections produce distinct cache lines — a query embedded
 /// under nomic-embed-text-v2-moe must NOT collide with the same
 /// query under nomic-embed-text v1.
 #[derive(Eq, PartialEq, Hash, Clone)]
 struct EmbedCacheKey {
    model: String,
    text: String,
 }
 /// Default LRU cache size — 4096 entries × ~6KB per 768-d f64
 /// vector ≈ 24MB. Sized for typical staffing-domain repetition
 /// (coordinator workflows have query repetition rates around 70-90%
 /// per session). Tunable via [aibridge].embed_cache_size in the
 /// config; 0 disables the cache entirely.
 const DEFAULT_EMBED_CACHE_SIZE: usize = 4096;
 #[derive(Clone)]
 pub struct AiClient {
    client: Client,
    base_url: String,
    gateway_url: Option<String>,
    /// Closes the 63× perf gap with Go side. Mirrors the shape of
    /// Go's internal/embed/cached.go::CachedProvider — same
    /// (model, text) → vector caching, same nil-disable semantics.
    /// None = caching disabled (cache_size=0); Some = bounded LRU.
    embed_cache: Option<Arc<Mutex<LruCache<EmbedCacheKey, Vec<f64>>>>>,
    /// Hit / miss counters for /admin observability + load-test
    /// validation. Atomic so Clone'd AiClients share the same counts.
    embed_cache_hits: Arc<AtomicU64>,
    embed_cache_misses: Arc<AtomicU64>,
    /// Pinned at construction time so the EmbedResponse can carry
    /// dimension consistently even when every text was a cache hit
    /// (no fresh upstream call to learn the dim from). Set on first
    /// successful Ollama embed; checked on every cache hit.
    cached_dim: Arc<AtomicU64>,
 }
 // -- Request/Response types --
@ -95,17 +141,49 @@ pub struct RerankResponse {
 impl AiClient {
    pub fn new(base_url: &str) -> Self {
        Self::with_embed_cache(base_url, DEFAULT_EMBED_CACHE_SIZE)
    }
    /// Constructs an AiClient with an explicit embed-cache size.
    /// Pass 0 to disable the cache entirely (matches Go-side
    /// CachedProvider's nil-cache semantics).
    pub fn with_embed_cache(base_url: &str, cache_size: usize) -> Self {
        let client = Client::builder()
            .timeout(Duration::from_secs(120))
            .build()
            .expect("failed to build HTTP client");
        let embed_cache = if cache_size > 0 {
            // SAFETY: cache_size > 0 just verified, NonZeroUsize::new
            // returns Some.
            let cap = NonZeroUsize::new(cache_size).expect("cache_size > 0");
            Some(Arc::new(Mutex::new(LruCache::new(cap))))
        } else {
            None
        };
        Self {
            client,
            base_url: base_url.trim_end_matches('/').to_string(),
            gateway_url: None,
            embed_cache,
            embed_cache_hits: Arc::new(AtomicU64::new(0)),
            embed_cache_misses: Arc::new(AtomicU64::new(0)),
            cached_dim: Arc::new(AtomicU64::new(0)),
        }
    }
    /// Cache hit/miss/size snapshot. Useful for /admin endpoints +
    /// load-test validation ("did the cache fire as expected?").
    pub fn embed_cache_stats(&self) -> (u64, u64, usize) {
        let hits = self.embed_cache_hits.load(Ordering::Relaxed);
        let misses = self.embed_cache_misses.load(Ordering::Relaxed);
        let len = self
            .embed_cache
            .as_ref()
            .map(|c| c.lock().map(|g| g.len()).unwrap_or(0))
            .unwrap_or(0);
        (hits, misses, len)
    }
    /// Same as `new`, but every `generate()` is routed through
    /// `${gateway_url}/v1/chat` (provider=ollama) for observability.
    /// Use this for callers OUTSIDE the gateway. Inside the gateway
@ -118,50 +196,222 @@ impl AiClient {
        c
    }
    /// Reachability + version check. Hits Ollama's `/api/version`,
    /// returns a sidecar-shaped envelope so callers reading
    /// `.status` / `.ollama_url` don't break across the
    /// pre-/post-2026-05-02 cutover.
    pub async fn health(&self) -> Result<serde_json::Value, String> {
        let resp = self.client
-            .get(format!("{}/health", self.base_url))
+            .get(format!("{}/api/version", self.base_url))
            .send()
            .await
-            .map_err(|e| format!("sidecar unreachable: {e}"))?;
+            .map_err(|e| format!("ollama unreachable: {e}"))?;
-        resp.json().await.map_err(|e| format!("invalid response: {e}"))
+        let body: serde_json::Value = resp.json().await
            .map_err(|e| format!("invalid response: {e}"))?;
        Ok(serde_json::json!({
            "status": "ok",
            "ollama_url": &self.base_url,
            "ollama_version": body.get("version"),
        }))
    }
    /// Embed with per-text LRU caching. Mirrors Go-side
    /// CachedProvider behavior: cache key is (model, text);
    /// cache-hit texts skip the sidecar; cache-miss texts batch
    /// into a single sidecar call; results are interleaved in the
    /// caller's input order.
    ///
    /// Closes ~95% of the load-test perf gap vs Go side (loadgen
    /// 2026-05-01: Rust 128 RPS → with cache ≥ 7000 RPS expected
    /// for warm-cache workloads). Cold-cache behavior unchanged
    /// (every text is a miss → single sidecar call, identical to
    /// pre-cache).
    pub async fn embed(&self, req: EmbedRequest) -> Result<EmbedResponse, String> {
-        let resp = self.client
+        let model_key = req.model.clone().unwrap_or_default();
            .post(format!("{}/embed", self.base_url))
            .json(&req)
            .send()
            .await
            .map_err(|e| format!("embed request failed: {e}"))?;
-        if !resp.status().is_success() {
+        // Fast path: cache disabled → original behavior.
-            let text = resp.text().await.unwrap_or_default();
+        let Some(cache) = self.embed_cache.as_ref() else {
-            return Err(format!("embed error ({}): {text}", text.len()));
+            return self.embed_uncached(&req).await;
        };
        if req.texts.is_empty() {
            return self.embed_uncached(&req).await;
        }
-        resp.json().await.map_err(|e| format!("embed parse error: {e}"))
+
        // First pass: check cache for each text. Track which positions
        // need a sidecar fetch.
        let mut embeddings: Vec<Option<Vec<f64>>> = vec![None; req.texts.len()];
        let mut miss_indices: Vec<usize> = Vec::new();
        let mut miss_texts: Vec<String> = Vec::new();
        {
            let mut guard = cache.lock().map_err(|e| format!("cache lock poisoned: {e}"))?;
            for (i, text) in req.texts.iter().enumerate() {
                let key = EmbedCacheKey { model: model_key.clone(), text: text.clone() };
                if let Some(vec) = guard.get(&key) {
                    embeddings[i] = Some(vec.clone());
                    self.embed_cache_hits.fetch_add(1, Ordering::Relaxed);
                } else {
                    miss_indices.push(i);
                    miss_texts.push(text.clone());
                    self.embed_cache_misses.fetch_add(1, Ordering::Relaxed);
                }
            }
        }
        // All hit? Return immediately. Use cached_dim to populate
        // the response dimension (no sidecar to ask).
        if miss_indices.is_empty() {
            let dim = self.cached_dim.load(Ordering::Relaxed) as usize;
            let dim = if dim == 0 { embeddings[0].as_ref().map(|v| v.len()).unwrap_or(0) } else { dim };
            return Ok(EmbedResponse {
                embeddings: embeddings.into_iter().map(|opt| opt.expect("filled")).collect(),
                model: req.model.unwrap_or_else(|| "nomic-embed-text".to_string()),
                dimensions: dim,
            });
        }
        // Second pass: fetch the misses in one sidecar call.
        let miss_req = EmbedRequest { texts: miss_texts.clone(), model: req.model.clone() };
        let resp = self.embed_uncached(&miss_req).await?;
        if resp.embeddings.len() != miss_texts.len() {
            return Err(format!(
                "embed cache: sidecar returned {} embeddings for {} texts",
                resp.embeddings.len(),
                miss_texts.len()
            ));
        }
        // Pin cached_dim on first successful response.
        if resp.dimensions > 0 {
            self.cached_dim.store(resp.dimensions as u64, Ordering::Relaxed);
        }
        // Insert misses into cache + fill response slots.
        {
            let mut guard = cache.lock().map_err(|e| format!("cache lock poisoned: {e}"))?;
            for (j, idx) in miss_indices.iter().enumerate() {
                let key = EmbedCacheKey {
                    model: model_key.clone(),
                    text: miss_texts[j].clone(),
                };
                let vec = resp.embeddings[j].clone();
                guard.put(key, vec.clone());
                embeddings[*idx] = Some(vec);
            }
        }
        Ok(EmbedResponse {
            embeddings: embeddings.into_iter().map(|opt| opt.expect("filled")).collect(),
            model: resp.model,
            dimensions: resp.dimensions,
        })
    }
    /// Direct Ollama call — used internally by embed() for cache-miss
    /// batches and as the transparent fallback when the cache is
    /// disabled. Loops per-text against `${base_url}/api/embed`,
    /// matching the sidecar's pre-2026-05-02 behavior. Ollama 0.4+
    /// supports batch input but per-text keeps compatibility broader
    /// + lets cache-miss-only batches share the loop with cold runs.
    async fn embed_uncached(&self, req: &EmbedRequest) -> Result<EmbedResponse, String> {
        let model = req.model.clone().unwrap_or_else(|| "nomic-embed-text".to_string());
        let mut embeddings: Vec<Vec<f64>> = Vec::with_capacity(req.texts.len());
        for text in &req.texts {
            let resp = self.client
                .post(format!("{}/api/embed", self.base_url))
                .json(&serde_json::json!({
                    "model": &model,
                    "input": text,
                }))
                .send()
                .await
                .map_err(|e| format!("embed request failed: {e}"))?;
            if !resp.status().is_success() {
                let body = resp.text().await.unwrap_or_default();
                return Err(format!("ollama embed error: {body}"));
            }
            // Ollama returns {"embeddings": [[...]], "model": "...", ...}.
            // The outer `embeddings` is always a list; for a scalar input
            // we get a single inner vector.
            let parsed: serde_json::Value = resp.json().await
                .map_err(|e| format!("embed parse error: {e}"))?;
            let arr = parsed.get("embeddings")
                .and_then(|v| v.as_array())
                .ok_or_else(|| format!("ollama embed: missing 'embeddings' field in {parsed}"))?;
            if arr.is_empty() {
                return Err("ollama embed: empty embeddings array".to_string());
            }
            let first = arr[0].as_array()
                .ok_or_else(|| "ollama embed: embeddings[0] not an array".to_string())?;
            let vec: Vec<f64> = first.iter()
                .filter_map(|n| n.as_f64())
                .collect();
            if vec.is_empty() {
                return Err("ollama embed: numeric coercion produced empty vector".to_string());
            }
            embeddings.push(vec);
        }
        let dimensions = embeddings.first().map(|v| v.len()).unwrap_or(0);
        Ok(EmbedResponse {
            embeddings,
            model,
            dimensions,
        })
    }
    pub async fn generate(&self, req: GenerateRequest) -> Result<GenerateResponse, String> {
        if let Some(gw) = self.gateway_url.as_deref() {
            return self.generate_via_gateway(gw, req).await;
        }
-        // Direct-sidecar legacy path. Used by gateway internals (so
+        // Direct Ollama path. Used by gateway internals (so the ollama
-        // ollama_arm can call sidecar without a self-loop) and by
+        // provider can call upstream without a self-loop through
-        // any consumer that wants raw transport without /v1/usage
+        // /v1/chat) and by any consumer that wants raw transport
-        // accounting.
+        // without /v1/usage accounting.
        let model = req.model.clone().unwrap_or_else(|| "qwen3.5:latest".to_string());
        let mut body = serde_json::json!({
            "model": &model,
            "prompt": &req.prompt,
            "stream": false,
        });
        let mut options = serde_json::Map::new();
        if let Some(t) = req.temperature {
            options.insert("temperature".to_string(), serde_json::json!(t));
        }
        if let Some(mt) = req.max_tokens {
            options.insert("num_predict".to_string(), serde_json::json!(mt));
        }
        if !options.is_empty() {
            body["options"] = serde_json::Value::Object(options);
        }
        if let Some(sys) = &req.system {
            body["system"] = serde_json::json!(sys);
        }
        if let Some(th) = req.think {
            body["think"] = serde_json::json!(th);
        }
        let resp = self.client
-            .post(format!("{}/generate", self.base_url))
+            .post(format!("{}/api/generate", self.base_url))
-            .json(&req)
+            .json(&body)
            .send()
            .await
            .map_err(|e| format!("generate request failed: {e}"))?;
        if !resp.status().is_success() {
            let text = resp.text().await.unwrap_or_default();
-            return Err(format!("generate error: {text}"));
+            return Err(format!("ollama generate error: {text}"));
        }
-        resp.json().await.map_err(|e| format!("generate parse error: {e}"))
+        let parsed: serde_json::Value = resp.json().await
            .map_err(|e| format!("generate parse error: {e}"))?;
        Ok(GenerateResponse {
            text: parsed.get("response").and_then(|v| v.as_str()).unwrap_or("").to_string(),
            model,
            tokens_evaluated: parsed.get("prompt_eval_count").and_then(|v| v.as_u64()),
            tokens_generated: parsed.get("eval_count").and_then(|v| v.as_u64()),
        })
    }
    /// Phase 44 part 2: route generate() through the gateway's
@ -217,19 +467,60 @@ impl AiClient {
        })
    }
    /// Cross-encoder reranking via Ollama generate. Asks the model to
    /// rate each document's relevance to the query 0-10, then sorts
    /// descending. Mirrors the sidecar's pre-2026-05-02 algorithm
    /// exactly so callers see the same scores.
    pub async fn rerank(&self, req: RerankRequest) -> Result<RerankResponse, String> {
-        let resp = self.client
+        let model = req.model.clone().unwrap_or_else(|| "qwen3.5:latest".to_string());
-            .post(format!("{}/rerank", self.base_url))
+        let mut scored: Vec<ScoredDocument> = Vec::with_capacity(req.documents.len());
            .json(&req)
            .send()
            .await
            .map_err(|e| format!("rerank request failed: {e}"))?;
-        if !resp.status().is_success() {
+        for (i, doc) in req.documents.iter().enumerate() {
-            let text = resp.text().await.unwrap_or_default();
+            let prompt = format!(
-            return Err(format!("rerank error: {text}"));
+                "Rate the relevance of the following document to the query on a scale of 0 to 10. \
                 Respond with ONLY a number.\n\n\
                 Query: {}\n\n\
                 Document: {}\n\n\
                 Score:",
                req.query, doc,
            );
            let resp = self.client
                .post(format!("{}/api/generate", self.base_url))
                .json(&serde_json::json!({
                    "model": &model,
                    "prompt": prompt,
                    "stream": false,
                    "options": {"temperature": 0.0, "num_predict": 8},
                }))
                .send()
                .await
                .map_err(|e| format!("rerank request failed: {e}"))?;
            if !resp.status().is_success() {
                let body = resp.text().await.unwrap_or_default();
                return Err(format!("ollama rerank error: {body}"));
            }
            let parsed: serde_json::Value = resp.json().await
                .map_err(|e| format!("rerank parse error: {e}"))?;
            let text = parsed.get("response").and_then(|v| v.as_str()).unwrap_or("").trim();
            // Parse the leading number; tolerate "7", "7.5", "7 — strong match".
            let score = text.split_whitespace().next()
                .and_then(|t| t.parse::<f64>().ok())
                .unwrap_or(0.0)
                .clamp(0.0, 10.0);
            scored.push(ScoredDocument {
                index: i,
                text: doc.clone(),
                score,
            });
        }
-        resp.json().await.map_err(|e| format!("rerank parse error: {e}"))
+
        scored.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal));
        if let Some(k) = req.top_k {
            scored.truncate(k);
        }
        Ok(RerankResponse { results: scored, model })
    }
    /// Force Ollama to unload the named model from VRAM (keep_alive=0).
@ -238,40 +529,116 @@ impl AiClient {
    /// profile's model can linger in VRAM next to the new one.
    pub async fn unload_model(&self, model: &str) -> Result<serde_json::Value, String> {
        let resp = self.client
-            .post(format!("{}/admin/unload", self.base_url))
+            .post(format!("{}/api/generate", self.base_url))
-            .json(&serde_json::json!({ "model": model }))
+            .json(&serde_json::json!({
                "model": model,
                "prompt": "",
                "keep_alive": 0,
                "stream": false,
            }))
            .send().await
            .map_err(|e| format!("unload request failed: {e}"))?;
        if !resp.status().is_success() {
            let text = resp.text().await.unwrap_or_default();
-            return Err(format!("unload error: {text}"));
+            return Err(format!("ollama unload error: {text}"));
        }
-        resp.json().await.map_err(|e| format!("unload parse error: {e}"))
+        // Ollama returns 200 with the empty-prompt response shape.
        // Fold into the legacy {"unloaded": "<model>"} envelope so
        // callers' parsing doesn't break.
        Ok(serde_json::json!({ "unloaded": model }))
    }
    /// Ask Ollama to load the named model into VRAM proactively. Makes
    /// the first real request after profile activation fast (no cold-load
-    /// latency).
+    /// latency). Empty prompts confuse some models, so we send a single
    /// space + cap num_predict=1 (matches the sidecar's prior behavior).
    pub async fn preload_model(&self, model: &str) -> Result<serde_json::Value, String> {
        let resp = self.client
-            .post(format!("{}/admin/preload", self.base_url))
+            .post(format!("{}/api/generate", self.base_url))
-            .json(&serde_json::json!({ "model": model }))
+            .json(&serde_json::json!({
                "model": model,
                "prompt": " ",
                "keep_alive": "5m",
                "stream": false,
                "options": {"num_predict": 1},
            }))
            .send().await
            .map_err(|e| format!("preload request failed: {e}"))?;
        if !resp.status().is_success() {
            let text = resp.text().await.unwrap_or_default();
-            return Err(format!("preload error: {text}"));
+            return Err(format!("ollama preload error: {text}"));
        }
-        resp.json().await.map_err(|e| format!("preload parse error: {e}"))
+        let parsed: serde_json::Value = resp.json().await
            .map_err(|e| format!("preload parse error: {e}"))?;
        Ok(serde_json::json!({
            "preloaded": model,
            "load_duration_ns": parsed.get("load_duration"),
            "total_duration_ns": parsed.get("total_duration"),
        }))
    }
-    /// GPU + loaded-model snapshot from the sidecar. Combines nvidia-smi
+    /// GPU + loaded-model snapshot. Combines nvidia-smi output (when
-    /// output (if available) with Ollama's /api/ps.
+    /// available) with Ollama's /api/ps. Same shape as the prior
    /// sidecar /admin/vram endpoint so callers don't need updating.
    pub async fn vram_snapshot(&self) -> Result<serde_json::Value, String> {
        let resp = self.client
-            .get(format!("{}/admin/vram", self.base_url))
+            .get(format!("{}/api/ps", self.base_url))
            .send().await
-            .map_err(|e| format!("vram request failed: {e}"))?;
+            .map_err(|e| format!("ollama ps request failed: {e}"))?;
-        resp.json().await.map_err(|e| format!("vram parse error: {e}"))
+        let loaded: Vec<serde_json::Value> = if resp.status().is_success() {
            let parsed: serde_json::Value = resp.json().await.unwrap_or(serde_json::Value::Null);
            parsed.get("models")
                .and_then(|v| v.as_array())
                .map(|arr| arr.iter().map(|m| serde_json::json!({
                    "name": m.get("name"),
                    "size_vram_mib": m.get("size_vram").and_then(|v| v.as_u64()).map(|n| n / (1024 * 1024)),
                    "expires_at": m.get("expires_at"),
                })).collect())
                .unwrap_or_default()
        } else {
            Vec::new()
        };
        let gpu = nvidia_smi_snapshot();
        Ok(serde_json::json!({
            "gpu": gpu,
            "ollama_loaded": loaded,
        }))
    }
 }
 /// One-shot nvidia-smi poll. Returns Null if the tool isn't on PATH
 /// or the call fails. Mirrors the sidecar's `_nvidia_smi_snapshot`
 /// shape exactly so callers reading vram_snapshot don't break.
 fn nvidia_smi_snapshot() -> serde_json::Value {
    use std::process::Command;
    let out = Command::new("nvidia-smi")
        .args([
            "--query-gpu=memory.used,memory.total,utilization.gpu,name",
            "--format=csv,noheader,nounits",
        ])
        .output();
    let stdout = match out {
        Ok(o) if o.status.success() => o.stdout,
        _ => return serde_json::Value::Null,
    };
    let line = String::from_utf8_lossy(&stdout);
    let line = line.trim();
    if line.is_empty() {
        return serde_json::Value::Null;
    }
    let parts: Vec<&str> = line.split(',').map(|s| s.trim()).collect();
    if parts.len() < 4 {
        return serde_json::Value::Null;
    }
    let used = parts[0].parse::<u64>().unwrap_or(0);
    let total = parts[1].parse::<u64>().unwrap_or(0);
    let util = parts[2].parse::<u64>().unwrap_or(0);
    serde_json::json!({
        "name": parts[3],
        "used_mib": used,
        "total_mib": total,
        "utilization_pct": util,
    })
 }
--- a/crates/gateway/src/bin/parity_extract_json.rs
+++ b/crates/gateway/src/bin/parity_extract_json.rs
@ -0,0 +1,37 @@
 //! Cross-runtime parity helper for `extract_json`.
 //!
 //! Reads a single model-output string from stdin, runs the Rust
 //! extract_json, prints `{"matched": bool, "value": <object|null>}`
 //! to stdout as JSON. Exit 0 on success, exit 1 on internal error.
 //!
 //! The Go counterpart lives at
 //! `golangLAKEHOUSE/internal/validator/iterate.go::ExtractJSON`. The
 //! parity probe at
 //! `golangLAKEHOUSE/scripts/cutover/parity/extract_json_parity.sh`
 //! feeds the same fixtures through both and diffs the outputs.
 //!
 //! Usage:
 //!   echo '<raw model output>' | parity_extract_json
 //!   parity_extract_json <<< '...'
 use std::io::Read;
 fn main() {
    let mut buf = String::new();
    if let Err(e) = std::io::stdin().read_to_string(&mut buf) {
        eprintln!("read stdin: {e}");
        std::process::exit(1);
    }
    let result = gateway::v1::iterate::extract_json(&buf);
    let body = serde_json::json!({
        "matched": result.is_some(),
        "value": result.unwrap_or(serde_json::Value::Null),
    });
    match serde_json::to_string(&body) {
        Ok(s) => println!("{s}"),
        Err(e) => {
            eprintln!("serialize result: {e}");
            std::process::exit(1);
        }
    }
 }
--- a/crates/gateway/src/bin/parity_session_log.rs
+++ b/crates/gateway/src/bin/parity_session_log.rs
@ -0,0 +1,71 @@
 //! Cross-runtime parity helper for `SessionRecord` JSON shape.
 //!
 //! Reads a fixture JSON on stdin, builds a `SessionRecord`, emits
 //! one JSONL row on stdout. Used by
 //! `golangLAKEHOUSE/scripts/cutover/parity/session_log_parity.sh`
 //! to verify the Rust gateway's session log shape stays byte-equal
 //! to the Go-side validatord's `validator.SessionRecord` (commit
 //! 1a3a82a in golangLAKEHOUSE).
 use gateway::v1::session_log::{SessionAttemptRecord, SessionRecord, SESSION_RECORD_SCHEMA};
 use serde::Deserialize;
 use std::io::Read;
 #[derive(Deserialize)]
 struct FixtureInput {
    session_id: String,
    kind: String,
    model: String,
    provider: String,
    prompt: String,
    iterations: u32,
    max_iterations: u32,
    final_verdict: String,
    attempts: Vec<SessionAttemptRecord>,
    #[serde(default)]
    artifact: Option<serde_json::Value>,
    #[serde(default)]
    grounded_in_roster: Option<bool>,
    duration_ms: u64,
 }
 fn main() {
    let mut buf = String::new();
    if let Err(e) = std::io::stdin().read_to_string(&mut buf) {
        eprintln!("read stdin: {e}");
        std::process::exit(1);
    }
    let input: FixtureInput = match serde_json::from_str(&buf) {
        Ok(v) => v,
        Err(e) => {
            eprintln!("parse stdin: {e}");
            std::process::exit(1);
        }
    };
    let rec = SessionRecord {
        schema: SESSION_RECORD_SCHEMA.to_string(),
        session_id: input.session_id,
        // Pinned timestamp so both runtimes' rows compare byte-equal
        // when the test wrapper normalizes on `daemon` only.
        timestamp: "2026-01-01T00:00:00+00:00".to_string(),
        daemon: "gateway".to_string(),
        kind: input.kind,
        model: input.model,
        provider: input.provider,
        prompt: input.prompt,
        iterations: input.iterations,
        max_iterations: input.max_iterations,
        final_verdict: input.final_verdict,
        attempts: input.attempts,
        artifact: input.artifact,
        grounded_in_roster: input.grounded_in_roster,
        duration_ms: input.duration_ms,
    };
    match serde_json::to_string(&rec) {
        Ok(s) => println!("{s}"),
        Err(e) => {
            eprintln!("marshal: {e}");
            std::process::exit(1);
        }
    }
 }
--- a/crates/gateway/src/execution_loop/mod.rs
+++ b/crates/gateway/src/execution_loop/mod.rs
@ -438,6 +438,10 @@ impl ExecutionLoop {
                start_time: start_time.to_rfc3339(),
                end_time: end_time.to_rfc3339(),
                latency_ms: elapsed_ms,
                // Internal execution-loop traffic is its own top-level
                // trace per call. If a future caller threads a parent
                // trace into self.state, lift this to Some(parent_id).
                parent_trace_id: None,
            });
        }
@ -582,10 +586,10 @@ impl ExecutionLoop {
    /// Phase 20 step (8) — T3 overseer escalation.
    ///
    /// When the local executor/reviewer loop can't self-correct, call
-    /// the cloud overseer (`gpt-oss:120b` via Ollama Cloud) with (a)
+    /// the cloud overseer (`claude-opus-4-7` via OpenCode Zen) with
-    /// the KB context — recent outcomes + prior corrections for this
+    /// (a) the KB context — recent outcomes + prior corrections for
-    /// sig_hash + task_class, across every profile that has run it —
+    /// this sig_hash + task_class, across every profile that has run
-    /// and (b) the recent log tail. Its output is appended as a
+    /// it — and (b) the recent log tail. Its output is appended as a
    /// `system` role turn so the next executor generation sees it,
    /// AND written to `data/_kb/overseer_corrections.jsonl` so every
    /// future profile activation reads from the same learning pool.
@ -593,9 +597,16 @@ impl ExecutionLoop {
    /// This is the "pipe to the overviewer" piece from 2026-04-23 —
    /// the overseer is now a first-class KB consumer AND producer, not
    /// a one-shot correction oracle.
    ///
    /// 2026-04-28: routed through OpenCode (Zen tier) for Claude Opus
    /// 4.7. Frontier reasoning matters here because the overseer fires
    /// only after local self-correction has failed twice — by that
    /// point we need the strongest reasoning available, not the
    /// cheapest token. Frequency is low so the Zen pay-per-token cost
    /// stays bounded.
    async fn escalate_to_overseer(&mut self, turn: u32, reason: &str) -> Result<(), String> {
-        let Some(cloud_key) = self.state.ollama_cloud_key.clone() else {
+        let Some(opencode_key) = self.state.opencode_key.clone() else {
-            return Err("OLLAMA_CLOUD_KEY not configured — skipping escalation".into());
+            return Err("OPENCODE_API_KEY not configured — skipping escalation".into());
        };
        let kb = KbContext::load_for(&sig_hash(&self.req), &self.req.task_class).await;
@ -604,16 +615,18 @@ impl ExecutionLoop {
        let started = std::time::Instant::now();
        let start_time = chrono::Utc::now();
        let chat_req = crate::v1::ChatRequest {
-            model: "gpt-oss:120b".to_string(),
+            model: "claude-opus-4-7".to_string(),
            messages: vec![crate::v1::Message::new_text("user", prompt.clone())],
            temperature: Some(0.1),
            max_tokens: None,
            stream: Some(false),
-            think: Some(true),    // overseer KEEPS thinking (Phase 20 rule)
+            // Anthropic models on opencode reject `think` (handled in
-            provider: Some("ollama_cloud".into()),
+            // the adapter), but we keep the intent flag for parity.
            think: Some(true),
            provider: Some("opencode".into()),
        };
-        let resp = crate::v1::ollama_cloud::chat(&cloud_key, &chat_req).await
+        let resp = crate::v1::opencode::chat(&opencode_key, &chat_req).await
-            .map_err(|e| format!("ollama_cloud: {e}"))?;
+            .map_err(|e| format!("opencode: {e}"))?;
        let latency_ms = started.elapsed().as_millis() as u64;
        let end_time = chrono::Utc::now();
        let correction_text: String = resp.choices.into_iter().next()
@ -633,8 +646,8 @@ impl ExecutionLoop {
        if let Some(lf) = &self.state.langfuse {
            use crate::v1::langfuse_trace::ChatTrace;
            lf.emit_chat(ChatTrace {
-                provider: "ollama_cloud".into(),
+                provider: "opencode".into(),
-                model: "gpt-oss:120b".into(),
+                model: "claude-opus-4-7".into(),
                input: vec![crate::v1::Message::new_text("user", prompt.clone())],
                output: correction_text.clone(),
                prompt_tokens: resp.usage.prompt_tokens,
@ -645,12 +658,13 @@ impl ExecutionLoop {
                start_time: start_time.to_rfc3339(),
                end_time: end_time.to_rfc3339(),
                latency_ms,
                parent_trace_id: None,
            });
        }
        // Append to the transcript so the next executor turn sees it.
        self.append(LogEntry::new(
-            turn, "system", "gpt-oss:120b", "overseer_correction",
+            turn, "system", "claude-opus-4-7", "overseer_correction",
            serde_json::json!({
                "reason": reason,
                "correction": correction_text,
@ -672,7 +686,7 @@ impl ExecutionLoop {
            "task_class": self.req.task_class,
            "operation": self.req.operation,
            "reason": reason,
-            "model": "gpt-oss:120b",
+            "model": "claude-opus-4-7",
            "correction": correction_text,
            "applied_at_turn": turn,
            "kb_context_used": kb,
--- a/crates/gateway/src/lib.rs
+++ b/crates/gateway/src/lib.rs
@ -0,0 +1,19 @@
 //! Library facade for the gateway crate so sub-binaries (e.g.
 //! `parity_extract_json`) can reuse the same modules the gateway
 //! binary uses.
 //!
 //! Added 2026-05-02 to support the cross-runtime parity probe at
 //! `golangLAKEHOUSE/scripts/cutover/parity/extract_json_parity.sh`.
 //! `extract_json` is the load-bearing public surface for that probe.
 //!
 //! main.rs still uses local `mod foo;` declarations independently —
 //! adding this file is purely additive (the binary's module tree is
 //! unchanged).
 pub mod access;
 pub mod access_service;
 pub mod auth;
 pub mod execution_loop;
 pub mod observability;
 pub mod tools;
 pub mod v1;
--- a/crates/gateway/src/main.rs
+++ b/crates/gateway/src/main.rs
@ -362,6 +362,22 @@ async fn main() {
                }
                c
            },
            // Coordinator session JSONL — one row per /v1/iterate
            // session for offline DuckDB analysis. Cross-runtime
            // parity with Go-side validatord (commit 1a3a82a).
            session_log: {
                let path = &config.gateway.session_log_path;
                let s = v1::session_log::SessionLogger::from_path(path);
                if s.is_some() {
                    tracing::info!(
                        "v1: session log enabled — coordinator sessions written to {}",
                        path
                    );
                } else {
                    tracing::info!("v1: session log disabled (set [gateway].session_log_path to enable)");
                }
                s
            },
        }));
    // Auth middleware (if enabled) — P5-001 fix 2026-04-23:
--- a/crates/gateway/src/v1/iterate.rs
+++ b/crates/gateway/src/v1/iterate.rs
@ -21,12 +21,19 @@
 //! re-implementation. Staffing executors, agent loops, and future
 //! validators all reach the same code path.
-use axum::{extract::State, http::StatusCode, response::IntoResponse, Json};
+use axum::{extract::State, http::{HeaderMap, StatusCode}, response::IntoResponse, Json};
 use serde::{Deserialize, Serialize};
 const DEFAULT_MAX_ITERATIONS: u32 = 3;
 const LOOPBACK_TIMEOUT_SECS: u64 = 240;
 /// Header name used to propagate a Langfuse parent trace id across
 /// daemon boundaries. Matches Go's `shared.TraceIDHeader` constant
 /// byte-for-byte (commit d6d2fdf in golangLAKEHOUSE) — same wire
 /// format means a Go caller can hit Rust's /v1/iterate (or vice
 /// versa) and the resulting Langfuse trees nest correctly.
 pub const TRACE_ID_HEADER: &str = "x-lakehouse-trace-id";
 #[derive(Deserialize)]
 pub struct IterateRequest {
    /// "fill" | "email" | "playbook" — picks which validator runs.
@ -80,6 +87,14 @@ pub struct IterateResponse {
    pub validation: serde_json::Value,
    pub iterations: u32,
    pub history: Vec<IterateAttempt>,
    /// Echoes the resolved trace id (caller-forwarded header, body
    /// field, langfuse-middleware mint, or local fallback). Operators
    /// pivot from this id straight into Langfuse + the
    /// coordinator_sessions.jsonl join key. Cross-runtime parity with
    /// Go's `validator.IterateResponse` (commit 6847bbc in
    /// golangLAKEHOUSE).
    #[serde(skip_serializing_if = "Option::is_none")]
    pub trace_id: Option<String>,
 }
 #[derive(Serialize)]
@ -87,29 +102,52 @@ pub struct IterateFailure {
    pub error: String,
    pub iterations: u32,
    pub history: Vec<IterateAttempt>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub trace_id: Option<String>,
 }
 pub async fn iterate(
    State(state): State<super::V1State>,
    headers: HeaderMap,
    Json(req): Json<IterateRequest>,
 ) -> impl IntoResponse {
    let max_iter = req.max_iterations.unwrap_or(DEFAULT_MAX_ITERATIONS).max(1);
    let temperature = req.temperature.unwrap_or(0.2);
    let max_tokens = req.max_tokens.unwrap_or(4096);
    let mut history: Vec<IterateAttempt> = Vec::with_capacity(max_iter as usize);
    let mut attempt_records: Vec<super::session_log::SessionAttemptRecord> = Vec::with_capacity(max_iter as usize);
    let mut current_prompt = req.prompt.clone();
    // Resolve the parent Langfuse trace id. Caller-forwarded header
    // wins (cross-daemon tree linkage); otherwise mint a fresh id so
    // the iterate session is its own tree. Same shape as the Go-side
    // validatord trace propagation.
    let trace_id: String = headers
        .get(TRACE_ID_HEADER)
        .and_then(|v| v.to_str().ok())
        .filter(|s| !s.is_empty())
        .map(|s| s.to_string())
        .unwrap_or_else(new_trace_id);
    let client = match reqwest::Client::builder()
        .timeout(std::time::Duration::from_secs(LOOPBACK_TIMEOUT_SECS))
        .build() {
        Ok(c) => c,
-        Err(e) => return (StatusCode::INTERNAL_SERVER_ERROR, format!("client build: {e}")).into_response(),
+        Err(e) => {
            // Even infrastructure failures get a session row so a
            // missing /v1/iterate event never silently disappears
            // from the longitudinal log.
            write_infra_error(&state, &req, &trace_id, max_iter, 0, format!("client build: {e}")).await;
            return (StatusCode::INTERNAL_SERVER_ERROR, format!("client build: {e}")).into_response();
        }
    };
    // Self-loopback to the gateway port. Carries gateway internal
    // calls through /v1/chat + /v1/validate so /v1/usage tracks them.
    let gateway = "http://127.0.0.1:3100";
    let t0 = std::time::Instant::now();
    for iteration in 0..max_iter {
        let attempt_started = chrono::Utc::now();
        // ── Generate ──
        let mut messages = Vec::with_capacity(2);
        if let Some(sys) = &req.system {
@ -123,20 +161,33 @@ pub async fn iterate(
            "temperature": temperature,
            "max_tokens": max_tokens,
        });
-        let raw = match call_chat(&client, gateway, &chat_body).await {
+        let raw = match call_chat(&client, gateway, &chat_body, &trace_id).await {
            Ok(r) => r,
-            Err(e) => return (StatusCode::BAD_GATEWAY, format!("/v1/chat hop failed at iter {iteration}: {e}")).into_response(),
+            Err(e) => {
                write_infra_error(&state, &req, &trace_id, max_iter, t0.elapsed().as_millis() as u64, format!("/v1/chat hop failed at iter {iteration}: {e}")).await;
                return (StatusCode::BAD_GATEWAY, format!("/v1/chat hop failed at iter {iteration}: {e}")).into_response();
            }
        };
        // ── Extract JSON ──
        let artifact = match extract_json(&raw) {
            Some(a) => a,
            None => {
                let span_id = emit_attempt_span(
                    &state, &trace_id, iteration, &req, &current_prompt, &raw, "no_json", None,
                    attempt_started, chrono::Utc::now(),
                );
                history.push(IterateAttempt {
                    iteration,
                    raw: raw.chars().take(2000).collect(),
                    status: AttemptStatus::NoJson,
                });
                attempt_records.push(super::session_log::SessionAttemptRecord {
                    iteration,
                    verdict_kind: "no_json".to_string(),
                    error: None,
                    span_id,
                });
                current_prompt = format!(
                    "{}\n\nYour previous attempt did not contain a JSON object. Reply with ONLY a valid JSON object matching the requested artifact shape.",
                    req.prompt,
@ -151,22 +202,41 @@ pub async fn iterate(
            "artifact": artifact,
            "context": req.context.clone().unwrap_or(serde_json::Value::Null),
        });
-        match call_validate(&client, gateway, &validate_body).await {
+        match call_validate(&client, gateway, &validate_body, &trace_id).await {
            Ok(report) => {
                let span_id = emit_attempt_span(
                    &state, &trace_id, iteration, &req, &current_prompt, &raw, "accepted", None,
                    attempt_started, chrono::Utc::now(),
                );
                history.push(IterateAttempt {
                    iteration,
                    raw: raw.chars().take(2000).collect(),
                    status: AttemptStatus::Accepted,
                });
                attempt_records.push(super::session_log::SessionAttemptRecord {
                    iteration,
                    verdict_kind: "accepted".to_string(),
                    error: None,
                    span_id,
                });
                let duration_ms = t0.elapsed().as_millis() as u64;
                let grounded = grounded_in_roster(&state, &req.kind, &artifact);
                write_session_accepted(&state, &req, &trace_id, iteration + 1, max_iter, attempt_records, &artifact, grounded, duration_ms).await;
                return (StatusCode::OK, Json(IterateResponse {
                    artifact,
                    validation: report,
                    iterations: iteration + 1,
                    history,
                    trace_id: Some(trace_id.clone()),
                })).into_response();
            }
            Err(err) => {
                let err_summary = err.to_string();
                let span_id = emit_attempt_span(
                    &state, &trace_id, iteration, &req, &current_prompt, &raw, "validation_failed",
                    Some(err_summary.clone()),
                    attempt_started, chrono::Utc::now(),
                );
                history.push(IterateAttempt {
                    iteration,
                    raw: raw.chars().take(2000).collect(),
@ -174,6 +244,12 @@ pub async fn iterate(
                        error: serde_json::to_value(&err_summary).unwrap_or(serde_json::Value::Null),
                    },
                });
                attempt_records.push(super::session_log::SessionAttemptRecord {
                    iteration,
                    verdict_kind: "validation_failed".to_string(),
                    error: Some(err_summary.clone()),
                    span_id,
                });
                // Append validation feedback to prompt for next iter.
                // The model sees concrete failure mode + retries with
                // corrective context. This is the "observer correction"
@ -188,19 +264,167 @@ pub async fn iterate(
        }
    }
    let duration_ms = t0.elapsed().as_millis() as u64;
    write_session_failure(&state, &req, &trace_id, max_iter, max_iter, attempt_records, duration_ms).await;
    (StatusCode::UNPROCESSABLE_ENTITY, Json(IterateFailure {
        error: format!("max iterations reached ({max_iter}) without passing validation"),
        iterations: max_iter,
        history,
        trace_id: Some(trace_id.clone()),
    })).into_response()
 }
-async fn call_chat(client: &reqwest::Client, gateway: &str, body: &serde_json::Value) -> Result<String, String> {
+// ─── Helpers — Langfuse spans + session log + roster check ─────────
-    let resp = client.post(format!("{gateway}/v1/chat"))
+
-        .json(body)
+fn emit_attempt_span(
-        .send()
+    state: &super::V1State,
-        .await
+    trace_id: &str,
-        .map_err(|e| format!("chat hop: {e}"))?;
+    iteration: u32,
    req: &IterateRequest,
    prompt: &str,
    raw: &str,
    verdict: &str,
    error: Option<String>,
    started: chrono::DateTime<chrono::Utc>,
    ended: chrono::DateTime<chrono::Utc>,
 ) -> Option<String> {
    let lf = state.langfuse.as_ref()?;
    Some(lf.emit_attempt_span(super::langfuse_trace::AttemptSpan {
        trace_id: trace_id.to_string(),
        iteration,
        model: req.model.clone(),
        provider: req.provider.clone(),
        prompt: prompt.to_string(),
        raw: raw.to_string(),
        verdict: verdict.to_string(),
        error,
        start_time: started.to_rfc3339(),
        end_time: ended.to_rfc3339(),
    }))
 }
 /// Verify every fill artifact's candidate IDs exist in the roster.
 /// Returns Some(true)/Some(false) on the fill kind, None otherwise
 /// (other kinds don't have worker IDs to ground). Same semantics as
 /// Go's `handlers.rosterCheckFor("fill")`.
 fn grounded_in_roster(
    state: &super::V1State,
    kind: &str,
    artifact: &serde_json::Value,
 ) -> Option<bool> {
    if kind != "fill" {
        return None;
    }
    let fills = artifact.get("fills").and_then(|v| v.as_array())?;
    for f in fills {
        let id = match f.get("candidate_id").and_then(|v| v.as_str()) {
            Some(s) if !s.is_empty() => s,
            _ => return Some(false),
        };
        if state.validate_workers.find(id).is_none() {
            return Some(false);
        }
    }
    Some(true)
 }
 async fn write_session_accepted(
    state: &super::V1State,
    req: &IterateRequest,
    trace_id: &str,
    iterations: u32,
    max_iter: u32,
    attempts: Vec<super::session_log::SessionAttemptRecord>,
    artifact: &serde_json::Value,
    grounded: Option<bool>,
    duration_ms: u64,
 ) {
    let Some(logger) = state.session_log.as_ref() else { return };
    let rec = build_session_record(req, trace_id, "accepted", iterations, max_iter, attempts, Some(artifact.clone()), grounded, duration_ms);
    logger.append(rec).await;
 }
 async fn write_session_failure(
    state: &super::V1State,
    req: &IterateRequest,
    trace_id: &str,
    iterations: u32,
    max_iter: u32,
    attempts: Vec<super::session_log::SessionAttemptRecord>,
    duration_ms: u64,
 ) {
    let Some(logger) = state.session_log.as_ref() else { return };
    let rec = build_session_record(req, trace_id, "max_iter_exhausted", iterations, max_iter, attempts, None, None, duration_ms);
    logger.append(rec).await;
 }
 async fn write_infra_error(
    state: &super::V1State,
    req: &IterateRequest,
    trace_id: &str,
    max_iter: u32,
    duration_ms: u64,
    error: String,
 ) {
    let Some(logger) = state.session_log.as_ref() else { return };
    let attempts = vec![super::session_log::SessionAttemptRecord {
        iteration: 0,
        verdict_kind: "infra_error".to_string(),
        error: Some(error),
        span_id: None,
    }];
    let rec = build_session_record(req, trace_id, "infra_error", 0, max_iter, attempts, None, None, duration_ms);
    logger.append(rec).await;
 }
 fn build_session_record(
    req: &IterateRequest,
    trace_id: &str,
    final_verdict: &str,
    iterations: u32,
    max_iter: u32,
    attempts: Vec<super::session_log::SessionAttemptRecord>,
    artifact: Option<serde_json::Value>,
    grounded: Option<bool>,
    duration_ms: u64,
 ) -> super::session_log::SessionRecord {
    super::session_log::SessionRecord {
        schema: super::session_log::SESSION_RECORD_SCHEMA.to_string(),
        session_id: trace_id.to_string(),
        timestamp: chrono::Utc::now().to_rfc3339(),
        daemon: "gateway".to_string(),
        kind: req.kind.clone(),
        model: req.model.clone(),
        provider: req.provider.clone(),
        prompt: super::session_log::truncate(&req.prompt, 4000),
        iterations,
        max_iterations: max_iter,
        final_verdict: final_verdict.to_string(),
        attempts,
        artifact,
        grounded_in_roster: grounded,
        duration_ms,
    }
 }
 /// Generate a fresh trace id when no parent was forwarded. Same
 /// time-ordered hex shape Langfuse already accepts elsewhere in this
 /// crate (see `langfuse_trace::uuid_v7_like`).
 fn new_trace_id() -> String {
    let ts = chrono::Utc::now().timestamp_nanos_opt().unwrap_or(0);
    let rand = std::time::SystemTime::now()
        .duration_since(std::time::UNIX_EPOCH)
        .map(|d| d.subsec_nanos())
        .unwrap_or(0);
    format!("{:016x}-{:08x}", ts, rand)
 }
 async fn call_chat(client: &reqwest::Client, gateway: &str, body: &serde_json::Value, trace_id: &str) -> Result<String, String> {
    let mut req = client.post(format!("{gateway}/v1/chat")).json(body);
    if !trace_id.is_empty() {
        req = req.header(TRACE_ID_HEADER, trace_id);
    }
    let resp = req.send().await.map_err(|e| format!("chat hop: {e}"))?;
    let status = resp.status();
    if !status.is_success() {
        let body = resp.text().await.unwrap_or_default();
@ -213,12 +437,12 @@ async fn call_chat(client: &reqwest::Client, gateway: &str, body: &serde_json::V
        .to_string())
 }
-async fn call_validate(client: &reqwest::Client, gateway: &str, body: &serde_json::Value) -> Result<serde_json::Value, String> {
+async fn call_validate(client: &reqwest::Client, gateway: &str, body: &serde_json::Value, trace_id: &str) -> Result<serde_json::Value, String> {
-    let resp = client.post(format!("{gateway}/v1/validate"))
+    let mut req = client.post(format!("{gateway}/v1/validate")).json(body);
-        .json(body)
+    if !trace_id.is_empty() {
-        .send()
+        req = req.header(TRACE_ID_HEADER, trace_id);
-        .await
+    }
-        .map_err(|e| format!("validate hop: {e}"))?;
+    let resp = req.send().await.map_err(|e| format!("validate hop: {e}"))?;
    let status = resp.status();
    let parsed: serde_json::Value = resp.json().await.map_err(|e| format!("validate parse: {e}"))?;
    if status.is_success() {
@ -234,7 +458,13 @@ async fn call_validate(client: &reqwest::Client, gateway: &str, body: &serde_jso
 /// Extract the first JSON object from a model's output. Handles
 /// fenced code blocks (```json ... ```), bare braces, and stray
 /// prose around the JSON. Returns None on no extractable object.
-fn extract_json(raw: &str) -> Option<serde_json::Value> {
+///
 /// Made `pub` 2026-05-02 to support the cross-runtime parity probe
 /// at `golangLAKEHOUSE/scripts/cutover/parity/extract_json_parity.sh`.
 /// The Go counterpart lives at `internal/validator/iterate.go::ExtractJSON`;
 /// when either runtime's algorithm changes the parity probe surfaces
 /// the divergence.
 pub fn extract_json(raw: &str) -> Option<serde_json::Value> {
    // Try fenced first.
    let candidates: Vec<String> = {
        let mut out = vec![];
--- a/crates/gateway/src/v1/langfuse_trace.rs
+++ b/crates/gateway/src/v1/langfuse_trace.rs
@ -76,63 +76,54 @@ impl LangfuseClient {
        });
    }
-    async fn emit_chat_inner(&self, ev: ChatTrace) -> Result<(), String> {
+    /// Fire-and-forget per-iteration span emit. Returns the generated
-        let trace_id = uuid_v7_like();
+    /// span id synchronously so the caller can stamp it on
-        let gen_id = uuid_v7_like();
+    /// `IterateAttempt.span_id` before the network round-trip resolves.
-        let trace_ts = ev.start_time.clone();
+    /// Mirrors Go's `validator.Tracer` callback shape.
    pub fn emit_attempt_span(&self, sp: AttemptSpan) -> String {
        let span_id = uuid_v7_like();
        let span_id_for_caller = span_id.clone();
        let this = self.clone();
        tokio::spawn(async move {
            if let Err(e) = this.emit_attempt_span_inner(span_id, sp).await {
                tracing::warn!(target: "v1.langfuse", "iterate span drop: {e}");
            }
        });
        span_id_for_caller
    }
    async fn emit_attempt_span_inner(&self, span_id: String, sp: AttemptSpan) -> Result<(), String> {
        let level = if sp.verdict == "accepted" { "DEFAULT" } else { "WARNING" };
        let batch = IngestionBatch {
-            batch: vec![
+            batch: vec![IngestionEvent {
-                IngestionEvent {
+                id: uuid_v7_like(),
-                    id: uuid_v7_like(),
+                timestamp: sp.end_time.clone(),
-                    timestamp: trace_ts.clone(),
+                kind: "span-create",
-                    kind: "trace-create",
+                body: serde_json::json!({
-                    body: serde_json::json!({
+                    "id": span_id,
-                        "id": trace_id,
+                    "traceId": sp.trace_id,
-                        "name": format!("v1.chat:{}", ev.provider),
+                    "name": format!("iterate.attempt[{}]", sp.iteration),
-                        "input": serde_json::json!({
+                    "input": serde_json::json!({
-                            "model": ev.model,
+                        "iteration": sp.iteration,
-                            "messages": ev.input,
+                        "model": sp.model,
-                        }),
+                        "provider": sp.provider,
-                        "metadata": serde_json::json!({
+                        "prompt": truncate(&sp.prompt, 4000),
                            "provider": ev.provider,
                            "think": ev.think,
                        }),
                    }),
-                },
+                    "output": serde_json::json!({
-                IngestionEvent {
+                        "verdict": sp.verdict,
-                    id: uuid_v7_like(),
+                        "error": sp.error,
-                    timestamp: ev.end_time.clone(),
+                        "raw": truncate(&sp.raw, 4000),
                    kind: "generation-create",
                    body: serde_json::json!({
                        "id": gen_id,
                        "traceId": trace_id,
                        "name": "chat",
                        "model": ev.model,
                        "modelParameters": serde_json::json!({
                            "temperature": ev.temperature,
                            "max_tokens": ev.max_tokens,
                            "think": ev.think,
                        }),
                        "input": ev.input,
                        "output": ev.output,
                        "usage": serde_json::json!({
                            "input": ev.prompt_tokens,
                            "output": ev.completion_tokens,
                            "total": ev.prompt_tokens + ev.completion_tokens,
                            "unit": "TOKENS",
                        }),
                        "startTime": ev.start_time,
                        "endTime": ev.end_time,
                        "metadata": serde_json::json!({
                            "provider": ev.provider,
                            "latency_ms": ev.latency_ms,
                        }),
                    }),
-                },
+                    "level": level,
-            ],
+                    "startTime": sp.start_time,
                    "endTime": sp.end_time,
                }),
            }],
        };
        self.post_batch(batch).await
    }
    async fn post_batch(&self, batch: IngestionBatch) -> Result<(), String> {
        let url = format!("{}{}", self.inner.base_url.trim_end_matches('/'), INGESTION_PATH);
        let resp = self.inner.http
            .post(url)
@ -146,6 +137,81 @@ impl LangfuseClient {
        }
        Ok(())
    }
    async fn emit_chat_inner(&self, ev: ChatTrace) -> Result<(), String> {
        // When the caller forwarded a parent trace id (via the
        // X-Lakehouse-Trace-Id header → V1State plumbing), attach the
        // generation as a child of that trace. Without a parent we
        // mint a new top-level trace per call (Phase 40 default).
        let trace_id = ev.parent_trace_id.clone().unwrap_or_else(uuid_v7_like);
        let nested = ev.parent_trace_id.is_some();
        let gen_id = uuid_v7_like();
        let trace_ts = ev.start_time.clone();
        let mut events = Vec::with_capacity(2);
        if !nested {
            // Only mint a fresh trace-create when we don't have a parent.
            // Reusing a parent trace id without re-creating it is the
            // contract that lets validatord's iterate-session show up
            // as one tree in Langfuse.
            events.push(IngestionEvent {
                id: uuid_v7_like(),
                timestamp: trace_ts.clone(),
                kind: "trace-create",
                body: serde_json::json!({
                    "id": trace_id,
                    "name": format!("v1.chat:{}", ev.provider),
                    "input": serde_json::json!({
                        "model": ev.model,
                        "messages": ev.input,
                    }),
                    "metadata": serde_json::json!({
                        "provider": ev.provider,
                        "think": ev.think,
                    }),
                }),
            });
        }
        events.push(IngestionEvent {
            id: uuid_v7_like(),
            timestamp: ev.end_time.clone(),
            kind: "generation-create",
            body: serde_json::json!({
                "id": gen_id,
                "traceId": trace_id,
                "name": "chat",
                "model": ev.model,
                "modelParameters": serde_json::json!({
                    "temperature": ev.temperature,
                    "max_tokens": ev.max_tokens,
                    "think": ev.think,
                }),
                "input": ev.input,
                "output": ev.output,
                "usage": serde_json::json!({
                    "input": ev.prompt_tokens,
                    "output": ev.completion_tokens,
                    "total": ev.prompt_tokens + ev.completion_tokens,
                    "unit": "TOKENS",
                }),
                "startTime": ev.start_time,
                "endTime": ev.end_time,
                "metadata": serde_json::json!({
                    "provider": ev.provider,
                    "latency_ms": ev.latency_ms,
                }),
            }),
        });
        self.post_batch(IngestionBatch { batch: events }).await
    }
 }
 /// Truncate a string to at most `n` chars (NOT bytes). Matches the Go
 /// `trim` helper used in session log + attempt-span emission so an
 /// operator reading two cross-runtime traces sees the same boundary.
 fn truncate(s: &str, n: usize) -> String {
    s.chars().take(n).collect()
 }
 /// Everything the v1.chat handler collects for one completed call.
@ -162,6 +228,32 @@ pub struct ChatTrace {
    pub start_time: String,
    pub end_time: String,
    pub latency_ms: u64,
    /// When set, attach this chat trace as a child of the named
    /// Langfuse trace instead of starting a new top-level trace. Used
    /// by `/v1/iterate` to nest its inner /v1/chat hops under the
    /// iterate-session trace so a multi-call session shows in
    /// Langfuse as ONE trace tree, not N+1 disconnected traces.
    /// Matches the Go-side `X-Lakehouse-Trace-Id` propagation
    /// (commit d6d2fdf in golangLAKEHOUSE).
    pub parent_trace_id: Option<String>,
 }
 /// One iteration attempt inside `/v1/iterate`'s loop. Becomes one
 /// span on the parent trace when emitted via `emit_attempt_span`.
 /// Matches Go's `validator.AttemptSpan` shape so the cross-runtime
 /// observability surface is consistent.
 pub struct AttemptSpan {
    pub trace_id: String,
    pub iteration: u32,
    pub model: String,
    pub provider: String,
    pub prompt: String,
    pub raw: String,
    /// Verdict kind: "no_json" | "validation_failed" | "accepted"
    pub verdict: String,
    pub error: Option<String>,
    pub start_time: String,
    pub end_time: String,
 }
 #[derive(Serialize)]
--- a/crates/gateway/src/v1/mod.rs
+++ b/crates/gateway/src/v1/mod.rs
@ -21,6 +21,7 @@ pub mod opencode;
 pub mod validate;
 pub mod iterate;
 pub mod langfuse_trace;
 pub mod session_log;
 pub mod mode;
 pub mod respond;
 pub mod truth;
@ -83,6 +84,13 @@ pub struct V1State {
    /// disabled (keys missing or container unreachable). Traces are
    /// fire-and-forget: never block the response path.
    pub langfuse: Option<langfuse_trace::LangfuseClient>,
    /// Coordinator session JSONL writer (path from
    /// `[gateway].session_log_path`). One row per `/v1/iterate`
    /// session for offline DuckDB analysis. None = disabled.
    /// Cross-runtime parity with the Go-side `validatord`
    /// `[validatord].session_log_path` (commit 1a3a82a in
    /// golangLAKEHOUSE).
    pub session_log: Option<session_log::SessionLogger>,
 }
 #[derive(Default, Clone, Serialize)]
@ -361,6 +369,7 @@ mod resolve_provider_tests {
 async fn chat(
    State(state): State<V1State>,
    headers: axum::http::HeaderMap,
    Json(req): Json<ChatRequest>,
 ) -> Result<Json<ChatResponse>, (StatusCode, String)> {
    if req.messages.is_empty() {
@ -490,6 +499,17 @@ async fn chat(
        let output = resp.choices.first()
            .map(|c| c.message.text())
            .unwrap_or_default();
        // Cross-runtime trace linkage. When a caller (validatord on
        // Go side, /v1/iterate on Rust side) forwards a parent trace
        // id via X-Lakehouse-Trace-Id, attach this generation to that
        // trace so the iterate session and its inner chat hops show
        // up as ONE trace tree in Langfuse. Header name matches the
        // Go-side `shared.TraceIDHeader` constant byte-for-byte.
        let parent_trace_id = headers
            .get(crate::v1::iterate::TRACE_ID_HEADER)
            .and_then(|v| v.to_str().ok())
            .map(|s| s.to_string())
            .filter(|s| !s.is_empty());
        lf.emit_chat(langfuse_trace::ChatTrace {
            provider: used_provider.clone(),
            model: resp.model.clone(),
@ -503,6 +523,7 @@ async fn chat(
            start_time: start_time.to_rfc3339(),
            end_time: end_time.to_rfc3339(),
            latency_ms,
            parent_trace_id,
        });
    }
--- a/crates/gateway/src/v1/session_log.rs
+++ b/crates/gateway/src/v1/session_log.rs
@ -0,0 +1,235 @@
 //! Coordinator session JSONL writer — Rust parity with the Go-side
 //! `internal/validator/session_log.go` (commit 1a3a82a in
 //! golangLAKEHOUSE). Same schema, same field names, same producer
 //! semantics, so a unified longitudinal log can pull from either
 //! runtime via DuckDB.
 //!
 //! Schema: `session.iterate.v1`. One row per `/v1/iterate` session.
 //! Append-only. Best-effort posture: errors warn and the iterate
 //! response always ships.
 //!
 //! See `golangLAKEHOUSE/docs/SESSION_LOG.md` for the full schema
 //! reference + DuckDB query examples. This module produces rows
 //! with `daemon: "gateway"`; the Go side produces `daemon:
 //! "validatord"`. Operators who want a unified stream can point both
 //! to the same path (the OS write-append is atomic for the row sizes
 //! we produce) or query both files together via duckdb's `read_json`
 //! glob support.
 use serde::{Deserialize, Serialize};
 use std::sync::Arc;
 use tokio::sync::Mutex;
 pub const SESSION_RECORD_SCHEMA: &str = "session.iterate.v1";
 /// One row in coordinator_sessions.jsonl. Field names are the on-wire
 /// names — must stay byte-equal to the Go side's
 /// `validator.SessionRecord` (proven by the cross-runtime parity
 /// probe at golangLAKEHOUSE/scripts/cutover/parity/).
 // Deserialize is supported so the parity helper binary can round-trip
 // fixture inputs through serde without hand-rolling a parser. Production
 // emit path uses Serialize only; SessionRecord rows are written by the
 // gateway and consumed by DuckDB / external tooling, never re-read by us.
 #[derive(Serialize, Deserialize)]
 pub struct SessionRecord {
    pub schema: String,
    pub session_id: String,
    pub timestamp: String,
    pub daemon: String,
    pub kind: String,
    pub model: String,
    pub provider: String,
    pub prompt: String,
    pub iterations: u32,
    pub max_iterations: u32,
    pub final_verdict: String, // "accepted" | "max_iter_exhausted" | "infra_error"
    pub attempts: Vec<SessionAttemptRecord>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub artifact: Option<serde_json::Value>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub grounded_in_roster: Option<bool>,
    pub duration_ms: u64,
 }
 #[derive(Serialize, Deserialize)]
 pub struct SessionAttemptRecord {
    pub iteration: u32,
    pub verdict_kind: String, // "no_json" | "validation_failed" | "accepted" | "infra_error"
    #[serde(skip_serializing_if = "Option::is_none")]
    pub error: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub span_id: Option<String>,
 }
 /// Append-only writer. Cloneable handle — internal state is Arc'd so
 /// V1State can keep its own clone and per-request clones are cheap.
 #[derive(Clone)]
 pub struct SessionLogger {
    inner: Arc<Inner>,
 }
 struct Inner {
    path: String,
    /// tokio::Mutex (not std) because we hold it across the async
    /// fs write. Contention is low (one row per /v1/iterate session).
    mu: Mutex<()>,
 }
 impl SessionLogger {
    /// Construct a logger writing to `path`. Empty path → None
    /// (skip the wiring in the iterate handler entirely).
    pub fn from_path(path: &str) -> Option<Self> {
        if path.is_empty() {
            return None;
        }
        Some(Self {
            inner: Arc::new(Inner {
                path: path.to_string(),
                mu: Mutex::new(()),
            }),
        })
    }
    /// Append one record. Best-effort: failures land in `tracing::warn!`
    /// and the caller sees Ok(()) — observability is a witness, never
    /// a gate. Returns Err only on impossible cases the type system
    /// can't rule out (here: serde_json::to_string failing on a
    /// well-formed struct, which shouldn't happen).
    pub async fn append(&self, rec: SessionRecord) {
        let body = match serde_json::to_string(&rec) {
            Ok(s) => s,
            Err(e) => {
                tracing::warn!(target: "v1.session_log", "marshal: {e}");
                return;
            }
        };
        let _guard = self.inner.mu.lock().await;
        if let Err(e) = self.write(&body).await {
            tracing::warn!(target: "v1.session_log", "write {}: {e}", self.inner.path);
        }
    }
    async fn write(&self, body: &str) -> std::io::Result<()> {
        use tokio::fs::OpenOptions;
        use tokio::io::AsyncWriteExt;
        // Lazy mkdir on first write so a not-yet-mounted volume at
        // startup doesn't kill the daemon.
        if let Some(parent) = std::path::Path::new(&self.inner.path).parent() {
            if !parent.as_os_str().is_empty() {
                tokio::fs::create_dir_all(parent).await?;
            }
        }
        let mut f = OpenOptions::new()
            .append(true)
            .create(true)
            .open(&self.inner.path)
            .await?;
        f.write_all(body.as_bytes()).await?;
        f.write_all(b"\n").await?;
        Ok(())
    }
 }
 /// Best-effort UTF-8 char truncation. Matches Go's `trim` helper so
 /// rows produced by either runtime cap fields at the same boundaries.
 pub fn truncate(s: &str, n: usize) -> String {
    s.chars().take(n).collect()
 }
 #[cfg(test)]
 mod tests {
    use super::*;
    use std::path::PathBuf;
    use tokio::fs;
    fn fixture_record(session_id: &str) -> SessionRecord {
        SessionRecord {
            schema: SESSION_RECORD_SCHEMA.to_string(),
            session_id: session_id.to_string(),
            timestamp: "2026-05-02T08:00:00Z".to_string(),
            daemon: "gateway".to_string(),
            kind: "fill".to_string(),
            model: "qwen3.5:latest".to_string(),
            provider: "ollama".to_string(),
            prompt: "produce a fill artifact".to_string(),
            iterations: 1,
            max_iterations: 3,
            final_verdict: "accepted".to_string(),
            attempts: vec![SessionAttemptRecord {
                iteration: 0,
                verdict_kind: "accepted".to_string(),
                error: None,
                span_id: Some("span-0".to_string()),
            }],
            artifact: Some(serde_json::json!({"fills":[{"candidate_id":"W-1"}]})),
            grounded_in_roster: Some(true),
            duration_ms: 50,
        }
    }
    #[tokio::test]
    async fn from_path_empty_returns_none() {
        assert!(SessionLogger::from_path("").is_none());
    }
    #[tokio::test]
    async fn append_writes_jsonl_row_with_schema_field() {
        let dir = tempdir();
        let path = dir.join("sessions.jsonl");
        let path_str = path.to_string_lossy().to_string();
        let logger = SessionLogger::from_path(&path_str).unwrap();
        logger.append(fixture_record("trace-a")).await;
        let body = fs::read_to_string(&path).await.unwrap();
        assert!(body.contains("\"schema\":\"session.iterate.v1\""));
        assert!(body.contains("\"session_id\":\"trace-a\""));
        assert!(body.contains("\"grounded_in_roster\":true"));
        assert!(body.ends_with('\n'));
    }
    #[tokio::test]
    async fn append_concurrent_safe() {
        let dir = tempdir();
        let path = dir.join("sessions.jsonl");
        let path_str = path.to_string_lossy().to_string();
        let logger = SessionLogger::from_path(&path_str).unwrap();
        let n = 32;
        let mut handles = Vec::with_capacity(n);
        for i in 0..n {
            let l = logger.clone();
            handles.push(tokio::spawn(async move {
                l.append(fixture_record(&format!("trace-{i}"))).await;
            }));
        }
        for h in handles {
            h.await.unwrap();
        }
        let body = fs::read_to_string(&path).await.unwrap();
        let lines: Vec<_> = body.lines().filter(|l| !l.is_empty()).collect();
        assert_eq!(lines.len(), n, "expected {n} rows, got {}", lines.len());
        // Every row must round-trip through serde — a torn write
        // would surface as a parse error.
        for line in lines {
            let _: serde_json::Value = serde_json::from_str(line).expect("valid json per row");
        }
    }
    fn tempdir() -> PathBuf {
        // Per-test unique path so prior runs don't pollute the next.
        // The static counter increments across the whole test binary,
        // so back-to-back tests in the same module get distinct dirs.
        static COUNTER: std::sync::atomic::AtomicU64 = std::sync::atomic::AtomicU64::new(0);
        let n = COUNTER.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
        let p = std::env::temp_dir().join(format!(
            "session_log_test_{}_{}_{}",
            std::process::id(),
            chrono::Utc::now().timestamp_nanos_opt().unwrap_or(0),
            n,
        ));
        std::fs::create_dir_all(&p).unwrap();
        p
    }
 }
--- a/crates/lance-bench/src/main.rs
+++ b/crates/lance-bench/src/main.rs
@ -456,6 +456,26 @@ async fn build_lance_vector_index(path: &str, _dims: usize) -> Result<()> {
        .await
        .context("create_index")?;
    // Also build the scalar btree on doc_id. This bench's
    // measure_random_access_lance uses take(row_position) which doesn't
    // need the btree, but the dataset this bench writes is also queried
    // downstream by /vectors/lance/doc/<name>/<doc_id> (the production
    // lookup path) — without this index that path falls back to a full
    // table scan. Cheap to build (~1.2s on 10M rows) and matches the
    // gateway's lance_migrate handler behavior so bench-produced datasets
    // are immediately production-shape.
    use lance_index::scalar::ScalarIndexParams;
    dataset
        .create_index(
            &["doc_id"],
            IndexType::Scalar,
            Some("doc_id_btree".into()),
            &ScalarIndexParams::default(),
            true,
        )
        .await
        .context("create_index doc_id btree")?;
    Ok(())
 }
--- a/crates/shared/src/config.rs
+++ b/crates/shared/src/config.rs
@ -62,6 +62,15 @@ pub struct GatewayConfig {
    pub host: String,
    #[serde(default = "default_gateway_port")]
    pub port: u16,
    /// Coordinator session JSONL output path. One row per
    /// `/v1/iterate` session, schema=`session.iterate.v1`. Empty =
    /// disabled. Cross-runtime parity with the Go side's
    /// `[validatord].session_log_path` (added 2026-05-02). Default
    /// empty so existing deployments aren't perturbed; production
    /// sets `/var/lib/lakehouse/gateway/sessions.jsonl`. See
    /// `golangLAKEHOUSE/docs/SESSION_LOG.md` for query examples.
    #[serde(default)]
    pub session_log_path: String,
 }
 #[derive(Debug, Clone, Deserialize)]
@ -149,7 +158,13 @@ fn default_gateway_port() -> u16 { 3100 }
 fn default_storage_root() -> String { "./data".to_string() }
 fn default_profile_root() -> String { "./data/_profiles".to_string() }
 fn default_manifest_prefix() -> String { "_catalog/manifests".to_string() }
-fn default_sidecar_url() -> String { "http://localhost:3200".to_string() }
+// Post-2026-05-02: AiClient talks directly to Ollama; the Python
 // sidecar's hot-path role was retired. The config field name
 // `[sidecar].url` is preserved for migration compatibility (operators
 // with existing TOMLs don't need to rename anything), but the value
 // now points at Ollama. Lab UI / pipeline_lab Python remains as a
 // dev-only tool; not on this URL.
 fn default_sidecar_url() -> String { "http://localhost:11434".to_string() }
 fn default_embed_model() -> String { "nomic-embed-text".to_string() }
 fn default_gen_model() -> String { "qwen2.5".to_string() }
 fn default_rerank_model() -> String { "qwen2.5".to_string() }
@ -184,7 +199,11 @@ impl Config {
 impl Default for Config {
    fn default() -> Self {
        Self {
-            gateway: GatewayConfig { host: default_host(), port: default_gateway_port() },
+            gateway: GatewayConfig {
                host: default_host(),
                port: default_gateway_port(),
                session_log_path: String::new(),
            },
            storage: StorageConfig {
                root: default_storage_root(),
                profile_root: default_profile_root(),
--- a/crates/vectord-lance/src/lib.rs
+++ b/crates/vectord-lance/src/lib.rs
@ -603,3 +603,210 @@ fn row_from_batch(batch: &RecordBatch, row: usize) -> Result<Row, String> {
    Ok(Row { doc_id, chunk_text, vector: v, source, chunk_idx })
 }
 // =================== Tests ===================
 //
 // All tests run against a temp directory — never the production
 // data/lance/ tree. Lance reads/writes are async + filesystem-bound,
 // so we use #[tokio::test]. Each test uses a unique per-pid + per-
 // nanosecond temp dir so concurrent runs don't collide and a re-run
 // of a single test doesn't see prior state.
 //
 // Surfaced 2026-05-02 audit: vectord-lance had ZERO tests despite
 // being on the live HTTP path. These are the load-bearing locks for
 // the public API contract.
 #[cfg(test)]
 mod tests {
    use super::*;
    fn temp_path(label: &str) -> String {
        // Per-process atomic counter — guarantees uniqueness regardless
        // of clock resolution or test scheduling. Combined with pid, the
        // result is unique within and across processes for any practical
        // test workload. Nanosecond timestamps were not enough on their
        // own: opus WARN at lib.rs:622 from the 2026-05-02 scrum noted
        // that under tokio scheduling, multiple tests in the same cargo
        // process can hit the same nanos bucket.
        use std::sync::atomic::{AtomicU64, Ordering};
        static COUNTER: AtomicU64 = AtomicU64::new(0);
        let seq = COUNTER.fetch_add(1, Ordering::Relaxed);
        let pid = std::process::id();
        let nanos = std::time::SystemTime::now()
            .duration_since(std::time::UNIX_EPOCH)
            .map(|d| d.subsec_nanos())
            .unwrap_or(0);
        std::env::temp_dir()
            .join(format!("vlance_test_{label}_{pid}_{nanos}_{seq}"))
            .to_string_lossy()
            .to_string()
    }
    /// Build a minimal in-memory Parquet file matching vectord's
    /// binary-blob schema. Used as input to migrate_from_parquet_bytes.
    fn synth_parquet_bytes(n_rows: usize, dims: usize) -> Vec<u8> {
        use parquet::arrow::ArrowWriter;
        use std::io::Cursor;
        let schema = Arc::new(Schema::new(vec![
            Field::new("source", DataType::Utf8, true),
            Field::new("doc_id", DataType::Utf8, false),
            Field::new("chunk_idx", DataType::Int32, true),
            Field::new("chunk_text", DataType::Utf8, true),
            Field::new("vector", DataType::Binary, false),
        ]));
        let sources: Vec<Option<&str>> = (0..n_rows).map(|_| Some("test")).collect();
        let doc_ids: Vec<String> = (0..n_rows).map(|i| format!("DOC-{i:04}")).collect();
        let chunk_idxs: Vec<Option<i32>> = (0..n_rows).map(|i| Some(i as i32)).collect();
        let chunk_texts: Vec<String> = (0..n_rows).map(|i| format!("synth chunk {i}")).collect();
        let vectors: Vec<Vec<u8>> = (0..n_rows).map(|i| {
            let v: Vec<f32> = (0..dims).map(|j| (i * dims + j) as f32 * 0.01).collect();
            let mut bytes = Vec::with_capacity(dims * 4);
            for f in v { bytes.extend_from_slice(&f.to_le_bytes()); }
            bytes
        }).collect();
        let batch = RecordBatch::try_new(schema.clone(), vec![
            Arc::new(StringArray::from(sources)),
            Arc::new(StringArray::from(doc_ids)),
            Arc::new(Int32Array::from(chunk_idxs)),
            Arc::new(StringArray::from(chunk_texts)),
            Arc::new(BinaryArray::from(vectors.iter().map(|v| v.as_slice()).collect::<Vec<_>>())),
        ]).expect("synth parquet batch");
        let mut buf = Cursor::new(Vec::new());
        let mut writer = ArrowWriter::try_new(&mut buf, schema, None).expect("arrow writer");
        writer.write(&batch).expect("write batch");
        writer.close().expect("close writer");
        buf.into_inner()
    }
    #[tokio::test]
    async fn fresh_store_reports_no_state() {
        let path = temp_path("fresh");
        let store = LanceVectorStore::new(path.clone());
        assert_eq!(store.path(), path);
        assert_eq!(store.count().await.unwrap_or(0), 0);
        assert!(!store.has_vector_index().await.unwrap_or(true));
    }
    #[tokio::test]
    async fn migrate_then_count_and_fetch() {
        let path = temp_path("migrate_fetch");
        let store = LanceVectorStore::new(path.clone());
        let bytes = synth_parquet_bytes(8, 4);
        let stats = store.migrate_from_parquet_bytes(&bytes).await.expect("migrate");
        assert_eq!(stats.rows_written, 8);
        assert_eq!(stats.dimensions, 4);
        assert!(stats.disk_bytes > 0, "lance dataset should occupy disk");
        assert_eq!(store.count().await.unwrap(), 8);
        let row = store.get_by_doc_id("DOC-0003").await
            .expect("get_by_doc_id Ok").expect("DOC-0003 exists");
        assert_eq!(row.doc_id, "DOC-0003");
        assert_eq!(row.chunk_text, "synth chunk 3");
        assert_eq!(row.vector.len(), 4);
        let _ = std::fs::remove_dir_all(&path);
    }
    /// Load-bearing contract: get_by_doc_id distinguishes "dataset
    /// missing" (Err) from "id missing" (Ok(None)) so the HTTP
    /// handler can return 404 without inspecting error strings.
    #[tokio::test]
    async fn get_by_doc_id_missing_returns_none() {
        let path = temp_path("missing_id");
        let store = LanceVectorStore::new(path.clone());
        store.migrate_from_parquet_bytes(&synth_parquet_bytes(4, 4)).await.expect("migrate");
        let row = store.get_by_doc_id("DOC-NEVER-EXISTS").await.expect("Ok");
        assert!(row.is_none(), "missing id must return Ok(None), not Err");
        let _ = std::fs::remove_dir_all(&path);
    }
    /// Verifies the load-bearing structural-difference claim of
    /// ADR-019: Lance appends without rewriting the whole file. Row
    /// count grows; new rows are fetchable by their doc_ids.
    #[tokio::test]
    async fn append_grows_count_and_new_rows_fetchable() {
        let path = temp_path("append");
        let store = LanceVectorStore::new(path.clone());
        store.migrate_from_parquet_bytes(&synth_parquet_bytes(4, 4)).await.expect("migrate");
        assert_eq!(store.count().await.unwrap(), 4);
        let stats = store.append(
            Some("appended".into()),
            vec!["NEW-A".into(), "NEW-B".into()],
            vec![0, 0],
            vec!["new chunk a".into(), "new chunk b".into()],
            vec![vec![0.1, 0.2, 0.3, 0.4], vec![0.5, 0.6, 0.7, 0.8]],
        ).await.expect("append");
        assert_eq!(stats.rows_appended, 2);
        assert_eq!(store.count().await.unwrap(), 6);
        let new_a = store.get_by_doc_id("NEW-A").await.unwrap().expect("NEW-A");
        assert_eq!(new_a.chunk_text, "new chunk a");
        assert_eq!(new_a.source.as_deref(), Some("appended"));
        let _ = std::fs::remove_dir_all(&path);
    }
    /// Without this guard a dim-mismatch row would land on disk and
    /// silently break search at query time.
    #[tokio::test]
    async fn append_dim_mismatch_errors() {
        let path = temp_path("dim_mismatch");
        let store = LanceVectorStore::new(path.clone());
        store.migrate_from_parquet_bytes(&synth_parquet_bytes(4, 4)).await.expect("migrate");
        let err = store.append(
            None, vec!["X".into(), "Y".into()], vec![0, 0],
            vec!["a".into(), "b".into()],
            vec![vec![1.0, 2.0, 3.0, 4.0], vec![1.0, 2.0]],
        ).await;
        assert!(err.is_err(), "dim mismatch must error");
        let msg = err.unwrap_err();
        assert!(msg.contains("dim") || msg.contains("expected"),
            "error must mention the dimension problem; got: {msg}");
        let _ = std::fs::remove_dir_all(&path);
    }
    /// Search round-trip: query the exact vector for one row, top-1
    /// must be that row. Verifies the search path works on small
    /// datasets where IVF training would normally be skipped.
    #[tokio::test]
    async fn search_returns_nearest() {
        let path = temp_path("search");
        let store = LanceVectorStore::new(path.clone());
        store.migrate_from_parquet_bytes(&synth_parquet_bytes(8, 4)).await.expect("migrate");
        let target: Vec<f32> = (0..4).map(|j| (5 * 4 + j) as f32 * 0.01).collect();
        let hits = store.search(&target, 3, None, None).await.expect("search");
        assert!(!hits.is_empty(), "search must return at least 1 hit");
        assert_eq!(hits[0].doc_id, "DOC-0005",
            "exact-vector match should be top-1; got {hits:?}");
        let _ = std::fs::remove_dir_all(&path);
    }
    /// stats() summarizes the dataset state in one call. Locks the
    /// field shape so downstream consumers don't break on a rename.
    #[tokio::test]
    async fn stats_reports_post_migrate_state() {
        let path = temp_path("stats");
        let store = LanceVectorStore::new(path.clone());
        store.migrate_from_parquet_bytes(&synth_parquet_bytes(5, 4)).await.expect("migrate");
        let s = store.stats().await.expect("stats");
        assert_eq!(s.rows, 5);
        assert!(s.disk_bytes > 0);
        assert!(!s.has_vector_index, "no vector index built yet");
        let _ = std::fs::remove_dir_all(&path);
    }
 }
--- a/crates/vectord/src/pathway_memory.rs
+++ b/crates/vectord/src/pathway_memory.rs
@ -925,7 +925,7 @@ mod tests {
            reject_reason: None,
        }];
        let mut trace = PathwayTrace {
-            pathway_id,
+            pathway_id: pathway_id.clone(),
            task_class: "scrum_review".into(),
            file_path: format!("crates/{id_tag}/src/x.rs"),
            signal_class: Some("CONVERGING".into()),
@ -954,6 +954,14 @@ mod tests {
            replay_count: replays,
            replays_succeeded: succ,
            retired: false,
            // Versioning fields added by Mem0 wave (commit 6ac7f61) — defaults
            // mirror "this trace is the live head with no parent/successor".
            trace_uid: format!("test-{pathway_id}"),
            version: 1,
            parent_trace_uid: None,
            superseded_at: None,
            superseded_by_trace_uid: None,
            retirement_reason: None,
        };
        trace.pathway_vec = build_pathway_vec(&trace);
        trace
--- a/crates/vectord/src/rag.rs
+++ b/crates/vectord/src/rag.rs
@ -163,7 +163,11 @@ pub async fn query(
    // production caller of the Phase 21 primitives — see audit finding
    // "Phase 21 Rust primitives are wired but not CALLED by any
    // production surface" from 2026-04-21.
-    let mut cont_opts = ContinuableOpts::new("qwen2.5:latest");
+    // 2026-04-30 model bump: qwen2.5:latest → qwen3.5:latest to match
    // the small-model-pipeline local-tier default. Same JSON-clean
    // property, more capacity. think=Some(false) preserved — RAG hot
    // path doesn't need reasoning traces; direct answers only.
    let mut cont_opts = ContinuableOpts::new("qwen3.5:latest");
    cont_opts.max_tokens = Some(512);
    cont_opts.temperature = Some(0.2);
    cont_opts.shape = ResponseShape::Text;
@ -176,7 +180,7 @@ pub async fn query(
        // echoes whatever Ollama loaded). Use the configured tier model
        // for now; if RAG needs to report the actual resolved model,
        // the runner can add a post-call ps probe later.
-        model: "qwen2.5:latest".to_string(),
+        model: "qwen3.5:latest".to_string(),
        sources: results,
        tokens_generated: None,
    })
--- a/crates/vectord/src/service.rs
+++ b/crates/vectord/src/service.rs
@ -1855,10 +1855,10 @@ async fn lance_migrate(
        .map_err(|e| (StatusCode::NOT_FOUND, format!("read parquet: {e}")))?;
    let lance_store = state.lance.store_for_new(&index_name, &bucket).await
-        .map_err(|e| (StatusCode::BAD_REQUEST, e))?;
+        .map_err(|e| sanitize_lance_err(e, &index_name))?;
    let stats = lance_store.migrate_from_parquet_bytes(&bytes).await
-        .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?;
+        .map_err(|e| sanitize_lance_err(e, &index_name))?;
    tracing::info!(
        "lance migrate '{}': {} rows, {}d, {} bytes on disk, {:.2}s",
@ -1866,11 +1866,40 @@ async fn lance_migrate(
        stats.disk_bytes, stats.duration_secs,
    );
    // Auto-build the doc_id btree. The scalar index is what makes
    // get_doc_by_id O(log n) instead of a full table scan; ADR-019
    // calls this out as the load-bearing feature for hybrid lookup.
    // Verified 2026-05-02: skipping this on a 10M-row dataset turns
    // ~5ms doc-fetch into ~100ms (full scan over 35GB). Cheap to
    // build (~1.2s on 10M, +269MB on disk) and only runs once per
    // dataset since `has_scalar_index` short-circuits subsequent calls.
    let scalar_stats = if !lance_store.has_scalar_index("doc_id").await.unwrap_or(false) {
        match lance_store.build_scalar_index("doc_id").await {
            Ok(s) => {
                tracing::info!(
                    "lance migrate '{}': doc_id btree built in {:.2}s (+{} bytes)",
                    index_name, s.build_time_secs, s.disk_bytes_added,
                );
                Some(s)
            }
            Err(e) => {
                // Don't fail the whole migrate over a missing btree —
                // the dataset is still queryable, just slowly. Log it
                // so it's debuggable.
                tracing::warn!("lance migrate '{}': doc_id btree build failed (will fall back to scan): {e}", index_name);
                None
            }
        }
    } else {
        None
    };
    Ok::<_, (StatusCode, String)>(Json(serde_json::json!({
        "index_name": index_name,
        "bucket": bucket,
        "lance_path": lance_store.path(),
        "stats": stats,
        "scalar_index": scalar_stats,
    })))
 }
@ -1888,6 +1917,300 @@ fn default_partitions() -> u32 { 316 }   // ≈√100K — sane for the referenc
 fn default_bits() -> u32 { 8 }
 fn default_subvectors() -> u32 { 48 }    // 768/48 = 16 dims per subvector
 /// Sanitize a Lance backend error before returning it to the HTTP
 /// caller. Two responsibilities:
 ///
 /// 1. Map "dataset not found" patterns to HTTP 404 instead of 500.
 ///    A missing index isn't an internal failure — it's a resource
 ///    lookup miss, and the response code should reflect that.
 /// 2. Strip server-side filesystem paths and Rust crate registry
 ///    paths (`/root/.cargo/registry/src/index.crates.io-...`) from
 ///    the message body. An attacker probing the surface shouldn't
 ///    learn the server's directory layout or our exact dep versions.
 ///
 /// Surfaced 2026-05-02 by the Lance backend audit: missing-index
 /// search returned 500 + leaked the lakehouse data path AND the
 /// .cargo/registry path with crate versions.
 fn sanitize_lance_err(err: String, index_name: &str) -> (StatusCode, String) {
    // 404 detection — narrowed across two 2026-05-02→03 scrum waves.
    // First wave (opus WARN service.rs:1908): the original `lower.contains
    // ("not found")` was too broad — caught "column not found" /
    // "field not found in schema" which are real 500s. Second wave (opus
    // WARN service.rs:1949): the looser `mentions_path_missing` branch I
    // added would 404 on a registry-file error like "/root/.cargo/.../x.rs:
    // no such file or directory" because it triggers without dataset
    // context. Drop the standalone path-missing branch; require dataset
    // context AND a missing-shape phrase. Lance's actual error format
    // ("Dataset at path X was not found") satisfies this.
    let lower = err.to_lowercase();
    let mentions_dataset = lower.contains("dataset");
    let lance_dataset_missing = mentions_dataset && (
        lower.contains("not found") || lower.contains("does not exist")
    );
    // Excluded shapes — these contain "not found" but are real 500s.
    let column_or_field = lower.contains("column not found")
        || lower.contains("field not found")
        || lower.contains("schema not found");
    let is_not_found = lance_dataset_missing && !column_or_field;
    if is_not_found {
        return (StatusCode::NOT_FOUND, format!("lance dataset not found: {index_name}"));
    }
    // Path redaction — replace path-shaped substrings with [REDACTED]
    // rather than truncating, per opus BLOCK at service.rs:1914 from the
    // 2026-05-02 scrum. The previous `err.split("/home/").next()` returned
    // Some("") when the error string STARTED with "/home/", erasing the
    // entire message and falling back to a generic "lance backend error"
    // that lost all real error context. Replacing keeps the structural
    // error (the "what failed") while stripping the location.
    let cleaned = redact_paths(&err)
        .trim_end_matches([',', ' ', '\n', '\t'])
        .to_string();
    let msg = if cleaned.is_empty() {
        format!("lance backend error on {index_name}")
    } else {
        cleaned
    };
    (StatusCode::INTERNAL_SERVER_ERROR, msg)
 }
 /// Replace absolute-path substrings (under known leak-prone roots) with
 /// "[REDACTED]". Walks the input once, identifying path-shaped runs that
 /// start with one of the configured prefixes and continue until a
 /// path-terminating character (whitespace, quote, comma, paren, EOL).
 ///
 /// Linear time, no regex dep. Catches multi-occurrence cases that
 /// `String::split(p).next()` lost. The path-redaction surface intentionally
 /// includes /var, /tmp, /etc, /usr, /opt in addition to /home and
 /// /root/.cargo because Lance/Arrow errors surface system paths in
 /// addition to project paths.
 fn redact_paths(s: &str) -> String {
    // Two prefix sets:
    // - ABSOLUTE: paths starting with '/' (always safe to redact)
    // - RELATIVE: same path bodies but without leading '/' (Lance occasionally
    //   strips the leading slash when echoing dataset paths back, observed
    //   live 2026-05-02 — "Dataset at path home/profit/lakehouse/data/lance/x
    //   was not found"). Match these only when preceded by a non-alpha char
    //   (start of string, space, colon, etc.) so we don't redact innocent
    //   tokens like "homecoming" or "etcetera".
    const ABSOLUTE: &[&str] = &[
        "/root/.cargo", "/home", "/var", "/tmp", "/etc", "/usr", "/opt",
    ];
    const RELATIVE: &[&str] = &[
        "root/.cargo", "home/", "var/", "tmp/", "etc/", "usr/", "opt/",
    ];
    fn is_path_term(b: u8) -> bool {
        matches!(b, b' ' | b'\t' | b'\n' | b'\r' | b'"' | b'\'' | b',' | b')' | b']' | b'}')
    }
    fn is_word_boundary_before(bytes: &[u8], i: usize) -> bool {
        // True if byte at i-1 is non-alphanumeric (so this position starts
        // a fresh token). True at start-of-input.
        if i == 0 { return true; }
        let b = bytes[i - 1];
        !(b.is_ascii_alphanumeric() || b == b'_' || b == b'.' || b == b'-')
    }
    // Walk by byte index but slice the original &str when emitting, never
    // cast bytes to char (that would corrupt multi-byte UTF-8 — opus WARN
    // at service.rs:2018 from the 2026-05-03 re-scrum). Path prefixes are
    // pure ASCII so byte-level matching is sound; what matters is that
    // we emit non-matched stretches as &str slices, not byte-by-byte.
    let bytes = s.as_bytes();
    let mut out = String::with_capacity(s.len());
    let mut i = 0;
    let mut copy_start = 0usize;  // start of an in-progress unmatched run
    while i < bytes.len() {
        let mut matched_len: Option<usize> = None;
        // Try absolute prefixes first (always allowed).
        for p in ABSOLUTE {
            let pb = p.as_bytes();
            if i + pb.len() <= bytes.len() && &bytes[i..i + pb.len()] == pb {
                let after = i + pb.len();
                if after == bytes.len() || bytes[after] == b'/' || is_path_term(bytes[after]) {
                    matched_len = Some(pb.len());
                    break;
                }
            }
        }
        // Then relative prefixes — only at word boundaries.
        if matched_len.is_none() && is_word_boundary_before(bytes, i) {
            for p in RELATIVE {
                let pb = p.as_bytes();
                if i + pb.len() <= bytes.len() && &bytes[i..i + pb.len()] == pb {
                    matched_len = Some(pb.len());
                    break;
                }
            }
        }
        if let Some(prefix_len) = matched_len {
            // Flush any pending unmatched run as a UTF-8-safe slice.
            if copy_start < i {
                out.push_str(&s[copy_start..i]);
            }
            out.push_str("[REDACTED]");
            // Skip past the prefix and the path body (until terminator).
            let mut j = i + prefix_len;
            while j < bytes.len() && !is_path_term(bytes[j]) {
                j += 1;
            }
            i = j;
            copy_start = i;
        } else {
            // Advance one CHAR (not one byte) so multi-byte UTF-8 sequences
            // stay intact in the eventual slice. Look up the next char
            // boundary using the public API.
            i += utf8_char_len(bytes, i);
        }
    }
    if copy_start < bytes.len() {
        out.push_str(&s[copy_start..]);
    }
    out
 }
 /// Length in bytes of the UTF-8 character starting at byte `i`. Bytes are
 /// guaranteed to be a valid UTF-8 sequence start (callers ensure that).
 fn utf8_char_len(bytes: &[u8], i: usize) -> usize {
    let b = bytes[i];
    if b < 0x80 { 1 }
    else if b < 0xC0 { 1 }  // continuation byte — defensive, shouldn't start here
    else if b < 0xE0 { 2 }
    else if b < 0xF0 { 3 }
    else { 4 }
 }
 #[cfg(test)]
 mod sanitize_tests {
    use super::*;
    #[test]
    fn redact_path_at_offset_zero() {
        // Regression: opus BLOCK 2026-05-02. Old impl returned Some("")
        // when err started with "/home/", erasing the whole message.
        let out = redact_paths("/home/profit/lakehouse/data/lance not a directory");
        assert_eq!(out, "[REDACTED] not a directory");
    }
    #[test]
    fn redact_keeps_pre_and_post_text() {
        let out = redact_paths("failed to open /home/profit/lakehouse/data/x for read: ENOENT");
        assert_eq!(out, "failed to open [REDACTED] for read: ENOENT");
    }
    #[test]
    fn redact_multiple_paths() {
        let out = redact_paths("at /root/.cargo/registry/src/index.crates.io-foo/lance-table-4.0.0/src/io/commit.rs:364:26 from /home/profit/lakehouse");
        assert!(!out.contains("/root/.cargo"));
        assert!(!out.contains("/home/"));
        assert!(out.contains("[REDACTED]"));
    }
    #[test]
    fn redact_preserves_quote_terminator() {
        let out = redact_paths("{\"path\":\"/home/profit/x\",\"err\":\"bad\"}");
        assert_eq!(out, "{\"path\":\"[REDACTED]\",\"err\":\"bad\"}");
    }
    #[test]
    fn is_not_found_narrow_dataset_only() {
        // Regression: opus WARN 2026-05-02. Old impl 404'd on any "not
        // found" — including legitimate column/field-not-found 500s.
        let (status, _) = sanitize_lance_err(
            "column not found: vector".into(), "test_idx",
        );
        assert_eq!(status, StatusCode::INTERNAL_SERVER_ERROR);
        let (status, _) = sanitize_lance_err(
            "dataset not found at /home/profit/lakehouse/data/lance/missing".into(), "test_idx",
        );
        assert_eq!(status, StatusCode::NOT_FOUND);
    }
    #[test]
    fn redact_does_not_match_prefix_substring() {
        // /etcetera should NOT trigger /etc redaction.
        let out = redact_paths("etcetera and /etcd");
        assert_eq!(out, "etcetera and /etcd");
    }
    #[test]
    fn redact_relative_paths_lance_emits() {
        // 2026-05-02: live missing-index probe surfaced Lance error of the
        // form "Dataset at path home/profit/lakehouse/data/lance/x was not
        // found" — leading slash stripped. Need to redact the relative form
        // when preceded by a word boundary.
        let out = redact_paths("Dataset at path home/profit/lakehouse/data/lance/x was not found");
        assert!(!out.contains("home/profit"), "should redact: {out}");
        assert!(out.contains("Dataset at path"));
        assert!(out.contains("was not found"));
    }
    #[test]
    fn redact_does_not_eat_innocent_prefix_words() {
        // "homecoming" must NOT trigger "home/" redaction. "Etcetera" must
        // NOT trigger "etc/" redaction. The word-boundary guard handles this.
        let out = redact_paths("homecoming etcetera vary tmpfile");
        assert_eq!(out, "homecoming etcetera vary tmpfile");
    }
    #[test]
    fn is_not_found_lance_actual_phrasing() {
        // Lance's actual error format observed live: "Dataset at path X was
        // not found: Not found: ...". Must 404, not 500.
        let (status, _) = sanitize_lance_err(
            "Dataset at path home/profit/lakehouse/data/lance/x was not found".into(),
            "x",
        );
        assert_eq!(status, StatusCode::NOT_FOUND);
    }
    #[test]
    fn is_not_found_excludes_column_field_schema() {
        // Real 500s with the "not found" phrase that aren't dataset-missing.
        for err in [
            "column not found: vector",
            "field not found in schema: doc_id",
            "schema not found for dataset xyz",
        ] {
            let (status, _) = sanitize_lance_err(err.into(), "test_idx");
            assert_eq!(status, StatusCode::INTERNAL_SERVER_ERROR, "{err}");
        }
    }
    #[test]
    fn is_not_found_does_not_match_unrelated_path_missing() {
        // Regression: opus WARN at service.rs:1949 from the 2026-05-03
        // re-scrum. A registry-file error from inside a Lance internal
        // module should NOT be coerced to 404 just because it contains
        // "no such file or directory" — it's a real 500.
        let (status, _) = sanitize_lance_err(
            "/root/.cargo/registry/src/index.crates.io-foo/lance-table-4.0.0/src/io/commit.rs: no such file or directory".into(),
            "test_idx",
        );
        assert_eq!(status, StatusCode::INTERNAL_SERVER_ERROR);
        // (And the path is still redacted in the message.)
        let (_, msg) = sanitize_lance_err(
            "/root/.cargo/registry/src/lance-foo/x.rs: no such file or directory".into(),
            "test_idx",
        );
        assert!(!msg.contains("/root/.cargo"), "path leak: {msg}");
    }
    #[test]
    fn redact_preserves_multibyte_utf8() {
        // Regression: opus WARN at service.rs:2018 from the 2026-05-03
        // re-scrum. Old impl did `out.push(bytes[i] as char)` which
        // corrupted multi-byte UTF-8 (e.g. a path containing user-supplied
        // names with non-ASCII characters) into Latin-1 mojibake.
        let input = "Failed to open /home/profit/工作/data — café not found";
        let out = redact_paths(input);
        // The path is redacted...
        assert!(!out.contains("/home/profit"), "path leak: {out}");
        // ...AND the multi-byte characters elsewhere are preserved verbatim.
        assert!(out.contains("café"), "lost UTF-8: {out}");
        assert!(out.contains("not found"), "lost trailing context: {out}");
    }
 }
 /// Build the IVF_PQ index on the Lance dataset.
 async fn lance_build_index(
    State(state): State<VectorState>,
@ -1895,10 +2218,10 @@ async fn lance_build_index(
    Json(req): Json<LanceIndexRequest>,
 ) -> impl IntoResponse {
    let lance_store = state.lance.store_for(&index_name).await
-        .map_err(|e| (StatusCode::BAD_REQUEST, e))?;
+        .map_err(|e| sanitize_lance_err(e, &index_name))?;
    match lance_store.build_index(req.num_partitions, req.num_bits, req.num_sub_vectors).await {
        Ok(stats) => Ok(Json(stats)),
-        Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
+        Err(e) => Err(sanitize_lance_err(e, &index_name)),
    }
 }
@ -1947,13 +2270,13 @@ async fn lance_search(
    let qv: Vec<f32> = embed_resp.embeddings[0].iter().map(|&x| x as f32).collect();
    let lance_store = state.lance.store_for(&index_name).await
-        .map_err(|e| (StatusCode::BAD_REQUEST, e))?;
+        .map_err(|e| sanitize_lance_err(e, &index_name))?;
    let t0 = std::time::Instant::now();
    let nprobes = req.nprobes.or(Some(LANCE_DEFAULT_NPROBES));
    let refine = req.refine_factor.or(Some(LANCE_DEFAULT_REFINE_FACTOR));
    let hits = lance_store.search(&qv, req.top_k, nprobes, refine).await
-        .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?;
+        .map_err(|e| sanitize_lance_err(e, &index_name))?;
    Ok(Json(serde_json::json!({
        "index_name": index_name,
@ -1971,7 +2294,7 @@ async fn lance_get_doc(
    Path((index_name, doc_id)): Path<(String, String)>,
 ) -> impl IntoResponse {
    let lance_store = state.lance.store_for(&index_name).await
-        .map_err(|e| (StatusCode::BAD_REQUEST, e))?;
+        .map_err(|e| sanitize_lance_err(e, &index_name))?;
    let t0 = std::time::Instant::now();
    match lance_store.get_by_doc_id(&doc_id).await {
        Ok(Some(row)) => Ok(Json(serde_json::json!({
@ -1981,7 +2304,7 @@ async fn lance_get_doc(
            "row": row,
        }))),
        Ok(None) => Err((StatusCode::NOT_FOUND, format!("doc_id not found: {doc_id}"))),
-        Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
+        Err(e) => Err(sanitize_lance_err(e, &index_name)),
    }
 }
@ -2013,7 +2336,7 @@ async fn lance_append(
        return Err((StatusCode::BAD_REQUEST, "rows array is empty".into()));
    }
    let lance_store = state.lance.store_for(&index_name).await
-        .map_err(|e| (StatusCode::BAD_REQUEST, e))?;
+        .map_err(|e| sanitize_lance_err(e, &index_name))?;
    let mut doc_ids = Vec::with_capacity(req.rows.len());
    let mut chunk_idxs = Vec::with_capacity(req.rows.len());
--- a/data/_catalog/manifests/32ee74a0-59b4-4e5b-8edb-70c9347a4bf3.json
+++ b/data/_catalog/manifests/32ee74a0-59b4-4e5b-8edb-70c9347a4bf3.json
@ -11,15 +11,51 @@
    }
  ],
  "created_at": "2026-04-20T11:07:57.308050648Z",
-  "updated_at": "2026-04-22T03:28:28.343843823Z",
+  "updated_at": "2026-04-28T01:28:31.280305207Z",
  "description": "",
  "owner": "",
  "sensitivity": null,
-  "columns": [],
+  "columns": [
    {
      "name": "timestamp",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "operation",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "approach",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "result",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "context",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    }
  ],
  "lineage": null,
  "freshness": null,
  "tags": [],
-  "row_count": null,
+  "row_count": 2077,
  "last_embedded_at": null,
  "embedding_stale_since": null,
  "embedding_refresh_policy": null
--- a/data/_catalog/manifests/564b00ae-cbf3-4efd-aa55-84cdb6d2b0b7.json
+++ b/data/_catalog/manifests/564b00ae-cbf3-4efd-aa55-84cdb6d2b0b7.json
@ -1,117 +0,0 @@
 {
  "id": "564b00ae-cbf3-4efd-aa55-84cdb6d2b0b7",
  "name": "client_workerskjkk",
  "schema_fingerprint": "cdfe85348885ddf329e5e6e9bf0e2c75c92d1a86fdb0fd3875ed46e3f93c4d82",
  "objects": [
    {
      "bucket": "primary",
      "key": "datasets/client_workerskjkk.parquet",
      "size_bytes": 32201,
      "created_at": "2026-04-21T00:49:04.623625149Z"
    }
  ],
  "created_at": "2026-04-21T00:49:04.623626738Z",
  "updated_at": "2026-04-21T00:49:04.623901788Z",
  "description": "",
  "owner": "",
  "sensitivity": "pii",
  "columns": [
    {
      "name": "worker_id",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "name",
      "data_type": "Utf8",
      "sensitivity": "pii",
      "description": "",
      "is_pii": true
    },
    {
      "name": "role",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "city",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "state",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "email",
      "data_type": "Utf8",
      "sensitivity": "pii",
      "description": "",
      "is_pii": true
    },
    {
      "name": "phone",
      "data_type": "Utf8",
      "sensitivity": "pii",
      "description": "",
      "is_pii": true
    },
    {
      "name": "skills",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "certifications",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "availability",
      "data_type": "Float64",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "reliability",
      "data_type": "Float64",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    },
    {
      "name": "archetype",
      "data_type": "Utf8",
      "sensitivity": null,
      "description": "",
      "is_pii": false
    }
  ],
  "lineage": {
    "source_system": "csv",
    "source_file": "staffing_roster_sample.csv",
    "ingest_job": "ingest-1776732544623",
    "ingest_timestamp": "2026-04-21T00:49:04.623625149Z",
    "parent_datasets": []
  },
  "freshness": null,
  "tags": [],
  "row_count": 180,
  "last_embedded_at": null,
  "embedding_stale_since": null,
  "embedding_refresh_policy": null
 }
--- a/data/headshots/manifest.jsonl
+++ b/data/headshots/manifest.jsonl
--- a/docs/ARCHITECTURE_COMPARISON.md
+++ b/docs/ARCHITECTURE_COMPARISON.md
@ -0,0 +1,46 @@
 # Lakehouse: Rust vs Go architecture comparison
 > **Source of truth lives in the golangLAKEHOUSE repo:**
 > [`/home/profit/golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md`](file:///home/profit/golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md)
 >
 > J's living document — pulled from there into this repo's docs as
 > a pointer so the comparison is reachable from either side.
 ## Why the source lives in golangLAKEHOUSE
 The Go rewrite was the trigger for the comparison. The doc updates as
 J ships fixes on either side, and most of the open backlog items
 (materializer port, replay port, validators network surface) land in
 the Go repo. Keeping the source there means PR auditing on Go
 commits also catches doc drift.
 ## When to update from this side
 If a fix lands in the Rust repo that changes a comparison value
 (e.g. embed cache change, sidecar drop, new validator), update both:
 1. The source at `/home/profit/golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md`
 2. The change log section at the bottom of the same file
 This file is a pointer — **do not put authoritative content here.**
 Drift between two copies wastes the discipline.
 ## Quick links
 - **Decisions tracker** — section near the top of the source file.
  Lists actioned items + open backlog with LOC estimates.
 - **Performance numbers** — Python dependency section. Updated each
  time a load test is rerun.
 - **Distillation porting status** — table of phase-by-phase port
  state across runtimes.
 - **Recommendation** — current working hypothesis on Go-primary vs
  Rust-primary. Subject to change as fixes ship.
 ## Last known state
 - **2026-05-01**: Rust embed cache shipped (`150cc3b`), 236× RPS gain.
 - **2026-05-01**: Go validator port shipped (`b03521a`), production
  safety net now on Go side.
 - **Open**: Drop Rust Python sidecar (~200 LOC, universal-win).
 - **Open**: Port Rust materializer to Go (~500-800 LOC, unblocks
  Go-only end-to-end pipeline).
--- a/docs/PHASE_AUDIT_GUIDE.md
+++ b/docs/PHASE_AUDIT_GUIDE.md
@ -0,0 +1,107 @@
 # Phase Audit Guidance for Claude Code
 ## Purpose
 This document provides the proper workflow for auditing completed phases in the Lakehouse project.
 ## ⚠️ Important: Do NOT Skip Steps
 Each phase requires BOTH:
 1. PRD spec verification (check code exists)
 2. Full SCRUM execution (6 commands)
 ## Proper Phase Audit Workflow
 ### Step 1: Read PRD Specification
 For each phase, read the PRD to understand what's supposed to ship:
 ```bash
 # Read from docs/PRD.md or docs/PHASES.md
 cat docs/PHASES.md | grep -A20 "Phase N:"
 ```
 ### Step 2: Verify Code Exists
 Check that each deliverable from the PRD spec has corresponding code:
 ```bash
 # Example - check for specific implementations
 grep -r "function_name" crates/*/src/
 ls crates/*/src/*.rs
 ```
 ### Step 3: Run Full SCRUM (6 Commands)
 In order, execute ALL of these for the phase's crates:
 ```bash
 # 1. Build
 cargo build -p <crate-name>
 # 2. Test  
 cargo test -p <crate-name>
 # 3. Clippy (if installed)
 cargo clippy -p <crate-name> -- -D warnings
 # 4. Format check
 cargo fmt -p <crate-name> -- --check
 # 5. Cargo check
 cargo check -p <crate-name>
 # 6. Doc check
 cargo doc -p <crate-name> --no-deps
 ```
 ### Step 4: Fix Issues
 If any SCRUM command fails:
 - Fix the code
 - Re-run the failing command
 - Re-run ALL 6 commands to verify
 ### Step 5: Update Phase Documentation
 Only mark as ✅ after ALL 6 SCRUM commands pass:
 ```markdown
 ## Phase N: [Name] ✅
 - [x] spec item 1
 - [x] spec item 2
  - SCRUM: build ✅ test ✅ clippy ✅ fmt ✅ check ✅ doc ✅
 ```
 ## Current Phase Status
 | Phase | Status | Notes |
 |-------|--------|-------|
 | 0 | ✅ | Bootstrap complete |
 | 1 | ✅ | Storage + Catalog |
 | 2 | ✅ | Query Engine |
 | 3 | ✅ | AI Integration |
 | 4 | ✅ | Frontend |
 | 5 | ✅ | Hardening |
 | 6-42 | ✅ | See docs/PHASES.md |
 ## Notes from Previous Session
 - Clippy and rustfmt are NOT installed on this system
 - Run `rustup component add clippy rustfmt` to install
 - Some crates have 0 unit tests (expected for service crates)
 - 28 warnings remain in unused code paths (ui/vectord)
 ## Key Files
 - `docs/PHASES.md` - Phase tracker with checkboxes
 - `docs/PRD.md` - Full product requirements
 - `docs/CONTROL_PLANE_PRD.md` - Phases 38+ specifications
 - `crates/*/` - All crate implementations
 ## Quick Reference
 ```bash
 # Full workspace SCRUM
 cargo build --workspace
 cargo test --workspace
 # (clippy if installed)
 cargo fmt -- --check
 cargo check --workspace
 cargo doc --no-deps
 # Per-crate
 cargo build -p <crate>
 cargo test -p <crate>
 cargo check -p <crate>
 ```
--- a/lakehouse.toml
+++ b/lakehouse.toml
@ -3,6 +3,15 @@
 [gateway]
 host = "0.0.0.0"
 port = 3100
 # Coordinator session JSONL — one row per /v1/iterate session for
 # offline DuckDB analysis. Cross-runtime parity with the Go-side
 # [validatord].session_log_path. Set to the SAME path Go validatord
 # writes to so DuckDB queries see one unified longitudinal stream
 # across both runtimes (rows are tagged daemon="gateway" vs
 # daemon="validatord" so producers stay distinguishable). Append-write
 # is atomic at the row sizes both runtimes produce — both daemons
 # co-writing is safe.
 session_log_path = "/tmp/lakehouse-validator/sessions.jsonl"
 [storage]
 root = "./data"
@ -44,12 +53,22 @@ manifest_prefix = "_catalog/manifests"
 # max_rows_per_query = 10000
 [sidecar]
-url = "http://localhost:3200"
+# Post-2026-05-02: AiClient talks directly to Ollama; the Python
 # sidecar's hot-path role (~120 LOC of pure Ollama wrappers) was
 # retired. Field name kept for migration compat — value now points
 # at Ollama on :11434. Lab UI + pipeline_lab Python remains as a
 # dev-only tool, NOT on this URL.
 url = "http://localhost:11434"
 [ai]
 embed_model = "nomic-embed-text"
-gen_model = "qwen2.5"
+# Local-tier defaults bumped 2026-04-30: qwen3.5:latest is the
-rerank_model = "qwen2.5"
+# stronger local rung in the 5-loop substrate (per
 # project_small_model_pipeline_vision.md). Same JSON-clean property
 # as qwen2.5, more capacity. Ollama still serves both — bump back
 # in this file if a workload regressed.
 gen_model = "qwen3.5:latest"
 rerank_model = "qwen3.5:latest"
 [auth]
 enabled = false
@ -72,7 +91,9 @@ min_recall = 0.9                          # never promote below this
 max_trials_per_hour = 20                  # hard budget cap
 # Model roster — available for profile hot-swap
 # qwen3.5:latest: stronger local rung — JSON-clean, 8K+ context,
 #                 default for gen_model and rerank_model
 # qwen3: 8.2B, 40K context, thinking+tools, best for reasoning tasks
-# qwen2.5: 7B, 8K context, fast, good for SQL generation
+# qwen2.5: 7B, 8K context, fast — kept loaded for the 2026-04 era
-# mistral: 7B, 8K context, good for general generation
+#          comparison runs; new defaults use qwen3.5:latest
 # nomic-embed-text: 137M, embedding-only, used by all profiles
--- a/mcp-server/console.html
+++ b/mcp-server/console.html
@ -51,9 +51,28 @@ details .body{padding-top:10px;font-size:12px;color:#8b949e}
 .accent-b{border-left:3px solid #1f6feb}
 .accent-a{border-left:3px solid #bc8cff}
 .accent-w{border-left:3px solid #d29922}
 .accent-g{border-left:3px solid #3fb950}
 .accent-r{border-left:3px solid #f85149}
-.worker{display:flex;align-items:center;gap:10px;padding:8px 10px;background:#161b22;border-radius:6px;margin-bottom:4px;font-size:12px}
+.worker{display:flex;align-items:center;gap:10px;padding:8px 10px;background:#161b22;border-radius:6px;margin-bottom:4px;font-size:12px;border-left:3px solid #30363d}
-.worker .av{width:28px;height:28px;border-radius:6px;background:#1a2744;display:flex;align-items:center;justify-content:center;font-weight:600;color:#e6edf3;font-size:10px;flex-shrink:0}
+.worker .av{width:32px;height:32px;border-radius:50%;background:#0d1117;border:1px solid #21262d;display:flex;align-items:center;justify-content:center;font-weight:600;color:#c9d1d9;font-size:11px;flex-shrink:0;letter-spacing:0.5px;overflow:hidden;position:relative}
 .worker .av img{position:absolute;inset:0;width:100%;height:100%;object-fit:cover;display:block;
  /* Softening — mirror of search.html. Pulls saturation + contrast off
     the SDXL Turbo over-render so faces feel less "AI-generated".
     If you tweak one, tweak the other. */
  filter: saturate(0.86) contrast(0.93) brightness(1.02) blur(0.3px);
 }
 .worker[data-role-band="warehouse"]{border-left-color:#58a6ff}
 .worker[data-role-band="production"]{border-left-color:#d29922}
 .worker[data-role-band="trades"]{border-left-color:#bc8cff}
 .worker[data-role-band="driver"]{border-left-color:#3fb950}
 .worker[data-role-band="lead"]{border-left-color:#f0883e}
 .role-pill{display:inline-block;font-size:9px;padding:1px 7px;border-radius:3px;background:#0d1117;color:#8b949e;margin-right:6px;font-weight:600;letter-spacing:0.4px;text-transform:uppercase;border-left:2px solid #30363d;vertical-align:1px}
 .role-pill[data-rb="warehouse"]{border-left-color:#58a6ff;color:#79c0ff}
 .role-pill[data-rb="production"]{border-left-color:#d29922;color:#e3b341}
 .role-pill[data-rb="trades"]{border-left-color:#bc8cff;color:#d2a8ff}
 .role-pill[data-rb="driver"]{border-left-color:#3fb950;color:#56d364}
 .role-pill[data-rb="lead"]{border-left-color:#f0883e;color:#ffa657}
 .worker .info{flex:1;min-width:0}
 .worker .nm{color:#e6edf3;font-weight:500}
 .worker .why{color:#545d68;font-size:11px;margin-top:1px}
@ -95,6 +114,7 @@ details .body{padding-top:10px;font-size:12px;color:#8b949e}
  <nav>
    <a href=".">Dashboard</a>
    <a href="console" class="active">Walkthrough</a>
    <a href="profiler">Profiler</a>
    <a href="proof">Architecture</a>
    <a href="spec">Spec</a>
    <a href="onboard">Onboard</a>
@ -147,11 +167,40 @@ details .body{padding-top:10px;font-size:12px;color:#8b949e}
 <div class="chapter">
  <div class="num">Chapter 6</div>
-  <h2>Try it yourself</h2>
+  <h2>Three coordinators, three views of the same corpus</h2>
-  <div class="lede">Type any staffing question. The system picks the right search path (smart-parse, semantic discovery, analytics), shows what it understood, and returns ranked results with memory signal.</div>
+  <div class="lede">Maria runs Chicago, Devon runs Indianapolis, Aisha runs Milwaukee. Same database, same playbooks — but the search results, the recurring-skill patterns, and the playbook context all reshape to whoever is acting. This is the per-staffer hot-swap index: the relevance gradient is unique to each person, and gets sharper the more they use it.</div>
  <div id="ch6-staffers"><div class="loading">Loading staffer roster…</div></div>
 </div>
 <div class="chapter">
  <div class="num">Chapter 7</div>
  <h2>The hidden signal — public issuers in your contractor graph</h2>
  <div class="lede">Every contractor in this corpus is also a forward indicator on the public equities they touch. Permit filings precede construction starts by ~45 days, staffing windows by ~30, revenue recognition by months. The associated-ticker network surfaces this signal <em>before</em> any 10-Q. Below: the top issuers attributable to the contractor activity in this view, with live prices.</div>
  <div id="ch7-signal"><div class="loading">Computing the Building Activity Index…</div></div>
 </div>
 <div class="chapter">
  <div class="num">Chapter 8</div>
  <h2>When something breaks — triage in one shot</h2>
  <div class="lede">A coordinator gets a text: "Marcus running late." Watch what the system does in 250 milliseconds: pulls Marcus's record, scores his attendance pattern, finds five same-role same-geo backfills sorted by responsiveness, and pre-writes the SMS to send to the client. This is the moment the AI becomes worth its weight.</div>
  <div id="ch8-triage"><div class="loading">Running the triage scenario…</div></div>
 </div>
 <div class="chapter">
  <div class="num">Chapter 9</div>
  <h2>Try it yourself — every input below hits a different route</h2>
  <div class="lede">Type any staffing question. The router picks the right path: smart-parse (zip code, headcount, role, state), semantic discovery, name lookup, late-worker triage, "what came in last night" temporal queries. Whatever you type, the system tells you what it understood and how it routed.</div>
  <div class="try-box">
-    <input type="text" id="try-q" placeholder="e.g. reliable forklift operators in Chicago with OSHA certs" onkeydown="if(event.key==='Enter')runTry()">
+    <input type="text" id="try-q" placeholder="e.g. 8 production workers near 60607 by next Friday" onkeydown="if(event.key==='Enter')runTry()">
    <button id="try-btn" onclick="runTry()">Ask</button>
    <div style="margin-top:10px;font-size:11px;color:#545d68;line-height:1.7">
      Try one of these to see different routes fire:<br>
      <a href="#" onclick="document.getElementById('try-q').value='8 production workers near 60607';runTry();return false">8 production workers near 60607</a> ·
      <a href="#" onclick="document.getElementById('try-q').value='Marcus running late site 4422';runTry();return false">Marcus running late site 4422</a> ·
      <a href="#" onclick="document.getElementById('try-q').value='Marcus';runTry();return false">Marcus</a> ·
      <a href="#" onclick="document.getElementById('try-q').value='what came in last night';runTry();return false">what came in last night</a> ·
      <a href="#" onclick="document.getElementById('try-q').value='reliable forklift operators with OSHA certs';runTry();return false">reliable forklift operators with OSHA certs</a>
    </div>
    <div id="try-out" style="margin-top:16px"></div>
  </div>
 </div>
@ -167,6 +216,132 @@ var A=location.origin+P;
 // DOM helpers — all dynamic content goes through these. No innerHTML
 // anywhere in the script; every API-derived string passes through
 // textContent so no injection path regardless of upstream data.
 // Role classification — mirrors search.html, no emojis. Maps role
 // strings to a band+label used by the worker-card border + role pill.
 var ROLE_BANDS = [
  { match: /forklift|warehouse|associate|material\s*handler|loader|loading|packag|shipping|logistics|inventory|sanitation|janit/i, band: 'warehouse', label: 'Warehouse' },
  { match: /production|assembl/i, band: 'production', label: 'Production' },
  { match: /welder|weld|electric|maint(enance)?\s*tech|cnc|machine\s*op|hvac|plumb|carpenter|mason/i, band: 'trades', label: 'Skilled Trade' },
  { match: /driver|truck|haul|cdl/i, band: 'driver', label: 'Driver' },
  { match: /line\s*lead|supervisor|foreman|coordinator/i, band: 'lead', label: 'Lead' },
  { match: /quality/i, band: 'production', label: 'Quality' },
 ];
 function roleBand(role){
  if(!role) return { band: 'warehouse', label: '' };
  for (var i = 0; i < ROLE_BANDS.length; i++) {
    if (ROLE_BANDS[i].match.test(role)) return ROLE_BANDS[i];
  }
  return { band: 'warehouse', label: role.split(' ')[0].toUpperCase().slice(0, 12) };
 }
 // Build a sober worker card: monogram avatar + colored role band on
 // the left edge + uppercase role pill in the detail line. Used by
 // every chapter that renders worker rows. `name` and `role` drive the
 // classification; `detail` is the full text after the pill.
 // Quick first-name → gender hint for face-pool selection. Same lookup
 // idea as the dashboard; if the name is unknown, the server falls back
 // to the full pool. Trimmed table — covers the most common names that
 // appear in the synthetic worker data.
 var FEMALE_NAMES = new Set(['Mary','Patricia','Jennifer','Linda','Elizabeth','Barbara','Susan','Jessica','Sarah','Karen','Lisa','Nancy','Betty','Sandra','Margaret','Ashley','Kimberly','Emily','Donna','Michelle','Carol','Amanda','Melissa','Deborah','Stephanie','Dorothy','Rebecca','Sharon','Laura','Cynthia','Amy','Kathleen','Angela','Shirley','Brenda','Emma','Anna','Pamela','Nicole','Samantha','Katherine','Christine','Helen','Debra','Rachel','Carolyn','Janet','Maria','Catherine','Heather','Diane','Olivia','Julie','Joyce','Victoria','Ruth','Virginia','Lauren','Kelly','Christina','Joan','Evelyn','Judith','Andrea','Hannah','Megan','Cheryl','Jacqueline','Martha','Madison','Teresa','Gloria','Sara','Janice','Ann','Kathryn','Abigail','Sophia','Frances','Jean','Alice','Judy','Isabella','Julia','Grace','Amber','Denise','Danielle','Marilyn','Beverly','Charlotte','Natalie','Theresa','Diana','Brittany','Kayla','Alexis','Lori','Marie','Carmen','Aisha','Rosa','Mia','Audrey','Erin','Tina','Vanessa','Tara','Wendy','Tanya','Maya','Crystal','Yvonne','Kara','Shannon','Brianna','Faith','Caroline','Carla','Tracey','Tracy','Rita','Dawn','Tiffany','Stacy','Stacey','Gina','Bonnie','Tammy','Joanne','Jamie','Tonya','Alyssa','Ariana','Elena','Ellie','Erica','Erika','Felicia','Holly','Jenna','Jenny','Krista','Kristen','Kristin','Krystal','Lana','Leah','Lucy','Mallory','Melinda','Meredith','Misty','Monica','Naomi','Paige','Paula','Renee','Rhonda','Robin','Roxanne','Selena','Sierra','Skylar','Sonia','Stella','Tamara','Veronica','Vivian','Whitney','Yolanda','Zoe']);
 var MALE_NAMES = new Set(['James','Robert','John','Michael','David','William','Richard','Joseph','Thomas','Charles','Christopher','Daniel','Matthew','Anthony','Mark','Donald','Steven','Paul','Andrew','Joshua','Kenneth','Kevin','Brian','George','Edward','Ronald','Timothy','Jason','Jeffrey','Ryan','Jacob','Gary','Nicholas','Eric','Jonathan','Stephen','Larry','Justin','Scott','Brandon','Benjamin','Samuel','Gregory','Frank','Alexander','Raymond','Patrick','Jack','Dennis','Jerry','Tyler','Aaron','Jose','Adam','Henry','Nathan','Douglas','Zachary','Peter','Kyle','Walter','Ethan','Jeremy','Harold','Keith','Christian','Roger','Noah','Gerald','Carl','Terry','Sean','Austin','Arthur','Lawrence','Jesse','Dylan','Bryan','Joe','Jordan','Billy','Bruce','Albert','Willie','Gabriel','Logan','Alan','Juan','Wayne','Roy','Ralph','Randy','Eugene','Vincent','Russell','Elijah','Louis','Bobby','Philip','Johnny','Marcus','Antonio','Carlos','Diego','Hector','Jorge','Julio','Manuel','Miguel','Pedro','Raul','Ricardo','Roberto','Sergio','Victor','Jamal','Xavier','DeShawn','Dwayne','Jermaine','Malik','Tyrone','Devon','Andre','Brent','Calvin','Casey','Cody','Cole','Cory','Dale','Damon','Darius','Darrell','Dean','Derek','Drew','Earl','Eddie','Floyd','Glenn','Greg','Howard','Ivan','Jared','Jay','Jeff','Joel','Lance','Lee','Leonard','Lloyd','Mario','Martin','Mason','Maurice','Max','Mitchell','Morgan','Nick','Norman','Oliver','Owen','Pete','Quincy','Rafael','Reggie','Rex','Ricky','Russ','Shane','Shaun','Stanley','Steve','Theodore','Todd','Travis','Trevor','Troy','Wade','Warren','Wesley']);
 function guessGenderFromFirstName(n){
  if(!n) return null;
  var clean=n.replace(/[^A-Za-z]/g,'');
  if(!clean) return null;
  var c=clean[0].toUpperCase()+clean.slice(1).toLowerCase();
  if(FEMALE_NAMES.has(c)) return 'woman';
  if(MALE_NAMES.has(c)) return 'man';
  return null;
 }
 function genderFor(name){
  var g = guessGenderFromFirstName(name);
  if(g) return g;
  if(!name) return 'man';
  var s=String(name); var h=0;
  for(var i=0;i<s.length;i++) h=(h*31+s.charCodeAt(i))|0;
  return (Math.abs(h)&1)?'man':'woman';
 }
 // Confident first-name → ethnicity. Synthetic data — we own the call.
 var NAMES_SOUTH_ASIAN_C=new Set(['Raj','Anil','Rohan','Vikram','Arjun','Sanjay','Ravi','Krishna','Pradeep','Sunil','Amit','Deepak','Ashok','Manoj','Rahul','Vijay','Suresh','Naveen','Anand','Nikhil','Aditya','Karan','Rajesh','Priya','Anjali','Neha','Kavya','Pooja','Divya','Meera','Lakshmi','Rani','Asha','Saanvi','Aanya','Aaradhya','Shreya','Riya','Tanvi','Ishita','Aarav','Ishaan','Shivani']);
 var NAMES_EAST_ASIAN_C=new Set(['Wei','Mei','Yi','Jin','Chen','Lin','Liu','Wang','Zhang','Yang','Wu','Zhao','Sun','Hiroshi','Yuki','Akira','Kenji','Sakura','Aiko','Haruto','Sora','Hyun','Eun','Yoon','Kai','Long','Hong','Xiu','Lan','Hua','Hao','Tao','Bao','Cheng','Feng','Jian','Dong','Bin','Min','Lei','Hui','Yu','Xin','Ying','Zhen','Yuan','Yan']);
 var NAMES_HISPANIC_C=new Set(['Carmen','Carlos','Maria','Diego','Hector','Jorge','Julio','Manuel','Miguel','Pedro','Raul','Ricardo','Roberto','Sergio','Antonio','Esperanza','Luz','Sofia','Lucia','Isabella','Camila','Valentina','Mariana','Elena','Rosa','Catalina','Esteban','Fernando','Eduardo','Javier','Alejandro','Andres','Mateo','Santiago','Sebastian','Emilio','Tomas','Cristina','Daniela','Gabriela','Ximena','Adriana','Beatriz','Pilar','Mercedes','Xavier','Marisol','Guadalupe','Lupita','Inez','Itzel','Yesenia','Joaquin','Ignacio','Rafael','Salvador','Cesar','Arturo','Armando','Hugo','Marco','Alejandra','Felipe','Gerardo','Jaime','Leonardo','Luis','Pablo','Ramon']);
 var NAMES_BLACK_C=new Set(['DeShawn','Jamal','Aisha','Latoya','Tyrone','Malik','Imani','Keisha','Tariq','Lakisha','Kenya','Tamika','Andre','Marcus','Demetrius','Jermaine','Reggie','Tyrese','Darius','Trevon','Kareem','Damon','Jalen','Jaylen','Dwayne','DaQuan','Aaliyah','Kiara','Janelle','Jasmine','Tanisha','Maurice','Tyrell','Kwame','Khalil','Terrell','Cedric','Nia','Zuri','Jada','Ebony','Dominique']);
 var NAMES_MIDDLE_EASTERN_C=new Set(['Layla','Omar','Khalid','Fatima','Yasmin','Hassan','Hussein','Ahmed','Mohamed','Mohammed','Ali','Karim','Yusuf','Yara','Nadia','Zainab','Rania','Samira','Mariam','Salma','Ibrahim','Mahmoud','Saif','Anwar','Bilal','Faisal','Hamza','Imran','Sami','Wael','Zaid','Amira','Iman','Lina','Mona','Noor','Rana','Soha','Zara']);
 // Surname → ethnicity. Surname is more diagnostic than first name
 // for hispanic and asian — "Anna Cruz" is hispanic via surname.
 var SURNAMES_HISPANIC_C=new Set(['Garcia','Rodriguez','Martinez','Hernandez','Lopez','Gonzalez','Perez','Sanchez','Ramirez','Torres','Flores','Rivera','Gomez','Diaz','Reyes','Cruz','Morales','Ortiz','Gutierrez','Chavez','Ramos','Ruiz','Alvarez','Mendoza','Vasquez','Castillo','Jimenez','Moreno','Romero','Herrera','Medina','Aguilar','Vargas','Castro','Fernandez','Guzman','Munoz','Salazar','Ortega','Delgado','Estrada','Ayala','Pena','Cabrera','Alvarado','Espinoza','Padilla','Cardenas','Cortes','Ibarra','Vega','Soto','Lara','Navarro','Campos','Acosta','Rios','Marquez','Sandoval','Maldonado','Solis','Rojas','Mejia','Beltran','Cervantes','Lozano','Carrillo','Trevino','Robles','Tapia','Lugo']);
 var SURNAMES_SOUTH_ASIAN_C=new Set(['Patel','Singh','Kumar','Sharma','Gupta','Shah','Mehta','Desai','Joshi','Reddy','Nair','Iyer','Verma','Agarwal','Kapoor','Chopra','Malhotra','Banerjee','Chatterjee','Mukherjee','Das','Sen','Bose','Roy','Sinha','Trivedi','Pandey','Mishra','Tiwari','Yadav','Chauhan','Rana','Thakur','Pillai','Menon','Krishnan','Rao','Naidu','Pradhan','Acharya','Devi','Kaur']);
 var SURNAMES_EAST_ASIAN_C=new Set(['Chen','Wang','Li','Liu','Yang','Huang','Zhao','Wu','Zhou','Xu','Zhu','Sun','Ma','Lin','Lee','Kim','Park','Choi','Jung','Kang','Cho','Yoon','Han','Lim','Oh','Nakamura','Tanaka','Suzuki','Yamamoto','Sato','Watanabe','Takahashi','Kobayashi','Yoshida','Saito','Nguyen','Tran','Le','Pham','Hoang','Phan','Vu','Vo','Dang','Bui','Do','Ngo','Truong','Mai','Cao','Wong','Tang','Tan','Cheng','Lau','Leung','Ng','Cheung','Yip','Hsu','Tsai','Hsieh']);
 var SURNAMES_MIDDLE_EASTERN_C=new Set(['Khan','Ahmed','Hussein','Hassan','Ali','Mahmoud','Mohamed','Mohammed','Saleh','Aziz','Karim','Hamad','Najjar','Haddad','Khoury','Mansour','Rahman','Iqbal','Malik','Sheikh','Siddiqui','Qureshi','Saeed']);
 function guessEthnicityFromName(first, last){
  if(last){
    var s=last.replace(/[^A-Za-z]/g,'');
    if(s){
      var sc=s[0].toUpperCase()+s.slice(1).toLowerCase();
      if(SURNAMES_HISPANIC_C.has(sc)) return 'hispanic';
      if(SURNAMES_MIDDLE_EASTERN_C.has(sc)) return 'middle_eastern';
      if(SURNAMES_SOUTH_ASIAN_C.has(sc)) return 'south_asian';
      if(SURNAMES_EAST_ASIAN_C.has(sc)) return 'east_asian';
    }
  }
  if(first){
    var clean=first.replace(/[^A-Za-z]/g,'');
    if(clean){
      var c=clean[0].toUpperCase()+clean.slice(1).toLowerCase();
      if(NAMES_MIDDLE_EASTERN_C.has(c)) return 'middle_eastern';
      if(NAMES_BLACK_C.has(c)) return 'black';
      if(NAMES_HISPANIC_C.has(c)) return 'hispanic';
      if(NAMES_SOUTH_ASIAN_C.has(c)) return 'south_asian';
      if(NAMES_EAST_ASIAN_C.has(c)) return 'east_asian';
    }
  }
  return 'caucasian';
 }
 function guessEthnicityFromFirstName(n){
  if(!n) return 'caucasian';
  var clean=n.replace(/[^A-Za-z]/g,''); if(!clean) return 'caucasian';
  var c=clean[0].toUpperCase()+clean.slice(1).toLowerCase();
  if(NAMES_MIDDLE_EASTERN_C.has(c)) return 'middle_eastern';
  if(NAMES_BLACK_C.has(c)) return 'black';
  if(NAMES_HISPANIC_C.has(c)) return 'hispanic';
  if(NAMES_SOUTH_ASIAN_C.has(c)) return 'south_asian';
  if(NAMES_EAST_ASIAN_C.has(c)) return 'east_asian';
  return 'caucasian';
 }
 function workerRow(name, role, detail, opts){
  opts = opts || {};
  var band = roleBand(role||'');
  var w = el('div','worker');
  if(band.band) w.dataset.roleBand = band.band;
  var initials = (name||'?').split(' ').map(function(s){return (s[0]||'').toUpperCase()}).join('').substring(0,2);
  var av = el('div','av',initials);
  // Headshot insertion removed 2026-04-28. The .av element stays as
  // a monogram-initials avatar.
  w.appendChild(av);
  var info = el('div','info');
  var nm = el('div','nm', name||'?');
  if(opts.endorsed){
    nm.appendChild(el('span','boost-chip',opts.endorsed));
  }
  info.appendChild(nm);
  var why = el('div','why');
  if(band.label){
    var pill = document.createElement('span'); pill.className='role-pill';
    pill.dataset.rb = band.band;
    pill.textContent = band.label;
    why.appendChild(pill);
  }
  why.appendChild(document.createTextNode(detail||''));
  info.appendChild(why);
  w.appendChild(info);
  if(opts.score){
    w.appendChild(el('div','score', opts.score));
  }
  return w;
 }
 function el(tag, cls, text){
  var e=document.createElement(tag);
  if(cls) e.className=cls;
@ -191,6 +366,9 @@ window.addEventListener('load',function(){
  loadChapter3();
  loadChapter4();
  loadChapter5();
  loadChapter6();
  loadChapter7();
  loadChapter8();
 });
 // ─── Chapter 1 ────────────────────────────────────────────
@ -306,6 +484,30 @@ function loadChapter4(){
    addr.style.cssText='color:#8b949e;font-size:12px;margin-top:2px';
    card.appendChild(addr);
    // Contractor names link to the full /contractor profile page —
    // heat map, project index, history, 12 awaiting public-data
    // sources. The staffer click-through J asked for.
    if(p.contact_1_name || p.contact_2_name){
      var contractors=document.createElement('div');
      contractors.style.cssText='color:#8b949e;font-size:12px;margin-top:4px';
      contractors.appendChild(document.createTextNode('Contractors: '));
      var seen=[];
      [p.contact_1_name, p.contact_2_name].forEach(function(n,i){
        if(!n || seen.indexOf(n)>=0) return;
        seen.push(n);
        if(seen.length>1) contractors.appendChild(document.createTextNode(' · '));
        var a=document.createElement('a');
        a.href=P+'/contractor?name='+encodeURIComponent(n);
        a.target='_blank';
        a.rel='noopener';
        a.style.cssText='color:#58a6ff;text-decoration:none;border-bottom:1px dotted #58a6ff44';
        a.title='Open full contractor profile';
        a.textContent=n;
        contractors.appendChild(a);
      });
      card.appendChild(contractors);
    }
    card.appendChild(el('div','step-label','STEP 1 · Derive staffing need'));
    var s1=el('div','step-body');
    s1.appendChild(document.createTextNode('Industry heuristic: ~1 worker per $150K of permit cost, capped 2-8. Resulting contract: '));
@ -321,21 +523,13 @@ function loadChapter4(){
    var list=document.createElement('div');list.style.marginTop='6px';
    (prop.candidates||[]).slice(0,5).forEach(function(cand,i){
-      var w=el('div','worker');
+      var detail = cand.doc_id+' · '+(cand.playbook_boost>0?'boosted +'+cand.playbook_boost.toFixed(3)+' by memory · ':'')+'semantic score '+(cand.score||0).toFixed(3);
-      var initials=(cand.name||'?').split(' ').map(function(s){return (s[0]||'').toUpperCase()}).join('').substring(0,2);
+      var endorsed = (cand.playbook_boost||0) > 0
-      w.appendChild(el('div','av',initials));
+        ? 'Endorsed · '+((cand.playbook_citations||[]).length)+' past fill'+((cand.playbook_citations||[]).length!==1?'s':'')
-      var info=el('div','info');
+        : null;
-      var nm=el('div','nm',cand.name||cand.doc_id||'?');
+      list.appendChild(workerRow(cand.name||cand.doc_id||'?', prop.role||'', detail, {
-      if((cand.playbook_boost||0)>0){
+        endorsed: endorsed, score: '#'+(i+1)
-        var ncit=(cand.playbook_citations||[]).length;
+      }));
        nm.appendChild(el('span','boost-chip','Endorsed · '+ncit+' past fill'+(ncit!==1?'s':'')));
      }
      info.appendChild(nm);
      var why=cand.doc_id+' · '+(cand.playbook_boost>0?'boosted +'+cand.playbook_boost.toFixed(3)+' by memory · ':'')+'semantic score '+(cand.score||0).toFixed(3);
      info.appendChild(el('div','why',why));
      w.appendChild(info);
      w.appendChild(el('div','score','#'+(i+1)));
      list.appendChild(w);
    });
    card.appendChild(list);
@ -407,7 +601,182 @@ function loadChapter5(){
  });
 }
-// ─── Chapter 6 ────────────────────────────────────────────
+// ─── Chapter 6 — per-staffer hot-swap ─────────────────────
 function loadChapter6(){
  apiGet('/staffers').then(function(r){
    var host=document.getElementById('ch6-staffers');host.textContent='';
    var staffers=(r&&r.staffers)||[];
    if(!staffers.length){
      host.appendChild(el('div','err','No staffer roster — /staffers returned empty.'));
      return;
    }
    var grid=document.createElement('div'); grid.className='grid'; grid.style.gridTemplateColumns='repeat(auto-fit,minmax(280px,1fr))';
    staffers.forEach(function(s){
      var card=el('div','card accent-b');
      var name=el('div',null,s.name);
      name.style.cssText='font-size:18px;font-weight:700;color:#e6edf3;letter-spacing:-0.3px';
      card.appendChild(name);
      var role=el('div',null,s.display||'');
      role.style.cssText='font-size:11px;color:#545d68;text-transform:uppercase;letter-spacing:1.2px;margin-top:2px';
      card.appendChild(role);
      var ter=el('div',null,'Territory: '+s.territory.state+' · '+s.territory.cities.slice(0,3).join(', ')+'…');
      ter.style.cssText='color:#8b949e;font-size:12px;margin-top:8px';
      card.appendChild(ter);
      var greet=el('div',null,s.greeting||'');
      greet.style.cssText='color:#c9d1d9;font-size:11px;margin-top:6px;line-height:1.5;border-top:1px dashed #1f2631;padding-top:6px';
      card.appendChild(greet);
      grid.appendChild(card);
    });
    host.appendChild(grid);
    var narr=el('div','narr');
    narr.appendChild(el('strong',null,'What this means for a staffer. '));
    narr.appendChild(document.createTextNode('Same query — "forklift operators" — returns 89 Indiana workers when Devon is acting, 16 Wisconsin workers when Aisha is acting, 167 Illinois workers when Maria is acting. The MEMORY panel relabels itself with whoever\'s viewing. The corpus stays intact; the relevance gradient is per coordinator. As they each accumulate fills, their slice of the playbook compounds independently.'));
    host.appendChild(narr);
  }).catch(function(e){
    var h=document.getElementById('ch6-staffers');h.textContent='';h.appendChild(el('div','err','Staffer roster unavailable: '+(e.message||e)));
  });
 }
 // ─── Chapter 7 — Construction Activity Signal Engine ──────
 function loadChapter7(){
  Promise.all([
    api('/intelligence/profiler_index',{limit:200}),
  ]).then(function(rs){
    var prof=rs[0]||{};
    var rows=prof.contractors||[];
    var host=document.getElementById('ch7-signal');host.textContent='';
    // Aggregate basket
    var byTicker={};
    rows.forEach(function(r){
      var ts=(r.tickers&&r.tickers.direct?r.tickers.direct:[]).concat(r.tickers&&r.tickers.associated?r.tickers.associated:[]);
      ts.forEach(function(t){
        if(!t||!t.ticker) return;
        if(!byTicker[t.ticker]) byTicker[t.ticker]={ticker:t.ticker,count:0,kinds:new Set()};
        byTicker[t.ticker].count++;
        byTicker[t.ticker].kinds.add(t.via);
      });
    });
    var basket=Object.values(byTicker).sort(function(a,b){return b.count-a.count});
    var attribCost=0;
    rows.forEach(function(r){
      var ts=(r.tickers&&r.tickers.direct?r.tickers.direct:[]).concat(r.tickers&&r.tickers.associated?r.tickers.associated:[]);
      if(ts.length>0) attribCost += (r.total_cost||0);
    });
    var totalAttrib = basket.reduce(function(s,b){return s+b.count},0);
    if(!basket.length){
      host.appendChild(el('div','loading','No public-issuer attributions in this view yet.'));
      return;
    }
    // Top-line metric strip
    var grid=document.createElement('div');grid.className='grid';
    var c1=el('div','card accent-g');
    var b1=el('div',null,basket.length); b1.style.cssText='font-size:30px;font-weight:800;color:#3fb950;line-height:1';
    c1.appendChild(b1);
    var l1=el('div',null,'Public issuers in scope'); l1.style.cssText='font-size:10px;color:#545d68;text-transform:uppercase;letter-spacing:1.2px;margin-top:8px;font-weight:600';
    c1.appendChild(l1);
    var s1=el('div',null,totalAttrib+' attribution edges across the contractor graph'); s1.style.cssText='font-size:12px;color:#8b949e;margin-top:4px';
    c1.appendChild(s1);
    grid.appendChild(c1);
    var c2=el('div','card accent-b');
    var bav = attribCost>=1e9?'$'+(attribCost/1e9).toFixed(2)+'B':attribCost>=1e6?'$'+(attribCost/1e6).toFixed(0)+'M':'$'+Math.round(attribCost/1e3)+'K';
    var b2=el('div',null,bav); b2.style.cssText='font-size:30px;font-weight:800;color:#58a6ff;line-height:1';
    c2.appendChild(b2);
    var l2=el('div',null,'Attributed build value'); l2.style.cssText='font-size:10px;color:#545d68;text-transform:uppercase;letter-spacing:1.2px;margin-top:8px;font-weight:600';
    c2.appendChild(l2);
    var s2=el('div',null,'Permits with at least one wired public-issuer thread'); s2.style.cssText='font-size:12px;color:#8b949e;margin-top:4px';
    c2.appendChild(s2);
    grid.appendChild(c2);
    var c3=el('div','card accent-l');
    var b3=el('div',null,rows.length); b3.style.cssText='font-size:30px;font-weight:800;color:#bc8cff;line-height:1';
    c3.appendChild(b3);
    var l3=el('div',null,'Contractors indexed'); l3.style.cssText='font-size:10px;color:#545d68;text-transform:uppercase;letter-spacing:1.2px;margin-top:8px;font-weight:600';
    c3.appendChild(l3);
    var s3=el('div',null,'Each is also a heat map of where they work'); s3.style.cssText='font-size:12px;color:#8b949e;margin-top:4px';
    c3.appendChild(s3);
    grid.appendChild(c3);
    host.appendChild(grid);
    // Top issuer table
    var tHdr=document.createElement('div');tHdr.style.cssText='color:#545d68;font-size:11px;text-transform:uppercase;letter-spacing:1.4px;font-weight:600;margin:14px 0 8px';
    tHdr.textContent='Top public issuers attributable in this view';
    host.appendChild(tHdr);
    basket.slice(0,8).forEach(function(b){
      var row=el('div','row');
      var left=document.createElement('div');left.style.flex='1';left.style.minWidth='0';
      var tk=el('div','title',b.ticker);
      tk.style.cssText+='font-family:ui-monospace,monospace;color:#3fb950';
      left.appendChild(tk);
      var kinds=Array.from(b.kinds);
      var meta=el('div','meta',b.count+' attribution'+(b.count===1?'':'s')+' · '+kinds.join('+'));
      left.appendChild(meta);
      row.appendChild(left);
      var right=document.createElement('div');right.style.cssText='font-size:11px;color:#58a6ff';
      var a=document.createElement('a');a.href=P+'/profiler';a.target='_blank';a.style.color='#58a6ff';a.style.textDecoration='none';
      a.textContent='see in profiler →';
      right.appendChild(a);
      row.appendChild(right);
      host.appendChild(row);
    });
    var narr=el('div','narr');
    narr.appendChild(el('strong',null,'What this means for the business. '));
    narr.appendChild(document.createTextNode('The data corpus is also a market-signal engine. When a contractor co-files permits with a public company, that contractor inherits the ticker as an associated indicator. Permit volume changes precede earnings calls by months. As we add cities (NYC DOB next, then LA / Houston / Boston) the network compounds — and we own a piece of the signal that nobody else has.'));
    host.appendChild(narr);
  }).catch(function(e){
    var h=document.getElementById('ch7-signal');h.textContent='';h.appendChild(el('div','err','Signal engine unavailable: '+(e.message||e)));
  });
 }
 // ─── Chapter 8 — Triage in one shot ───────────────────────
 function loadChapter8(){
  api('/intelligence/chat',{message:'Marcus running late site 4422'}).then(function(d){
    var host=document.getElementById('ch8-triage');host.textContent='';
    if(d.type!=='triage'){
      host.appendChild(el('div','err','Triage route did not fire. Got type=' + (d.type||'?')));
      return;
    }
    // Worker card
    var wc=el('div','card accent-r');
    var lbl=el('div',null,'⚠ TRIAGE EVENT'); lbl.style.cssText='font-size:10px;color:#f85149;text-transform:uppercase;letter-spacing:1.2px;font-weight:700;margin-bottom:8px';
    wc.appendChild(lbl);
    var nm=el('div',null,d.worker.name); nm.style.cssText='font-size:18px;font-weight:700;color:#e6edf3';
    wc.appendChild(nm);
    var loc=el('div',null,(d.worker.role||'?')+' · '+(d.worker.city||'')+', '+(d.worker.state||''));
    loc.style.cssText='font-size:12px;color:#8b949e;margin-top:2px';
    wc.appendChild(loc);
    var stats=document.createElement('div');stats.style.cssText='display:flex;gap:14px;font-size:11px;color:#8b949e;margin-top:8px;flex-wrap:wrap';
    [['Reliability',Math.round((d.worker.rel||0)*100)+'%'],['Responsiveness',Math.round((d.worker.resp||0)*100)+'%'],['Availability',Math.round((d.worker.avail||0)*100)+'%']].forEach(function(p){
      var s=document.createElement('span');
      var l=document.createElement('span');l.textContent=p[0]+': ';
      var b=document.createElement('b');b.style.color='#e6edf3';b.textContent=p[1];
      s.appendChild(l);s.appendChild(b);stats.appendChild(s);
    });
    wc.appendChild(stats);
    host.appendChild(wc);
    // Draft SMS
    var smsLabel=el('div',null,'DRAFT SMS — TO CLIENT'); smsLabel.style.cssText='font-size:10px;color:#d29922;text-transform:uppercase;letter-spacing:1.2px;font-weight:700;margin:14px 0 4px';
    host.appendChild(smsLabel);
    var smsBox=el('div',null,d.draft_sms||'');
    smsBox.style.cssText='background:#0d1117;border:1px solid #21262d;border-radius:6px;padding:10px 12px;font-family:ui-monospace,monospace;font-size:12px;color:#e6edf3;line-height:1.5;white-space:pre-wrap';
    host.appendChild(smsBox);
    // Backfills
    if((d.backfills||[]).length){
      var bfHdr=document.createElement('div');bfHdr.style.cssText='font-size:11px;color:#3fb950;text-transform:uppercase;letter-spacing:1.2px;font-weight:600;margin:14px 0 8px';
      bfHdr.textContent='✓ '+d.backfills.length+' local '+(d.worker.role||'workers')+' available — sorted by responsiveness';
      host.appendChild(bfHdr);
      d.backfills.slice(0,5).forEach(function(c){
        var detail=(c.role||'?')+' · '+(c.city||'')+', '+(c.state||'')+' · rel '+Math.round((c.rel||0)*100)+'% · resp '+Math.round((c.resp||0)*100)+'%';
        host.appendChild(workerRow(c.name||'?', c.role||'', detail));
      });
    }
    var narr=el('div','narr');
    narr.appendChild(el('strong',null,'What this means for a coordinator. '));
    narr.appendChild(document.createTextNode('A normal afternoon: text rolls in, coordinator opens 3 tabs to look up the worker, checks the bench by hand, drafts a message. 20 minutes. Here: the system pulled the profile, scored attendance, surfaced 5 same-role same-geo backfills sorted by who actually answers their phone, and pre-wrote the client-facing SMS. The coordinator clicks send. ' + d.duration_ms + 'ms.'));
    host.appendChild(narr);
  }).catch(function(e){
    var h=document.getElementById('ch8-triage');h.textContent='';h.appendChild(el('div','err','Triage demo unavailable: '+(e.message||e)));
  });
 }
 // ─── Chapter 9 (was 6) — Try it yourself ──────────────────
 function runTry(){
  var q=document.getElementById('try-q').value.trim();if(!q)return;
  var btn=document.getElementById('try-btn'),out=document.getElementById('try-out');
@ -437,23 +806,16 @@ function runTry(){
    var workers=d.sql_results||d.vector_results||d.results||[];
    workers.slice(0,5).forEach(function(w,i){
      var row=el('div','worker');
      var nm=w.name||(w.text||'').split('—')[0].trim()||w.doc_id||'?';
      var initials=nm.split(' ').map(function(s){return (s[0]||'').toUpperCase()}).join('').substring(0,2);
      row.appendChild(el('div','av',initials));
      var info=el('div','info');
      var n=el('div','nm',nm);
      if((w.playbook_boost||0)>0){
        n.appendChild(el('span','boost-chip','Endorsed · '+((w.playbook_citations||[]).length||'?')+' past fill(s)'));
      }
      info.appendChild(n);
      var bits=[];
      if(w.role) bits.push(w.role);
      if(w.city&&w.state) bits.push(w.city+', '+w.state);
      if(w.rel!==undefined) bits.push('reliability '+Math.round(w.rel*100)+'%');
      if(w.avail!==undefined) bits.push('availability '+Math.round(w.avail*100)+'%');
-      info.appendChild(el('div','why',bits.join(' · ')||'AI semantic match'));
+      var endorsed = (w.playbook_boost||0) > 0
-      row.appendChild(info);
+        ? 'Endorsed · '+((w.playbook_citations||[]).length||'?')+' past fill(s)'
        : null;
      var row = workerRow(nm, w.role||'', bits.join(' · ')||'AI semantic match', { endorsed: endorsed });
      row.appendChild(el('div','score','#'+(i+1)));
      card.appendChild(row);
    });
--- a/mcp-server/contractor.html
+++ b/mcp-server/contractor.html
@ -0,0 +1,606 @@
 <!DOCTYPE html>
 <html><head>
 <meta charset="utf-8"><meta name="viewport" content="width=device-width,initial-scale=1">
 <title>Contractor Profile · Staffing Co-Pilot</title>
 <link rel="stylesheet" href="https://unpkg.com/leaflet@1.9.4/dist/leaflet.css">
 <script src="https://unpkg.com/leaflet@1.9.4/dist/leaflet.js"></script>
 <style>
 *{margin:0;padding:0;box-sizing:border-box}
 html,body{overflow-x:hidden}
 body{font-family:'Inter',-apple-system,system-ui,sans-serif;background:#090c10;color:#b0b8c4;font-size:14px;line-height:1.6}
 .bar{background:#0d1117;padding:0 24px;height:56px;border-bottom:1px solid #171d27;display:flex;justify-content:space-between;align-items:center}
 .bar h1{font-size:14px;font-weight:600;color:#e6edf3}
 .bar a{color:#545d68;text-decoration:none;font-size:12px;padding:6px 14px;border-radius:6px}
 .bar a:hover{color:#e6edf3;background:#161b22}
 .content{max-width:1100px;margin:0 auto;padding:24px 20px 40px}
 .search-box{background:#0d1117;border:1px solid #21262d;border-radius:10px;padding:16px;margin-bottom:24px;display:flex;gap:10px}
 .search-box input{flex:1;padding:12px 16px;background:#161b22;border:1px solid #21262d;border-radius:8px;color:#e6edf3;font-size:14px;outline:none}
 .search-box input:focus{border-color:#388bfd}
 .search-box button{padding:12px 24px;background:#1f6feb;border:none;border-radius:8px;color:#fff;font-weight:600;cursor:pointer}
 .hero{background:#0d1117;border:1px solid #171d27;border-radius:12px;padding:24px;margin-bottom:16px}
 .hero h2{color:#e6edf3;font-size:22px;font-weight:700;letter-spacing:-0.5px;margin-bottom:6px}
 .hero .ticker-row{display:flex;align-items:center;gap:10px;margin-top:10px;flex-wrap:wrap}
 .hero .ticker{font-family:ui-monospace,SFMono-Regular,monospace;background:#161b22;padding:4px 10px;border-radius:6px;color:#3fb950;border:1px solid #3fb95066;font-weight:600;font-size:12px}
 .hero .meta{font-size:12px;color:#8b949e}
 .grid{display:grid;grid-template-columns:repeat(auto-fit,minmax(320px,1fr));gap:14px}
 .card{background:#0d1117;border:1px solid #171d27;border-radius:10px;padding:16px}
 .card h3{font-size:11px;color:#545d68;text-transform:uppercase;letter-spacing:1.2px;margin-bottom:10px;font-weight:600}
 .card .big{font-size:24px;font-weight:700;color:#e6edf3;letter-spacing:-0.5px;margin-bottom:4px}
 .card .sub{font-size:11px;color:#8b949e;line-height:1.5}
 .card a{color:#58a6ff;text-decoration:none;font-size:11px}
 .row{display:flex;justify-content:space-between;align-items:baseline;padding:6px 0;border-bottom:1px dashed #1f2631;font-size:11px}
 .row:last-child{border:none}
 .row .l{color:#8b949e}
 .row .v{color:#e6edf3;font-family:ui-monospace,monospace;font-variant-numeric:tabular-nums}
 .chip{display:inline-block;padding:3px 8px;border-radius:9px;font-size:10px;font-weight:600;margin-right:6px;margin-bottom:4px}
 .ld{color:#3d444d;text-align:center;padding:60px;font-size:13px}
 .empty{color:#545d68;font-size:11px;font-style:italic;line-height:1.5}
 .wide{grid-column:1/-1}
 .heatmap{height:380px;border-radius:8px;border:1px solid #1f2631;overflow:hidden;margin-top:10px}
 .heatmap .leaflet-container{background:#0a0d12}
 .timeline{margin-top:10px;display:flex;align-items:flex-end;gap:2px;height:80px;padding:6px 0;border-bottom:1px solid #1f2631}
 .timeline .tbar{flex:1;background:#1f6feb;min-height:2px;border-radius:2px 2px 0 0;position:relative;cursor:help}
 .timeline .tbar:hover{background:#58a6ff}
 .timeline-axis{display:flex;justify-content:space-between;font-size:10px;color:#545d68;padding-top:4px;font-family:ui-monospace,monospace}
 .placeholder-grid{display:grid;grid-template-columns:repeat(auto-fit,minmax(280px,1fr));gap:10px;margin-top:14px}
 .ph-card{background:#0a0d12;border:1px dashed #21262d;border-radius:8px;padding:12px 14px;position:relative}
 .ph-card h4{font-size:11px;color:#8b949e;font-weight:600;margin-bottom:4px;display:flex;align-items:center;gap:6px}
 .ph-card h4 .badge{font-size:9px;padding:2px 6px;border-radius:8px;background:#161b22;color:#d29922;border:1px solid #d2992244;font-weight:600;letter-spacing:0.5px;text-transform:uppercase}
 .ph-card .why{font-size:11px;color:#e6edf3;line-height:1.5;margin-bottom:6px}
 .ph-card .would{font-size:10px;color:#545d68;font-family:ui-monospace,monospace;line-height:1.5;border-top:1px dashed #1f2631;padding-top:6px;margin-top:6px}
 .section-label{font-size:10px;color:#545d68;text-transform:uppercase;letter-spacing:1.4px;font-weight:600;margin:24px 0 8px}
@media(max-width:640px){.bar{padding:0 14px}.content{padding:14px}.hero{padding:16px}.hero h2{font-size:18px}.card{padding:12px}}
 </style>
 </head><body>
 <div class="bar">
  <h1>Staffing Co-Pilot · Contractor Profile</h1>
  <a href="/">← Dashboard</a>
 </div>
 <div class="content">
  <div class="search-box">
    <input id="q" type="text" placeholder="Type a contractor name (e.g., Turner Construction Company)" onkeydown="if(event.key==='Enter')lookup()">
    <button onclick="lookup()">Look up</button>
  </div>
  <div id="out"><div class="ld">Type a name above to load the full portfolio across every wired data source.</div></div>
 </div>
 <script>
 function $(id){return document.getElementById(id)}
 // Path prefix detection — devop.live serves this page under /lakehouse,
 // localhost:3700 serves it at root. URL rewrites must preserve whatever
 // prefix the user reached the page through, otherwise the back-link and
 // browser refresh break.
 var P=location.pathname.indexOf('/lakehouse')>=0?'/lakehouse':'';
 // Bootstrap from URL: /contractor?name=Turner+Construction
 window.addEventListener('load', function(){
  var name = new URLSearchParams(location.search).get('name');
  if(name){
    $('q').value = name;
    lookup();
  }
  // Back link respects the prefix too
  var back=document.querySelector('.bar a');
  if(back) back.href=P+'/';
 });
 function lookup(){
  var name = $('q').value.trim();
  if(!name){ $('out').textContent = ''; return; }
  history.replaceState({}, '', P+'/contractor?name='+encodeURIComponent(name));
  var out = $('out');
  out.textContent = '';
  var ld = document.createElement('div');
  ld.className = 'ld';
  ld.textContent = 'Pulling OSHA, SEC, Stooq, Chicago history, USASpending… (~5-10s on cold cache)';
  out.appendChild(ld);
  fetch(P+'/intelligence/contractor_profile',{
    method:'POST',
    headers:{'Content-Type':'application/json'},
    body:JSON.stringify({name:name})
  }).then(function(r){return r.json()}).then(function(d){
    render(d);
  }).catch(function(e){
    out.textContent = '';
    var err = document.createElement('div');
    err.className = 'ld';
    err.style.color = '#f85149';
    err.textContent = 'profile failed: '+e.message;
    out.appendChild(err);
  });
 }
 function render(d){
  var out = $('out');
  out.textContent = '';
  // ─── Hero — name, ticker, parent ─────────────
  var hero = document.createElement('div');
  hero.className = 'hero';
  var h2 = document.createElement('h2');
  h2.textContent = d.display_name;
  hero.appendChild(h2);
  var sub = document.createElement('div');
  sub.className = 'meta';
  sub.textContent = 'Internal ticker: '+(d.ticker||'?')+' · profile generated '+new Date(d.generated_at).toLocaleTimeString();
  hero.appendChild(sub);
  var trow = document.createElement('div');
  trow.className = 'ticker-row';
  // Direct ticker
  var s = d.stock;
  if(s && s.status==='ok'){
    var tk = document.createElement('span');
    tk.className = 'ticker';
    tk.textContent = s.ticker;
    trow.appendChild(tk);
    var px = document.createElement('span');
    px.className = 'meta';
    px.textContent = (s.company_name||'')+(s.exchange?' · '+s.exchange:'')+(s.price?' · $'+s.price.toFixed(2):'');
    if(s.day_change_pct!=null && !isNaN(s.day_change_pct)){
      var ch = (s.day_change_pct>=0?'+':'')+s.day_change_pct.toFixed(2)+'%';
      var chSpan = document.createElement('span');
      chSpan.style.color = s.day_change_pct>=0?'#3fb950':'#f85149';
      chSpan.style.marginLeft = '6px';
      chSpan.textContent = ch;
      px.appendChild(chSpan);
    }
    trow.appendChild(px);
  } else {
    var noTk = document.createElement('span');
    noTk.className = 'meta';
    noTk.textContent = 'Private — no direct US ticker';
    trow.appendChild(noTk);
  }
  // Parent link
  var pl = d.parent_link;
  if(pl && pl.status==='ok'){
    var arrow = document.createElement('span');
    arrow.className = 'meta';
    arrow.style.color = '#545d68';
    arrow.textContent = ' → parent ';
    trow.appendChild(arrow);
    var pTk = document.createElement('span');
    pTk.className = 'ticker';
    pTk.style.color = '#d29922';
    pTk.style.borderColor = '#d2992266';
    pTk.textContent = pl.parent_ticker || '?';
    pTk.title = pl.link_source || '';
    trow.appendChild(pTk);
    var pName = document.createElement('span');
    pName.className = 'meta';
    pName.textContent = pl.parent_name+(pl.parent_exchange?' · '+pl.parent_exchange:'')+(pl.parent_country?' · '+pl.parent_country:'');
    trow.appendChild(pName);
  } else if(pl && pl.status==='no_link'){
    var pp = document.createElement('span');
    pp.className = 'meta';
    pp.style.fontStyle = 'italic';
    pp.textContent = ' · '+(pl.reason||'no public parent identified');
    trow.appendChild(pp);
  }
  hero.appendChild(trow);
  out.appendChild(hero);
  // ─── Grid of cards ─────────────────────────────
  var grid = document.createElement('div');
  grid.className = 'grid';
  // OSHA
  var oCard = card('OSHA SAFETY HISTORY (NATIONAL)');
  var osha = d.osha || {};
  if(osha.status==='ok'){
    big(oCard, osha.inspection_count + ' inspections', 'most recent '+(osha.most_recent_date||'?'));
    rowEl(oCard, 'States seen', (osha.states_seen||[]).join(', ') || '?');
    rowEl(oCard, 'Most recent', osha.most_recent_date||'?');
    if(osha.recent_inspections && osha.recent_inspections.length){
      var rep = document.createElement('div');
      rep.style.marginTop = '8px';
      rep.style.fontSize = '10px';
      rep.style.color = '#545d68';
      rep.textContent = 'Recent inspections:';
      oCard.appendChild(rep);
      osha.recent_inspections.slice(0,5).forEach(function(i){
        var r = document.createElement('div');
        r.style.fontSize = '10px';
        r.style.color = '#8b949e';
        r.style.fontFamily = 'ui-monospace,monospace';
        r.style.padding = '2px 0';
        var a = document.createElement('a');
        a.href = i.detail_url;
        a.target = '_blank';
        a.textContent = i.id;
        r.appendChild(a);
        r.appendChild(document.createTextNode(' · '+i.date+' · '+i.state+' · '+i.type+' · '+i.scope));
        oCard.appendChild(r);
      });
    }
  } else if(osha.status==='no_match'){
    big(oCard, 'No inspections', 'clean record');
  } else {
    empty(oCard, 'OSHA fetch error: '+(osha.error||'unknown'));
  }
  grid.appendChild(oCard);
  // Chicago history
  var hCard = card('CHICAGO PERMIT HISTORY (24mo + LIFETIME)');
  var hist = d.history || {};
  if(hist.status==='ok'){
    big(hCard, hist.permits_historical_total+' permits all-time',
        hist.permits_last_180d+' in last 180d · '+hist.permits_last_24mo+' in 24mo · trend: '+hist.trend);
    rowEl(hCard, 'Cost (24mo)', hist.total_cost_last_24mo>=1e6 ? '$'+(hist.total_cost_last_24mo/1e6).toFixed(1)+'M' : '$'+Math.round(hist.total_cost_last_24mo/1e3)+'K');
    if(hist.recent_permits && hist.recent_permits.length){
      var rh = document.createElement('div');
      rh.style.marginTop = '8px';
      rh.style.fontSize = '10px';
      rh.style.color = '#545d68';
      rh.textContent = 'Recent Chicago permits:';
      hCard.appendChild(rh);
      hist.recent_permits.slice(0,5).forEach(function(p){
        var r = document.createElement('div');
        r.style.fontSize = '10px';
        r.style.color = '#8b949e';
        r.style.padding = '2px 0';
        r.textContent = '· '+(p.date||'?')+' · '+p.work_type+' · $'+(p.cost||0).toLocaleString()+' · '+p.address;
        hCard.appendChild(r);
      });
    }
  } else {
    empty(hCard, 'Chicago history error');
  }
  grid.appendChild(hCard);
  // Federal contracts
  var fCard = card('FEDERAL CONTRACTS (USASpending.gov)');
  var fed = d.federal || {};
  if(fed.status==='ok' && fed.total_awards_count>0){
    var dollars = fed.total_awards_value>=1e9 ? '$'+(fed.total_awards_value/1e9).toFixed(2)+'B'
                : fed.total_awards_value>=1e6 ? '$'+(fed.total_awards_value/1e6).toFixed(1)+'M'
                : '$'+Math.round(fed.total_awards_value/1e3)+'K';
    big(fCard, dollars, fed.total_awards_count+' awards · most recent '+(fed.most_recent_award_date||'?'));
    if(fed.top_agencies && fed.top_agencies.length){
      var ta = document.createElement('div');
      ta.style.marginTop = '6px';
      ta.style.fontSize = '10px';
      ta.style.color = '#545d68';
      ta.textContent = 'Top awarding agencies:';
      fCard.appendChild(ta);
      fed.top_agencies.forEach(function(a){
        var r = document.createElement('div');
        r.style.fontSize = '11px';
        r.style.color = '#8b949e';
        r.style.padding = '3px 0';
        var dollars2 = a.value>=1e6 ? '$'+(a.value/1e6).toFixed(1)+'M' : '$'+Math.round(a.value/1e3)+'K';
        r.textContent = '· '+a.agency+' — '+dollars2;
        fCard.appendChild(r);
      });
    }
    if(fed.source_url){
      var lnk = document.createElement('a');
      lnk.href = fed.source_url;
      lnk.target = '_blank';
      lnk.style.display = 'inline-block';
      lnk.style.marginTop = '8px';
      lnk.textContent = 'View on usaspending.gov ↗';
      fCard.appendChild(lnk);
    }
  } else if(fed.status==='no_match'){
    big(fCard, 'No federal contracts', 'on file under this name');
  } else {
    empty(fCard, 'usaspending error');
  }
  grid.appendChild(fCard);
  // Debarment + NLRB combined
  var rCard = card('DEBARMENT + LABOR ACTIONS');
  var deb = d.debarment || {};
  var nlrb = d.nlrb || {};
  rowEl(rCard, 'SAM.gov excluded', deb.status==='needs_setup' ? 'awaiting API key' : (deb.sam_excluded?'YES':'no'));
  rowEl(rCard, 'IDOL debarred', deb.status==='needs_setup' ? 'awaiting scrape' : (deb.idol_debarred?'YES':'no'));
  rowEl(rCard, 'NLRB cases', nlrb.status==='needs_setup' ? 'awaiting scrape' : (nlrb.total_cases||0));
  if(deb.status==='needs_setup' || nlrb.status==='needs_setup'){
    var dn = document.createElement('div');
    dn.className = 'empty';
    dn.style.marginTop = '8px';
    dn.textContent = 'Both sources pending wire-up: '+(deb.reason||nlrb.reason||'');
    rCard.appendChild(dn);
  }
  grid.appendChild(rCard);
  // ILSOS
  var iCard = card('CORPORATE REGISTRY (Illinois SoS)');
  var ilsos = d.ilsos || {};
  if(ilsos.status==='source_unreachable'){
    rowEl(iCard, 'Status', 'source blocked at our ASN');
    var en = document.createElement('div');
    en.className = 'empty';
    en.style.marginTop = '8px';
    en.textContent = ilsos.reason||'';
    iCard.appendChild(en);
  } else if(ilsos.status==='ok'){
    rowEl(iCard, 'Entity name', ilsos.entity_name||'?');
    rowEl(iCard, 'File #', ilsos.file_number||'?');
    rowEl(iCard, 'Status', ilsos.status_text||'?');
    rowEl(iCard, 'Formed', ilsos.formation_date||'?');
    rowEl(iCard, 'Registered agent', ilsos.registered_agent||'?');
  } else {
    empty(iCard, 'no ILSOS data');
  }
  grid.appendChild(iCard);
  out.appendChild(grid);
  // ─── Project Index summary — the staffer-facing build-signal score ──
  var pixHeader = document.createElement('div');
  pixHeader.className = 'section-label';
  pixHeader.textContent = '◆ Project Index — build-signal score';
  out.appendChild(pixHeader);
  var pixCard = document.createElement('div');
  pixCard.className = 'card wide';
  // Score is a simple weighted blend of the wired signals — designed to
  // be replaced with a real model once enough placeholders are wired.
  var hist2 = d.history || {};
  var pixScore = 0;
  var pixDrivers = [];
  if(hist2.permits_last_180d){ pixScore += Math.min(hist2.permits_last_180d * 5, 30); pixDrivers.push(hist2.permits_last_180d+' Chicago permits in 180d (+'+Math.min(hist2.permits_last_180d*5,30)+')'); }
  if(hist2.trend === 'rising'){ pixScore += 10; pixDrivers.push('permit trend rising (+10)'); }
  if(d.osha && d.osha.status==='ok' && d.osha.inspection_count>0){ pixScore -= Math.min(d.osha.inspection_count*5, 25); pixDrivers.push(d.osha.inspection_count+' OSHA inspections (-'+Math.min(d.osha.inspection_count*5,25)+')'); }
  if(d.federal && d.federal.status==='ok' && d.federal.total_awards_count>0){ pixScore += 15; pixDrivers.push('federally-vetted contractor (+15)'); }
  if(d.debarment && d.debarment.sam_excluded){ pixScore -= 50; pixDrivers.push('SAM.gov excluded (-50)'); }
  if(d.stock && d.stock.status==='ok'){ pixScore += 5; pixDrivers.push('public ticker (+5)'); }
  pixScore = Math.max(0, Math.min(100, 50 + pixScore));
  var pixColor = pixScore >= 70 ? '#3fb950' : pixScore >= 40 ? '#d29922' : '#f85149';
  var pixHero = document.createElement('div');
  pixHero.style.cssText = 'display:flex;align-items:baseline;gap:14px;margin-bottom:8px';
  var pixBig = document.createElement('span');
  pixBig.style.cssText = 'font-size:42px;font-weight:700;color:'+pixColor+';letter-spacing:-1px';
  pixBig.textContent = pixScore;
  pixHero.appendChild(pixBig);
  var pixLabel = document.createElement('span');
  pixLabel.style.cssText = 'font-size:12px;color:#8b949e';
  pixLabel.textContent = pixScore >= 70 ? 'Strong staffing partner — wired signals positive' : pixScore >= 40 ? 'Mixed signals — review drivers below' : 'Caution — wired signals negative';
  pixHero.appendChild(pixLabel);
  pixCard.appendChild(pixHero);
  if(pixDrivers.length){
    var pixDrv = document.createElement('div');
    pixDrv.style.cssText = 'font-size:11px;color:#8b949e;line-height:1.7;font-family:ui-monospace,monospace';
    pixDrv.textContent = pixDrivers.join(' · ');
    pixCard.appendChild(pixDrv);
  }
  var pixFoot = document.createElement('div');
  pixFoot.style.cssText = 'font-size:10px;color:#545d68;margin-top:8px;font-style:italic;line-height:1.5';
  pixFoot.textContent = 'Score is a placeholder weighted blend of the 6 wired signals above. Real ML model lands once 12 awaiting sources below ship — that gives the index 18 features instead of 6.';
  pixCard.appendChild(pixFoot);
  out.appendChild(pixCard);
  // ─── Heat map — every Chicago permit they're contact_1 or contact_2 on ─
  var mapHeader = document.createElement('div');
  mapHeader.className = 'section-label';
  mapHeader.textContent = '◆ Where they\'ve worked — Chicago permits, last 24 months';
  out.appendChild(mapHeader);
  var mapCard = document.createElement('div');
  mapCard.className = 'card wide';
  var mapDiv = document.createElement('div');
  mapDiv.className = 'heatmap';
  mapDiv.id = 'cmap';
  mapCard.appendChild(mapDiv);
  var mapHint = document.createElement('div');
  mapHint.style.cssText = 'font-size:11px;color:#545d68;margin-top:8px';
  mapHint.textContent = 'Loading geo from chicago_permits…';
  mapCard.appendChild(mapHint);
  out.appendChild(mapCard);
  // Plot the recent_permits embedded in the contractor profile (now
  // includes lat/lng/permit_id/description per the entity.ts change).
  // Color by cost: green <$100K, amber $100K-$1M, red ≥$1M.
  var permits = (hist2.recent_permits||[]).filter(function(p){return p.lat&&p.lng});
  if(!permits.length){
    mapHint.textContent = 'No geocoded permits in the contractor history (Socrata may not have lat/lng for these records).';
  } else {
    // Construct map only after the div is in the DOM; defer one tick.
    setTimeout(function(){
      var map = L.map('cmap', {zoomControl:true, attributionControl:false}).setView([41.88,-87.63], 11);
      L.tileLayer('https://{s}.basemaps.cartocdn.com/dark_all/{z}/{x}/{y}{r}.png',{maxZoom:19}).addTo(map);
      var bounds = [];
      var costs = permits.map(function(p){return Number(p.cost)||0});
      var maxCost = Math.max.apply(null, costs.concat([1]));
      permits.forEach(function(p){
        var c = Number(p.cost)||0;
        var radius = 4 + (c/maxCost)*16;
        var color = c >= 1000000 ? '#f85149' : c >= 100000 ? '#d29922' : '#3fb950';
        var marker = L.circleMarker([p.lat,p.lng],{radius:radius, color:color, weight:1, fillOpacity:0.55});
        // Build popup via DOM (no innerHTML — keeps the XSS hook happy)
        var pop = document.createElement('div');
        pop.style.cssText = 'font-family:ui-monospace,monospace;font-size:11px;color:#0a0d12;min-width:160px';
        var costRow = document.createElement('div');
        costRow.style.cssText = 'font-weight:700;margin-bottom:4px';
        costRow.textContent = '$'+c.toLocaleString()+' · '+(p.date||'?');
        pop.appendChild(costRow);
        var wt = document.createElement('div');
        wt.textContent = p.work_type||'?';
        pop.appendChild(wt);
        var addr = document.createElement('div');
        addr.style.color = '#545d68';
        addr.textContent = p.address||'?';
        pop.appendChild(addr);
        if(p.permit_id){
          var pid = document.createElement('div');
          pid.style.cssText = 'color:#545d68;margin-top:4px;font-size:10px';
          pid.textContent = 'permit '+p.permit_id;
          pop.appendChild(pid);
        }
        marker.bindPopup(pop);
        marker.addTo(map);
        bounds.push([p.lat, p.lng]);
      });
      if(bounds.length>1) map.fitBounds(bounds, {padding:[24,24]});
      mapHint.textContent = permits.length+' permits plotted · green <$100K, amber $100K-$1M, red ≥$1M · radius: relative cost';
    }, 50);
  }
  // ─── History timeline — monthly permit volume + cost trend ─────────
  if(hist2.recent_permits && hist2.recent_permits.length){
    var tlHeader = document.createElement('div');
    tlHeader.className = 'section-label';
    tlHeader.textContent = '◆ Activity timeline — Chicago permits by month';
    out.appendChild(tlHeader);
    var tlCard = document.createElement('div');
    tlCard.className = 'card wide';
    // Bucket by year-month
    var buckets = {};
    hist2.recent_permits.forEach(function(p){
      var d = (p.date||'').substring(0,7); // YYYY-MM
      if(!d) return;
      buckets[d] = buckets[d] || {count:0, cost:0};
      buckets[d].count++;
      buckets[d].cost += Number(p.cost)||0;
    });
    var months = Object.keys(buckets).sort();
    if(months.length){
      var maxC = Math.max.apply(null, months.map(function(m){return buckets[m].count}));
      var tl = document.createElement('div'); tl.className='timeline';
      months.forEach(function(m){
        var b = buckets[m];
        var bar = document.createElement('div'); bar.className='tbar';
        bar.style.height = Math.max(2, Math.round(b.count/maxC*72)) + 'px';
        bar.title = m+' · '+b.count+' permit'+(b.count===1?'':'s')+' · $'+Math.round(b.cost).toLocaleString();
        tl.appendChild(bar);
      });
      tlCard.appendChild(tl);
      var ax = document.createElement('div'); ax.className='timeline-axis';
      var first = document.createElement('span'); first.textContent = months[0];
      var last = document.createElement('span'); last.textContent = months[months.length-1];
      ax.appendChild(first); ax.appendChild(last);
      tlCard.appendChild(ax);
    }
    out.appendChild(tlCard);
  }
  // ─── 12 awaiting-source placeholders ──────────────────────────────
  // Each one names a real public data source that would feed the
  // build-signal index, with a one-line "why a staffer cares" framing
  // and a sample shape of what the panel would show once wired.
  var phHeader = document.createElement('div');
  phHeader.className = 'section-label';
  phHeader.textContent = '◆ 12 awaiting sources — what plugs in next';
  out.appendChild(phHeader);
  var phGrid = document.createElement('div');
  phGrid.className = 'placeholder-grid';
  PLACEHOLDERS.forEach(function(p){
    var c = document.createElement('div'); c.className='ph-card';
    var h = document.createElement('h4');
    var name = document.createElement('span'); name.textContent = p.name;
    var badge = document.createElement('span'); badge.className='badge'; badge.textContent='AWAITING';
    h.appendChild(name); h.appendChild(badge);
    c.appendChild(h);
    var why = document.createElement('div'); why.className='why'; why.textContent = p.why;
    c.appendChild(why);
    var would = document.createElement('div'); would.className='would';
    would.textContent = 'Would show: ' + p.would;
    c.appendChild(would);
    phGrid.appendChild(c);
  });
  out.appendChild(phGrid);
  // Roadmap footer
  var foot = document.createElement('div');
  foot.style.marginTop = '20px';
  foot.style.fontSize = '10px';
  foot.style.color = '#484f58';
  foot.style.lineHeight = '1.6';
  foot.textContent = 'Wired: OSHA Enforcement · SEC EDGAR + Stooq · Chicago Socrata permits (lat/lng) · USASpending.gov · curated parent-ticker map · ILSOS (datacenter ASN blocked). 12 awaiting sources above are real public datasets that would 3× the feature count of the build-signal index — each one labeled with the one-liner the staffer would ask before placing a worker.';
  out.appendChild(foot);
 }
 // Twelve real public data sources, framed in coordinator language.
 // Each is a placeholder; the panel renders them as "AWAITING" with a
 // description of what they'd add once wired. Order is roughly: highest
 // staffing-decision relevance first.
 var PLACEHOLDERS = [
  {
    name: 'DOL Wage & Hour (WHD)',
    why: 'Has this contractor stiffed workers before? WHD posts every back-wage settlement and unpaid-overtime case.',
    would: 'cases last 24mo · total back wages owed · status by state · most recent settlement date · whether the workers got paid',
  },
  {
    name: 'State Licensure Boards',
    why: 'Is the contractor legally allowed to do this work today, in this state?',
    would: 'license # · status (active / expired / suspended) · trade scope · expiration date · disciplinary history',
  },
  {
    name: 'Surety Bond Capacity',
    why: 'How big a job can this contractor actually take? Bond ceiling = upper bound on what they\'re bonded for.',
    would: 'bonding company · single-contract ceiling · aggregate cap · current utilization · recent bond denials',
  },
  {
    name: 'EPA ECHO Compliance',
    why: 'If a worker shows up to a site with hazmat issues, that\'s the staffing company\'s problem too.',
    would: 'facility-level violations · last enforcement action · pollutants · whether OSHA escalated',
  },
  {
    name: 'DOT/FMCSA Carrier Safety',
    why: 'For warehouses with on-site driving or carriers we cross-staff: crash rate, driver out-of-service rate, IFTA filings.',
    would: 'crash rate per million miles · driver OOS % · vehicle OOS % · safety rating · last compliance review',
  },
  {
    name: 'BBB Complaints + Rating',
    why: 'What do this contractor\'s own employees say happens to them? BBB aggregates complaints from workers and clients.',
    would: 'rating · complaint count last 36mo · complaint categories (pay, safety, ghosted) · response rate',
  },
  {
    name: 'PACER Civil Suits (Federal)',
    why: 'Are they being sued for FLSA, discrimination, or wrongful termination? Filings predate enforcement actions.',
    would: 'open suits · FLSA / Title VII / ADA breakdowns · counterparties · year-over-year filing rate',
  },
  {
    name: 'UCC Lien Filings',
    why: 'When a contractor stops paying suppliers, mechanics liens hit the public record. Cash-flow distress signal.',
    would: 'open liens · total face value · filers (suppliers, banks) · last filing · whether resolved',
  },
  {
    name: 'D&B / Credit Bureau',
    why: 'Will they pay our staffing invoices? D&B PAYDEX score is the standard.',
    would: 'PAYDEX (1-100) · days-beyond-terms · credit limit recommendation · UCC link · trade payment trend',
  },
  {
    name: 'State UI Employer Claims',
    why: 'Workforce stability proxy. A spike in unemployment claims at this employer = layoffs or churn we should know about.',
    would: 'claims filed against this employer last 12mo · approval rate · separation-reason breakdown',
  },
  {
    name: 'MSHA Mine Safety',
    why: 'For excavation, demolition, materials, aggregate — MSHA owns the citation history.',
    would: 'citations · S&S violations · most recent fatality / serious injury · pattern-of-violation flag',
  },
  {
    name: 'Registered Apprenticeships (DOL RAPIDS)',
    why: 'A contractor with active apprenticeship programs has built a workforce pipeline — different staffing partnership story than one without.',
    would: 'active programs · apprentice count · trades covered · graduation rate · ethnic/gender diversity reported',
  },
 ];
 function card(title){
  var c = document.createElement('div');
  c.className = 'card';
  var h = document.createElement('h3');
  h.textContent = title;
  c.appendChild(h);
  return c;
 }
 function big(c, value, sub){
  var b = document.createElement('div'); b.className='big'; b.textContent=value;
  var s = document.createElement('div'); s.className='sub'; s.textContent=sub;
  c.appendChild(b); c.appendChild(s);
 }
 function rowEl(c, label, value){
  var r = document.createElement('div'); r.className='row';
  var l = document.createElement('span'); l.className='l'; l.textContent=label;
  var v = document.createElement('span'); v.className='v'; v.textContent=value||'—';
  r.appendChild(l); r.appendChild(v); c.appendChild(r);
 }
 function empty(c, msg){
  var e = document.createElement('div'); e.className='empty'; e.textContent=msg;
  c.appendChild(e);
 }
 </script>
 </body></html>
--- a/mcp-server/entity.ts
+++ b/mcp-server/entity.ts
--- a/mcp-server/icon_recipes.ts
+++ b/mcp-server/icon_recipes.ts
@ -0,0 +1,123 @@
 // Visual filler iconography rendered through ComfyUI. Distinct from
 // role_scenes.ts (which renders portraits) — these are object/badge
 // style renders that fill dead space on worker cards: cert pills,
 // role-prop chips, hazard indicators, empty-state heroes.
 //
 // Layout on disk:
 //   data/icons_pool/{category}/{slug}.webp
 //
 // Cache invalidation:
 //   ICONS_VERSION mixes into the on-disk filename (slug includes
 //   version). Bump it after editing a recipe so prior renders are
 //   ignored on next view.
 export type IconCategory = "cert" | "role_prop" | "status" | "hazard" | "empty";
 export interface IconRecipe {
  slug: string;
  category: IconCategory;
  // Text label that appears next to / under the icon. The front-end
  // already renders this text in cert pills; the icon is supplementary.
  display: string;
  // Full diffusion prompt. Style guidance baked in. SDXL Turbo at 8
  // steps reliably produces clean macro photography, so default to
  // photographic prop shots over flat-vector illustrations (the model
  // hallucinates noise into flat-vector geometry at low step counts).
  prompt: string;
  // Negative prompt — what NOT to render. Crucial for icons because
  // SDXL likes to add hands/text/people unprompted.
  negative?: string;
 }
 // Default negative prompt baked into every icon render unless the
 // recipe overrides. Empirically, these terms are the top SDXL Turbo
 // off-style failures.
 export const DEFAULT_NEGATIVE =
  "people, hands, faces, blurry, low quality, watermark, signature, "
  + "logos, copyright, distorted text, garbled letters, multiple objects";
 // TODO J — review and tune the prompts here. Each one is what diffusion
 // sees verbatim. The visual decision: photographic prop shots (macro
 // photo of an actual badge / placard / sticker) vs flat-icon vector
 // style. Default below is photographic — matches the worker headshot
 // aesthetic. Flip a recipe to flat-vector by replacing "macro photograph"
 // with "flat icon illustration on solid color background, minimal vector".
 //
 // Visual cues that work well in SDXL Turbo at 8 steps:
 //   - "macro photograph", "isolated on plain background", "studio lighting"
 //   - Concrete colors ("orange and black warning diamond") not adjectives
 //   - Avoid: small text in the prompt (model garbles it), specific brand
 //     names (creates fake logos), detailed scene composition
 const CERT_ICONS: IconRecipe[] = [
  { slug: "osha-10", category: "cert", display: "OSHA-10",
    prompt: "macro photograph of a circular yellow safety badge with a black hard hat icon at center, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "osha-30", category: "cert", display: "OSHA-30",
    prompt: "macro photograph of a circular orange safety badge with a black hard hat icon at center, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "first-aid-cpr", category: "cert", display: "First Aid/CPR",
    prompt: "macro photograph of a small enamel pin badge featuring a bold red cross on a white circular background, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "hazmat", category: "cert", display: "Hazmat",
    prompt: "macro photograph of a HAZMAT warning placard, bold orange and black diamond shape with a flame icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "forklift", category: "cert", display: "Forklift",
    prompt: "macro photograph of a yellow industrial forklift safety badge with a forklift silhouette icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "reach-truck", category: "cert", display: "Reach Truck",
    prompt: "macro photograph of a navy blue industrial certification badge with a warehouse reach-truck silhouette icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "order-picker", category: "cert", display: "Order Picker",
    prompt: "macro photograph of a green industrial certification badge with a warehouse order-picker silhouette icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "lockout-tagout", category: "cert", display: "Lockout/Tagout",
    prompt: "macro photograph of a bright red padlock tag with a danger warning, hanging on a metal industrial valve, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "msds", category: "cert", display: "MSDS",
    prompt: "macro photograph of a folded chemical safety data sheet booklet with chemical hazard pictograms visible on cover, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "confined-space", category: "cert", display: "Confined Space",
    prompt: "macro photograph of a yellow confined space warning sign featuring a manhole entry icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "servsafe", category: "cert", display: "ServSafe",
    prompt: "macro photograph of a dark green food safety certification badge featuring a stylized chef hat icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "fire-safety", category: "cert", display: "Fire Safety",
    prompt: "macro photograph of a red enamel pin badge featuring a flame icon and a fire extinguisher silhouette, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "iso-9001", category: "cert", display: "ISO 9001",
    prompt: "macro photograph of a deep blue circular quality-management certification seal with embossed metallic ring, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
 ];
 // Role-band visual chips — small icons that go in the role pill area.
 // One per band, optional inline supplement to the existing colored pill.
 const ROLE_PROP_ICONS: IconRecipe[] = [
  { slug: "warehouse", category: "role_prop", display: "Warehouse",
    prompt: "macro photograph of a yellow hard hat with a high-visibility safety vest folded behind it, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "production", category: "role_prop", display: "Production",
    prompt: "macro photograph of a navy blue work shirt and protective safety glasses on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "trades", category: "role_prop", display: "Trades",
    prompt: "macro photograph of a leather work glove and a small adjustable wrench on a neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "driver", category: "role_prop", display: "Driver",
    prompt: "macro photograph of a navy delivery driver baseball cap and a clipboard manifest on a neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
  { slug: "lead", category: "role_prop", display: "Lead",
    prompt: "macro photograph of a tablet showing a bar chart and a high-vis vest folded beside it on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
 ];
 export const ICONS: Record<string, IconRecipe> = Object.fromEntries(
  [...CERT_ICONS, ...ROLE_PROP_ICONS].map((r) => [`${r.category}/${r.slug}`, r]),
 );
 // v2 — 256×256 canvas, intended to be displayed monochrome via CSS
 // `filter: grayscale(1)`. Smaller canvas, tighter crops, crisper at
 // 14px display size.
 export const ICONS_VERSION = "v2";
 // Map a free-form cert string from the data ("First Aid/CPR",
 // "OSHA-10", "Lockout/Tagout") to the canonical slug used here.
 // Returns null if no recipe matches.
 export function certToSlug(cert: string): string | null {
  const c = (cert || "").trim().toLowerCase().replace(/\s+/g, "-");
  if (c === "osha-10") return "osha-10";
  if (c === "osha-30") return "osha-30";
  if (c.startsWith("first") || c.includes("cpr")) return "first-aid-cpr";
  if (c === "hazmat" || c.startsWith("hazwoper")) return "hazmat";
  if (c === "forklift" || c.startsWith("pit")) return "forklift";
  if (c.startsWith("reach")) return "reach-truck";
  if (c.startsWith("order")) return "order-picker";
  if (c.startsWith("lockout") || c.includes("tagout")) return "lockout-tagout";
  if (c === "msds" || c.startsWith("ghs")) return "msds";
  if (c.startsWith("confined")) return "confined-space";
  if (c === "servsafe") return "servsafe";
  if (c.startsWith("fire")) return "fire-safety";
  if (c.startsWith("iso")) return "iso-9001";
  return null;
 }
--- a/mcp-server/index.ts
+++ b/mcp-server/index.ts
--- a/mcp-server/observer.ts
+++ b/mcp-server/observer.ts
@ -146,15 +146,16 @@ async function persistOp(op: ObservedOp) {
 // ─── LLM Team escalation (code_review mode) ───
 //
 // When recent failures on a single sig_hash cross a threshold the
-// local qwen2.5 analysis is probably insufficient. J's 2026-04-24
+// local-model analysis is probably insufficient. J's 2026-04-24
 // direction: "the observer would trigger to give more context" —
 // route failure clusters to LLM Team's specialized code_review mode
 // (via /api/run) so richer structured signal lands in the KB for
 // scrum + auditor + playbook memory to consume next pass.
 //
-// Non-destructive: runs in parallel to the existing qwen2.5 analysis,
+// Non-destructive: runs in parallel to the existing local diagnose
-// never replaces it. Writes to data/_kb/observer_escalations.jsonl
+// call (qwen3.5:latest after the 2026-04-30 bump), never replaces
-// as a dedicated audit surface.
+// it. Writes to data/_kb/observer_escalations.jsonl as a dedicated
 // audit surface.
 const LLM_TEAM = process.env.LH_LLM_TEAM_URL ?? "http://localhost:5000";
 const LLM_TEAM_ESCALATIONS = "/home/profit/lakehouse/data/_kb/observer_escalations.jsonl";
@ -542,7 +543,7 @@ async function analyzeErrors() {
  if (failures.length === 0) return;
  // NEW 2026-04-24: escalate recurring sig_hash clusters to LLM Team
-  // code_review mode. Runs in parallel to the local qwen2.5 analysis
+  // code_review mode. Runs in parallel to the local diagnose call
  // below — non-blocking, richer downstream signal for scrum/auditor.
  maybeEscalate(failures).catch(() => {});
@ -552,13 +553,14 @@ async function analyzeErrors() {
  // Ask local model to diagnose. Phase 44 migration (2026-04-27):
  // /v1/chat instead of legacy /ai/generate so /v1/usage tracks the
-  // call + Langfuse traces it. Same upstream model (qwen2.5 local).
+  // call + Langfuse traces it. 2026-04-30 model bump: qwen2.5 →
  // qwen3.5:latest to match the small-model-pipeline local-tier default.
  try {
    const resp = await fetch(`${LAKEHOUSE}/v1/chat`, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
-        model: "qwen2.5",
+        model: "qwen3.5:latest",
        provider: "ollama",
        messages: [{
          role: "user",
@ -769,7 +771,7 @@ async function tailOverseerCorrections(): Promise<number> {
    try { row = JSON.parse(line); } catch { continue; }
    const op: ObservedOp = {
      timestamp: row.created_at ?? new Date().toISOString(),
-      endpoint: `overseer:${row.model ?? "gpt-oss:120b"}`,
+      endpoint: `overseer:${row.model ?? "claude-opus-4-7"}`,
      input_summary: `${row.task_class ?? "?"}: ${row.reason ?? "escalation"}`,
      // Correction itself is neither success nor failure — it's a
      // mitigation attempt. We mark success=true so analyzeErrors
--- a/mcp-server/profiler.html
+++ b/mcp-server/profiler.html
@ -0,0 +1,599 @@
 <!DOCTYPE html>
 <html><head>
 <meta charset="utf-8"><meta name="viewport" content="width=device-width,initial-scale=1">
 <title>Profiler Index · Staffing Co-Pilot</title>
 <link rel="stylesheet" href="https://unpkg.com/leaflet@1.9.4/dist/leaflet.css">
 <script src="https://unpkg.com/leaflet@1.9.4/dist/leaflet.js"></script>
 <style>
 *{margin:0;padding:0;box-sizing:border-box}
 html,body{overflow-x:hidden}
 body{font-family:'Inter',-apple-system,system-ui,sans-serif;background:#090c10;color:#b0b8c4;font-size:14px;line-height:1.6}
 .bar{background:#0d1117;padding:0 24px;height:56px;border-bottom:1px solid #171d27;display:flex;justify-content:space-between;align-items:center}
 .bar h1{font-size:14px;font-weight:600;color:#e6edf3}
 .bar nav a{color:#545d68;text-decoration:none;font-size:12px;padding:6px 14px;border-radius:6px;margin-left:4px}
 .bar nav a:hover{color:#e6edf3;background:#161b22}
 .content{max-width:1200px;margin:0 auto;padding:24px 20px 40px}
 .controls{background:#0d1117;border:1px solid #171d27;border-radius:10px;padding:16px;margin-bottom:14px;display:flex;gap:10px;align-items:center;flex-wrap:wrap}
 .controls input,.controls select{padding:9px 12px;background:#161b22;border:1px solid #21262d;border-radius:6px;color:#e6edf3;font-size:13px;outline:none}
 .controls input:focus,.controls select:focus{border-color:#388bfd}
 .controls input.s{flex:1;min-width:240px}
 .controls .meta{font-size:11px;color:#8b949e;margin-left:auto}
 .summary{background:#0d1117;border:1px solid #171d27;border-radius:10px;padding:14px 16px;margin-bottom:14px;font-size:12px;color:#8b949e}
 .summary b{color:#e6edf3;font-weight:600}
 table{width:100%;border-collapse:collapse;background:#0d1117;border:1px solid #171d27;border-radius:10px;overflow:hidden}
 th{font-size:10px;color:#545d68;text-transform:uppercase;letter-spacing:1.2px;font-weight:600;text-align:left;padding:12px;background:#0a0d12;border-bottom:1px solid #171d27;cursor:pointer;user-select:none}
 th:hover{color:#e6edf3}
 th .arrow{font-size:9px;margin-left:4px;color:#388bfd}
 td{padding:11px 12px;border-bottom:1px solid #1f2631;font-size:13px}
 tr:last-child td{border-bottom:none}
 tr:hover td{background:#0a0d12}
 td.name a{color:#58a6ff;text-decoration:none;font-weight:600}
 td.name a:hover{text-decoration:underline}
 td.right{text-align:right;font-family:ui-monospace,monospace;font-variant-numeric:tabular-nums}
 td.role{font-size:10px;color:#8b949e}
 td.role .pill{display:inline-block;padding:2px 7px;border-radius:9px;font-size:9px;font-weight:600;background:#161b22;border:1px solid #21262d;color:#8b949e;margin-right:4px;text-transform:uppercase;letter-spacing:0.5px}
 .tickers{display:flex;gap:4px;flex-wrap:wrap;margin-top:3px}
 .ticker-pill{display:inline-block;padding:1px 7px;border-radius:5px;font-size:10px;font-weight:700;font-family:ui-monospace,SFMono-Regular,monospace;letter-spacing:0.3px;cursor:help}
 .ticker-pill.direct{background:#0d2818;border:1px solid #2ea04388;color:#3fb950}
 .ticker-pill.parent{background:#1a1410;border:1px solid #d2992288;color:#d29922}
 .ticker-pill.associated{background:#0d1830;border:1px solid #58a6ff66;color:#58a6ff}
 .ticker-pill.exact{background:#0d2818;border:1px solid #2ea043;color:#3fb950}
 /* Hero — the thesis panel that frames the data corpus's value. */
 .thesis{background:linear-gradient(135deg,#0d2818 0%,#0d1830 50%,#1a1410 100%);border:1px solid #2ea04344;border-radius:12px;padding:18px 22px;margin-bottom:14px;position:relative;overflow:hidden}
 .thesis::before{content:'';position:absolute;top:0;left:0;right:0;height:2px;background:linear-gradient(90deg,#3fb950 0%,#58a6ff 50%,#d29922 100%)}
 .thesis h2{font-size:18px;color:#e6edf3;font-weight:700;letter-spacing:-0.4px;margin-bottom:6px}
 .thesis .sub{font-size:12px;color:#8b949e;line-height:1.6;margin-bottom:14px;max-width:880px}
 .thesis .sub b{color:#3fb950;font-weight:600}
 .bai-row{display:flex;gap:24px;align-items:baseline;flex-wrap:wrap;margin-bottom:14px}
 .bai-block{display:flex;flex-direction:column;gap:2px}
 .bai-label{font-size:9px;color:#545d68;text-transform:uppercase;letter-spacing:1.4px;font-weight:700}
 .bai-value{font-size:26px;font-weight:700;color:#e6edf3;font-family:ui-monospace,monospace;letter-spacing:-0.5px;font-variant-numeric:tabular-nums}
 .bai-value.up{color:#3fb950}
 .bai-value.down{color:#f85149}
 .bai-sub{font-size:10px;color:#8b949e;margin-top:1px}
 .markets-strip{display:flex;gap:6px;flex-wrap:wrap;font-size:10px}
 .market-pill{padding:3px 9px;border-radius:9px;font-weight:600;border:1px solid;letter-spacing:0.4px}
 .market-pill.live{background:#0d2818;border-color:#3fb950;color:#3fb950}
 .market-pill.next{background:#0d1830;border-color:#58a6ff;color:#58a6ff}
 .market-pill.queue{background:#161b22;border-color:#21262d;color:#545d68}
 .market-pill.queue::before{content:'· '}
 /* Map panel below basket — populates when a ticker is selected. */
 .signal-map-wrap{display:none;background:#0d1117;border:1px solid #171d27;border-radius:10px;padding:14px;margin-bottom:14px}
 .signal-map-wrap.active{display:block}
 .signal-map-header{display:flex;justify-content:space-between;align-items:baseline;margin-bottom:10px}
 .signal-map-title{font-size:11px;color:#545d68;text-transform:uppercase;letter-spacing:1.2px;font-weight:600}
 .signal-map-title b{color:#58a6ff;font-family:ui-monospace,monospace}
 .signal-map-meta{font-size:11px;color:#8b949e}
 .signal-map{height:340px;border-radius:8px;border:1px solid #1f2631;overflow:hidden}
 .signal-map .leaflet-container{background:#0a0d12}
 /* Scrolling ticker basket — top strip showing every public issuer
   the profiler index has surfaced, with live price + day-change. */
 .basket-wrap{background:#0a0d12;border:1px solid #171d27;border-radius:10px;margin-bottom:14px;overflow:hidden;position:relative}
 .basket-label{padding:10px 16px 4px;font-size:10px;color:#545d68;text-transform:uppercase;letter-spacing:1.3px;font-weight:600;display:flex;justify-content:space-between;align-items:baseline}
 .basket-label .meta{font-weight:400;color:#3d444d;font-size:10px;text-transform:none;letter-spacing:0}
 .basket-track{display:flex;gap:0;overflow-x:auto;scroll-behavior:smooth;padding:6px 8px 12px;scrollbar-width:thin;scrollbar-color:#21262d transparent}
 .basket-track::-webkit-scrollbar{height:6px}
 .basket-track::-webkit-scrollbar-thumb{background:#21262d;border-radius:3px}
 .basket-track::-webkit-scrollbar-thumb:hover{background:#388bfd}
 .bk-card{flex:0 0 auto;min-width:140px;background:#0d1117;border:1px solid #21262d;border-radius:8px;padding:10px 12px;margin:0 4px;cursor:pointer;transition:all 0.12s;position:relative}
 .bk-card:hover{border-color:#58a6ff;background:#0d1a30;transform:translateY(-1px)}
 .bk-card.selected{border-color:#58a6ff;background:#0d1a30;box-shadow:0 0 0 1px #58a6ff;}
 .bk-card .tk{font-family:ui-monospace,SFMono-Regular,monospace;font-size:13px;font-weight:700;color:#e6edf3;letter-spacing:0.3px}
 .bk-card .px{font-family:ui-monospace,SFMono-Regular,monospace;font-size:14px;font-weight:600;color:#e6edf3;margin-top:3px;font-variant-numeric:tabular-nums}
 .bk-card .ch{font-family:ui-monospace,monospace;font-size:11px;margin-top:1px;font-variant-numeric:tabular-nums}
 .bk-card .ch.up{color:#3fb950}
 .bk-card .ch.down{color:#f85149}
 .bk-card .ch.flat{color:#545d68}
 .bk-card .meta{font-size:9px;color:#545d68;margin-top:5px;text-transform:uppercase;letter-spacing:0.6px}
 .bk-card .kind-bar{position:absolute;left:0;top:0;bottom:0;width:3px;border-radius:8px 0 0 8px}
 .bk-card .kind-bar.exact,.bk-card .kind-bar.direct{background:#3fb950}
 .bk-card .kind-bar.parent{background:#d29922}
 .bk-card .kind-bar.associated{background:#58a6ff}
 .bk-card .kind-bar.mixed{background:linear-gradient(180deg,#3fb950 0%,#58a6ff 100%)}
 .bk-card.no-quote .px{color:#545d68}
 .basket-empty{padding:18px;font-size:11px;color:#545d68;font-style:italic;text-align:center}
 .basket-clear{margin-left:8px;font-size:10px;color:#58a6ff;cursor:pointer;border:none;background:none;text-decoration:underline}
 .cost-band-1{color:#3fb950}
 .cost-band-2{color:#d29922}
 .cost-band-3{color:#f85149}
 .loading{text-align:center;padding:60px;font-size:13px;color:#3d444d}
 .empty{text-align:center;padding:40px;font-size:12px;color:#545d68;font-style:italic}
 .foot{margin-top:14px;font-size:10px;color:#484f58;line-height:1.6}
@media(max-width:640px){.bar{padding:0 14px}.content{padding:14px}th,td{padding:8px 6px;font-size:11px}}
 </style>
 </head><body>
 <div class="bar">
  <h1>Staffing Co-Pilot · Profiler Index</h1>
  <nav>
    <a href="" id="back-dashboard">← Dashboard</a>
    <a href="" id="back-console">Console</a>
  </nav>
 </div>
 <div class="content">
  <!-- Hero thesis — frames what this data corpus actually is. The
       profiler index isn't just a contractor directory; it's a
       construction-activity signal that surfaces public issuers months
       before quarterly earnings does. Each metric here is computed
       from the live data, not pre-baked. -->
  <div class="thesis" id="thesis">
    <h2>Chicago Construction Activity Signal Engine</h2>
    <div class="sub">
      Every contractor name in this corpus is also a forward indicator on the public equities they touch. Permits filed today predict construction starts ~45 days out, staffing windows ~2 weeks before that, and revenue recognition months later. The associated-ticker network surfaces this signal <b>before</b> it lands in any 10-Q.
    </div>
    <div class="bai-row">
      <div class="bai-block">
        <span class="bai-label">Building Activity Index — today</span>
        <span class="bai-value" id="bai-value">—</span>
        <span class="bai-sub" id="bai-sub">awaiting basket prices</span>
      </div>
      <div class="bai-block">
        <span class="bai-label">Indexed build value</span>
        <span class="bai-value" id="bav-value">—</span>
        <span class="bai-sub" id="bav-sub">across surfaced issuers</span>
      </div>
      <div class="bai-block">
        <span class="bai-label">Network depth</span>
        <span class="bai-value" id="net-value">—</span>
        <span class="bai-sub" id="net-sub">issuers · attributions</span>
      </div>
      <div class="bai-block" style="flex:1;min-width:240px">
        <span class="bai-label">Market replication roadmap</span>
        <div class="markets-strip" style="margin-top:4px">
          <span class="market-pill live">Chicago — live</span>
          <span class="market-pill next">NYC DOB — adapter ready</span>
          <span class="market-pill queue">LA County · Houston BCD · Boston ISD · DC DCRA</span>
        </div>
      </div>
    </div>
  </div>
  <div class="basket-wrap" id="basket-wrap" style="display:none">
    <div class="basket-label">
      <span><span id="bk-count">0</span> public issuers in this view <span class="meta" id="bk-meta"></span></span>
      <button class="basket-clear" id="bk-clear" style="display:none" type="button">clear filter</button>
    </div>
    <div class="basket-track" id="basket"></div>
  </div>
  <!-- Per-ticker permit map — shows where the selected issuer's
       attributed contractor activity is actually happening. Same
       leaflet pattern as the contractor profile, scoped to one ticker. -->
  <div class="signal-map-wrap" id="signal-map-wrap">
    <div class="signal-map-header">
      <span class="signal-map-title">Where <b id="signal-map-ticker">—</b> activity is happening</span>
      <span class="signal-map-meta" id="signal-map-meta">—</span>
    </div>
    <div class="signal-map" id="signal-map"></div>
  </div>
  <div class="controls">
    <input class="s" id="q" type="text" placeholder="Filter by contractor name (e.g., Target, Turner)" autocomplete="off">
    <select id="since">
      <option value="2025-06-01">Since June 2025</option>
      <option value="2024-01-01">Since 2024</option>
      <option value="2020-01-01">Since 2020 (deeper history)</option>
    </select>
    <select id="min-cost">
      <option value="500000">$500K+</option>
      <option value="250000" selected>$250K+</option>
      <option value="100000">$100K+</option>
      <option value="50000">$50K+</option>
    </select>
    <span class="meta" id="meta">Loading…</span>
  </div>
  <div class="summary" id="summary" style="display:none"></div>
  <div id="result"><div class="loading">Loading the directory from Chicago Socrata…</div></div>
  <div class="foot">Aggregations sourced live from data.cityofchicago.org (Building Permits dataset ydr8-5enu). Contractor names appear when listed as contact_1 or contact_2 on a permit. Click any name to open the full profile — heat map, project index, history, 12 awaiting public-data sources.</div>
 </div>
 <script>
 var P=location.pathname.indexOf('/lakehouse')>=0?'/lakehouse':'';
 document.getElementById('back-dashboard').href = P+'/';
 document.getElementById('back-console').href = P+'/console';
 var sortKey='total_cost', sortDir='desc';
 var lastRows=[];
 var tickerFilter=null; // selected ticker for filtering the table
 var lastQuotes={};     // ticker → quote (price, day_change_pct)
 var lastBasket=[];     // basket rows aggregated from lastRows
 var signalMap=null;    // leaflet map instance for the per-ticker view
 function clearChildren(el){ while(el.firstChild) el.removeChild(el.firstChild); }
 function fmt$(n){
  if(n>=1e9) return '$'+(n/1e9).toFixed(2)+'B';
  if(n>=1e6) return '$'+(n/1e6).toFixed(1)+'M';
  if(n>=1e3) return '$'+(n/1e3).toFixed(0)+'K';
  return '$'+Math.round(n||0).toLocaleString();
 }
 function costClass(n){
  if(n>=1e7) return 'cost-band-3';
  if(n>=1e6) return 'cost-band-2';
  return 'cost-band-1';
 }
 function load(){
  var search=document.getElementById('q').value.trim();
  var since=document.getElementById('since').value;
  var minCost=parseInt(document.getElementById('min-cost').value,10);
  document.getElementById('meta').textContent='Loading…';
  var host=document.getElementById('result'); clearChildren(host);
  var loading=document.createElement('div'); loading.className='loading';
  loading.textContent='Aggregating from Chicago Socrata…';
  host.appendChild(loading);
  fetch(P+'/intelligence/profiler_index',{
    method:'POST',
    headers:{'Content-Type':'application/json'},
    body:JSON.stringify({since:since,min_cost:minCost,search:search,limit:200})
  }).then(function(r){return r.json()}).then(function(d){
    lastRows = d.contractors||[];
    document.getElementById('meta').textContent=lastRows.length+' contractors · '+(d.duration_ms||0)+'ms';
    // Build the ticker basket from the surfaced rows
    buildBasket();
    var totalCost = lastRows.reduce(function(s,r){return s+(r.total_cost||0)},0);
    var totalPermits = lastRows.reduce(function(s,r){return s+(r.permits||0)},0);
    var sumDiv=document.getElementById('summary');
    sumDiv.style.display='block';
    clearChildren(sumDiv);
    var b1=document.createElement('b'); b1.textContent=lastRows.length.toLocaleString();
    sumDiv.appendChild(b1);
    sumDiv.appendChild(document.createTextNode(' contractors · '));
    var b2=document.createElement('b'); b2.textContent=totalPermits.toLocaleString();
    sumDiv.appendChild(b2);
    sumDiv.appendChild(document.createTextNode(' total permits · '));
    var b3=document.createElement('b'); b3.textContent=fmt$(totalCost);
    sumDiv.appendChild(b3);
    sumDiv.appendChild(document.createTextNode(' aggregate value · since '+(d.since||'?')+' · min permit cost '+fmt$(d.min_cost||0)));
    render();
  }).catch(function(e){
    document.getElementById('meta').textContent='error';
    var host=document.getElementById('result'); clearChildren(host);
    var er=document.createElement('div'); er.className='empty'; er.style.color='#f85149';
    er.textContent='Profiler index error: '+e.message;
    host.appendChild(er);
  });
 }
 // Aggregate every public ticker the profiler index surfaced, with a
 // kind hierarchy (exact > direct > parent > associated) and the count
 // of contractors each ticker is attributed to. Then fetch live quotes
 // in one batch and render the scrolling basket.
 function buildBasket(){
  var byTicker = {};
  lastRows.forEach(function(r){
    var ts = (r.tickers && r.tickers.direct ? r.tickers.direct : []).concat(r.tickers && r.tickers.associated ? r.tickers.associated : []);
    ts.forEach(function(t){
      if(!t || !t.ticker) return;
      if(!byTicker[t.ticker]) byTicker[t.ticker] = {ticker:t.ticker, kinds:new Set(), count:0, contractors:[], matched_name:t.matched_name||t.partner_name||null};
      byTicker[t.ticker].kinds.add(t.via);
      byTicker[t.ticker].count++;
      if(byTicker[t.ticker].contractors.length < 5) byTicker[t.ticker].contractors.push(r.name);
    });
  });
  var basketRows = Object.values(byTicker)
    .map(function(b){
      // Pick a single 'kind' for the bar color: direct/exact wins, then parent, then associated.
      var k = b.kinds.has('exact')?'exact':b.kinds.has('direct')?'direct':b.kinds.has('parent')?'parent':b.kinds.has('associated')?'associated':'mixed';
      if(b.kinds.size>1 && (b.kinds.has('exact')||b.kinds.has('direct')) && b.kinds.has('associated')) k='mixed';
      return Object.assign({}, b, {kinds:Array.from(b.kinds), kind:k});
    })
    .sort(function(a,b){return b.count - a.count});
  var wrap = document.getElementById('basket-wrap');
  var track = document.getElementById('basket');
  clearChildren(track);
  if(!basketRows.length){
    wrap.style.display='block';
    var emp=document.createElement('div'); emp.className='basket-empty';
    emp.textContent='No public issuers in this view. Try a wider filter or "since 2020" history.';
    track.appendChild(emp);
    document.getElementById('bk-count').textContent='0';
    document.getElementById('bk-meta').textContent='';
    return;
  }
  wrap.style.display='block';
  document.getElementById('bk-count').textContent=basketRows.length;
  document.getElementById('bk-meta').textContent='loading prices…';
  // Render shells immediately, then fill in prices when the batch returns
  basketRows.forEach(function(b){
    var card=document.createElement('div'); card.className='bk-card no-quote';
    card.dataset.ticker=b.ticker;
    var bar=document.createElement('div'); bar.className='kind-bar '+b.kind; card.appendChild(bar);
    var tk=document.createElement('div'); tk.className='tk'; tk.textContent=b.ticker; card.appendChild(tk);
    var px=document.createElement('div'); px.className='px'; px.textContent='—'; card.appendChild(px);
    var ch=document.createElement('div'); ch.className='ch flat'; ch.textContent=' '; card.appendChild(ch);
    var meta=document.createElement('div'); meta.className='meta';
    meta.textContent=b.count+' attribution'+(b.count===1?'':'s')+' · '+b.kinds.join('+');
    card.appendChild(meta);
    card.title=(b.matched_name||b.ticker)+'\n'+b.contractors.slice(0,5).join('\n')+(b.count>5?'\n…':'');
    card.onclick=function(){
      tickerFilter = (tickerFilter===b.ticker) ? null : b.ticker;
      Array.prototype.forEach.call(track.children, function(c){
        c.classList.toggle('selected', c.dataset && c.dataset.ticker===tickerFilter);
      });
      document.getElementById('bk-clear').style.display = tickerFilter ? 'inline' : 'none';
      showSignalMap(tickerFilter);
      render();
    };
    track.appendChild(card);
  });
  lastBasket = basketRows;
  // Update the hero panel right away with what we know without prices
  updateThesisMetrics();
  // Batch-fetch quotes and update each card + thesis
  fetch(P+'/intelligence/ticker_quotes',{
    method:'POST',headers:{'Content-Type':'application/json'},
    body:JSON.stringify({tickers:basketRows.map(function(b){return b.ticker})})
  }).then(function(r){return r.json()}).then(function(qd){
    var quotes=qd.quotes||{};
    lastQuotes = quotes;
    document.getElementById('bk-meta').textContent='quotes via Stooq · '+(qd.duration_ms||0)+'ms';
    Array.prototype.forEach.call(track.children, function(card){
      var t=card.dataset.ticker; var q=quotes[t];
      if(!q || !q.price) return;
      card.classList.remove('no-quote');
      var px=card.querySelector('.px'); px.textContent='$'+q.price.toFixed(2);
      var ch=card.querySelector('.ch');
      if(q.day_change_pct==null){ ch.textContent='close '+(q.price_date||''); ch.className='ch flat'; }
      else if(q.day_change_pct>=0){ ch.textContent='+'+q.day_change_pct.toFixed(2)+'%'; ch.className='ch up'; }
      else { ch.textContent=q.day_change_pct.toFixed(2)+'%'; ch.className='ch down'; }
    });
    updateThesisMetrics();
  }).catch(function(){
    document.getElementById('bk-meta').textContent='quote fetch failed';
  });
 }
 // Compute the Building Activity Index and update the hero panel.
 // BAI = attribution-weighted day-change % across surfaced issuers.
 // "Indexed build value" = total dollars of permits attributable to
 // any public issuer in this view (sum across attributing contractors).
 // "Network depth" = issuer count + total attributions.
 function updateThesisMetrics(){
  if(!lastBasket.length){
    document.getElementById('bai-value').textContent='—';
    document.getElementById('bai-sub').textContent='awaiting basket data';
    return;
  }
  // BAI: weighted average of day_change_pct, weight = attribution count.
  var weightedSum=0, weightTotal=0, contributors=[];
  lastBasket.forEach(function(b){
    var q = lastQuotes[b.ticker];
    if(q && q.day_change_pct!=null){
      var w = b.count || 1;
      weightedSum += q.day_change_pct * w;
      weightTotal += w;
      contributors.push({ticker:b.ticker, day:q.day_change_pct, weight:w});
    }
  });
  var bai = weightTotal>0 ? (weightedSum/weightTotal) : null;
  var baiEl = document.getElementById('bai-value');
  var baiSub = document.getElementById('bai-sub');
  if(bai==null){
    baiEl.textContent='—'; baiSub.textContent='no quotes settled yet';
    baiEl.className='bai-value';
  } else {
    var sign = bai>=0 ? '+' : '';
    baiEl.textContent = sign + bai.toFixed(2) + '%';
    baiEl.className = 'bai-value ' + (bai>=0?'up':'down');
    contributors.sort(function(a,b){return Math.abs(b.day*b.weight) - Math.abs(a.day*a.weight)});
    var top = contributors.slice(0,3).map(function(c){return c.ticker+' '+(c.day>=0?'+':'')+c.day.toFixed(1)+'%'}).join(' · ');
    baiSub.textContent = contributors.length+' issuers contributing · top: '+top;
  }
  // Indexed build value
  var totalCost = 0;
  lastRows.forEach(function(r){
    var ts=(r.tickers&&r.tickers.direct?r.tickers.direct:[]).concat(r.tickers&&r.tickers.associated?r.tickers.associated:[]);
    if(ts.length>0) totalCost += (r.total_cost||0);
  });
  var bav = totalCost>=1e9 ? '$'+(totalCost/1e9).toFixed(2)+'B' : totalCost>=1e6 ? '$'+(totalCost/1e6).toFixed(0)+'M' : '$'+Math.round(totalCost/1e3)+'K';
  document.getElementById('bav-value').textContent = bav;
  document.getElementById('bav-sub').textContent = lastBasket.length+' issuers in scope';
  // Network depth
  var totalAttrib = lastBasket.reduce(function(s,b){return s + (b.count||0)},0);
  document.getElementById('net-value').textContent = lastBasket.length + ' / ' + totalAttrib;
  document.getElementById('net-sub').textContent = 'issuers / attribution edges';
 }
 // Per-ticker map: when a ticker is selected, plot the contractor
 // permit locations attributed to that ticker. Pulls lat/lng for each
 // attributed contractor from the contractor profile endpoint and
 // merges. Caches per-ticker so toggling is instant.
 var mapCache = {};
 function showSignalMap(ticker){
  var wrap=document.getElementById('signal-map-wrap');
  if(!ticker){ wrap.classList.remove('active'); if(signalMap){signalMap.remove(); signalMap=null;} return; }
  wrap.classList.add('active');
  document.getElementById('signal-map-ticker').textContent = ticker;
  document.getElementById('signal-map-meta').textContent = 'loading permits…';
  // Find the contractors attributed to this ticker
  var attrib = lastRows.filter(function(r){
    var ts=(r.tickers&&r.tickers.direct?r.tickers.direct:[]).concat(r.tickers&&r.tickers.associated?r.tickers.associated:[]);
    return ts.some(function(t){return t.ticker===ticker});
  });
  if(!attrib.length){
    document.getElementById('signal-map-meta').textContent='no attributed contractors';
    return;
  }
  // Use the contractor_profile endpoint per attributed contractor (cap at 6)
  // to pull their geocoded permits, then render. Cached per ticker.
  if(mapCache[ticker]){
    drawSignalMap(ticker, mapCache[ticker]);
    return;
  }
  var names = attrib.slice(0,6).map(function(r){return r.name});
  Promise.all(names.map(function(n){
    return fetch(P+'/intelligence/contractor_profile',{
      method:'POST',headers:{'Content-Type':'application/json'},
      body:JSON.stringify({name:n})
    }).then(function(r){return r.json()}).then(function(d){
      var perms = (d.history && d.history.recent_permits) || [];
      return perms.filter(function(p){return p.lat&&p.lng}).map(function(p){
        return Object.assign({contractor:n}, p);
      });
    }).catch(function(){return []});
  })).then(function(arrs){
    var all = arrs.reduce(function(a,b){return a.concat(b)},[]);
    mapCache[ticker] = all;
    drawSignalMap(ticker, all);
  });
 }
 function drawSignalMap(ticker, permits){
  if(signalMap){ signalMap.remove(); signalMap=null; }
  if(!permits.length){
    document.getElementById('signal-map-meta').textContent='0 geocoded permits across attributed contractors';
    return;
  }
  document.getElementById('signal-map-meta').textContent = permits.length + ' geocoded permits across ' + new Set(permits.map(function(p){return p.contractor})).size + ' contractors';
  signalMap = L.map('signal-map',{zoomControl:true, attributionControl:false}).setView([41.88,-87.63], 11);
  L.tileLayer('https://{s}.basemaps.cartocdn.com/dark_all/{z}/{x}/{y}{r}.png',{maxZoom:19}).addTo(signalMap);
  var bounds=[];
  var maxCost = Math.max.apply(null, permits.map(function(p){return Number(p.cost)||1}));
  permits.forEach(function(p){
    var c=Number(p.cost)||0;
    var radius = 4 + (c/maxCost)*14;
    var color = c>=1000000?'#f85149':c>=100000?'#d29922':'#3fb950';
    var marker = L.circleMarker([p.lat,p.lng],{radius:radius,color:color,weight:1,fillOpacity:0.55});
    var pop=document.createElement('div');
    pop.style.cssText='font-family:ui-monospace,monospace;font-size:11px;color:#0a0d12;min-width:200px';
    var top=document.createElement('div'); top.style.cssText='font-weight:700;margin-bottom:3px;color:#1f6feb';
    top.textContent=ticker+' attribution';
    pop.appendChild(top);
    var con=document.createElement('div'); con.textContent=p.contractor; con.style.fontWeight='600';
    pop.appendChild(con);
    var meta=document.createElement('div'); meta.style.color='#545d68';
    meta.textContent='$'+c.toLocaleString()+' · '+(p.date||'?')+' · '+(p.work_type||'?');
    pop.appendChild(meta);
    var addr=document.createElement('div'); addr.style.color='#545d68';
    addr.textContent=p.address||'?';
    pop.appendChild(addr);
    marker.bindPopup(pop);
    marker.addTo(signalMap);
    bounds.push([p.lat,p.lng]);
  });
  if(bounds.length>1) signalMap.fitBounds(bounds,{padding:[28,28]});
 }
 function render(){
  var host=document.getElementById('result');
  clearChildren(host);
  // Apply ticker filter if set: keep only rows whose tickers include the selected one
  var pool = tickerFilter ? lastRows.filter(function(r){
    var ts=(r.tickers&&r.tickers.direct?r.tickers.direct:[]).concat(r.tickers&&r.tickers.associated?r.tickers.associated:[]);
    return ts.some(function(t){return t.ticker===tickerFilter});
  }) : lastRows;
  if(!pool.length){
    var emp=document.createElement('div'); emp.className='empty';
    emp.textContent='No contractors match the current filter.';
    host.appendChild(emp);
    return;
  }
  var rows = pool.slice().sort(function(a,b){
    var av=a[sortKey], bv=b[sortKey];
    if(typeof av==='string'){ av=(av||'').toUpperCase(); bv=(bv||'').toUpperCase(); }
    if(av<bv) return sortDir==='asc'?-1:1;
    if(av>bv) return sortDir==='asc'?1:-1;
    return 0;
  });
  var t=document.createElement('table');
  var thead=document.createElement('thead'); var hr=document.createElement('tr');
  var cols=[
    {k:'name', label:'Contractor'},
    {k:'permits', label:'Permits', right:true},
    {k:'total_cost', label:'Total Value', right:true},
    {k:'last_filed', label:'Last Filed', right:true},
    {k:'roles', label:'Listed As'},
  ];
  cols.forEach(function(c){
    var h=document.createElement('th');
    h.textContent=c.label;
    if(c.right) h.style.textAlign='right';
    if(sortKey===c.k){
      var ar=document.createElement('span'); ar.className='arrow';
      ar.textContent = sortDir==='asc' ? '▲' : '▼';
      h.appendChild(ar);
    }
    h.onclick=function(){
      if(sortKey===c.k) sortDir = sortDir==='asc' ? 'desc' : 'asc';
      else { sortKey=c.k; sortDir = (c.k==='name') ? 'asc' : 'desc'; }
      render();
    };
    hr.appendChild(h);
  });
  thead.appendChild(hr); t.appendChild(thead);
  var tb=document.createElement('tbody');
  rows.forEach(function(r){
    var tr=document.createElement('tr');
    var ntd=document.createElement('td'); ntd.className='name';
    var a=document.createElement('a');
    a.href = P+'/contractor?name='+encodeURIComponent(r.name);
    a.target='_blank'; a.rel='noopener';
    a.textContent = r.name;
    ntd.appendChild(a);
    // Ticker association pills — direct (green) = the contractor is a
    // public issuer; parent (amber) = subsidiary of a public parent;
    // associated (blue) = co-appears on permits with a public entity.
    // Shows the correlation indicator J described — when Bob's Electric
    // works permits with Target, TGT renders here as associated.
    var t = r.tickers || {direct:[], associated:[]};
    var allTk = (t.direct||[]).concat(t.associated||[]);
    if(allTk.length){
      var trk = document.createElement('div'); trk.className='tickers';
      allTk.forEach(function(x){
        var p = document.createElement('span');
        p.className = 'ticker-pill ' + (x.via||'direct');
        p.textContent = x.ticker;
        // Tooltip shows the full reason path
        var hint = x.via === 'associated'
          ? 'Associated via co-permits with '+x.partner_name+' ('+(x.co_permits||0)+' shared permits)' + (x.matched_name ? ' — '+x.matched_name : '')
          : x.via === 'parent'
            ? 'Subsidiary of '+(x.matched_name||x.ticker) + (x.exchange ? ' · '+x.exchange : '')
            : 'Direct match: '+(x.matched_name||r.name);
        p.title = hint;
        trk.appendChild(p);
      });
      ntd.appendChild(trk);
    }
    tr.appendChild(ntd);
    var ptd=document.createElement('td'); ptd.className='right';
    ptd.textContent=(r.permits||0).toLocaleString();
    tr.appendChild(ptd);
    var ctd=document.createElement('td'); ctd.className='right '+costClass(r.total_cost||0);
    ctd.textContent=fmt$(r.total_cost||0);
    tr.appendChild(ctd);
    var ltd=document.createElement('td'); ltd.className='right';
    ltd.textContent=(r.last_filed||'').slice(0,10) || '—';
    tr.appendChild(ltd);
    var rtd=document.createElement('td'); rtd.className='role';
    (r.roles||[]).forEach(function(role){
      var pill=document.createElement('span'); pill.className='pill'; pill.textContent=role;
      rtd.appendChild(pill);
    });
    tr.appendChild(rtd);
    tb.appendChild(tr);
  });
  t.appendChild(tb);
  host.appendChild(t);
 }
 var sDeb;
 document.getElementById('q').addEventListener('input',function(){
  clearTimeout(sDeb);
  sDeb=setTimeout(load,400);
 });
 document.getElementById('since').addEventListener('change',load);
 document.getElementById('min-cost').addEventListener('change',load);
 document.getElementById('bk-clear').addEventListener('click',function(){
  tickerFilter=null;
  document.getElementById('bk-clear').style.display='none';
  Array.prototype.forEach.call(document.querySelectorAll('.bk-card.selected'), function(c){c.classList.remove('selected')});
  showSignalMap(null);
  render();
 });
 window.addEventListener('load',load);
 </script>
 </body></html>
--- a/mcp-server/proof.html
+++ b/mcp-server/proof.html
@ -81,6 +81,7 @@ pre{background:#161b22;border:1px solid #171d27;border-radius:8px;padding:14px 1
  <nav>
    <a href=".">Dashboard</a>
    <a href="console">Walkthrough</a>
    <a href="profiler">Profiler</a>
    <a href="proof" class="active">Architecture</a>
    <a href="spec">Spec</a>
    <a href="onboard">Onboard</a>
@ -95,138 +96,137 @@ pre{background:#161b22;border:1px solid #171d27;border-radius:8px;padding:14px 1
 <div class="chapter">
  <div class="num">Chapter 1</div>
  <h2>Receipts, not promises</h2>
-  <div class="lede">Every test below ran live against the real gateway when you loaded this page. Sub-100ms SQL on multi-million-row Parquet, hybrid search with playbook boost applied. No fixtures. If a test fails, you'll see ✗.</div>
+  <div class="lede">Every test below ran live against the real gateway when you loaded this page. Sub-100ms SQL on multi-million-row Parquet, hybrid search with playbook boost applied, public-issuer attribution computed from this view. No fixtures. If a test fails, you'll see ✗.</div>
  <div id="ch1-tests"><div class="loading">Running tests…</div></div>
  <div id="ch1-live" style="margin-top:14px"></div>
 </div>
 <div class="chapter">
  <div class="num">Chapter 2</div>
-  <h2>Architecture — 13 crates, one object store, one local AI runtime</h2>
+  <h2>Architecture — 15 crates, one object store, a 5-provider model fleet</h2>
-  <div class="lede">Request flows top to bottom. Every node is independently swappable. Every line is a real HTTP or gRPC hop that you can trace with <code>tcpdump</code>.</div>
+  <div class="lede">Gateway is a drop-in OpenAI-compatible middleware. Any consumer that speaks the OpenAI Chat Completions shape — agent SDKs, IDE plugins, custom apps — points at <code>localhost:3100/v1</code> and gets routing, audit, and the full memory substrate behind every call. The model side has 5 providers and 40+ frontier models reachable via one OpenCode key. The data side stays Rust-first.</div>
  <div class="card accent-b">
-    <pre>                            HTTP :3100  +  gRPC :3101
+    <pre>     OpenAI SDK consumers          MCP clients         Browser UI (Bun :3700)
-                                    │
+              │                          │                          │
-                            ┌───────▼───────┐
+              └──────────────────────────┼──────────────────────────┘
-                            │   gateway     │   Rust · Axum · routing, CORS, auth, tools
+                                         ▼
-                            └───────┬───────┘
+                           ┌──────────────────────────────┐
-           ┌────────────┬───────────┼───────────┬────────────┐
+                           │   gateway   :3100  /v1/*     │  Rust · Axum
-           │            │           │           │            │
+                           │   OpenAI-compat drop-in      │  smart provider routing
-      ┌────▼───┐   ┌────▼───┐  ┌────▼───┐  ┌────▼───┐   ┌────▼───┐
+                           │   /v1/chat /v1/mode /iterate │  cost telemetry, Langfuse
-      │catalog │   │ query  │  │ vector │  │ ingest │   │aibridge│
+                           └──────────┬───────────────────┘
-      │   d    │   │   d    │  │   d    │  │   d    │   │        │
+            ┌─────────┬───────────────┼───────────────┬──────────┐
-      └────┬───┘   └────┬───┘  └────┬───┘  └────┬───┘   └────┬───┘
+            │         │               │               │          │
-           │            │           │           │            │
+       ┌────▼───┐ ┌───▼────┐    ┌─────▼──────┐  ┌─────▼─────┐ ┌──▼──────┐
-           └────────────┴───────────┼───────────┴────────────┘
+       │catalog │ │ query  │    │   vector   │  │  ingest   │ │aibridge │
-                                    ▼
+       │   d    │ │   d    │    │     d      │  │     d     │ │         │
-                          ┌─────────────────┐
+       │idempot │ │DataFus │    │HNSW · Lance│  │CSV PDF SQL│ │provider │
-                          │ object storage  │   Parquet files (local / S3)
+       │schema  │ │delta   │    │playbook+   │  │auto-PII   │ │adapters │
-                          └─────────────────┘
+       │fingerp │ │MemTabl │    │pathway mem │  │schema fp  │ │5 active │
-                                    ▲
+       └────┬───┘ └───┬────┘    └─────┬──────┘  └─────┬─────┘ └──┬──────┘
-                                    │
+            └─────────┴────────────────┼────────────────┴─────────┘
-                            ┌───────┴────────┐
+                                       ▼
-                            │ Python sidecar │   FastAPI → Ollama
+                            ┌──────────────────┐
-                            │   (aibridge)   │   local models only
+                            │ object storage   │  Parquet · MinIO · S3-compat
-                            └────────────────┘</pre>
+                            └──────────────────┘
                                       ▲
                                       │
                       ┌───────────────┴────────────────┐
                       │  validator   ·   journald      │  schema/PII/policy gates
                       │  (Phase 43)  ·   (audit log)   │  + append-only mutations
                       └────────────────────────────────┘
 Provider fleet (config/providers.toml):
  ollama         localhost:3200          local Ollama → qwen3.5, gemma2
  ollama_cloud   ollama.com              gpt-oss:120b, qwen3-coder:480b,
                                         deepseek-v3.1:671b, kimi-k2:1t,
                                         mistral-large-3:675b, qwen3.5:397b
  openrouter     openrouter.ai/api/v1    343 models — paid + free rescue
  opencode       opencode.ai/zen/v1      40 models · ONE sk-* key reaches
                                         Claude Opus 4.7, GPT-5.5-pro,
                                         Gemini 3.1-pro, Kimi K2.6, GLM 5.1,
                                         DeepSeek, Qwen, MiniMax, free tier
  kimi           api.kimi.com/coding/v1  direct Kimi For Coding (TOS-clean)</pre>
  </div>
-  <h3>Per-crate responsibility</h3>
+  <h3>Per-crate responsibility (15 crates)</h3>
  <table class="plain">
    <thead><tr><th>Crate</th><th>Role</th><th>Path</th></tr></thead>
    <tbody>
-      <tr><td>shared</td><td>Types, errors, Arrow helpers, PII detection, secrets provider</td><td>crates/shared/</td></tr>
+      <tr><td>shared</td><td>Types, errors, Arrow helpers, PII detection, secrets provider, model_matrix</td><td>crates/shared/</td></tr>
-      <tr><td>storaged</td><td>object_store I/O, BucketRegistry (multi-bucket), AppendLog, ErrorJournal</td><td>crates/storaged/</td></tr>
+      <tr><td>storaged</td><td>object_store I/O, BucketRegistry, AppendLog, ErrorJournal, federation_service</td><td>crates/storaged/</td></tr>
-      <tr><td>catalogd</td><td>Metadata authority — manifests, views, tombstones, profiles, schema fingerprints</td><td>crates/catalogd/</td></tr>
+      <tr><td>catalogd</td><td>Manifests, views (incl. PII-safe view layer), tombstones, profiles, schema fingerprints, register-idempotency (ADR-020)</td><td>crates/catalogd/</td></tr>
-      <tr><td>queryd</td><td>DataFusion SQL engine, MemTable cache, delta merge-on-read, compaction</td><td>crates/queryd/</td></tr>
+      <tr><td>queryd</td><td>DataFusion SQL, MemTable cache, delta merge-on-read, compaction, truth gate (ADR-021)</td><td>crates/queryd/</td></tr>
-      <tr><td>ingestd</td><td>CSV/JSON/PDF(+OCR)/Postgres/MySQL ingest, cron schedules, auto-PII</td><td>crates/ingestd/</td></tr>
+      <tr><td>ingestd</td><td>CSV/JSON/PDF(+OCR)/Postgres/MySQL ingest, cron schedules, auto-PII flagging</td><td>crates/ingestd/</td></tr>
-      <tr><td>vectord</td><td>Embeddings as Parquet, HNSW, trial system, autotune agent, playbook_memory</td><td>crates/vectord/</td></tr>
+      <tr><td>vectord</td><td>Embeddings as Parquet, HNSW, trial system, autotune, playbook_memory + pathway_memory (ADR-021 semantic-correctness layer)</td><td>crates/vectord/</td></tr>
      <tr><td>vectord-lance</td><td>Firewall crate — Lance 4.0 + Arrow 57 isolated from main Arrow 55</td><td>crates/vectord-lance/</td></tr>
-      <tr><td>journald</td><td>Append-only mutation event log for time-travel &amp; audit</td><td>crates/journald/</td></tr>
+      <tr><td>journald</td><td>Append-only mutation event log for time-travel + audit</td><td>crates/journald/</td></tr>
-      <tr><td>aibridge</td><td>Rust↔Python sidecar, Ollama HTTP client, VRAM introspection</td><td>crates/aibridge/</td></tr>
+      <tr><td>truth</td><td>File-backed rule store; <code>evaluate(task_class, ctx) → Vec&lt;RuleOutcome&gt;</code> (ADR-021)</td><td>crates/truth/</td></tr>
-      <tr><td>gateway</td><td>Axum HTTP :3100 + gRPC :3101, middleware, tools registry</td><td>crates/gateway/</td></tr>
+      <tr><td>aibridge</td><td>Rust↔Python sidecar, Ollama client, ProviderAdapter trait, /v1/chat router</td><td>crates/aibridge/</td></tr>
-      <tr><td>ui</td><td>Dioxus WASM internal developer UI</td><td>crates/ui/</td></tr>
+      <tr><td>gateway</td><td>Axum HTTP :3100 + gRPC :3101, OpenAI-compat /v1/*, mode runner, validator, iterate loop, cost telemetry, Langfuse + observer fan-out</td><td>crates/gateway/</td></tr>
-      <tr><td>mcp-server</td><td>Bun TypeScript recruiter-facing app (this server)</td><td>mcp-server/</td></tr>
+      <tr><td>validator</td><td>Phase 43 — schema / completeness / consistency / policy gates over LLM outputs (FillValidator, EmailValidator, ParquetWorkerLookup)</td><td>crates/validator/</td></tr>
      <tr><td>ui</td><td>Dioxus WASM internal developer UI (separate from this Bun-served public UI)</td><td>crates/ui/</td></tr>
      <tr><td>mcp-server</td><td>Bun TypeScript public-facing app + MCP tool surface — what you're reading right now</td><td>mcp-server/</td></tr>
      <tr><td>auditor</td><td>External claim-vs-diff verifier on PRs · Kimi K2.6 ↔ Haiku 4.5 cross-lineage alternation, Opus 4.7 auto-promote on diffs &gt;100k chars</td><td>auditor/</td></tr>
    </tbody>
  </table>
-  <div class="ref"><strong>Source:</strong> git.agentview.dev/profit/lakehouse &nbsp;·&nbsp; <strong>ADRs:</strong> docs/DECISIONS.md (currently 20 records)</div>
+  <div class="ref"><strong>Source:</strong> git.agentview.dev/profit/lakehouse · branch <code>scrum/auto-apply-19814</code> · tag <code>distillation-v1.0.0</code> at commit <code>e7636f2</code> (frozen substrate) · <strong>ADRs:</strong> docs/DECISIONS.md (currently 21 records)</div>
 </div>
 <div class="chapter">
  <div class="num">Chapter 3</div>
-  <h2>Dual-agent recursive consensus loop</h2>
+  <h2>The model fleet — 9-rung ladder, N=3 consensus, cross-lineage audit</h2>
-  <div class="lede">The system we use to execute staffing fills is a dual-agent recursive protocol. Two agents with distinct roles iterate against a shared log until one of three terminal states is reached. It is deterministic in structure, stochastic in content, and verifiable through the per-run log artifact.</div>
+  <div class="lede">No single model owns the answer. Every consequential call is structured: the right tier picks up first, fallback rungs catch what fails, parallel runs vote, and an independent auditor of a different model lineage checks the result against the diff. The protocol is deterministic; the inference is stochastic; every step writes a receipt.</div>
  <h3>Agents and protocol</h3>
  <div class="card accent-a">
    <pre>  task in
    │
    ▼
  ┌───────────────────────────────────────────────────────────┐
  │  EXECUTOR (mistral:latest)                                │
  │  ──────────────────────────────────────────────────────── │
  │  input:   task spec + shared log + seen-candidates ledger │
  │  output:  one JSON action per turn                        │
  │             · {kind:"plan",steps:[…]}                     │
  │             · {kind:"tool_call",tool,args,rationale}      │
  │             · {kind:"propose_done",fills:[N of N]}        │
  └───────────┬───────────────────────────────┬───────────────┘
              │ tool_call                     │ propose_done
              ▼                               │
  ┌──────────────────────────┐                │
  │  TOOL DISPATCH           │                │
  │  hybrid_search / sql     │                │
  │  (against live gateway)  │                │
  └──────────┬───────────────┘                │
             │ result (trimmed, exclusions)   │
             ▼                                ▼
  ┌───────────────────────────────────────────────────────────┐
  │  REVIEWER (qwen2.5:latest)                                │
  │  ──────────────────────────────────────────────────────── │
  │  input:   task spec + shared log (including tool result)  │
  │  output:  {kind:"critique",verdict:"continue|drift|       │
  │                                    approve_done",notes}   │
  └───────────┬───────────────────────────────────────────────┘
              │
        ┌─────┴─────┐
        ▼           ▼           ▼
    continue     drift       approve_done + propose_done ⟹ SEAL
    (next turn)  (cap ≈ 3 →
                  hard abort)
    </pre>
  </div>
  <div class="ref"><strong>Code:</strong> tests/multi-agent/agent.ts (protocol + prompts) &nbsp;·&nbsp; tests/multi-agent/orchestrator.ts (run loop) &nbsp;·&nbsp; tests/multi-agent/scenario.ts (5-event warehouse week)</div>
-  <h3>Why "dual" — role specialization</h3>
+  <h3>The 9-rung cloud-first ladder</h3>
-  <div class="narr">
+  <div class="card accent-b">
-    <strong>The executor is an optimist.</strong> Its job is to produce progress: pull candidates, verify SQL, propose consensus. It's instructed to be decisive.
+    <pre>  request in
-    <br><br>
+      │
-    <strong>The reviewer is a pessimist.</strong> Its job is to catch drift: proposals that don't match the task's geography, fill count, or role. It's authorized to stop the loop.
+      ▼
-    <br><br>
+  ┌───────────────────────────────────────────────────────────────────┐
-    This adversarial separation is cheaper and more deterministic than asking a single model to self-critique. The reviewer has a hard rule: on the turn after a <code>propose_done</code>, it MUST emit either <code>approve_done</code> or <code>drift</code> — it cannot stall with <code>continue</code>.
+  │  attempt 1  ollama_cloud / kimi-k2:1t           1T params · flagship │
  │  attempt 2  ollama_cloud / qwen3-coder:480b     coding specialist    │
  │  attempt 3  ollama_cloud / deepseek-v3.1:671b   reasoning            │
  │  attempt 4  ollama_cloud / mistral-large-3:675b deep analysis        │
  │  attempt 5  ollama_cloud / gpt-oss:120b         reliable workhorse   │
  │  attempt 6  ollama_cloud / qwen3.5:397b         dense final thinker  │
  │  attempt 7  openrouter   / openai/gpt-oss-120b:free  rescue tier     │
  │  attempt 8  openrouter   / google/gemma-3-27b-it:free fastest rescue │
  │  attempt 9  ollama       / qwen3.5:latest       last-resort local    │
  └───────────────┬───────────────────────────────────────────────────┘
                  │  isAcceptable() = chars ≥ 3800 ∧ not malformed JSON
                  ▼
            sealed result OR next-rung learning preamble</pre>
  </div>
  <div class="narr">Every rung sees a learning preamble carrying the prior rejection reason. The ladder is the standard scrum/auditor path; for individual <code>/v1/chat</code> calls the caller picks the model directly (or lets the smart-routing default fire).</div>
  <div class="ref"><strong>Code:</strong> tests/real-world/scrum_master_pipeline.ts <code>const LADDER</code> · config/routing.toml · crates/gateway/src/v1/mode.rs (mode runner)</div>
-  <h3>Why "parallel" — orchestrator can fan out</h3>
+  <h3>N=3 consensus + tie-breaker (auditor inference)</h3>
  <div class="narr">
    <strong>Independent pairs run concurrently.</strong> <code>tests/multi-agent/run_e2e_rated.ts</code> runs two task-specific agent pairs via <code>Promise.all</code>. Ollama serializes inference at the model level, so "parallel" is concurrent orchestration — but the substrate (gateway, queryd, vectord) handles concurrent requests cleanly. Verified in the scenario harness: two contracts sealing simultaneously.
  </div>
  <h3>Why "recursive" — each seal feeds the next</h3>
  <div class="narr">
    <strong>Consensus does not end at the sealed playbook.</strong> Every sealed playbook is persisted to <code>playbook_memory</code> via <code>POST /vectors/playbook_memory/seed</code>. The next hybrid search for a semantically similar operation consults that memory via <code>compute_boost_for(query_embedding, top_k, base_weight)</code> and re-ranks the candidate pool. The system builds on itself turn over turn, playbook over playbook.
  </div>
  <h3>Termination guarantees</h3>
  <div class="math">
-    <span class="c">// three paths out, every run has one of these:</span><br>
+    <span class="c">// auditor/checks/inference.ts — every claim audit runs this:</span><br>
-    sealed = executor.propose_done ∧ reviewer.approve_done ∧ fills.count == target<br>
+    1. Fire the primary reviewer N=3 times in PARALLEL (Promise.all) — wall-clock = single call<br>
-    abort  = consecutive_tool_errors ≥ MAX_TOOL_ERRORS (3) &nbsp;&nbsp;<span class="c">// executor can't form a valid call</span><br>
+    2. Aggregate votes per claim_idx · majority wins<br>
-    abort  = consecutive_drifts ≥ MAX_CONSECUTIVE_DRIFTS (3) &nbsp;<span class="c">// reviewer keeps flagging</span><br>
+    3. On 1-1-1 split → tie-breaker model with <strong>different architecture</strong> (qwen3-coder:480b vs primary gpt-oss/kimi)<br>
-    abort  = turn &gt; MAX_TURNS (12) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="c">// no consensus reached in window</span>
+    4. Every disagreement (even when majority resolves) → <code>data/_kb/audit_discrepancies.jsonl</code><br>
    <br>
    <span class="c">// Closes the cloud-non-determinism gap: temp=0 isn't actually deterministic in practice</span><br>
    <span class="c">// across hours; consensus + cross-architecture tie-break stabilizes verdicts.</span>
  </div>
-  <div class="narr">Every abort dumps the full log to <code>tests/multi-agent/playbooks/&lt;id&gt;-FAILED.json</code> for forensic review. No consensus is ever implicit.</div>
+
  <h3>Auditor cross-lineage — Kimi ↔ Haiku ↔ Opus</h3>
  <div class="narr">Every push to PR #11 triggers <code>auditor/audit.ts</code> within ~90s. To prevent a single model lineage's blind spots from becoming the system's blind spots, audits alternate between Kimi K2.6 (Moonshot) and Haiku 4.5 (Anthropic) by SHA. Diffs over 100k chars auto-promote to Claude Opus 4.7. Per-PR cap of 3 audits with auto-reset on each new head SHA prevents infinite-loop spend. <strong>100% grounding-verified rate</strong> on Haiku 4.5 across the latest 10 findings — pairing different lineages + forcing per-finding grounding kills confabulation.</div>
  <div class="ref"><strong>Code:</strong> auditor/audit.ts · auditor/checks/inference.ts (N=3) · auditor/checks/kimi_architect.ts · <strong>Verdicts:</strong> data/_auditor/kimi_verdicts/ — read any 11-&lt;sha&gt;.json to inspect a real audit</div>
  <h3>Distillation v1.0.0 — the frozen substrate</h3>
  <div class="narr">The substrate the auditor and mode runner sit on is tagged at <code>distillation-v1.0.0</code> / commit <code>e7636f2</code>. <strong>145 unit tests pass · 22/22 acceptance invariants · 16/16 audit-full checks · bit-identical reproducibility verified.</strong> The distillation phase exports clean SFT / RAG / preference samples with a multi-layer contamination firewall; the auditor consumes the substrate. The frozen tag means: any future "the system regressed" question has a baseline to bisect against, byte-for-byte.</div>
  <div class="ref"><strong>Tag:</strong> distillation-v1.0.0 · <strong>Commit:</strong> e7636f2 · <strong>Substrate code:</strong> scripts/distillation/ · auditor/schemas/distillation/ · <strong>Output:</strong> data/_kb/distilled_{facts,procedures,config_hints}.jsonl</div>
 </div>
 <div class="chapter">
  <div class="num">Chapter 4</div>
-  <h2>Playbook memory — the compounding feedback loop</h2>
+  <h2>Two memory layers — playbook (worker signal) + pathway (system signal)</h2>
-  <div class="lede">A CRM stores events. This system turns events into re-ranking signal. Every sealed playbook endorses specific (worker, city, state) tuples. Every failure penalizes them. Every similar future query inherits the signal through cosine similarity.</div>
+  <div class="lede">A CRM stores events. This system turns events into re-ranking signal at two layers. <strong>Playbook memory</strong> compounds worker-level outcomes (who got endorsed, where, when) into per-query boost. <strong>Pathway memory</strong> compounds system-level outcomes (which model + corpus + framing actually solved similar problems) into per-task hot-swap. Both are queryable. Both are auditable. Both compound.</div>
  <h3>Layer 1 — playbook memory (worker + geo signal)</h3>
  <h3>Seed shape</h3>
  <div class="math">
@ -289,10 +289,82 @@ pre{background:#161b22;border:1px solid #171d27;border-radius:8px;padding:14px 1
    <strong>Beyond "who was endorsed."</strong> <code>POST /vectors/playbook_memory/patterns</code> takes a query, finds top-K similar past playbooks, pulls each endorsed worker's full workers_500k profile, and aggregates shared traits: recurring certifications, skill frequencies, modal archetype, reliability distribution. Returns a <code>discovered_pattern</code> string showing operator-actionable signal the user didn't explicitly query for.
  </div>
  <div class="ref"><strong>Code:</strong> crates/vectord/src/playbook_memory.rs::discover_patterns &nbsp;·&nbsp; <strong>Surfaces:</strong> /vectors/playbook_memory/patterns endpoint, /intelligence/chat response, /intelligence/permit_contracts cards</div>
  <h3>Layer 2 — pathway memory (system-level hot-swap, ADR-021)</h3>
  <div class="narr">
    <strong>Pathway memory remembers which approach worked, not just which worker.</strong> Every accepted scrum review writes a <code>PathwayTrace</code> with the full backtrack: file fingerprint, model used, signal class, KB chunks consulted, observer events, semantic flags, bug fingerprints. A new query that fingerprints to the same trace can hot-swap to the prior result without re-running the 9-rung escalation. The 5-factor hot-swap gate is strict: narrow fingerprint match AND audit consensus pass AND replay_count ≥ 3 (probation) AND success_rate ≥ 0.80 AND NOT retired AND vector cosine ≥ 0.90.
  </div>
  <div class="math">
    <span class="c">// Live pathway state (refresh page to recompute):</span><br>
    <span id="pwm-traces">— traces</span> · <span id="pwm-replays">—</span> successful replays · <span id="pwm-rate">—</span> reuse rate<br>
    <span class="c">// 88 / 11/11 / 100% as of 2026-04-27 — probation gate crossed</span>
  </div>
  <div class="ref"><strong>Code:</strong> crates/vectord/src/pathway_memory.rs · <strong>Endpoints:</strong> /vectors/pathway/insert · /query · /record_replay · /stats · /bug_fingerprints · <strong>Spec:</strong> docs/DECISIONS.md ADR-021 — Semantic-correctness matrix layer</div>
  <h3>What both memory layers feed (besides search)</h3>
  <div class="narr">
    Both layers also feed the <strong>per-staffer hot-swap index</strong> (Chapter 5) and the <strong>Construction Activity Signal Engine</strong> (Chapter 6). One memory model, surfaced three different ways at the request boundary depending on who's asking.
  </div>
 </div>
 <div class="chapter">
  <div class="num">Chapter 5</div>
  <h2>Per-staffer hot-swap — same corpus, different relevance gradient</h2>
  <div class="lede">Maria runs Chicago. Devon runs Indianapolis. Aisha runs Wisconsin/Michigan. They share one corpus, but the search results, the recurring-skill patterns, and the playbook context all reshape to whoever is acting. Same query "forklift operators" returns 89 IN workers when Devon's acting, 16 WI when Aisha's, 167 IL when Maria's. The MEMORY panel relabels itself with the active coordinator's name.</div>
  <h3>What scopes per staffer</h3>
  <div class="math">
    <span class="c">// On every /intelligence/chat call:</span><br>
    if (b.staffer_id) {<br>
    &nbsp;&nbsp;const staffer = lookupStaffer(b.staffer_id);<br>
    &nbsp;&nbsp;<span class="c">// 1. Default state filter to staffer territory unless caller pinned one</span><br>
    &nbsp;&nbsp;if (!explicitState) filters.push(`state = '${staffer.territory.state}'`);<br>
    &nbsp;&nbsp;<span class="c">// 2. Default playbook-pattern geo to staffer's primary city/state</span><br>
    &nbsp;&nbsp;cityForPatterns = staffer.territory.cities[0];<br>
    &nbsp;&nbsp;stateForPatterns = staffer.territory.state;<br>
    &nbsp;&nbsp;<span class="c">// 3. Surface staffer.name back so the UI can relabel MEMORY → MARIA'S MEMORY</span><br>
    &nbsp;&nbsp;response.staffer = { id, name, territory };<br>
    }
  </div>
  <div class="narr">
    The corpus stays intact. The relevance gradient is per coordinator. As each accumulates fills, their slice of the playbook compounds independently. The architecture generalizes — every new metro adds territories, not code paths.
  </div>
  <div class="ref"><strong>Code:</strong> mcp-server/index.ts <code>STAFFERS</code> roster + <code>lookupStaffer()</code> · <code>/staffers</code> endpoint · <code>/intelligence/chat</code> smart_search route · <strong>UI:</strong> staffer dropdown in mcp-server/search.html</div>
 </div>
 <div class="chapter">
  <div class="num">Chapter 6</div>
  <h2>Construction Activity Signal Engine — the corpus is also a market signal</h2>
  <div class="lede">Every contractor in this corpus is also a forward indicator on the public equities they touch. Permits filed today predict construction starts ~45 days out, staffing ~30, revenue recognition months later. The associated-ticker network surfaces this signal <em>before</em> any 10-Q. The architecture is metro-agnostic — Chicago is Phase 1; NYC DOB, LA County, Houston BCD, Boston ISD ship as Socrata-shaped adapters.</div>
  <h3>Three flavors of attribution</h3>
  <div class="math">
    <span class="c">// per contractor in /intelligence/profiler_index:</span><br>
    direct      <span class="c">// contractor IS a public issuer → SEC tickers index match</span><br>
    parent      <span class="c">// curated KNOWN_PARENT_MAP — Turner → HOC.DE via Hochtief AG</span><br>
    associated  <span class="c">// co-permit network — Bob's Electric appears with TARGET CORPORATION</span><br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="c">// 3+ times → inherits TGT as an associated indicator</span>
  </div>
  <div class="narr">
    The associated path is the moat. A staffing-permit dataset that maps contractor-to-public-issuer is not commercially available; we synthesize it from the Socrata co-occurrence graph. Every additional metro multiplies edges.
  </div>
  <h3>Building Activity Index (BAI)</h3>
  <div class="math">
    <span class="c">// BAI = attribution-weighted average day-change across surfaced issuers:</span><br>
    BAI = Σ (day_change_pct × attribution_count) / Σ attribution_count<br>
    <br>
    <span class="c">// Indexed build value = total $ of permits attributable to ANY public issuer</span><br>
    <span class="c">// Network depth = issuers / total attribution edges</span>
  </div>
  <div class="narr">
    Run BAI daily, save the series, and you've got a backtestable thesis in months. Today's surface is Chicago-only with ~9 issuers; the curve scales linearly with metros added — and the marginal cost of a new metro is one Socrata adapter.
  </div>
  <div class="ref"><strong>Code:</strong> mcp-server/index.ts <code>/intelligence/profiler_index</code> + <code>/intelligence/ticker_quotes</code> · entity.ts <code>lookupTickerLite()</code> · <code>fetchStooqQuote()</code> · <strong>UI:</strong> /profiler · <strong>Data sources:</strong> SEC company_tickers.json (in-memory index) + Stooq CSV API + curated parent-link map</div>
 </div>
 <div class="chapter">
  <div class="num">Chapter 7</div>
  <h2>Key architectural choices — what was picked and why</h2>
  <div class="lede">Each choice is documented in <code>docs/DECISIONS.md</code> (Architecture Decision Records). If you dispute any of these, the ADR names the alternatives we rejected and the measurement that drove the call.</div>
  <div class="card">
@ -314,62 +386,95 @@ pre{background:#161b22;border:1px solid #171d27;border-radius:8px;padding:14px 1
    <div class="row accent-r">
      <div style="flex:1"><div class="title">ADR-020 · Idempotent register() with schema-fingerprint gate</div><div class="meta">Same (name, fingerprint) reuses manifest. Different fingerprint = 409 Conflict. Prevents silent duplicate manifests. Cleanup run collapsed 374 → 31 datasets.</div></div>
    </div>
    <div class="row accent-r">
      <div style="flex:1"><div class="title">ADR-021 · Semantic-correctness matrix layer</div><div class="meta">Pathway memory carries semantic flags (UnitMismatch, TypeConfusion, OffByOne, StaleReference, DeadCode, BoundaryViolation, …) on every trace. New reviews see prior bug fingerprints as a preamble; recurrent classes get caught on first read. Compounds across files in the same crate.</div></div>
    </div>
    <div class="row accent-l">
      <div style="flex:1"><div class="title">Phase 19 design note · Statistical + semantic, not neural</div><div class="meta">Meta-index is cosine similarity + endorsement aggregation. No model training. Rebuildable from <code>successful_playbooks</code> alone. Neural re-ranker deferred to Phase 20+ only if statistical floor plateaus.</div></div>
    </div>
    <div class="row accent-l">
      <div style="flex:1"><div class="title">Distillation freeze · v1.0.0 at e7636f2</div><div class="meta">145 tests · 22/22 acceptance · 16/16 audit-full · bit-identical reproducibility. Multi-layer contamination firewall on SFT exports. Substrate the auditor + mode runner sit on; "the system regressed" questions bisect against this anchor.</div></div>
    </div>
  </div>
 </div>
 <div class="chapter">
-  <div class="num">Chapter 6</div>
+  <div class="num">Chapter 8</div>
  <h2>Measured at scale, on this machine</h2>
-  <div class="lede">Hardware: i9 + 128GB RAM + Nvidia A4000 16GB VRAM. Numbers below are from <em>this</em> running instance. Refresh the page and they'll recompute.</div>
+  <div class="lede">Hardware: i9 + 128GB RAM + Nvidia A4000 16GB VRAM + 2.5GB symmetric. Numbers below are from <em>this</em> running instance. Refresh the page and they'll recompute.</div>
  <div class="grid" id="ch6-scale"><div class="loading">Loading scale data…</div></div>
  <div id="ch6-recall" style="margin-top:10px"></div>
 </div>
 <div class="chapter">
-  <div class="num">Chapter 7</div>
+  <div class="num">Chapter 9</div>
  <h2>Verify or dispute — reproduce it yourself</h2>
-  <div class="lede">Every claim below is a curl away from falsification.</div>
+  <div class="lede">Every claim above is a curl away from falsification.</div>
  <div class="card">
-    <div class="narr"><strong>Health.</strong> Should return <code>lakehouse ok</code>.</div>
+    <div class="narr"><strong>Gateway health.</strong> Returns provider matrix + worker count.</div>
-    <pre>curl http://localhost:3100/health</pre>
+    <pre>curl -s http://localhost:3100/v1/health | jq</pre>
    <div class="narr"><strong>Any SQL on multi-million-row Parquet.</strong> Sub-100ms typical.</div>
    <pre>curl -s -X POST http://localhost:3100/query/sql \
  -H 'Content-Type: application/json' \
  -d '{"sql":"SELECT role, COUNT(*) FROM workers_500k WHERE state=\"IL\" GROUP BY role LIMIT 5"}'</pre>
-    <div class="narr"><strong>Hybrid search with playbook boost.</strong> The whole Phase 19 feedback loop in one request.</div>
+    <div class="narr"><strong>Hybrid search with playbook boost.</strong> SQL filter + vector rerank + playbook memory in one call.</div>
    <pre>curl -s -X POST http://localhost:3100/vectors/hybrid \
  -H 'Content-Type: application/json' \
  -d '{"index_name":"workers_500k_v1",
       "sql_filter":"role = '\''Forklift Operator'\'' AND city = '\''Chicago'\'' AND CAST(availability AS DOUBLE) > 0.5",
       "question":"reliable forklift operator",
       "top_k":5,"use_playbook_memory":true,"playbook_memory_k":200}'</pre>
-    <div class="narr"><strong>Playbook memory stats.</strong> Count + endorsed names + sample.</div>
+    <div class="narr"><strong>Pathway memory stats.</strong> System-level hot-swap signal — should show 88 traces / 11 replays / 100% reuse rate (probation gate crossed).</div>
-    <pre>curl http://localhost:3100/vectors/playbook_memory/stats</pre>
+    <pre>curl -s http://localhost:3100/vectors/pathway/stats | jq</pre>
-    <div class="narr"><strong>Pattern discovery.</strong> What do past similar fills have in common?</div>
+    <div class="narr"><strong>Per-staffer scoping.</strong> Same query, different rosters per coordinator.</div>
-    <pre>curl -s -X POST http://localhost:3100/vectors/playbook_memory/patterns \
+    <pre>for s in maria devon aisha; do
  curl -s -X POST http://localhost:3700/intelligence/chat \
    -H 'Content-Type: application/json' \
    -d "{\"message\":\"forklift operators\",\"staffer_id\":\"$s\"}" \
    | jq -r ".staffer.name + \": \" + (.sql_results | length | tostring) + \" workers, top: \" + (.sql_results[0].name + \" in \" + .sql_results[0].city + \", \" + .sql_results[0].state)"
 done
 # Maria: 167 workers, top: ... in Chicago, IL
 # Devon: 89  workers, top: ... in Fort Wayne, IN
 # Aisha: 16  workers, top: ... in Milwaukee, WI</pre>
    <div class="narr"><strong>Late-worker triage in one shot.</strong> Pulls profile + 5 backfills + drafts SMS. Should respond in under 300ms.</div>
    <pre>curl -s -X POST http://localhost:3700/intelligence/chat \
  -H 'Content-Type: application/json' \
-  -d '{"query":"Forklift Operator in Chicago, IL","top_k_playbooks":25,"min_trait_frequency":0.3}'</pre>
+  -d '{"message":"Marcus running late site 4422"}' | jq</pre>
-    <div class="narr"><strong>Run the dual-agent scenario yourself.</strong> All 5 events, real fills, real artifacts.</div>
+    <div class="narr"><strong>Construction Activity Signal Engine.</strong> Profiler index with attribution, cost, last filed.</div>
    <pre>curl -s -X POST http://localhost:3700/intelligence/profiler_index \
  -H 'Content-Type: application/json' \
  -d '{"limit":10}' \
  | jq '.contractors[] | {name, permits, total_cost, direct: (.tickers.direct | map(.ticker)), associated: (.tickers.associated | map(.ticker + " ←via " + .partner_name))}'</pre>
    <div class="narr"><strong>Live ticker quotes.</strong> Batch Stooq pull for the basket.</div>
    <pre>curl -s -X POST http://localhost:3700/intelligence/ticker_quotes \
  -H 'Content-Type: application/json' \
  -d '{"tickers":["TGT","JPM","BALY","WBA","MCD"]}' | jq .quotes</pre>
    <div class="narr"><strong>Audit trail — read any verdict on PR #11.</strong> Independent claim-vs-diff verifier output.</div>
    <pre>ls /home/profit/lakehouse/data/_auditor/kimi_verdicts/
 # 11-c3c9c2174a91.json  11-ca7375ea2b17.json  11-2d9cb128bf42.json …
 jq '.findings[0:3]' /home/profit/lakehouse/data/_auditor/kimi_verdicts/11-c3c9c2174a91.json</pre>
    <div class="narr"><strong>Distillation acceptance gate.</strong> 22/22 invariants must pass for any commit that touches the substrate.</div>
    <pre>cd /home/profit/lakehouse
-bun run tests/multi-agent/scenario.ts
+bun test auditor/schemas/distillation/ tests/distillation/
-# Output: tests/multi-agent/playbooks/scenario-&lt;timestamp&gt;/report.md</pre>
+# Expect: 145 pass · 0 fail · 372 expect() calls</pre>
  </div>
 </div>
 <div class="chapter">
-  <div class="num">Chapter 8</div>
+  <div class="num">Chapter 10</div>
  <h2>What we are <em>not</em> claiming</h2>
-  <div class="lede">Every impressive-sounding number comes with a footnote. Here are the honest limits.</div>
+  <div class="lede">Every impressive-sounding number comes with a footnote. Here are the honest limits as of 2026-04-27.</div>
  <div class="card">
-    <div class="row accent-a"><div style="flex:1"><div class="title">workers_500k is synthetic.</div><div class="meta">Real client ATS export replaces this table. Schema is deliberately identical to a production ATS.</div></div></div>
+    <div class="row accent-a"><div style="flex:1"><div class="title">workers_500k is synthetic.</div><div class="meta">Real client ATS export replaces this table. Schema is deliberately identical to a production ATS so the swap is config, not code.</div></div></div>
-    <div class="row accent-a"><div style="flex:1"><div class="title">candidates table has 1,000 rows.</div><div class="meta">Intentionally small for demo. call_log references higher candidate_ids that don't cross-reference — this is a dataset alignment issue, not a pipeline issue.</div></div></div>
+    <div class="row accent-a"><div style="flex:1"><div class="title">candidates table is light at 1,000 rows.</div><div class="meta">Intentionally small. Live PII-safe view layer is built; replacing the small table with a 100K+ ATS is a one-line config flip.</div></div></div>
-    <div class="row accent-b"><div style="flex:1"><div class="title">Chicago permit data is real.</div><div class="meta">Pulled live from data.cityofchicago.org/resource/ydr8-5enu.json (Socrata API). Not synthetic. Not cached.</div></div></div>
+    <div class="row accent-b"><div style="flex:1"><div class="title">Chicago permit data is real.</div><div class="meta">Pulled live from data.cityofchicago.org/resource/ydr8-5enu.json (Socrata). Not synthetic. Not cached. Verifiable address-by-address.</div></div></div>
-    <div class="row accent-l"><div style="flex:1"><div class="title">Playbook memory is seeded from demo runs.</div><div class="meta">The pipeline that seeds it is identical to what a live recruiter would trigger via /log. Same code path.</div></div></div>
+    <div class="row accent-l"><div style="flex:1"><div class="title">Playbook memory is seeded from demo runs.</div><div class="meta">Same code path that seeds in production: every /log from the recruiter UI triggers seed → persist_sql. Demo seeds use the same shape as live operations.</div></div></div>
-    <div class="row accent-w"><div style="flex:1"><div class="title">Local 7B models (mistral, qwen2.5) are imperfect.</div><div class="meta">They occasionally malform tool calls or drop fields. Multi-agent scenarios seal roughly 40-80% in one run. Larger models or constrained decoding would improve this. Not a substrate problem.</div></div></div>
+    <div class="row accent-l"><div style="flex:1"><div class="title">Pathway memory probation gate is crossed.</div><div class="meta">88 traces, 11 replays, 11 successful, 100% reuse rate. Any pathway that fails to clear ≥0.80 success_rate after ≥3 replays gets retired automatically (sticky flag prevents oscillation).</div></div></div>
    <div class="row accent-w"><div style="flex:1"><div class="title">SEC name-to-ticker fuzzy matcher has rare false positives.</div><div class="meta">For names with no clean SEC match the matcher occasionally surfaces a same-keyword small-cap (saw FLG attach to a PNC-adjacent contractor once). Kept conservative — minimum 2 non-stopword overlap. Tightenable to require explicit allow-list for production trading use.</div></div></div>
    <div class="row accent-r"><div style="flex:1"><div class="title">12 awaiting public-data sources are placeholders.</div><div class="meta">DOL Wage &amp; Hour, EPA ECHO, MSHA, BBB, PACER, UCC liens, D&amp;B, etc. — listed by name on every contractor profile with a one-line "would show:" sample. Not yet wired. Each ships as a Socrata-style adapter; engineering scope is concrete.</div></div></div>
    <div class="row accent-r"><div style="flex:1"><div class="title">No rate/margin awareness yet.</div><div class="meta">Worker pay expectations vs contract bill rates are not modeled. Flagged as a Phase 20 item; no architectural blocker.</div></div></div>
    <div class="row accent-r"><div style="flex:1"><div class="title">BAI is a thesis, not a backtested signal.</div><div class="meta">The Building Activity Index is computed live from current attribution + day-change. To have a backtestable thesis we need the daily series saved over months. Architectural support is there (data/_kb/audit_baselines.jsonl pattern); just hasn't been running long enough.</div></div></div>
    <div class="row accent-r"><div style="flex:1"><div class="title">Single-metro today.</div><div class="meta">Chicago via Socrata. NYC DOB, LA County, Houston BCD, Boston ISD, DC DCRA all use Socrata-equivalent APIs — adapters are config-only. Each new metro multiplies the network without multiplying the codebase.</div></div></div>
  </div>
 </div>
@ -394,8 +499,72 @@ function apiPost(path, body){
 window.addEventListener('load',function(){
  loadLiveSections();
  loadPathwayLive();
  loadSignalLive();
 });
 // Pathway memory live counters in Chapter 4 — small inline spans.
 function loadPathwayLive(){
  fetch(A+'/api/vectors/pathway/stats').then(function(r){return r.json()}).then(function(p){
    if(!p) return;
    var t=document.getElementById('pwm-traces');
    var r=document.getElementById('pwm-replays');
    var rate=document.getElementById('pwm-rate');
    if(t) t.textContent = (p.total_pathways||0) + ' traces';
    if(r) r.textContent = (p.successful_replays||0) + '/' + (p.total_replays||0);
    if(rate) rate.textContent = Math.round((p.replay_success_rate||0)*100) + '%';
  }).catch(function(){});
 }
 // Live tile under Chapter 1 — what the signal engine sees in this view.
 function loadSignalLive(){
  apiPost('/intelligence/profiler_index',{limit:200}).then(function(d){
    var host=document.getElementById('ch1-live');if(!host) return;
    host.textContent='';
    var rows=d.contractors||[];
    if(!rows.length) return;
    // Aggregate basket
    var byTk={};
    rows.forEach(function(r){
      var ts=(r.tickers&&r.tickers.direct?r.tickers.direct:[]).concat(r.tickers&&r.tickers.associated?r.tickers.associated:[]);
      ts.forEach(function(t){
        if(!t||!t.ticker) return;
        if(!byTk[t.ticker]) byTk[t.ticker]={kinds:[],count:0};
        byTk[t.ticker].count++;
        if(byTk[t.ticker].kinds.indexOf(t.via)<0) byTk[t.ticker].kinds.push(t.via);
      });
    });
    var basket=Object.values(byTk);
    var attribCost=rows.reduce(function(s,r){
      var ts=(r.tickers&&r.tickers.direct?r.tickers.direct:[]).concat(r.tickers&&r.tickers.associated?r.tickers.associated:[]);
      return s + (ts.length>0 ? (r.total_cost||0) : 0);
    },0);
    if(!basket.length) return;
    var card=el('div','card accent-l');
    var hdr=el('div',null,'LIVE — Construction Activity Signal Engine');
    hdr.style.cssText='font-size:10px;color:#3fb950;text-transform:uppercase;letter-spacing:1.4px;font-weight:700;margin-bottom:8px';
    card.appendChild(hdr);
    var line=document.createElement('div');
    line.style.cssText='display:flex;gap:24px;flex-wrap:wrap;font-size:13px';
    function block(num,lab){
      var b=document.createElement('div');
      var n=document.createElement('div');n.style.cssText='font-size:18px;font-weight:700;color:#e6edf3;font-family:ui-monospace,monospace';n.textContent=num;
      var l=document.createElement('div');l.style.cssText='font-size:10px;color:#545d68;text-transform:uppercase;letter-spacing:1.2px;font-weight:600';l.textContent=lab;
      b.appendChild(n);b.appendChild(l);return b;
    }
    var bav = attribCost>=1e9?'$'+(attribCost/1e9).toFixed(2)+'B':attribCost>=1e6?'$'+(attribCost/1e6).toFixed(0)+'M':'$'+Math.round(attribCost/1e3)+'K';
    line.appendChild(block(basket.length+'', 'Public issuers in scope'));
    line.appendChild(block(bav, 'Attributed build value'));
    line.appendChild(block(rows.length+'', 'Contractors indexed'));
    line.appendChild(block(basket.reduce(function(s,b){return s+b.count},0)+'', 'Attribution edges'));
    card.appendChild(line);
    var note=el('div',null,'Computed live from /intelligence/profiler_index in '+(d.duration_ms||0)+'ms · click any of the chapter-9 curl lines to verify');
    note.style.cssText='font-size:11px;color:#545d68;margin-top:10px;font-family:ui-monospace,monospace';
    card.appendChild(note);
    host.appendChild(card);
  }).catch(function(){});
 }
 function loadLiveSections(){
  apiPost('/proof.json',{}).then(function(r){
    var host1=document.getElementById('ch1-tests');host1.textContent='';
--- a/mcp-server/role_scenes.ts
+++ b/mcp-server/role_scenes.ts
@ -0,0 +1,92 @@
 // Server-side mirror of search.html's ROLE_BANDS regex table.
 // Each band carries a *visual scene* — clothing + immediate backdrop —
 // so ComfyUI produces role-coherent headshots instead of interchangeable
 // studio portraits. The front-end sends the raw role string in the
 // query (?role=Forklift%20Operator); the server resolves it to a band
 // and looks up the scene here.
 export type RoleBand =
  | "warehouse"
  | "production"
  | "trades"
  | "driver"
  | "lead";
 export interface SceneDef {
  band: RoleBand;
  // Free-form clause inserted into the diffusion prompt AFTER
  // "[age]-year-old [race] [gender] [role], ". Should describe what
  // they're wearing and what is immediately behind them. Keep under
  // ~25 words — SDXL Turbo loses focus on longer prompts and starts
  // hallucinating cartoon hands.
  scene: string;
 }
 const RE_BANDS: { re: RegExp; band: RoleBand }[] = [
  { re: /forklift|warehouse|associate|material\s*handler|loader|loading|packag|shipping|logistics|inventory|sanitation|janit/i, band: "warehouse" },
  { re: /production|assembl|quality/i, band: "production" },
  { re: /welder|weld|electric|maint(enance)?\s*tech|cnc|machine\s*op|hvac|plumb|carpenter|mason|tool\s*&\s*die/i, band: "trades" },
  { re: /driver|truck|haul|cdl/i, band: "driver" },
  { re: /line\s*lead|supervisor|foreman|coordinator|lead\b/i, band: "lead" },
 ];
 export function roleBand(role: string): RoleBand {
  const r = (role || "").trim();
  if (!r) return "warehouse";
  for (const b of RE_BANDS) if (b.re.test(r)) return b.band;
  return "warehouse";
 }
 // TODO J — refine these. Each `scene` string lands directly in the
 // diffusion prompt. Tone target: a coordinator glances at the card
 // and recognizes the role from the photo before reading the role pill.
 //
 // Things that work well in SDXL Turbo at 8 steps:
 //   - One concrete clothing item ("high-visibility yellow vest")
 //   - One concrete prop ("hard hat hanging from belt", "tablet in hand")
 //   - One blurred background element ("warehouse pallet aisle behind",
 //     "factory machinery softly out of focus")
 //   - Avoid: text/logos (rendered as scribble), specific brands, hands
 //     holding tools (often distorts), full-body language ("standing",
 //     "leaning") — model is trained on portrait crops.
 //
 // Each scene now bakes "monochrome black and white photography" into
 // the prompt so the model produces native B&W output rather than us
 // applying CSS grayscale post-hoc. SDXL Turbo handles B&W natively
 // with strong tonal range — better than desaturating a color render.
 export const SCENES: Record<RoleBand, SceneDef> = {
  warehouse: {
    band: "warehouse",
    scene: "wearing a high-visibility safety vest over a t-shirt, hard hat visible, blurred warehouse pallet aisle behind, soft natural light, monochrome black and white photography, fine film grain, documentary portrait style",
  },
  production: {
    band: "production",
    scene: "wearing a work shirt with safety glasses on forehead, blurred factory machinery softly out of focus behind, fluorescent overhead lighting, monochrome black and white photography, fine film grain, documentary portrait style",
  },
  trades: {
    band: "trades",
    scene: "wearing a heavy-duty work shirt with rolled sleeves, blurred workshop tool wall behind, focused tungsten lighting, monochrome black and white photography, fine film grain, documentary portrait style",
  },
  driver: {
    band: "driver",
    scene: "wearing a polo shirt, lanyard with ID badge visible, blurred truck cab or loading dock behind, daylight, monochrome black and white photography, fine film grain, documentary portrait style",
  },
  lead: {
    band: "lead",
    scene: "wearing a button-down shirt, tablet held casually at chest level, blurred warehouse floor in soft focus behind, professional lighting, monochrome black and white photography, fine film grain, documentary portrait style",
  },
 };
 // v2 — baked B&W + 1024×1024 render canvas (4× pixels of v1). Larger
 // source means downsampling to a 40px avatar packs more detail per
 // displayed pixel, hiding the diffusion-y micro-textures that read as
 // "AI generated" at small sizes. Server route reads pool from
 // data/headshots_role_pool/{SCENES_VERSION}/... so v1 stays available
 // for rollback / A-B comparison.
 export const SCENES_VERSION = "v2";
 // Default render dimensions used by both the on-demand /headshots/
 // generate/:key route and the offline render_role_pool.py script. v1
 // used 512²; v2 doubles to 1024² (linear 2× = 4× pixels = ~3× GPU
 // time on SDXL Turbo).
 export const FACE_RENDER_DIM = 1024;
--- a/mcp-server/search.html
+++ b/mcp-server/search.html
--- a/mcp-server/spec.html
+++ b/mcp-server/spec.html
@ -78,13 +78,14 @@ table.plain tr:hover td{background:#0d1117}
  <nav>
    <a href=".">Dashboard</a>
    <a href="console">Walkthrough</a>
    <a href="profiler">Profiler</a>
    <a href="proof">Architecture</a>
    <a href="spec" class="active">Spec</a>
    <a href="onboard">Onboard</a>
    <a href="alerts">Alerts</a>
    <a href="workspaces">Workspaces</a>
  </nav>
-  <div class="rt">v1 · 2026-04-20</div>
+  <div class="rt">v3 · 2026-04-27</div>
 </div>
 <div class="layout">
@ -120,14 +121,18 @@ table.plain tr:hover td{background:#0d1117}
 <tr><td class="mono">crates/vectord/</td><td>The vector + learning surface. Embeddings stored as Parquet (ADR-008), HNSW index (Phase 15), trial system (autotune), promotion registry (Phase 16), playbook_memory (Phase 19). Core feedback loop lives here.</td></tr>
 <tr><td class="mono">crates/vectord-lance/</td><td>Firewall crate. Lance 4.0 + Arrow 57, isolated from the main Arrow-55 workspace. Provides secondary vector backend for large-scale, random-access, and append-heavy workloads (ADR-019).</td></tr>
 <tr><td class="mono">crates/journald/</td><td>Append-only mutation event log (ADR-012). Every insert/update/delete writes here — who, when, what, old/new value. Never mutated. Foundation for time-travel + compliance audit.</td></tr>
-<tr><td class="mono">crates/aibridge/</td><td>Rust ↔ Python sidecar. HTTP client over FastAPI wrapper around Ollama. VRAM introspection via nvidia-smi. All LLM calls (embed, generate, rerank) flow through here.</td></tr>
+<tr><td class="mono">crates/truth/</td><td>File-backed rule store. <code>evaluate(task_class, ctx) → Vec&lt;RuleOutcome&gt;</code> (ADR-021 — semantic-correctness matrix layer). Loaded from <code>truth/*.toml</code> at gateway boot.</td></tr>
-<tr><td class="mono">crates/gateway/</td><td>Axum HTTP (:3100) + gRPC (:3101). Auth middleware, tools registry (Phase 12 — governed actions), CORS. Every external request enters here.</td></tr>
+<tr><td class="mono">crates/aibridge/</td><td>Rust ↔ Python sidecar + provider adapter trait. HTTP client over FastAPI wrapper around Ollama for local; <code>ProviderAdapter</code> dispatch for cloud (ollama_cloud, openrouter, opencode, kimi). VRAM introspection via nvidia-smi. All LLM calls flow through here.</td></tr>
 <tr><td class="mono">crates/gateway/</td><td>Axum HTTP (:3100) + gRPC (:3101). OpenAI-compat <code>/v1/*</code> (drop-in middleware), mode runner (<code>/v1/mode/execute</code>), validator (<code>/v1/validate</code>), iterate loop (<code>/v1/iterate</code>), tools registry, cost telemetry, Langfuse + observer fan-out on every chat. Every external request enters here.</td></tr>
 <tr><td class="mono">crates/validator/</td><td>Phase 43 production validator. Schema / completeness / consistency / policy gates over LLM outputs. <code>FillValidator</code>, <code>EmailValidator</code>, <code>ParquetWorkerLookup</code> (loads workers_500k.parquet at boot). Fail-closed when roster absent.</td></tr>
 <tr><td class="mono">crates/ui/</td><td>Dioxus WASM developer UI. Internal tool. Not exposed externally.</td></tr>
-<tr><td class="mono">mcp-server/</td><td>Bun/TypeScript recruiter-facing app. Serves <code>devop.live/lakehouse</code>. Routes: <code>/search /match /log /log_failure /clients/:c/blacklist /intelligence/* /memory/query /models/matrix /system/summary</code>. Observer sibling at <code>observer.ts</code> with HTTP listener on :3800 for scenario event ingest. Proxies to the Rust gateway for heavy work.</td></tr>
+<tr><td class="mono">mcp-server/</td><td>Bun/TypeScript public-facing app + MCP tool surface. Serves <code>devop.live/lakehouse</code>. Pages: dashboard / console / profiler / contractor / proof / spec / onboard / alerts / workspaces. Routes: <code>/search /match /log /log_failure /clients/:c/blacklist /intelligence/* /staffers /memory/query /models/matrix /system/summary</code>. Observer sibling at <code>observer.ts</code> on :3800 for event ingest.</td></tr>
-<tr><td class="mono">tests/multi-agent/</td><td>Dual-agent scenario harness + memory stack. <code>agent.ts</code> (prompts, continuation + tree-split primitives, cloud routing), <code>orchestrator.ts</code>, <code>scenario.ts</code> (contracts + staffer + tool_level), <code>kb.ts</code> (KB indexing, competence scoring, neighbor retrieval), <code>normalize.ts</code> (input normalizer — structured / regex / LLM), <code>memory_query.ts</code> (unified /memory/query), <code>gen_scenarios.ts</code> + <code>gen_staffer_demo.ts</code> (corpus generators), <code>run_e2e_rated.ts</code>, <code>chain_of_custody.ts</code>. Unit tests colocated (<code>kb.test.ts</code>, <code>normalize.test.ts</code>).</td></tr>
+<tr><td class="mono">auditor/</td><td>External claim-vs-diff verifier on PRs. Polls Gitea for open PRs, builds adversarial prompt from PRD invariants + staffing matrix, alternates Kimi K2.6 ↔ Haiku 4.5 by SHA, auto-promotes Claude Opus 4.7 on diffs &gt;100k chars. Per-PR cap=3 with auto-reset on each new head SHA. Verdicts at <code>data/_auditor/kimi_verdicts/</code>.</td></tr>
-<tr><td class="mono">config/</td><td><code>models.json</code> — authoritative 5-tier model matrix (T1 hot local / T2 review local / T3 overview cloud / T4 strategic / T5 gatekeeper). Per-tier context_window + context_budget + overflow_policy. Read at runtime by scenario.ts; hot-swap friendly.</td></tr>
+<tr><td class="mono">tests/multi-agent/</td><td>Multi-agent scenario harness + memory stack. <code>agent.ts</code>, <code>scenario.ts</code> (contracts + staffer + tool_level), <code>kb.ts</code> (KB indexing, competence scoring), <code>normalize.ts</code>, <code>memory_query.ts</code>, <code>run_e2e_rated.ts</code>. Unit tests colocated.</td></tr>
-<tr><td class="mono">docs/</td><td><code>PRD.md</code>, <code>PHASES.md</code>, <code>DECISIONS.md</code> (20 ADRs). Every significant architectural choice has an ADR with the alternatives that were rejected and why.</td></tr>
+<tr><td class="mono">scripts/distillation/</td><td>Distillation substrate v1.0.0 (frozen at tag <code>distillation-v1.0.0</code> / commit <code>e7636f2</code>). 145 unit tests, 22/22 acceptance, 16/16 audit-full, bit-identical reproducibility. Multi-layer contamination firewall on SFT exports.</td></tr>
-<tr><td class="mono">data/</td><td>Default local object store. Parquet files per dataset, append-log batches, HNSW trial journals, promotion registries, <code>_playbook_memory/state.json</code> (now with retirement fields — Phase 25), catalog manifests. Plus four learning-loop directories: <code>_kb/</code> (signatures, outcomes, recommendations, error_corrections, config_snapshots, staffers), <code>_playbook_lessons/</code> (T3 cross-day lessons archived per run), <code>_observer/ops.jsonl</code> (append journal, durable scenario outcome stream), <code>_chunk_cache/</code> (spec'd for Phase 21 Rust port). Rebuildable from repo + this dir alone.</td></tr>
+<tr><td class="mono">config/</td><td><code>modes.toml</code> — task_class → mode/model router (<code>scrum_review</code>, <code>contract_analysis</code>, <code>staffing_inference</code>, <code>pr_audit</code>, <code>doc_drift_check</code>, <code>fact_extract</code>). <code>providers.toml</code> — 5 active providers (ollama, ollama_cloud, openrouter, opencode 40-model, kimi direct). <code>routing.toml</code> — cost gates per task class.</td></tr>
 <tr><td class="mono">docs/</td><td><code>PRD.md</code>, <code>PHASES.md</code>, <code>DECISIONS.md</code> (21 ADRs). Every significant architectural choice has an ADR with the alternatives that were rejected and why.</td></tr>
 <tr><td class="mono">data/</td><td>Default local object store. Parquet datasets, append-log batches, HNSW trial journals, promotion registries, <code>_playbook_memory/state.json</code>, <code>_pathway_memory/state.json</code> (88 traces, 11/11 successful replays, ADR-021), catalog manifests. Plus learning-loop directories: <code>_kb/</code>, <code>_playbook_lessons/</code>, <code>_observer/ops.jsonl</code>, <code>_auditor/kimi_verdicts/</code>. Rebuildable from repo + this dir alone.</td></tr>
 </tbody>
 </table>
 </div>
@ -199,20 +204,42 @@ table.plain tr:hover td{background:#0d1117}
 <li>Ollama swaps to the profile's model via <code>keep_alive=0</code>; only one model in VRAM at a time</li>
 </ul>
-<h3>Model matrix (Phase 20)</h3>
+<h3>Provider fleet — 5 active, 40+ frontier models reachable</h3>
-<p>Five tiers declared in <code>config/models.json</code>. Each call site picks the tier appropriate to its purpose — hot-path JSON emitters get fast local, overview/strategic/gatekeeper decisions get thinking models on cloud. Every tier carries <code>context_window</code>, <code>context_budget</code>, and <code>overflow_policy</code>.</p>
+<p>Declared in <code>config/providers.toml</code> + <code>config/modes.toml</code>. Gateway is an OpenAI-compatible drop-in middleware: any consumer that speaks <code>POST /v1/chat/completions</code> gets routing, audit, cost telemetry, and the full memory substrate behind every call.</p>
 <table class="plain">
-<thead><tr><th>Tier</th><th>Purpose</th><th>Primary model</th><th>Frequency</th></tr></thead>
+<thead><tr><th>Provider</th><th>Reach</th><th>Use case</th></tr></thead>
 <tbody>
-<tr><td>T1 hot</td><td>Per tool call — SQL gen, hybrid_search, propose_done</td><td><code>qwen3.5:latest</code> local, <code>think:false</code></td><td>50-200/scenario</td></tr>
+<tr><td><code>ollama</code></td><td>localhost:3200 — local sidecar over Ollama</td><td>Hot-path JSON emitters, embeddings, last-resort rescue</td></tr>
-<tr><td>T2 review</td><td>Per-step consensus, drift flagging</td><td><code>qwen3:latest</code> local, <code>think:false</code></td><td>5-14/event</td></tr>
+<tr><td><code>ollama_cloud</code></td><td>ollama.com bearer key — gpt-oss:120b, qwen3-coder:480b, deepseek-v3.1:671b, kimi-k2:1t, mistral-large-3:675b, qwen3.5:397b</td><td>Strong-model reviewer rungs, T3+ overview, scrum master pipeline</td></tr>
-<tr><td>T3 overview</td><td>Mid-day checkpoints + cross-day lesson distill</td><td><code>gpt-oss:120b</code> Ollama Cloud, thinking on</td><td>1-3/scenario</td></tr>
+<tr><td><code>openrouter</code></td><td>openrouter.ai/api/v1 — 343 models incl. Anthropic/Google/OpenAI/MiniMax/Qwen, paid + free tiers</td><td>Paid ladder for observer escalations, free-tier rescue</td></tr>
-<tr><td>T4 strategic</td><td>Pattern re-ranking, weekly gap audit</td><td><code>qwen3.5:397b</code> cloud</td><td>1-10/day</td></tr>
+<tr><td><code>opencode</code></td><td>opencode.ai/zen/v1 — <strong>40 frontier models reachable through ONE sk-* key</strong>: Claude Opus 4.7 / Sonnet / Haiku, GPT-5.5-pro / 5.4 / codex variants, Gemini 3.1-pro, Kimi K2.6, GLM 5.1, DeepSeek, Qwen 3.6+, MiniMax, plus 4 free-tier</td><td>Cross-architecture tie-breakers, auditor cross-lineage (Haiku 4.5 + Opus 4.7), high-context reasoning (Opus on diffs &gt;100k chars)</td></tr>
-<tr><td>T5 gatekeeper</td><td>Schema migrations, autotune config changes</td><td><code>kimi-k2-thinking</code> cloud, audit-logged</td><td>1-5/day</td></tr>
+<tr><td><code>kimi</code></td><td>api.kimi.com/coding/v1 — direct Kimi For Coding</td><td>kimi_architect when ollama_cloud rate-limits; TOS-clean primary path</td></tr>
 </tbody>
 </table>
-<p><strong>Key mechanical finding (2026-04-21):</strong> qwen3.5 and qwen3 are <em>thinking</em> models — they burn ~650 tokens of hidden reasoning before emitting the visible response. For hot-path JSON emitters this meant 400-token budgets returned empty strings. Fix: <code>think: false</code> plumbed through sidecar's <code>/generate</code> endpoint; hot path disables thinking (structure matters more than reasoning depth), overseer tiers keep it on. Mistral was dropped entirely after a 0/14 fill rate on complex scenarios (decoder-level malformed-JSON bug, not a prompt issue).</p>
+
-<p><strong>Continuation primitive (Phase 21):</strong> <code>generateContinuable()</code> handles output-overflow without <code>max_tokens</code> tourniquets — empty response → geometric backoff retry; truncated-JSON → continue with partial as scratchpad. <code>generateTreeSplit()</code> handles input-overflow via map-reduce with running scratchpad. Both respect <code>assertContextBudget()</code> so silent truncation can't happen.</p>
+<h3>The 9-rung cloud-first ladder</h3>
 <p>Defined in <code>tests/real-world/scrum_master_pipeline.ts</code> as <code>const LADDER</code>. Each attempt is evaluated by <code>isAcceptable()</code> = chars ≥ 3800 ∧ not malformed JSON-only. On reject, the next rung sees a learning preamble carrying the prior rejection reason.</p>
 <pre>1  ollama_cloud / kimi-k2:1t            1T params · flagship
 2  ollama_cloud / qwen3-coder:480b      coding specialist
 3  ollama_cloud / deepseek-v3.1:671b    reasoning
 4  ollama_cloud / mistral-large-3:675b  deep analysis
 5  ollama_cloud / gpt-oss:120b          reliable workhorse
 6  ollama_cloud / qwen3.5:397b          dense final thinker
 7  openrouter   / openai/gpt-oss-120b:free  rescue tier
 8  openrouter   / google/gemma-3-27b-it:free fastest rescue
 9  ollama       / qwen3.5:latest        last-resort local</pre>
 <h3>N=3 consensus + cross-architecture tie-breaker</h3>
 <p>Every audit and every consensus-required call fires the primary reviewer N=3 times in parallel (Promise.all — wall-clock = single call). Aggregate votes per claim_idx, majority wins. On a 1-1-1 split, a tie-breaker model with <em>different architecture</em> (qwen3-coder:480b vs primary gpt-oss/kimi) is invoked. Every disagreement, even when majority resolves, writes to <code>data/_kb/audit_discrepancies.jsonl</code>. Closes the cloud-non-determinism gap: <code>temp=0</code> isn't actually deterministic in practice across hours; consensus + cross-architecture tie-break stabilizes verdicts.</p>
 <h3>Auditor cross-lineage (Kimi ↔ Haiku ↔ Opus)</h3>
 <p>Every push to PR #11 triggers <code>auditor/audit.ts</code> within ~90s. To prevent a single model lineage's blind spots from becoming the system's blind spots, audits alternate between Kimi K2.6 (Moonshot lineage) and Haiku 4.5 (Anthropic lineage) by head SHA. Diffs over 100k chars auto-promote to Claude Opus 4.7 (Anthropic frontier). Per-PR cap of 3 audits with auto-reset on each new head SHA prevents infinite-loop spend. <strong>Latest verdict on c3c9c21:</strong> Haiku 4.5, 24.6s, 100% grounding-verified across 10 findings.</p>
 <h3>Distillation v1.0.0 — the frozen substrate</h3>
 <p>The substrate the auditor and mode runner sit on is tagged at <code>distillation-v1.0.0</code> / commit <code>e7636f2</code>. <strong>145 unit tests pass · 22/22 acceptance invariants · 16/16 audit-full checks · bit-identical reproducibility verified.</strong> The distillation phase exports clean SFT / RAG / preference samples with a multi-layer contamination firewall (<code>SFT_NEVER</code> constant + scorer category mapping + acceptance fixtures); the auditor consumes the substrate. The frozen tag means: any future "the system regressed" question has a baseline to bisect against, byte-for-byte.</p>
 <h3>Continuation primitive (Phase 21)</h3>
 <p><code>generateContinuable()</code> handles output-overflow without <code>max_tokens</code> tourniquets — empty response → geometric backoff retry; truncated-JSON → continue with partial as scratchpad. <code>generateTreeSplit()</code> handles input-overflow via map-reduce with running scratchpad. Both respect <code>assertContextBudget()</code> so silent truncation can't happen. Now Rust-native in <code>crates/aibridge/src/continuation.rs</code> (Phase 44).</p>
 <h3>Per-staffer tool_level (Phase 23)</h3>
 <p>Scenarios can be scoped to a specific coordinator (<code>staffer: {id, name, tenure_months, role, tool_level}</code>). <code>tool_level</code> controls which tiers are available:</p>
@ -265,6 +292,12 @@ table.plain tr:hover td{background:#0d1117}
 <tr><td>Boost workers based on past success</td><td>No</td><td>Yes (Phase 19 playbook_memory)</td></tr>
 <tr><td>Penalize workers based on past failure</td><td>No</td><td>Yes (<code>/log_failure</code> + <code>0.5<sup>n</sup></code> penalty)</td></tr>
 <tr><td>Surface traits across past fills</td><td>No</td><td>Yes (<code>/vectors/playbook_memory/patterns</code>)</td></tr>
 <tr><td>Per-staffer relevance gradient</td><td>No</td><td>Yes — same query reshapes per coordinator (<code>staffer_id</code> on <code>/intelligence/chat</code>); MARIA'S MEMORY pill labels the playbook context with the active coordinator</td></tr>
 <tr><td>Triage in one shot — late-worker → backfills + draft SMS</td><td>No</td><td>Yes (<code>/intelligence/chat</code> Route 6 — pulls profile + 5 same-role same-geo backfills sorted by responsiveness + drafts client SMS in ~250ms)</td></tr>
 <tr><td>Permit → fill plan derivation (forward demand)</td><td>No</td><td>Yes (<code>/intelligence/permit_contracts</code> — Chicago Socrata permit → role / headcount / deadline / fill probability / gross revenue per card)</td></tr>
 <tr><td>Public-issuer attribution across contractor graph</td><td>No</td><td>Yes (<code>/intelligence/profiler_index</code> — direct + parent + co-permit associated tickers; live Stooq prices)</td></tr>
 <tr><td>Cross-lineage AI audit on every PR</td><td>No</td><td>Yes (auditor crate — Kimi K2.6 ↔ Haiku 4.5 alternation + Opus 4.7 auto-promote on big diffs)</td></tr>
 <tr><td>Pathway memory — system-level hot-swap by task fingerprint</td><td>No</td><td>Yes (88 traces, 11/11 successful replays, 100% reuse rate, ADR-021)</td></tr>
 <tr><td>Predict staffing demand from external data</td><td>No</td><td>Yes (Chicago permit feed + 30-day rolling forecast)</td></tr>
 <tr><td>Count down to staffing deadline per contract</td><td>No</td><td>Yes (permit issue_date + heuristic timeline)</td></tr>
 <tr><td>Explain why each candidate ranked</td><td>No</td><td>Yes (boost chip + narrative citations + memory pattern)</td></tr>
@ -278,7 +311,7 @@ table.plain tr:hover td{background:#0d1117}
 <div class="chapter" id="ch6">
 <div class="num">Chapter 6</div>
 <h2>How it gets better over time</h2>
-<div class="lede">Compounding learning across seven paths. The first three are automatic background loops. Paths 4-7 landed 2026-04-21 and turn the system into a reinforcement-learning pipeline: outcomes → knowledge base → pathway recommendations → cloud rescue → competence-weighted retrieval → observer analysis. All seven happen without operator intervention.</div>
+<div class="lede">Compounding learning across ten paths. The first three are automatic background loops. Paths 4-7 (Phase 22-24) added the reinforcement layer: outcomes → KB → recommendations → cloud rescue → competence-weighted retrieval → observer analysis. Paths 7-9 (Phase 25-43, 2026-04-26→27) added the system-level memory layers: pathway memory by task fingerprint (ADR-021), per-staffer hot-swap, and the Construction Activity Signal Engine. All ten happen without operator intervention.</div>
 <h3>Path 1 — Playbook boost with geo + role prefilter (Phase 19 + refinement)</h3>
 <p>Every sealed fill is seeded to <code>playbook_memory</code>. The boost fires inside <code>/vectors/hybrid</code> when <code>use_playbook_memory: true</code>. Math, tightened 2026-04-21 after a diagnostic pass found globally-ranked playbooks were missing the SQL-filtered candidate pool entirely:</p>
@ -311,7 +344,19 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
 <p>Answers "who handled this" as a first-class matrix-index dimension. Each scenario carries <code>staffer: {id, name, tenure_months, role, tool_level}</code>. After every run, <code>recomputeStafferStats(staffer_id)</code> aggregates their fill_rate, turn efficiency, citation density, rescue rate into a single <code>competence_score</code> (0.45·fill + 0.20·turn_eff + 0.20·cites + 0.15·rescue).</p>
 <p><code>findNeighbors</code> returns <code>weighted_score = cosine × max_staffer_competence</code> — top-performer playbooks rank above juniors' on similar scenarios. Auto-discovery emerges: running 4 staffers × 3 contracts × 3 rounds surfaced Rachel D. Lewis (Welder Nashville) with 18 endorsements across all 4 staffers, Angela U. Ward (Machine Op Indianapolis) with 19 — reliable-performer labels the system built without human tagging.</p>
-<h3>Path 7 — Observer outcome ingest (Phase 24)</h3>
+<h3>Path 7 — Pathway memory (ADR-021 — semantic-correctness matrix layer)</h3>
 <p>Memory at the system layer, not the worker layer. Every accepted scrum review writes a <code>PathwayTrace</code> with the full backtrack: file fingerprint, model used, signal class, KB chunks consulted, observer events, semantic flags (UnitMismatch, TypeConfusion, OffByOne, StaleReference, DeadCode, BoundaryViolation, …), bug fingerprints. A new query that fingerprints to the same trace can hot-swap to the prior result without re-running the 9-rung escalation. Five-factor hot-swap gate: narrow fingerprint match AND audit consensus pass AND replay_count ≥ 3 (probation) AND success_rate ≥ 0.80 AND NOT retired AND vector cosine ≥ 0.90.</p>
 <p><strong>Live state (verified on this load):</strong> 88 traces · 11 / 11 successful replays · 100% reuse rate · probation gate crossed. Endpoints: <code>/vectors/pathway/insert</code> · <code>/query</code> · <code>/record_replay</code> · <code>/stats</code> · <code>/bug_fingerprints</code>. Spec: <code>docs/DECISIONS.md</code> ADR-021.</p>
 <h3>Path 8 — Per-staffer hot-swap index</h3>
 <p>Memory scoped to whoever's acting. <code>/intelligence/chat</code> accepts <code>staffer_id</code>; on match, defaults state filter to staffer territory, scopes playbook-pattern geo to staffer's primary city/state, and surfaces <code>response.staffer.name</code> so the UI relabels MEMORY → MARIA'S MEMORY. Same query "forklift operators" returns 167 IL workers as Maria, 89 IN as Devon, 16 WI as Aisha. The corpus stays intact; the relevance gradient is per coordinator; each accumulates fills independently.</p>
 <p><strong>Roster:</strong> <code>/staffers</code> endpoint reads from <code>STAFFERS</code> in <code>mcp-server/index.ts</code>. Three personas today (Maria/Devon/Aisha); architecture generalizes — every new metro adds territories, not code paths.</p>
 <h3>Path 9 — Construction Activity Signal Engine</h3>
 <p>Memory at the network layer. Every contractor in the corpus is also a forward indicator on the public equities they touch via three attribution flavors: <code>direct</code> (contractor IS the public issuer — SEC tickers index match), <code>parent</code> (subsidiary of a public parent — curated KNOWN_PARENT_MAP, e.g. Turner → HOC.DE via Hochtief AG), <code>associated</code> (co-permit network — Bob's Electric appears with TARGET CORPORATION 3+ times → inherits TGT). The associated path is the moat: a staffing-permit dataset that maps contractor-to-public-issuer is not commercially available; we synthesize it from the Socrata co-occurrence graph.</p>
 <p><strong>BAI (Building Activity Index)</strong> = attribution-weighted average day-change across surfaced issuers. <strong>Indexed build value</strong> = total $ of permits attributable to ANY public issuer in scope. <strong>Network depth</strong> = issuers / total attribution edges. Cross-metro replication explicit in the architecture — Chicago is Phase 1; NYC DOB / LA County / Houston BCD / Boston ISD / DC DCRA are all Socrata-shaped, ship as config-only adapters.</p>
 <h3>Path 10 — Observer outcome ingest (Phase 24)</h3>
 <p>Observer runs as <code>lakehouse-observer.service</code>, now with an HTTP listener on <code>:3800</code>. Scenarios POST per-event outcomes to <code>/event</code> with full provenance (staffer_id, sig_hash, event_kind, role, city, state, rescue flags). Observer's ERROR_ANALYZER and PLAYBOOK_BUILDER loops consume them alongside MCP-wrapped ops. Persistence switched from the old <code>/ingest/file</code> REPLACE path to an append-only <code>data/_observer/ops.jsonl</code> journal so the trace survives across restarts.</p>
 <h3>Input normalizer + unified memory query</h3>
@ -399,7 +444,11 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
 <div class="chapter" id="ch9">
 <div class="num">Chapter 9</div>
 <h2>Per-staffer context</h2>
-<div class="lede">Twenty staffers don't see the same UI state. Each one's session is shaped by their active profile, their workspaces, their assigned contracts, and their client's blacklists.</div>
+<div class="lede">Twenty staffers don't see the same UI state. Each one's session is shaped by their identity (the per-staffer hot-swap index — Path 8 in Ch6), their active profile, their workspaces, their assigned contracts, and their client's blacklists.</div>
 <h3>Per-staffer hot-swap index (the recent layer)</h3>
 <p>Maria runs Chicago. Devon runs Indianapolis. Aisha runs Wisconsin/Michigan. They share one corpus, but search results, recurring-skill patterns, and playbook context all reshape to whoever is acting. <code>/intelligence/chat</code> accepts <code>staffer_id</code>; on match, defaults state filter to the staffer's territory, scopes playbook-pattern geo to their primary city/state, and surfaces <code>response.staffer.name</code> so the UI relabels MEMORY → <em>MARIA'S MEMORY</em>.</p>
 <p><strong>Verified end-to-end:</strong> same query "forklift operators" returns 167 IL workers as Maria, 89 IN as Devon, 16 WI as Aisha (live numbers; refresh the profiler page to recompute). The corpus stays intact; the relevance gradient is per coordinator. As each accumulates fills, their slice of the playbook compounds independently. <strong>Roster:</strong> <code>/staffers</code> endpoint, declared in <code>STAFFERS</code> in <code>mcp-server/index.ts</code>. Adding a staffer is one append; the architecture is metro-agnostic by construction.</p>
 <h3>Active profile (Phase 17)</h3>
 <p>Scopes every search. A <code>staffing-recruiter</code> profile bound to <code>workers_500k</code> sees only that dataset. A <code>security-analyst</code> profile bound to <code>threat_intel</code> cannot see worker data. <code>GET /vectors/profile/&lt;id&gt;/audit</code> records every tool invocation by model identity.</p>
@ -446,7 +495,7 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
 <div class="step"><div class="n">12:30</div><div class="body"><strong>Client pushes 20 new contracts + 1M ATS delta.</strong> Ch7 scale flow fires. Ingest in seconds; embedding refresh kicks off as a background job. Searches continue against old embeddings.</div></div>
-<div class="step"><div class="n">14:00</div><div class="body"><strong>Emergency: worker Dave no-showed.</strong> Sarah clicks No-show button on Dave's card → <code>/log_failure</code> → <code>mark_failed</code> records a penalty. Next similar query dampens Dave's boost by 0.5. Sarah continues the refill — the refill excludes Dave and the 2 others already booked for this shift.</div></div>
+<div class="step"><div class="n">14:00</div><div class="body"><strong>Emergency: worker Dave no-showed.</strong> Sarah types "Dave running late site 4422" into the search box. ~250ms later: triage card with Dave's profile + reliability + responsiveness, draft SMS to client ("dispatching X from local bench, 96% reliability, will confirm arrival"), and 5 same-role same-geo backfills sorted by responsiveness rendered as a green list below. Sarah clicks Copy SMS, pastes to client, clicks Call on the top backfill. <code>/log_failure</code> on Dave records the penalty for the next similar query.</div></div>
 <div class="step"><div class="n">15:00</div><div class="body"><strong>New embeddings live.</strong> Hot-swap promotion. Searches now see all 1M new profiles. Sarah's noon query re-run would produce different top-5.</div></div>
@ -468,14 +517,15 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
 <h4>Deferred — real architectural work, just not shipped yet</h4>
 <ul>
 <li><strong>BAI persistence + backtesting.</strong> Building Activity Index is computed live per page load. To validate the thesis (permit activity precedes equity moves) we need the daily series saved over months. Architectural support exists (<code>data/_kb/audit_baselines.jsonl</code> append pattern); just hasn't run long enough.</li>
 <li><strong>NYC DOB adapter.</strong> Architecture is metro-agnostic — Chicago is Phase 1. NYC DOB ships next as a config-only Socrata adapter; LA County, Houston BCD, Boston ISD, DC DCRA queue behind it. Each new metro multiplies network edges without multiplying the codebase.</li>
 <li><strong>12 awaiting public-data sources for contractor profile.</strong> DOL Wage &amp; Hour, EPA ECHO, MSHA, BBB, PACER civil suits, UCC liens, D&amp;B credit, State licensure, Surety bonds, DOT/FMCSA, State UI claims, DOL RAPIDS apprenticeships. Listed by name on every contractor profile with a one-line "would show:" sample. Each ships as a Socrata-style adapter; engineering scope is concrete.</li>
 <li><strong>Rate / margin awareness.</strong> Worker pay expectations vs contract bill rate not modeled. Requires adding <code>pay_rate</code> to workers, <code>bill_rate</code> to contracts, and a filter + warning path. Partially addressed via <code>ContractTerms.budget_per_hour_max</code> passed to T3/rescue prompts, but the match-time filter isn't wired yet.</li>
-<li><strong>Mem0-style UPDATE / DELETE / NOOP operations on playbooks.</strong> Today <code>/seed</code> only ADDs. Same <code>(operation, date)</code> pair appends a duplicate instead of refining an existing entry. Phase 26 item — cheap to add, moderate payoff.</li>
+<li><strong>Mem0-style UPDATE / DELETE / NOOP operations on playbooks.</strong> Today <code>/seed</code> only ADDs. Same <code>(operation, date)</code> pair appends a duplicate instead of refining an existing entry. Cheap to add, moderate payoff.</li>
-<li><strong>Letta working-memory hot cache.</strong> Every boost query scans all active playbook entries from in-memory state. 1.9K today; cheap. Will bite somewhere north of 100K. LRU for the last-N playbooks or current-sig neighborhood deferred until that ceiling approaches.</li>
+<li><strong>Letta working-memory hot cache.</strong> Every boost query scans all active playbook entries from in-memory state. ~5K today; cheap. Will bite somewhere north of 100K. Deferred until the ceiling approaches.</li>
 <li><strong>Chunking cache (Phase 21 Rust port).</strong> TS primitives <code>generateContinuable</code> + <code>generateTreeSplit</code> are wired, but <code>crates/aibridge/src/{continuation.rs, tree_split.rs}</code> + <code>crates/storaged/src/chunk_cache.rs</code> remain queued. Gateway-side callers currently don't have the same protection against silent truncation that the TS test harness does.</li>
 <li><strong>Confidence calibration.</strong> Top-K is a rank, not a probability. No calibrated "85% likely to accept" score. Requires outcome-labeled training data.</li>
-<li><strong>Neural re-ranker.</strong> Phase 19 is statistical + semantic (now with geo + role prefilter, Phase 25 retirement). A (query, candidate, outcome)-trained re-ranker is deferred only if the statistical floor plateaus below usable recall — current 14× citation lift on identical inputs suggests it hasn't.</li>
+<li><strong>SEC name-to-ticker fuzzy precision.</strong> Current matcher requires ≥2 non-stopword overlap; rare false positives still surface (saw FLG attach to a PNC-adjacent contractor once). Tightenable to require an explicit allow-list for production trading use.</li>
-<li><strong>Observer → autotune feedback wire.</strong> Phase 24 streams scenario outcomes into <code>data/_observer/ops.jsonl</code>; autotune agent still runs on its own HNSW-trial schedule and hasn't subscribed to the outcome metric stream yet. Phase 26+ item — connects the last loop.</li>
+<li><strong>Tighter integration of pathway memory + scrum loop.</strong> ADR-021 substrate is shipped (88 traces, 11/11 replays). The hot-swap gate fires correctly; what's deferred is automatic mode-runner short-circuit when a high-confidence pathway match is available before any cloud call burns.</li>
 <li><strong>call_log cross-reference.</strong> Infrastructure present; current synthetic candidates table is too small to cross-ref. Fixes when real ATS lands.</li>
 </ul>
 <h4>Non-goals — explicitly out of scope</h4>
@ -496,6 +546,6 @@ boost[(city, state, name)] = min(Σ per_worker, 0.25)</pre>
 </div>
 </div>
-<div class="footer">Lakehouse spec · v2 2026-04-21 · Phases 19-25 shipped (playbook boost, model matrix, continuation, KB, staffer competence, observer ingest, validity windows) · maintained from <code>docs/DECISIONS.md</code> · <a href="proof">architecture live-tested</a> · <a href="console">walkthrough</a></div>
+<div class="footer">Lakehouse spec · v3 2026-04-27 · Phases 19-45 shipped (playbook boost, KB, staffer competence, observer ingest, validity windows, distillation v1.0.0 substrate frozen at e7636f2, gateway as OpenAI-compat drop-in, mode runner, validator + iterate, pathway memory ADR-021, per-staffer hot-swap, Construction Activity Signal Engine) · maintained from <code>docs/DECISIONS.md</code> · <a href="proof">architecture live-tested</a> · <a href="console">walkthrough</a> · <a href="profiler">profiler</a></div>
 </body></html>
--- a/mcp-server/tif_polygons.ts
+++ b/mcp-server/tif_polygons.ts
@ -0,0 +1,178 @@
 // TIF (Tax Increment Financing) district point-in-polygon lookup.
 // Given a property's lat/long, returns which Chicago TIF district (if
 // any) contains it. TIF districts are public-subsidy zones — a property
 // inside one is receiving city tax-increment funding for its build.
 // Strong "this project has financial backing" signal for the Project Index.
 //
 // Data: data/_entity_cache/tif_districts.geojson (Chicago Open Data
 // dataset eejr-xtfb, 100 active districts, 3.2MB). Refresh by re-running
 // `curl ... eejr-xtfb.geojson > tif_districts.geojson` — districts
 // change rarely (only when city council approves new ones or repeals).
 //
 // Algorithm: classic ray-casting. For each MultiPolygon's outer ring,
 // count edge crossings of an east-going horizontal ray from the point.
 // Odd crossings = inside. Holes (inner rings) flip the parity. Library-
 // free; correct for arbitrary polygons including the irregular Chicago
 // shapes which often have many small detours.
 import { readFile } from "node:fs/promises";
 import { existsSync } from "node:fs";
 import { join } from "node:path";
 const TIF_GEOJSON = join("/home/profit/lakehouse/data/_entity_cache", "tif_districts.geojson");
 type LngLat = [number, number]; // GeoJSON convention: [longitude, latitude]
 type Ring = LngLat[];
 type Polygon = Ring[]; // outer ring + optional inner rings (holes)
 type MultiPolygon = Polygon[];
 type TifFeature = {
  name: string;
  trim_name?: string;
  ref?: string;
  approval_date?: string;
  expiration?: string;
  type?: string; // T-1xx etc.
  comm_area?: string;
  wards?: string;
  // Bounding box for quick reject
  bbox: { minLon: number; minLat: number; maxLon: number; maxLat: number };
  geometry: MultiPolygon;
 };
 let tifIdx: TifFeature[] | null = null;
 function bboxOfMultiPolygon(mp: MultiPolygon): TifFeature["bbox"] {
  let minLon = Infinity, minLat = Infinity, maxLon = -Infinity, maxLat = -Infinity;
  for (const poly of mp) {
    for (const ring of poly) {
      for (const [lon, lat] of ring) {
        if (lon < minLon) minLon = lon;
        if (lat < minLat) minLat = lat;
        if (lon > maxLon) maxLon = lon;
        if (lat > maxLat) maxLat = lat;
      }
    }
  }
  return { minLon, minLat, maxLon, maxLat };
 }
 async function ensureLoaded(): Promise<TifFeature[]> {
  if (tifIdx) return tifIdx;
  if (!existsSync(TIF_GEOJSON)) {
    tifIdx = [];
    return tifIdx;
  }
  try {
    const raw = JSON.parse(await readFile(TIF_GEOJSON, "utf-8"));
    const out: TifFeature[] = [];
    for (const f of raw.features || []) {
      const geom = f.geometry;
      if (!geom) continue;
      // Normalize Polygon → MultiPolygon for uniform iteration
      let mp: MultiPolygon;
      if (geom.type === "MultiPolygon") {
        mp = geom.coordinates;
      } else if (geom.type === "Polygon") {
        mp = [geom.coordinates];
      } else {
        continue;
      }
      const props = f.properties || {};
      out.push({
        name: props.name || "Unknown TIF",
        trim_name: props.name_trim,
        ref: props.ref,
        approval_date: props.approval_d,
        expiration: props.expiration,
        type: props.type,
        comm_area: props.comm_area,
        wards: props.wards,
        bbox: bboxOfMultiPolygon(mp),
        geometry: mp,
      });
    }
    tifIdx = out;
    return tifIdx;
  } catch (e) {
    console.warn("[tif] load failed:", (e as Error).message);
    tifIdx = [];
    return tifIdx;
  }
 }
 // Ray-casting point-in-polygon (single ring). Returns true if (lon, lat)
 // is strictly inside the ring. Edge cases (point exactly on a vertex)
 // resolve by half-open interval convention; for our use case (Chicago
 // boundary precision is ~1m, sites are point queries) this is fine.
 function pointInRing(lon: number, lat: number, ring: Ring): boolean {
  let inside = false;
  const n = ring.length;
  for (let i = 0, j = n - 1; i < n; j = i++) {
    const [xi, yi] = ring[i];
    const [xj, yj] = ring[j];
    const intersect =
      yi > lat !== yj > lat &&
      lon < ((xj - xi) * (lat - yi)) / (yj - yi + 0) + xi;
    if (intersect) inside = !inside;
  }
  return inside;
 }
 // Polygon = outer ring + holes. Inside outer AND not inside any hole.
 function pointInPolygon(lon: number, lat: number, polygon: Polygon): boolean {
  if (polygon.length === 0) return false;
  if (!pointInRing(lon, lat, polygon[0])) return false;
  for (let i = 1; i < polygon.length; i++) {
    if (pointInRing(lon, lat, polygon[i])) return false;
  }
  return true;
 }
 export type TifMatch = {
  name: string;
  ref?: string;
  approval_date?: string;
  expiration?: string;
  comm_area?: string;
  wards?: string;
 };
 export async function findTifDistrict(
  longitude: number | string | undefined,
  latitude: number | string | undefined,
 ): Promise<TifMatch | null> {
  const lon = typeof longitude === "string" ? parseFloat(longitude) : longitude;
  const lat = typeof latitude === "string" ? parseFloat(latitude) : latitude;
  if (!lon || !lat || isNaN(lon) || isNaN(lat)) return null;
  const idx = await ensureLoaded();
  if (idx.length === 0) return null;
  for (const f of idx) {
    // Bbox reject — cheap O(1) skip for the 99% of districts that
    // can't possibly contain the point.
    const b = f.bbox;
    if (lon < b.minLon || lon > b.maxLon || lat < b.minLat || lat > b.maxLat) continue;
    // Full point-in-polygon for any polygon in this MultiPolygon
    for (const poly of f.geometry) {
      if (pointInPolygon(lon, lat, poly)) {
        return {
          name: f.name,
          ref: f.ref,
          approval_date: f.approval_date,
          expiration: f.expiration,
          comm_area: f.comm_area,
          wards: f.wards,
        };
      }
    }
  }
  return null;
 }
 export async function getTifIndexStats(): Promise<{
  total: number;
  loaded: boolean;
 }> {
  const idx = await ensureLoaded();
  return { total: idx.length, loaded: idx.length > 0 };
 }
--- a/ops/systemd/lakehouse-langfuse-bridge.service
+++ b/ops/systemd/lakehouse-langfuse-bridge.service
@ -0,0 +1,28 @@
 [Unit]
 Description=Lakehouse Langfuse → observer bridge — forwards LLM trace metadata to :3800 so KB learns from cost/latency/provider deltas
 Documentation=file:///home/profit/lakehouse/mcp-server/langfuse_bridge.ts
 After=network.target
 # No hard dependency on either Langfuse or observer — if either is down,
 # the bridge retries on the next tick without crashing. That's the
 # whole point of the cursor state file.
 [Service]
 Type=simple
 WorkingDirectory=/home/profit/lakehouse
 ExecStart=/home/profit/.bun/bin/bun run /home/profit/lakehouse/mcp-server/langfuse_bridge.ts
 Restart=on-failure
 RestartSec=30
 # Credentials resolved from env. Matches how
 # crates/gateway/src/v1/langfuse_trace.rs reads them so both producer
 # (gateway emitter) and consumer (this bridge) share the same config.
 EnvironmentFile=-/etc/lakehouse/langfuse.env
 Environment=LANGFUSE_URL=http://localhost:3001
 Environment=OBSERVER_URL=http://localhost:3800
 Environment=LANGFUSE_POLL_MS=30000
 Environment=LANGFUSE_BATCH_LIMIT=50
 Environment=LANGFUSE_STATE_FILE=/var/lib/lakehouse-guard/langfuse_last_seen.json
 KillSignal=SIGTERM
 TimeoutStopSec=5
 [Install]
 WantedBy=multi-user.target
--- a/package.json
+++ b/package.json
@ -0,0 +1,5 @@
 {
  "dependencies": {
    "langfuse": "^3.38.20"
  }
 }
--- a/reports/distillation/phase6-acceptance-report.md
+++ b/reports/distillation/phase6-acceptance-report.md
@ -1,6 +1,6 @@
 # Phase 6 — Acceptance Gate Report
-**Run:** 2026-04-27T04:54:32.225Z
+**Run:** 2026-04-27T15:43:37.943Z
 **Fixture:** `tests/fixtures/distillation/acceptance/`
 **Temp root:** `/tmp/distillation_phase6_acceptance`
 **Pipeline run_ids:** `acceptance-run-1-stable` (first) + `acceptance-run-2-stable` (second / hash reproducibility)
@ -40,13 +40,13 @@
 | 19 | scratchpad/tree-split case: fixture row materialized into evidence | found | found | ✓ |
 | 20 | PRD drift case: fixture row materialized | found | found | ✓ |
 | 21 | hash reproducibility: per-stage output_hash identical across runs | 0 mismatches | all match | ✓ |
-| 22 | hash reproducibility: run_hash identical | 3ea12b160ee9099a... | 3ea12b160ee9099a... | ✓ |
+| 22 | hash reproducibility: run_hash identical | 8dfdacee62380ec2... | 8dfdacee62380ec2... | ✓ |
 ## Hash reproducibility detail
-run 1 run_hash: `3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2`
+run 1 run_hash: `8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0`
-run 2 run_hash: `3ea12b160ee9099a3c52fe6e7fffd3076de7920d2704d24c789260d63cb1a5a2`
+run 2 run_hash: `8dfdacee62380ec20b7420d8f8bad3c395822da6eb0b41eeecd356e88fe20bf0`
 **Bit-for-bit identical.** Two runs of the entire pipeline on the same fixture with the same `recorded_at` produce the same outputs. Distillation is deterministic.
--- a/reports/distillation/phase8-full-audit-report.md
+++ b/reports/distillation/phase8-full-audit-report.md
@ -1,8 +1,8 @@
 # Phase 8 — Full System Audit Report
-**Run:** 2026-04-27T04:54:32.283Z
+**Run:** 2026-04-27T15:43:38.021Z
-**Git commit:** 73f242e3e41c2aa36b35fe9de54742b248915cb5
+**Git commit:** ca7375ea2b178159a0c61bbf62788a2ffa2390e9
-**Baseline:** 2026-04-27T04:53:45.796Z (5bdd159966e6)
+**Baseline:** 2026-04-27T10:31:44.043Z (d11632a6fae6)
 ## Result: **PASS** ✓
@ -26,7 +26,7 @@
 | 1 | P0 | recon doc exists | Y | docs/recon/local-distillation-recon.md present | present | ✓ |
 | 2 | P0 | tier-1 source streams present | — | all 4 tier-1 jsonls on disk | all present | ✓ |
 | 3 | P1 | schema validators pass on fixtures | Y | ≥40 tests, 0 fail | 51 pass, 0 fail | ✓ |
-| 4 | P2 | materializer dry-run completes | Y | >=1 row from each tier-1 source | 1073 read · 16 written · 2 skipped | ✓ |
+| 4 | P2 | materializer dry-run completes | Y | >=1 row from each tier-1 source | 1139 read · 82 written · 2 skipped | ✓ |
 | 5 | P2 | tier-1 sources each materialize ≥1 row | — | 4/4: distilled_facts, scrum_reviews, audit_facts, mode_experiments | 1/4 hit (mode_experiments) | ✓ |
 | 6 | P3 | on-disk scored-runs distribution non-empty | Y | >=1 accepted | acc=386 part=132 rej=57 hum=480 | ✓ |
 | 7 | P3 | scored-runs distribution sums positive | — | >0 total | 1055 total | ✓ |
@ -38,19 +38,19 @@
 | 13 | P5 | latest run (3fa51d66-784c-4c7d-843d-6c48328a608c) has all 5 stage receipts | Y | collect,score,export-rag,export-sft,export-preference | all present | ✓ |
 | 14 | P5 | every stage receipt validates against schema | Y | 0 invalid | 0 invalid | ✓ |
 | 15 | P5 | RunSummary validates | Y | valid | valid | ✓ |
-| 16 | P5 | summary.git_commit is 40-char hex | — | match | 68b6697bcb38... (HEAD: 73f242e3e41c...) | ✓ |
+| 16 | P5 | summary.git_commit is 40-char hex | — | match | 68b6697bcb38... (HEAD: ca7375ea2b17...) | ✓ |
 | 17 | P5 | run_hash is sha256 | Y | /^[0-9a-f]{64}$/ | 2336b96c3638982d... | ✓ |
 | 18 | P6 | acceptance gate passes 22/22 invariants on fixture | Y | PASS — 22/22 | 22/22 (exit=0) | ✓ |
 | 19 | P7 | replay validation passes on 3/3 dry-run sample tasks | Y | 3/3 | 3/3 | ✓ |
 | 20 | P7 | replay retrieval surfaces ≥1 playbook on each task (when corpus present) | — | ≥1 task with retrieval | 3/3 | ✓ |
 | 21 | P7 | escalation loop guard: no path > 2 models | Y | 0 loops | 0 | ✓ |
-| 22 | P7 | replay_runs.jsonl populated by audit run | — | exists with ≥3 rows added | 21 rows total | ✓ |
+| 22 | P7 | replay_runs.jsonl populated by audit run | — | exists with ≥3 rows added | 27 rows total | ✓ |
 ## Drift vs prior baseline
 | Metric | Baseline | Current | Δ% | Flag |
 |---|---|---|---|---|
-| p2_evidence_rows | 15 | 16 | 7% | ok |
+| p2_evidence_rows | 25 | 82 | 228% | warn |
 | p2_evidence_skips | 2 | 2 | 0% | ok |
 | p3_accepted | 386 | 386 | 0% | ok |
 | p3_partial | 132 | 132 | 0% | ok |
@ -61,7 +61,7 @@
 | p4_pref_pairs | 83 | 83 | 0% | ok |
 | p4_total_quarantined | 1325 | 1325 | 0% | ok |
-All metrics within 20% of baseline — pipeline stable across runs.
+**1 metric(s) drifted >20% from baseline.** Investigate before treating outputs as stable.
 ## System health status
--- a/reports/kimi/audit-last-week-full.md
+++ b/reports/kimi/audit-last-week-full.md
@ -0,0 +1,45 @@
 # Kimi Forensic Audit (FULL FILES) — distillation v1.0.0
 **Generated:** 2026-04-27 by `kimi-for-coding` via gateway /v1/chat
 **Latency:** 270.6s | **finish:** stop | **usage:** {'prompt_tokens': 66338, 'completion_tokens': 10159, 'total_tokens': 76497}
 **Input:** /tmp/kimi-audit-full.md (238KB · 12 commits · 15 files · line-numbered, no truncation)
 ---
 ## Verdict
 **Hold**: the substrate’s TypeScript pipeline is architecturally coherent and the SFT firewall is genuine, but committed Rust tests fail to compile, drift detection hardcodes an unverified integrity assertion, and deterministic guarantees leak wall-clock time in multiple places.
 ## What's solid
 - **Three-layer SFT contamination firewall is real.** Schema enum restricts `quality_score` to `["accepted", "partially_accepted"]` (`sft_sample.ts:13,62`), exporter constant `SFT_NEVER` blocks rejected/needs_human_review before synthesis (`export_sft.ts:51,205`), and `receipts.ts` re-reads the output to fail loud if any forbidden score leaked (`receipts.ts:231-236`).
 - **Core scorer is pure and deterministic.** `scoreRecord` takes an `EvidenceRecord`, performs no I/O, no LLM calls, and uses no mutable state (`scorer.ts:1-5,257-273`).
 - **Quarantine is exhaustive and observable.** Every exporter routes skips to structured `exports/quarantine/<exporter>.jsonl` with typed reasons; silent drops are impossible by construction (`quarantine.ts:1-6,14-26`).
 - **Evidence provenance is mandatory on every row.** Every `EvidenceRecord` carries `source_file`, `line_offset`, `sig_hash`, and `recorded_at` (`build_evidence_index.ts:27-34`).
 - **Local-first replay reduces cloud calls.** `replay.ts` defaults to a local model, augments via RAG retrieval, and only escalates on validation failure, directly supporting the cloud-call reduction claim (`replay.ts:24,349-376`).
 ## What's risky
 1. **receipts.ts:495** hardcodes `input_hash_match: true` in drift reports while comments on lines 467-469 admit input-hash comparison is unimplemented; this is false telemetry in a forensic system.
 2. **score_runs.ts:159** deduplicates scored runs by `scored.provenance.sig_hash` (the *evidence* hash), not by a composite of evidence + scorer version, so scorer logic or `SCORER_VERSION` updates are silently ignored on re-runs against existing partition files.
 3. **transforms.ts:181** `auto_apply` transform falls back to `new Date().toISOString()` when `row.ts` is missing, injecting wall-clock time into the supposedly deterministic materialization layer.
 4. **mode.rs:1035,1042** Rust test code assigns `Some("...".into())` and `None` to a `Vec<String>` field (`matrix_corpus`), which would fail `cargo test` compilation; this contradicts the claim that the tag is fully tested.
 5. **export_sft.ts:109-133** synthesizes fake instruction templates per source stem instead of using actual historical prompts; the SFT firewall prevents category contamination but not prompt-fidelity distortion.
 ## Specific findings
 - **mode.rs:1035** — Compile error in test helper: `matrix_corpus: Some("distilled_procedural_v1".into())` mismatches the `Vec<String>` type declared at line 172. **Rationale:** Direct struct construction in the test module uses an `Option` where a `Vec` is required, so the Rust test suite cannot compile.
 - **receipts.ts:495** — Drift detection hardcodes `input_hash_match: true`. **Rationale:** The adjacent comment admits input-hash comparison is simplified and unimplemented (lines 467-469); asserting a verified match is misleading telemetry that will hide real input-side regressions.
 - **score_runs.ts:159** — Scored-run dedup ignores scorer version. **Rationale:** `loadSeenHashes` and the skip logic key only on the EvidenceRecord `sig_hash`, meaning an existing scored-run file from yesterday will block updated scores even if `SCORER_VERSION` or scorer logic changed today.
 - **transforms.ts:181** — Non-deterministic timestamp fallback in `auto_apply` transform. **Rationale:** `row.ts ?? new Date().toISOString()` injects wall-clock time when the source row lacks a timestamp, violating the header claim that transforms are “deterministic by construction” and breaking bit-identical reproducibility for that stream.
 - **export_sft.ts:126** — Unsafe property access via `as any`. **Rationale:** `(ev as any).contractor` bypasses the `EvidenceRecord` type contract; if the property is absent the template silently emits `"<contractor>"`, degrading SFT data quality without a type error.
 - **scorer.ts:30** — Environmental dependency in deterministic scorer. **Rationale:** `process.env.LH_SCORER_VERSION` means identical evidence inputs produce different `scorer_version` stamps (and different downstream receipts) depending on the runtime environment, undermining bit-identical claims.
 - **replay.ts:378** — Non-deterministic run identifier. **Rationale:** `` `replay:${task_hash.slice(0, 16)}:${Date.now()}` `` makes replay evidence rows non-reproducible and risks collision under rapid successive calls.
 - **export_sft.ts:109-133** — Synthetic instruction generation replaces ground-truth prompts. **Rationale:** The exporter fabricates instruction strings from metadata (e.g., hardcoded scrum review phrasing) rather than retrieving the actual historical prompt, so the resulting SFT dataset trains on reconstructed, not authentic, user instructions.
 ## Direction recommendation
 **Pause the staffing audit and harden the substrate first.** Before building the staffing inference mode (`staffing_inference_lakehouse` in `mode.rs:54`) on top of this substrate:
 1. Fix the Rust test compile errors (`mode.rs:1035,1042`) and ensure `cargo test` runs in CI.
 2. Replace the hardcoded `input_hash_match: true` in drift detection (`receipts.ts:495`) with a real hash comparison or remove the field until it is implemented.
 3. Change scored-run dedup (`score_runs.ts:159`) to key on a composite hash of `evidence_sig_hash + scorer_version + SCORER_VERSION` so scorer updates force re-scoring.
 4. Remove the `new Date().toISOString()` fallback in `transforms.ts:181` or fail the row so determinism is preserved.
 5. Audit all `as any` casts in the export layer (`export_sft.ts:126`) for type-safe alternatives.
 Once those fixes land and acceptance re-runs pass, proceed to the staffing audit wave; the architecture is sound enough to support it, but the forensic guarantees must be honest before downstream teams depend on them.
--- a/reports/kimi/audit-last-week.md
+++ b/reports/kimi/audit-last-week.md
@ -0,0 +1,36 @@
 # Kimi Forensic Audit — distillation v1.0.0 (last week)
 **Generated:** 2026-04-27 by `kimi-for-coding` via gateway /v1/chat
 **Latency:** 157.6s | **finish:** stop | **usage:** {'prompt_tokens': 14014, 'completion_tokens': 6356, 'total_tokens': 20370}
 **Input:** /tmp/kimi-audit-input.md (56k chars · 12 commits · 6 files)
 ---
 ## Verdict
 **hold** — Runtime lock-in, integration mismatches, and truncated source files in the v1.0.0 payload make the tag unshippable without rework.
 ## What's solid
 - `scorer.ts` is a pure, deterministic function with no I/O, no LLM calls, and an explicit version stamp (`scorer.ts:22`).
 - SFT export enforces defense-in-depth contamination firewalls via `SFT_NEVER` and schema validators (`export_sft.ts:49-50`; `sft_sample.ts:43-48`).
 - Evidence materialization is idempotent across reruns using `sig_hash` deduplication (`build_evidence_index.ts:114-126`).
 - Mode router falls back to a safe built-in default if config parsing fails (`mode.rs:208-228`).
 - Quarantine writer abstraction isolates bad records instead of failing the export (`export_sft.ts`).
 ## What's risky
 - **Integration mismatch**: `replay.ts` posts to `/v1/chat`, but the provided gateway only declares `/v1/mode` and `/v1/mode/execute` (`replay.ts:186` vs `mode.rs:13-18`), suggesting an undocumented or broken proxy contract.
 - **Bun runtime lock-in**: Multiple files depend on `Bun.CryptoHasher`, which throws in Node.js (`export_sft.ts:235`; `build_evidence_index.ts:89`).
 - **Unauditable files in scope**: Critical files listed in the diff—`transforms.ts`, `receipts.ts`, `quarantine.ts`, `score_runs.ts`—were not provided, so their logic is unseen.
 - **Every shown implementation file is truncated**: `scorer.ts`, `export_sft.ts`, `build_evidence_index.ts`, `replay.ts`, and `mode.rs` all end mid-block, hiding error handling, receipt finalization, and gateway dispatch code.
 - **Type safety escape**: `(ev as any).contractor` in SFT synthesis bypasses the schema layer (`export_sft.ts:138`).
 ## Specific findings
 1. `scripts/distillation/scorer.ts:22` — `SCORER_VERSION` reads from `process.env`, introducing environment-dependent output drift that contradicts the file’s “identical input → identical output forever” contract.
 2. `scripts/distillation/export_sft.ts:138` — `(ev as any).contractor` is an unguarded `any` cast; a malformed `EvidenceRecord` will inject the string `"undefined"` or crash at runtime inside the SFT instruction template.
 3. `scripts/distillation/export_sft.ts:235` — `new Bun.CryptoHasher("sha256")` is a Bun-only API; this path will fail under Node.js/Deno and makes the substrate non-portable.
 4. `scripts/distillation/build_evidence_index.ts:89` — Same Bun crypto lock-in in `sha256OfFile`, fragmenting the hashing implementation (here `Bun.CryptoHasher`, elsewhere `canonicalSha256`).
 5. `scripts/distillation/replay.ts:178` — Provider routing relies on fragile string heuristics (`model.includes("/")`, prefix lists); models with unexpected names will route to the wrong backend or hit the `ollama` default incorrectly.
 6. `scripts/distillation/replay.ts:186` — `fetch(`${gatewayUrl()}/v1/chat`` targets an endpoint absent from the provided `mode.rs` router; without the missing gateway dispatch code, this call will 404.
 7. `crates/gateway/src/v1/mode.rs:141` — `deserialize_string_or_vec` uses `serde_json::Value::deserialize` against a TOML source, which is non-idiomatic and risks mis-handling TOML-specific types (datetime, inline tables) compared to a native `toml::Value`.
 8. `scripts/distillation/build_evidence_index.ts:185` — `await canonicalSha256(row)` is async, yet `sha256OfFile` is sync; the mixing of sync/async crypto calls in the same module hints at inconsistent I/O boundaries.
 ## Direction recommendation
 Keep the substrate architecture, but **do not expand staffing audit work on top of v1.0.0 until three blockers are fixed**: (1) replace `Bun.CryptoHasher` with portable WebCrypto or Node `crypto` so the build is runtime-agnostic; (2) align `replay.ts` to the actual gateway contract (`/v1/mode/execute`) or document the `/v1/chat` proxy route; and (3) eliminate `any` casts in the export path. The schema firewalls, deterministic scorer, and receipt provenance are the right foundation—rework the runtime/contract gaps rather than rebuilding from scratch.
--- a/reports/lance_10m_rebench_2026-05-02.md
+++ b/reports/lance_10m_rebench_2026-05-02.md
@ -0,0 +1,116 @@
 # Lance backend re-benchmark — 10M vectors (scale_test_10m)
 **Date:** 2026-05-02
 **Dataset:** `data/lance/scale_test_10m` (33 GB, ~10M vectors, 768d)
 **Driver:** live HTTP gateway `:3100/vectors/lance/*` (post sanitizer-fix binary)
 **Method tag on every search response:** `lance_ivf_pq` (confirms IVF_PQ, not brute-force)
 ADR-019 deferred a 10M re-bench: *"at 10M we expect Lance to pull ahead because HNSW doesn't fit in RAM. Re-benchmark when we have a 10M-vector corpus to test against."* The corpus exists; this is that benchmark.
 ## Search latency, 10 diverse queries, top_k=10 (cold)
 | Query | Latency |
 |---|---:|
 | warehouse forklift operator second shift | 50.5ms |
 | senior software engineer kubernetes | 52.9ms |
 | registered nurse pediatric | 37.6ms |
 | welder TIG aluminum | **127.7ms** |
 | data scientist python | 41.6ms |
 | electrician journeyman commercial | 31.4ms |
 | accountant CPA tax | 28.6ms |
 | machine learning research | 32.1ms |
 | construction site supervisor | 31.8ms |
 | biomedical engineer | 25.0ms |
 Median ~32ms, mean ~46ms, one ~128ms outlier (TIG aluminum query — not investigated; could be query-specific IVF traversal pattern or transient I/O).
 ## Search latency, repeated query (warm cache)
 Same query (`forklift operator`) hit 5 times in a row:
 | Call | Latency |
 |---|---:|
 | 1 | 21.9ms |
 | 2 | 20.2ms |
 | 3 | 19.2ms |
 | 4 | 22.4ms |
 | 5 | 18.6ms |
 **Warm-cache p50 ~20ms.** Stable across the 5 trials.
 ## Doc-fetch by id, 5 calls (post-warmup) — BEFORE scalar-index fix
 Fetched the same doc_id (`VEC-2196862`) repeatedly:
 | Call | Latency |
 |---|---:|
 | 1 | 68.2ms |
 | 2 | 89.3ms |
 | 3 | 153.9ms |
 | 4 | 126.5ms |
 | 5 | 140.7ms |
 **~100ms p50, climbing under repeat.** Substantially slower than the 100K-corpus number from ADR-019 (311μs claimed; ~6ms measured today on workers_500k_v1).
 ### Root cause (investigated post-bench)
 `/vectors/lance/stats/scale_test_10m` returned `has_doc_id_index: false`. The scalar btree on `doc_id` was **never built** for this dataset. Doc-fetch was running a full table scan over 35GB.
 Cause: the auto-build code in `crates/vectord/src/service.rs:1492-1503` only fires for `IndexMeta`-registered indexes during `set_active_profile` warming. `scale_test_10m` was created by the `lance-bench` binary directly via the migrate HTTP route — it bypasses the IndexMeta registry, so warming never sees it, so neither the vector index nor the scalar index gets auto-built. (The vector index was built manually via `/vectors/lance/index/scale_test_10m`; the scalar index never was.)
 ### Doc-fetch by id, 5 calls — AFTER `POST /vectors/lance/scalar-index/scale_test_10m/doc_id`
 Build took **1.22s** for 10M rows, added 269MB of btree on disk.
 | Call | Latency |
 |---|---:|
 | 1 | 5.6ms |
 | 2 | 5.0ms |
 | 3 | 5.0ms |
 | 4 | 4.9ms |
 | 5 | 4.7ms |
 **~5ms p50, stable.** ~20x improvement. Matches workers_500k_v1's ~6ms baseline.
 ADR-019's "O(1) random access via btree" claim is structurally vindicated. The 311μs projection from the 100K bench was an in-process Rust call; the live HTTP/JSON round-trip floor is ~5ms regardless of dataset size.
 ### Followup: close the IndexMeta-bypass gap
 The `lance-bench` binary writes datasets that the rest of the gateway can't see. Two reasonable fixes:
 1. **Auto-build scalar index inside `lance_migrate` HTTP handler** — every dataset created via the migrate route gets the btree before returning. Costs 1-2 seconds at ingest time, saves 100ms per doc-fetch forever after.
 2. **Have `lance-bench` register an IndexMeta entry** at the end of its run, so the existing warming code picks it up on next gateway start.
 Recommendation: do (1). It's a one-line addition next to the existing `build_index` call inside the handler, and it makes the migrate route self-sufficient — no caller needs to remember a follow-up build call.
 ## Compared to ADR-019 100K projections
 | Op | 100K (ADR-019) | 10M (today) | Notes |
 |---|---:|---:|---|
 | Search (cold) | 2229μs | ~46ms | 21x slower at 100x scale → reasonable for IVF_PQ |
 | Search (warm) | (not measured) | ~20ms | Warm cache converges nicely |
 | Doc fetch (no btree) | — | ~100ms | full scan, 35GB |
 | Doc fetch (post btree build) | 311μs | ~5ms | structural win confirmed; HTTP/JSON floor explains delta |
 | Index method | lance_ivf_pq | lance_ivf_pq | confirmed via response tag |
 ## What this means
 ADR-019's claim that "at 10M, Lance pulls ahead because HNSW doesn't fit in RAM" remains **unverified-but-not-refuted**. We can't directly compare to HNSW at 10M because HNSW's RAM footprint at 10M × 768d × 4 bytes = ~30 GB just for vectors, double that for the graph — way past any single-node deployment. So Lance "wins" at 10M by being the only contender that operationally exists.
 What the bench DID surface:
 - **Search at 10M works at production-shape latency** (~20ms warm). Acceptable for batch / async / non-conversational workloads. Too slow for sub-10ms voice or recommendation paths.
 - **Doc-fetch at 10M is fast (~5ms) once the scalar btree is built.** Pre-build was ~100ms (full scan). Built in 1.2s, +269MB on disk. ADR-019's structural claim holds.
 - **The auto-build only fires for IndexMeta-registered datasets.** `lance-bench` bypasses IndexMeta, so its datasets need either a manual `POST /vectors/lance/scalar-index/<name>/doc_id` after migration, or a one-line fix to the `lance_migrate` handler that builds the btree inline. Recommend the inline fix.
 - **Sanitizer fix held under load** — no 500-with-leak surfaced even on rare query pattern (TIG aluminum). The fix is robust to long-tail queries.
 ## Repro
 ```bash
 # Search latency, single query
 curl -sS -X POST http://127.0.0.1:3100/vectors/lance/search/scale_test_10m \
  -H 'Content-Type: application/json' \
  -d '{"query":"forklift operator","top_k":10}' | jq '.latency_us'
 # Doc fetch by id
 curl -sS http://127.0.0.1:3100/vectors/lance/doc/scale_test_10m/VEC-2196862 \
  | jq '.latency_us'
 ```
--- a/scripts/e2e_pipeline_check.sh
+++ b/scripts/e2e_pipeline_check.sh
@ -0,0 +1,536 @@
 #!/usr/bin/env bash
 # ------------------------------------------------------------
 # End-to-end pipeline verification for Lakehouse.
 #
 # Generates realistic staffing-style data, runs it through every
 # shipped pipeline stage, asserts correctness at each step, and
 # cleans up after itself.
 #
 # Stages exercised:
 #   0. Preflight                     — gateway + sidecar reachability
 #   1. Data generation               — 1000 candidates, 200 placements, 10 resumes
 #   2. CSV ingest                    — Phase 6.1 (via ?name= query param)
 #   3. NDJSON ingest                 — Phase 6.2
 #   4. SQL queries + joins           — Phase 2, Phase 8 hot cache
 #   5. Content-hash re-ingest dedup  — Phase 6.4
 #   6. Idempotent register           — ADR-020 (same-fingerprint path)
 #   7. Schema-drift rejection        — ADR-020 (409 Conflict path)
 #   8. Catalog dedupe no-op          — ADR-020 (clean state)
 #   9. Metadata enrichment           — Phase 10 POST
 #  10. PII auto-detection audit      — Phase 10
 #  11. Vector index + search         — Phase 7 (documents pulled via SQL)
 #  12. Cleanup + baseline verify     — no-orphan guarantee
 #
 # Usage:
 #   ./scripts/e2e_pipeline_check.sh              # run all stages
 #   SKIP_VECTOR=1 ./scripts/e2e_pipeline_check.sh # skip Ollama-bound steps
 #   KEEP_DATA=1   ./scripts/e2e_pipeline_check.sh # leave /tmp artifacts
 #
 # Exit codes:
 #   0  all assertions passed
 #   1  one or more assertions failed
 #   2  preflight failed (service unreachable)
 # ------------------------------------------------------------
 set -u
 set -o pipefail
 GATEWAY="${GATEWAY:-http://localhost:3100}"
 SIDECAR="${SIDECAR:-http://localhost:3200}"
 WORKDIR="${WORKDIR:-/tmp/lakehouse_e2e}"
 DATA_ROOT="${DATA_ROOT:-/home/profit/lakehouse/data}"
 SKIP_VECTOR="${SKIP_VECTOR:-0}"
 KEEP_DATA="${KEEP_DATA:-0}"
 RUN_ID="e2e_$(date +%s)"
 CAND_DS="${RUN_ID}_candidates"
 PLACE_DS="${RUN_ID}_placements"
 RESUME_DS="${RUN_ID}_resumes"
 VEC_IDX="${RESUME_DS}_v1"
 # Color names use a CC_ prefix so they can't be shadowed by single-letter
 # local variables like `R` that hold curl responses elsewhere in the script.
 if [[ -t 1 ]]; then
    CC_GRN=$'\033[0;32m'; CC_RED=$'\033[0;31m'; CC_YLW=$'\033[1;33m'
    CC_BLU=$'\033[1;34m'; CC_DIM=$'\033[2m';    CC_RST=$'\033[0m'
 else
    CC_GRN=''; CC_RED=''; CC_YLW=''; CC_BLU=''; CC_DIM=''; CC_RST=''
 fi
 PASS=0; FAIL=0; WARN=0; STARTED_AT=$(date +%s)
 FAILURES=()
 pass() { printf '  %s✓%s %s\n' "$CC_GRN" "$CC_RST" "$1"; PASS=$((PASS+1)); }
 fail() { printf '  %s✗%s %s\n' "$CC_RED" "$CC_RST" "$1"; FAIL=$((FAIL+1)); FAILURES+=("$1"); }
 warn() { printf '  %s!%s %s\n' "$CC_YLW" "$CC_RST" "$1"; WARN=$((WARN+1)); }
 step() { printf '\n%s== %s ==%s\n' "$CC_BLU" "$1" "$CC_RST"; }
 info() { printf '  %s%s%s\n' "$CC_DIM" "$1" "$CC_RST"; }
 die()  { printf '%sFATAL: %s%s\n' "$CC_RED" "$1" "$CC_RST" >&2; cleanup; exit 2; }
 assert_eq() {
    if [[ "$1" == "$2" ]]; then pass "$3 ($1)"; else fail "$3: got '$1', expected '$2'"; fi
 }
 http_code() {
    local method="$1" path="$2" data="${3:-}"
    if [[ -n "$data" ]]; then
        curl -s -o /dev/null -w '%{http_code}' -X "$method" "$GATEWAY$path" \
            -H 'Content-Type: application/json' -d "$data"
    else
        curl -s -o /dev/null -w '%{http_code}' -X "$method" "$GATEWAY$path"
    fi
 }
 # query_scalar <sql>  -> first column of first row as string, sentinel on empty/error
 query_scalar() {
    local sql="$1"
    local payload
    payload=$(python3 -c 'import json,sys; print(json.dumps({"sql": sys.argv[1]}))' "$sql")
    curl -s -X POST "$GATEWAY/query/sql" \
         -H 'Content-Type: application/json' \
         -d "$payload" \
      | python3 -c '
 import sys, json
 try:
    r = json.load(sys.stdin)
 except Exception:
    print("__PARSE_ERROR__"); sys.exit(0)
 if isinstance(r, dict) and "error" in r:
    sys.stderr.write("query error: " + str(r["error"]) + "\n")
    print("__ERROR__"); sys.exit(0)
 rows = r.get("rows") if isinstance(r, dict) else None
 if not rows:
    print("__NO_ROWS__"); sys.exit(0)
 row = rows[0]
 print(next(iter(row.values())))
 '
 }
 cleanup() {
    [[ "$KEEP_DATA" == "1" ]] && { info "KEEP_DATA=1 — leaving $WORKDIR"; return; }
    info "cleaning up test datasets for $RUN_ID"
    # Catch any previous-run zombies too: any catalog entry whose name
    # starts with "e2e_" is definitionally ours. Using DELETE (added for
    # this script's needs) purges both the live registry and the manifest
    # file atomically, so the next run doesn't trip on zombie entries
    # pointing at parquets we've already rm'd.
    local names
    names=$(curl -s "$GATEWAY/catalog/datasets" 2>/dev/null \
        | python3 -c "
 import sys, json
 try: ds = json.load(sys.stdin)
 except Exception: sys.exit(0)
 for d in ds:
    if d['name'].startswith('e2e_'):
        print(d['name'])
 " 2>/dev/null || true)
    local removed=0
    for n in $names; do
        curl -s -o /dev/null -X DELETE "$GATEWAY/catalog/datasets/by-name/$n" && removed=$((removed+1))
    done
    # Delete any stray parquet + vector artifacts we can positively
    # attribute to an e2e_ prefix.
    rm -f "$DATA_ROOT/datasets/"e2e_*.parquet 2>/dev/null || true
    rm -f "$DATA_ROOT/vectors/"e2e_*.parquet  2>/dev/null || true
    rm -rf "$WORKDIR" 2>/dev/null || true
    info "deleted $removed e2e datasets (covers this run + any prior zombies)"
 }
 trap cleanup EXIT
 # ============================================================
 # 0. Preflight
 # ============================================================
 step "0. Preflight"
 curl -sf -m 3 "$GATEWAY/health" >/dev/null 2>&1 || die "gateway not reachable at $GATEWAY"
 pass "gateway /health (200)"
 SIDECAR_UP=0
 if curl -sf -m 3 "$SIDECAR/health" >/dev/null 2>&1; then
    SIDECAR_UP=1; pass "sidecar /health (200)"
 else
    warn "sidecar unreachable — vector stage will be skipped"
    SKIP_VECTOR=1
 fi
 # Purge any e2e_* zombies from prior runs (stale registry entries that
 # would otherwise break DataFusion schema inference for every query).
 ZOMBIES=$(curl -s "$GATEWAY/catalog/datasets" 2>/dev/null \
    | python3 -c "
 import sys, json
 try: ds = json.load(sys.stdin)
 except Exception: sys.exit(0)
 for d in ds:
    if d['name'].startswith('e2e_'):
        print(d['name'])
 " 2>/dev/null || true)
 if [[ -n "$ZOMBIES" ]]; then
    ZCOUNT=$(echo "$ZOMBIES" | wc -l | tr -d ' ')
    for n in $ZOMBIES; do
        curl -s -o /dev/null -X DELETE "$GATEWAY/catalog/datasets/by-name/$n"
    done
    info "pre-cleaned $ZCOUNT e2e_ zombies from prior runs"
 fi
 BASELINE=$(curl -s "$GATEWAY/catalog/datasets" | python3 -c 'import sys,json; print(len(json.load(sys.stdin)))')
 info "baseline dataset count: $BASELINE"
 # ============================================================
 # 1. Generate realistic data
 # ============================================================
 step "1. Generate realistic staffing data"
 mkdir -p "$WORKDIR"
 # Seed with RUN_ID (which embeds the wall-clock timestamp) so each run
 # produces different content. Otherwise the content-hash dedup from
 # Phase 6.4 keys off a stale hash that lingers in the live registry
 # until the next gateway restart, and subsequent runs silently dedupe.
 python3 - "$WORKDIR" "$RUN_ID" <<'PYEOF'
 import csv, json, random, sys, os
 workdir, run_id = sys.argv[1], sys.argv[2]
 # Mix RUN_ID into the seed so content differs per run, but keep it
 # deterministic within a single run.
 random.seed(hash(run_id) & 0x7FFFFFFF)
 FIRST = ['Aisha','Brandon','Carlos','Daria','Eli','Fiona','Gabriel','Hana','Ian','Julia',
        'Kofi','Lena','Mateo','Nadia','Oscar','Priya','Quinn','Raj','Sofia','Tomas',
        'Uma','Victor','Wendy','Xander','Yuki','Zara']
 LAST  = ['Adams','Brown','Chen','Davis','Evans','Fisher','Garcia','Hughes','Ibrahim','Johnson',
        'Kim','Lopez','Martinez','Nguyen','Ortiz','Patel','Rossi','Singh','Thomas','Umar',
        'Vargas','Williams','Xu','Young','Zhang','OConnor']
 PLACES = [('Chicago','IL'),('New York','NY'),('San Francisco','CA'),('Austin','TX'),
          ('Seattle','WA'),('Denver','CO'),('Boston','MA'),('Atlanta','GA'),
          ('Miami','FL'),('Phoenix','AZ')]
 SKILL_GROUPS = [
    ['Python','AWS','Docker'],['Java','Spring','Kubernetes'],
    ['React','TypeScript','Node'],['Go','PostgreSQL','gRPC'],
    ['Rust','DataFusion','Parquet'],['C#','.NET','Azure'],
    ['Ruby','Rails','Redis'],['Scala','Spark','Kafka'],
    ['Swift','iOS','CoreData'],['Kotlin','Android','Jetpack'],
 ]
 STATUSES = ['active','placed','inactive','blocked']
 STATUS_WEIGHTS = [60, 25, 10, 5]
 with open(os.path.join(workdir, 'candidates.csv'), 'w', newline='') as f:
    w = csv.DictWriter(f, fieldnames=[
        'candidate_id','first_name','last_name','email','phone',
        'city','state','skills','years_experience','hourly_rate_usd','status'])
    w.writeheader()
    for i in range(1, 1001):
        fn, ln = random.choice(FIRST), random.choice(LAST)
        city, state = random.choice(PLACES)
        w.writerow({
            'candidate_id': f'CAND-{i:05d}',
            'first_name': fn, 'last_name': ln,
            'email': f'{fn.lower()}.{ln.lower()}{i}@example.com',
            'phone': f'({random.randint(200,999)}) {random.randint(200,999)}-{random.randint(1000,9999)}',
            'city': city, 'state': state,
            'skills': ','.join(random.choice(SKILL_GROUPS)),
            'years_experience': random.randint(0, 20),
            'hourly_rate_usd': random.randint(35, 185),
            'status': random.choices(STATUSES, weights=STATUS_WEIGHTS)[0],
        })
 CLIENTS = ['Acme Corp','Globex','Initech','Umbrella','Wayne Enterprises',
           'Stark Industries','Tyrell','Cyberdyne','Massive Dynamic','Oscorp']
 with open(os.path.join(workdir, 'placements.ndjson'), 'w') as f:
    for i in range(1, 201):
        f.write(json.dumps({
            'placement_id': f'PLACE-{i:04d}',
            'candidate_id': f'CAND-{random.randint(1,1000):05d}',
            'client': random.choice(CLIENTS),
            'start_date': f'2026-{random.randint(1,4):02d}-{random.randint(1,28):02d}',
            'weekly_hours': random.choice([20,25,30,35,40]),
            'bill_rate': random.randint(80, 250),
            'placement_status': random.choice(['active','completed','terminated']),
        }) + '\n')
 RESUMES = [
    'Senior Python engineer with 8 years of cloud infrastructure experience. Expert in AWS, Docker, and distributed systems design. Led migration of monolithic legacy system to microservices.',
    'Full-stack React and TypeScript developer specializing in real-time dashboards. Built financial trading interfaces. GraphQL, WebSocket, performance optimization.',
    'Data engineer with deep Apache Spark and Kafka expertise. Seven years on streaming analytics pipelines processing billions of events per day. Scala and Python.',
    'Embedded systems engineer with C++ and Rust experience. Worked on automotive ADAS systems and industrial IoT devices. Low-level firmware, RTOS.',
    'DevOps engineer with Kubernetes and Terraform expertise. Six years at hypergrowth startups. Prometheus, Grafana, and observability tooling.',
    'Machine learning engineer specializing in NLP. Built production transformer-based systems. PyTorch, Hugging Face, fine-tuning large language models.',
    'iOS developer with Swift and SwiftUI. Four years building consumer apps at mid-size tech companies. Offline-first architectures and CoreData.',
    'Backend Go developer focused on high-throughput APIs. Built payment processing systems handling millions of transactions. PostgreSQL, gRPC, Redis.',
    'Security engineer with penetration testing and threat modeling experience. OSCP certified. Web application security, AppSec code review, SAST and DAST tooling.',
    'Site reliability engineer with Linux internals and performance tuning expertise. Ten years at large-scale infrastructure. Tracing, profiling, kernel-level debugging.',
 ]
 with open(os.path.join(workdir, 'resumes.ndjson'), 'w') as f:
    for i, r in enumerate(RESUMES, 1):
        f.write(json.dumps({'doc_id': f'RES-{i:03d}', 'resume_text': r}) + '\n')
 PYEOF
 pass "candidates.csv  (1000 rows, 11 cols)"
 pass "placements.ndjson (200 rows, 7 cols)"
 pass "resumes.ndjson   (10 rows, 2 cols)"
 # ============================================================
 # 2. CSV ingest
 # ============================================================
 step "2. CSV ingest (Phase 6.1)"
 R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$CAND_DS" -F "file=@$WORKDIR/candidates.csv")
 echo "$R" | python3 -c 'import sys,json; json.load(sys.stdin)' 2>/dev/null \
    || { fail "ingest response was not JSON: $(echo "$R" | head -c 200)"; R='{}'; }
 ROWS=$(echo "$R"  | python3 -c 'import sys,json; print(json.load(sys.stdin).get("rows",-1))' 2>/dev/null)
 DEDUP=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("deduplicated","?"))' 2>/dev/null)
 DS_NAME=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("dataset_name","?"))' 2>/dev/null)
 assert_eq "$DS_NAME" "$CAND_DS" "ingest respected ?name= query param"
 assert_eq "$ROWS"    "1000"     "ingest rows"
 assert_eq "$DEDUP"   "False"    "first upload not deduplicated"
 REG_ROWS=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS" \
    | python3 -c 'import sys,json; print(json.load(sys.stdin).get("row_count","null"))')
 assert_eq "$REG_ROWS" "1000" "manifest row_count reflects ingest"
 # ============================================================
 # 3. NDJSON ingest
 # ============================================================
 step "3. NDJSON ingest (Phase 6.2)"
 R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$PLACE_DS" -F "file=@$WORKDIR/placements.ndjson")
 ROWS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("rows",-1))' 2>/dev/null)
 assert_eq "$ROWS" "200" "placements NDJSON ingest rows"
 R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$RESUME_DS" -F "file=@$WORKDIR/resumes.ndjson")
 ROWS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("rows",-1))' 2>/dev/null)
 assert_eq "$ROWS" "10" "resumes NDJSON ingest rows"
 # ============================================================
 # 4. SQL queries + JOIN + cache
 # ============================================================
 step "4. SQL queries (Phase 2, Phase 8)"
 N=$(query_scalar "SELECT COUNT(*) FROM $CAND_DS")
 assert_eq "$N" "1000" "candidates COUNT(*)"
 N=$(query_scalar "SELECT COUNT(*) FROM $CAND_DS WHERE status = 'active'")
 if [[ "$N" =~ ^[0-9]+$ ]] && (( N > 400 && N < 700 )); then
    pass "active candidates in plausible range ($N, expect ~600)"
 else
    fail "active candidates count out of range: $N"
 fi
 N=$(query_scalar "
    SELECT COUNT(DISTINCT c.candidate_id)
    FROM $CAND_DS c
    JOIN $PLACE_DS p ON c.candidate_id = p.candidate_id
    WHERE p.placement_status = 'active'
 ")
 if [[ "$N" =~ ^[0-9]+$ ]] && (( N > 0 && N <= 200 )); then
    pass "cross-dataset JOIN with filter returns $N rows"
 else
    fail "JOIN returned unexpected count: $N"
 fi
 AVG=$(query_scalar "SELECT AVG(hourly_rate_usd) FROM $CAND_DS")
 if python3 -c "import sys; v=float('$AVG'); sys.exit(0 if 100 < v < 130 else 1)" 2>/dev/null; then
    pass "average hourly rate in plausible range ($AVG, expect ~110)"
 else
    fail "average hourly rate out of range: $AVG"
 fi
 CODE=$(http_code POST "/query/cache/pin" "{\"dataset\":\"$CAND_DS\"}")
 assert_eq "$CODE" "200" "cache pin HTTP"
 # ============================================================
 # 5. Content-hash re-ingest dedup (Phase 6.4)
 # ============================================================
 step "5. Content-hash re-ingest dedup"
 R=$(curl -s -X POST "$GATEWAY/ingest/file?name=$CAND_DS" -F "file=@$WORKDIR/candidates.csv")
 DEDUP=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("deduplicated","?"))' 2>/dev/null)
 assert_eq "$DEDUP" "True" "re-upload same file is deduplicated"
 # ============================================================
 # 6. Idempotent register — same fingerprint (ADR-020)
 # ============================================================
 step "6. Idempotent register (ADR-020 same-fp path)"
 DS=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS")
 FP=$(echo "$DS"    | python3 -c 'import sys,json; print(json.load(sys.stdin)["schema_fingerprint"])')
 OBJS=$(echo "$DS"  | python3 -c 'import sys,json,json as j; print(j.dumps(json.load(sys.stdin)["objects"]))')
 ID_BEFORE=$(echo "$DS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["id"])')
 PAYLOAD=$(python3 -c "import json,sys; print(json.dumps({'name':sys.argv[1],'schema_fingerprint':sys.argv[2],'objects':json.loads(sys.argv[3])}))" "$CAND_DS" "$FP" "$OBJS")
 CODE=$(http_code POST "/catalog/datasets" "$PAYLOAD")
 assert_eq "$CODE" "201" "same-fp re-register returns 201"
 ID_AFTER=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["id"])')
 assert_eq "$ID_AFTER" "$ID_BEFORE" "same DatasetId after re-register"
 COUNT=$(curl -s "$GATEWAY/catalog/datasets" | python3 -c "import sys,json; print(sum(1 for d in json.load(sys.stdin) if d['name']=='$CAND_DS'))")
 assert_eq "$COUNT" "1" "no duplicate manifest created"
 # ============================================================
 # 7. Schema-drift rejection (409)
 # ============================================================
 step "7. Schema-drift rejection (ADR-020 409 path)"
 PAYLOAD=$(python3 -c "import json,sys; print(json.dumps({'name':sys.argv[1],'schema_fingerprint':'deadbeefnotmatching','objects':json.loads(sys.argv[2])}))" "$CAND_DS" "$OBJS")
 CODE=$(http_code POST "/catalog/datasets" "$PAYLOAD")
 assert_eq "$CODE" "409" "different-fp rejected with 409"
 # ============================================================
 # 8. Dedupe no-op on clean catalog
 # ============================================================
 step "8. Dedupe no-op on clean state"
 R=$(curl -s -X POST "$GATEWAY/catalog/dedupe")
 GROUPS=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin)["groups"])')
 REMOVED=$(echo "$R" | python3 -c 'import sys,json; print(json.load(sys.stdin)["removed"])')
 assert_eq "$GROUPS" "0" "dedupe groups (clean catalog)"
 assert_eq "$REMOVED" "0" "dedupe removed count"
 # ============================================================
 # 9. Metadata enrichment (Phase 10)
 # ============================================================
 step "9. Metadata enrichment (Phase 10)"
 CODE=$(http_code POST "/catalog/datasets/by-name/$CAND_DS/metadata" \
    "{\"owner\":\"e2e-test\",\"description\":\"$RUN_ID synthetic candidates\",\"tags\":[\"test\",\"synthetic\"]}")
 assert_eq "$CODE" "200" "POST metadata HTTP"
 META=$(curl -s "$GATEWAY/catalog/datasets/by-name/$CAND_DS")
 OWNER=$(echo "$META" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("owner",""))')
 assert_eq "$OWNER" "e2e-test" "owner persisted"
 # ============================================================
 # 10. PII auto-detection (Phase 10)
 # ============================================================
 step "10. PII auto-detection (Phase 10)"
 PII_COLS=$(echo "$META" | python3 -c '
 import sys, json
 m = json.load(sys.stdin)
 pii = [c["name"] for c in m.get("columns",[]) if c.get("is_pii") or (isinstance(c.get("sensitivity"),str) and c["sensitivity"].lower()=="pii")]
 print(" ".join(pii) if pii else "__NONE__")')
 if [[ "$PII_COLS" == *"email"* ]] && [[ "$PII_COLS" == *"phone"* ]]; then
    pass "email and phone flagged as PII ($PII_COLS)"
 elif [[ "$PII_COLS" == "__NONE__" ]]; then
    warn "no PII flagged — auto-detection may not run on this path"
 else
    warn "partial PII detection: $PII_COLS"
 fi
 # ============================================================
 # 11. Vector index + semantic search (Phase 7)
 # ============================================================
 step "11. Vector index + semantic search (Phase 7)"
 if [[ "$SKIP_VECTOR" == "1" ]]; then
    warn "SKIP_VECTOR=1 — skipping vector pipeline"
 else
    # Pull documents out of the ingested resumes dataset via SQL,
    # then feed to the inline /vectors/index body. This exercises
    # the query→embed integration rather than pre-canned input.
    DOCS=$(curl -s -X POST "$GATEWAY/query/sql" \
         -H 'Content-Type: application/json' \
         -d "$(python3 -c "import json; print(json.dumps({'sql': 'SELECT doc_id, resume_text FROM $RESUME_DS'}))")" \
      | python3 -c '
 import sys, json
 r = json.load(sys.stdin)
 docs = [{"id": row["doc_id"], "text": row["resume_text"]} for row in r.get("rows", [])]
 print(json.dumps(docs))')
    DOC_COUNT=$(echo "$DOCS" | python3 -c 'import sys,json; print(len(json.load(sys.stdin)))')
    assert_eq "$DOC_COUNT" "10" "pulled docs via SQL for embedding"
    PAYLOAD=$(python3 -c "
 import json, sys
 print(json.dumps({
    'index_name': sys.argv[1],
    'source':     sys.argv[2],
    'documents':  json.loads(sys.argv[3]),
    'chunk_size': 500,
    'overlap':    50,
 }))" "$VEC_IDX" "$RESUME_DS" "$DOCS")
    R=$(curl -s -X POST "$GATEWAY/vectors/index" -H 'Content-Type: application/json' -d "$PAYLOAD")
    JOB_ID=$(echo "$R" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(d.get("job_id","__NONE__"))' 2>/dev/null)
    if [[ "$JOB_ID" == "__NONE__" || -z "$JOB_ID" ]]; then
        fail "vector index job rejected: $(echo "$R" | head -c 200)"
    else
        pass "embedding job accepted (job=$JOB_ID)"
        # Poll up to 90s for 10 short resumes; Ollama cold-start can be slow.
        JOB_STATUS="unknown"
        for _ in $(seq 1 45); do
            JOB_STATUS=$(curl -s "$GATEWAY/vectors/jobs/$JOB_ID" 2>/dev/null \
                | python3 -c '
 import sys, json
 try: print(json.load(sys.stdin).get("status","?"))
 except Exception: print("?")' 2>/dev/null)
            [[ "$JOB_STATUS" == "completed" || "$JOB_STATUS" == "Completed" ]] && break
            [[ "$JOB_STATUS" == "failed"    || "$JOB_STATUS" == "Failed"    ]] && break
            sleep 2
        done
        case "$JOB_STATUS" in
            completed|Completed)
                pass "embedding job completed"
                R=$(curl -s -X POST "$GATEWAY/vectors/search" \
                    -H 'Content-Type: application/json' \
                    -d "{\"index_name\":\"$VEC_IDX\",\"query\":\"fine-tuning large language models\",\"k\":3}")
                TOP_DOC=$(echo "$R" | python3 -c '
 import sys, json
 r = json.load(sys.stdin)
 if r.get("results"): print(r["results"][0].get("doc_id","?"))
 else: print("__NONE__")' 2>/dev/null)
                if [[ "$TOP_DOC" == "RES-006" ]]; then
                    pass "top match is ML/NLP resume (semantically correct)"
                elif [[ "$TOP_DOC" == "__NONE__" ]]; then
                    fail "search returned no results"
                else
                    warn "top match is $TOP_DOC (expected RES-006 — ranking may vary)"
                fi ;;
            *)
                fail "embedding job did not complete (status=$JOB_STATUS)" ;;
        esac
    fi
 fi
 # ============================================================
 # 12. Cleanup + baseline verify
 # ============================================================
 step "12. Cleanup + baseline verify"
 cleanup
 trap - EXIT
 ON_DISK=$(ls "$DATA_ROOT/_catalog/manifests"/*.json 2>/dev/null | wc -l | tr -d ' ')
 info "manifest files on disk now: $ON_DISK"
 DISK_ORPHANS=0
 if compgen -G "$DATA_ROOT/_catalog/manifests/*.json" > /dev/null; then
    DISK_ORPHANS=$(grep -l "\"$RUN_ID" "$DATA_ROOT/_catalog/manifests"/*.json 2>/dev/null | wc -l | tr -d ' ')
 fi
 assert_eq "$DISK_ORPHANS" "0" "no orphan manifest files on disk for $RUN_ID"
 LIVE_ORPHANS=$(curl -s "$GATEWAY/catalog/datasets" \
    | python3 -c "import sys,json; print(sum(1 for d in json.load(sys.stdin) if d['name'].startswith('$RUN_ID')))")
 if [[ "$LIVE_ORPHANS" != "0" ]]; then
    warn "$LIVE_ORPHANS entries linger in live registry (clears on gateway restart; on-disk is ground truth)"
 fi
 # ============================================================
 # Summary
 # ============================================================
 ELAPSED=$(( $(date +%s) - STARTED_AT ))
 printf '\n%s─── Summary ───%s\n' "$CC_BLU" "$CC_RST"
 printf '  run_id:   %s\n'          "$RUN_ID"
 printf '  elapsed:  %ss\n'         "$ELAPSED"
 printf '  passed:   %s%d%s\n'      "$CC_GRN" "$PASS" "$CC_RST"
 printf '  failed:   %s%d%s\n'      "$CC_RED" "$FAIL" "$CC_RST"
 printf '  warnings: %s%d%s\n'      "$CC_YLW" "$WARN" "$CC_RST"
 if (( FAIL > 0 )); then
    printf '\n%sfailures:%s\n' "$CC_RED" "$CC_RST"
    for f in "${FAILURES[@]}"; do printf '  - %s\n' "$f"; done
    exit 1
 fi
 exit 0
--- a/scripts/lance_smoke.sh
+++ b/scripts/lance_smoke.sh
@ -0,0 +1,104 @@
 #!/usr/bin/env bash
 # lance smoke — gates the 5 /vectors/lance/* HTTP routes (search, doc,
 # index, append, migrate). Only the read paths are exercised here so a
 # CI run doesn't mutate state. Migrate + index + append have shape
 # probes (request bodies are well-formed) but ride the not-found path
 # that the 2026-05-02 audit added.
 #
 # Targets the live gateway at $LH_GATEWAY (default :3100). Uses an
 # existing on-disk Lance dataset — `workers_500k_v1` — so no
 # migration setup is needed. If the dataset is missing the smoke
 # fails loudly with a clear message.
 #
 # Surfaced 2026-05-02: the lance crates had zero tests + no smoke;
 # substrate change to lance_backend.rs would silently break the live
 # surface. This smoke is the regression gate.
 #
 # Usage:
 #   ./scripts/lance_smoke.sh
 #   LH_GATEWAY=http://127.0.0.1:3100 ./scripts/lance_smoke.sh
 set -euo pipefail
 GATEWAY="${LH_GATEWAY:-http://127.0.0.1:3100}"
 DATASET="${LH_LANCE_DATASET:-workers_500k_v1}"
 PREFIX="$GATEWAY/vectors/lance"
 PASS=0; FAIL=0
 PROBE() { local label="$1"; shift; "$@" && { echo "  ✓ $label"; PASS=$((PASS+1)); } || { echo "  ✗ $label"; FAIL=$((FAIL+1)); }; }
 echo "[lance-smoke] gateway=$GATEWAY dataset=$DATASET"
 # ── 0. Gateway alive ─────────────────────────────────────────────
 PROBE "gateway /v1/health responds" \
  bash -c "curl -sf -m 3 $GATEWAY/v1/health -o /dev/null"
 # ── 1. Search returns IVF_PQ results on existing dataset ────────
 # Capture curl status separately so a transport-level failure (gateway
 # down, network broken, timeout) shows up as its own probe — instead of
 # being swallowed by `|| echo '{}'` which would surface as the next jq
 # probe failing with a misleading "no method field" message. Per opus
 # INFO at lance_smoke.sh:38 from the 2026-05-02 scrum.
 RESP=$(curl -sS -m 30 -X POST "$PREFIX/search/$DATASET" \
  -H 'Content-Type: application/json' \
  -d '{"query":"forklift operator","top_k":3}' 2>/dev/null)
 CURL_RC=$?
 PROBE "search/$DATASET curl reachable (exit 0)" \
  test "$CURL_RC" = "0"
 [ "$CURL_RC" != "0" ] && RESP='{}'
 PROBE "search/$DATASET returns top-3 lance_ivf_pq results" \
  bash -c "echo '$RESP' | jq -e '.method == \"lance_ivf_pq\" and (.results | length) == 3' >/dev/null"
 # Capture one doc_id from those results so the next probe has something real to fetch.
 DOC_ID=$(echo "$RESP" | jq -r '.results[0].doc_id // ""')
 # ── 2. get_doc by id returns the row ────────────────────────────
 PROBE "doc/$DATASET/<known-id> returns full row" \
  bash -c "[ -n '$DOC_ID' ] && curl -sf -m 5 '$PREFIX/doc/$DATASET/$DOC_ID' | jq -e '.row.doc_id == \"$DOC_ID\"' >/dev/null"
 # ── 3. get_doc with bogus id returns 404 (not 500) ──────────────
 STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_404.json -w '%{http_code}' \
  "$PREFIX/doc/$DATASET/W500K-NOT-A-REAL-ID-00000")
 PROBE "doc/$DATASET/<missing-id> → 404" \
  test "$STATUS" = "404"
 # ── 4. search on missing dataset returns 404 + sanitized message ─
 STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_500.json -w '%{http_code}' \
  -X POST "$PREFIX/search/no-such-dataset-${RANDOM}" \
  -H 'Content-Type: application/json' \
  -d '{"query":"x","top_k":1}')
 BODY=$(cat /tmp/lance_smoke_500.json)
 PROBE "search/<missing> → 404 (was 500 pre-2026-05-02)" \
  test "$STATUS" = "404"
 # Assert "pattern absent" — `! grep -qE` (NOT `grep -qvE` which is unsound:
 # -v -q exits 0 if ANY line lacks the pattern, so a multi-line body containing
 # both a leak line AND any clean line would false-PASS. Caught 2026-05-02 by
 # opus scrum on the lance backend wave.)
 PROBE "search/<missing> body sanitized — no filesystem leak" \
  bash -c "! echo '$BODY' | grep -qE '/home/|/root/\.cargo/|/var/|/tmp/'"
 # ── 5. build_index on missing dataset also sanitized ────────────
 STATUS=$(curl -sS -m 5 -o /tmp/lance_smoke_idx.json -w '%{http_code}' \
  -X POST "$PREFIX/index/no-such-dataset-${RANDOM}" \
  -H 'Content-Type: application/json' \
  -d '{}')
 BODY=$(cat /tmp/lance_smoke_idx.json)
 PROBE "index/<missing> body sanitized" \
  bash -c "! echo '$BODY' | grep -qE '/home/|/root/\.cargo/|/var/|/tmp/'"
 # ── 6. append validates input shape (rejects empty rows array) ──
 STATUS=$(curl -sS -m 5 -o /dev/null -w '%{http_code}' \
  -X POST "$PREFIX/append/$DATASET" \
  -H 'Content-Type: application/json' \
  -d '{"rows":[]}')
 PROBE "append with empty rows[] → 400" \
  test "$STATUS" = "400"
 # ── 7. migrate route is reachable (POST without body returns a real error, not 404) ──
 STATUS=$(curl -sS -m 5 -o /dev/null -w '%{http_code}' \
  -X POST "$PREFIX/migrate/probe-not-real-${RANDOM}?bucket=primary" 2>/dev/null)
 # Should be 4xx (bad request shape), NOT 404 (route registered) and NOT 200.
 PROBE "migrate route registered (non-404, non-200 on empty body)" \
  bash -c "[ '$STATUS' != '404' ] && [ '$STATUS' != '200' ]"
 echo "[lance-smoke] $PASS PASS / $FAIL FAIL"
 [ "$FAIL" -eq 0 ]
--- a/scripts/production_smoke.sh
+++ b/scripts/production_smoke.sh
@ -0,0 +1,157 @@
 #!/usr/bin/env bash
 # Production substrate smoke — single command that verifies every
 # production-critical surface end-to-end. Exits non-zero on the first
 # failure so an operator can run this before:
 #   - Swapping workers_500k.parquet → real Chicago contractor data
 #   - Spinning up the Asterisk voice agent against /v1/chat
 #   - Running staffing inference loops via /v1/iterate
 #   - Wiring the assistant against the gateway
 #
 # Usage:
 #   ./scripts/production_smoke.sh
 #
 # Tunable via env:
 #   GATEWAY=http://localhost:3100   # gateway base URL
 #   FAIL_FAST=1                     # exit on first failure (default 1)
 #   VERBOSE=1                       # print full responses on success too
 set -e
 GATEWAY="${GATEWAY:-http://localhost:3100}"
 FAIL_FAST="${FAIL_FAST:-1}"
 VERBOSE="${VERBOSE:-0}"
 PASS=0
 FAIL=0
 FAILURES=()
 check() {
    local name="$1"
    local expected_status="$2"
    local cmd="$3"
    echo -n "  [$(($PASS + $FAIL + 1))] $name ... "
    local resp
    resp=$(eval "$cmd" 2>&1) || true
    local status="${resp%%|||*}"
    local body="${resp#*|||}"
    if [ "$status" = "$expected_status" ]; then
        PASS=$((PASS + 1))
        echo "✓ ($status)"
        if [ "$VERBOSE" = "1" ]; then echo "      $body" | head -3 | sed 's/^/      /'; fi
    else
        FAIL=$((FAIL + 1))
        FAILURES+=("$name: expected $expected_status, got $status")
        echo "✗ (got $status, expected $expected_status)"
        echo "      $body" | head -3 | sed 's/^/      /'
        [ "$FAIL_FAST" = "1" ] && { print_summary; exit 1; }
    fi
 }
 curl_with_status() {
    # Run curl, capture HTTP status + body, format as "status|||body"
    local args=("$@")
    curl -sS -w "\n%{http_code}" "${args[@]}" 2>&1 | awk '
        { lines[NR]=$0 }
        END {
            status=lines[NR]
            body=""
            for (i=1; i<NR; i++) body=body lines[i] (i<NR-1?"\n":"")
            print status "|||" body
        }
    '
 }
 print_summary() {
    echo ""
    echo "═══════════════════════════════════════════════════════════════"
    echo "  $PASS passed · $FAIL failed"
    if [ ${#FAILURES[@]} -gt 0 ]; then
        echo "  failures:"
        for f in "${FAILURES[@]}"; do echo "    - $f"; done
    fi
    echo "═══════════════════════════════════════════════════════════════"
 }
 echo "Production substrate smoke test against $GATEWAY"
 echo ""
 # ─── 1. Liveness ─────────────────────────────────────────────────────
 echo "▶ Liveness"
 check "gateway /health" "200" \
    'curl_with_status -m 5 "$GATEWAY/health"'
 # ─── 2. Operational health ──────────────────────────────────────────
 echo "▶ Operational state"
 HEALTH_RESP=$(curl -sS -m 10 "$GATEWAY/v1/health" 2>&1) || HEALTH_RESP="{}"
 WORKERS_COUNT=$(echo "$HEALTH_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('workers_count',0))" 2>/dev/null || echo 0)
 PROVIDERS_OK=$(echo "$HEALTH_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin).get('providers_configured',{}); print(sum(1 for v in d.values() if v))" 2>/dev/null || echo 0)
 echo "  workers_count: $WORKERS_COUNT"
 echo "  providers_configured (count): $PROVIDERS_OK"
 if [ "$WORKERS_COUNT" -lt 1 ]; then
    FAIL=$((FAIL + 1))
    FAILURES+=("workers_count=0 — parquet load failed or empty")
    echo "  ✗ workers not loaded"
    [ "$FAIL_FAST" = "1" ] && { print_summary; exit 1; }
 else
    PASS=$((PASS + 1))
    echo "  ✓ workers loaded"
 fi
 # ─── 3. Truth Layer ──────────────────────────────────────────────────
 echo "▶ Truth Layer"
 check "/v1/context returns rules" "200" \
    'curl_with_status -m 10 "$GATEWAY/v1/context"'
 # ─── 4. /v1/chat (provider=ollama) ──────────────────────────────────
 echo "▶ /v1/chat (provider=ollama, fast model)"
 check "/v1/chat ping" "200" \
    'curl_with_status -m 60 -X POST "$GATEWAY/v1/chat" \
        -H "content-type: application/json" \
        -d "{\"provider\":\"ollama\",\"model\":\"qwen3.5:latest\",\"messages\":[{\"role\":\"user\",\"content\":\"reply: PONG\"}],\"max_tokens\":30,\"temperature\":0,\"think\":false}"'
 # ─── 5. /v1/validate (negative + positive) ──────────────────────────
 echo "▶ /v1/validate"
 check "phantom candidate_id → 422 Consistency" "422" \
    'curl_with_status -m 10 -X POST "$GATEWAY/v1/validate" \
        -H "content-type: application/json" \
        -d "{\"kind\":\"fill\",\"artifact\":{\"fills\":[{\"candidate_id\":\"W-FAKE-0\",\"name\":\"Fake\"}]},\"context\":{\"target_count\":1}}"'
 check "real worker (W-1) → 200 OK" "200" \
    'curl_with_status -m 10 -X POST "$GATEWAY/v1/validate" \
        -H "content-type: application/json" \
        -d "{\"kind\":\"fill\",\"artifact\":{\"fills\":[{\"candidate_id\":\"W-1\",\"name\":\"Anyone\"}]},\"context\":{\"target_count\":1}}"'
 check "SSN in body → 422 Policy" "422" \
    'curl_with_status -m 10 -X POST "$GATEWAY/v1/validate" \
        -H "content-type: application/json" \
        -d "{\"kind\":\"email\",\"artifact\":{\"to\":\"a@b.com\",\"body\":\"Your SSN 123-45-6789 is on file.\"}}"'
 # ─── 6. /v1/iterate (bounded retry loop) ───────────────────────────
 # Phantom worker → expect 422 IterateFailure with history (not 200)
 echo "▶ /v1/iterate (bounded retry)"
 check "/v1/iterate phantom → bounded fail" "422" \
    'curl_with_status -m 240 -X POST "$GATEWAY/v1/iterate" \
        -H "content-type: application/json" \
        -d "{\"kind\":\"fill\",\"provider\":\"ollama\",\"model\":\"qwen3.5:latest\",\"system\":\"Reply with ONLY: {\\\"fills\\\":[{\\\"candidate_id\\\":\\\"W-99999999\\\",\\\"name\\\":\\\"X\\\"}]}\",\"prompt\":\"emit it\",\"context\":{\"target_count\":1},\"max_iterations\":1,\"max_tokens\":200,\"temperature\":0}"'
 # ─── 7. Doc-drift batch ─────────────────────────────────────────────
 echo "▶ Doc-drift scan"
 check "/vectors/playbook_memory/doc_drift/scan" "200" \
    'curl_with_status -m 60 -X POST "$GATEWAY/vectors/playbook_memory/doc_drift/scan"'
 # ─── 8. Usage tracking ──────────────────────────────────────────────
 echo "▶ Usage tracking"
 USAGE=$(curl -sS -m 10 "$GATEWAY/v1/usage" 2>&1)
 USAGE_REQS=$(echo "$USAGE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('requests',0))" 2>/dev/null || echo 0)
 echo "  usage.requests: $USAGE_REQS (should be > 0 if /v1/chat fired)"
 if [ "$USAGE_REQS" -ge 1 ]; then
    PASS=$((PASS + 1))
    echo "  ✓ /v1/usage tracking"
 else
    FAIL=$((FAIL + 1))
    FAILURES+=("/v1/usage didn't increment after /v1/chat call")
    echo "  ✗ /v1/usage didn't increment"
 fi
 print_summary
 [ $FAIL -eq 0 ] && exit 0 || exit 1
--- a/scripts/serve_imagegen.py
+++ b/scripts/serve_imagegen.py
@ -29,8 +29,14 @@ CACHE_DIR.mkdir(parents=True, exist_ok=True)
 WORKFLOW_PATH = "/opt/ComfyUI/workflows/editorial_hero.json"
-def _cache_key(prompt, width, height, steps):
+def _cache_key(prompt, width, height, steps, seed=None):
-    return hashlib.sha256(f"{prompt}|{width}|{height}|{steps}".encode()).hexdigest()[:24]
+    # Include seed so callers can vary outputs deterministically without
    # the proxy collapsing to a single cached image. None == legacy
    # (omitted from the key for backward compatibility).
    bits = f"{prompt}|{width}|{height}|{steps}"
    if seed is not None:
        bits += f"|{seed}"
    return hashlib.sha256(bits.encode()).hexdigest()[:24]
 def _cache_get(key):
    fp = CACHE_DIR / f"{key}.webp"
@ -40,8 +46,15 @@ def _cache_put(key, img_bytes):
    (CACHE_DIR / f"{key}.webp").write_bytes(img_bytes)
-def _comfyui_generate(prompt, width=1024, height=512, steps=8, seed=None):
+def _comfyui_generate(prompt, width=1024, height=512, steps=8, seed=None,
-    """Submit workflow to ComfyUI and wait for result."""
+                      negative_prompt=None, cfg=None, sampler=None, scheduler=None):
    """Submit workflow to ComfyUI and wait for result.
    Optional overrides — when provided, replace the workflow's defaults.
    The workflow template at editorial_hero.json was tuned for product
    hero shots with a "no humans" negative prompt; portrait callers MUST
    pass `negative_prompt` to avoid the model fighting them on faces.
    """
    # Load workflow template
    with open(WORKFLOW_PATH) as f:
        workflow = json.load(f)
@ -51,9 +64,21 @@ def _comfyui_generate(prompt, width=1024, height=512, steps=8, seed=None):
        seed = random.randint(0, 2**32)
    workflow["3"]["inputs"]["seed"] = seed
    workflow["3"]["inputs"]["steps"] = steps
    if cfg is not None:
        workflow["3"]["inputs"]["cfg"] = cfg
    if sampler:
        workflow["3"]["inputs"]["sampler_name"] = sampler
    if scheduler:
        workflow["3"]["inputs"]["scheduler"] = scheduler
    workflow["5"]["inputs"]["width"] = width
    workflow["5"]["inputs"]["height"] = height
    workflow["6"]["inputs"]["text"] = prompt
    # Node 7 is the negative-prompt CLIPTextEncode. The default is tuned
    # for product hero shots and contains "human, person, face, hand,
    # fingers, realistic photo of people" — actively sabotaging any
    # portrait render. Always overwrite when negative_prompt is given.
    if negative_prompt is not None:
        workflow["7"]["inputs"]["text"] = negative_prompt
    # Submit to ComfyUI
    payload = json.dumps({"prompt": workflow}).encode()
@ -177,9 +202,20 @@ class ImageHandler(BaseHTTPRequestHandler):
        height = min(max(int(body.get("height", 720)), 256), 1080)
        steps = min(max(int(body.get("steps", 50)), 1), 80)
        seed = body.get("seed")
        # Portrait-friendly overrides — None means "use workflow default".
        # negative_prompt MUST be passed by portrait callers to avoid
        # the workflow's "no humans" baked-in negative.
        negative_prompt = body.get("negative_prompt")
        cfg = body.get("cfg")
        sampler = body.get("sampler")
        scheduler = body.get("scheduler")
-        # Cache check
+        # Cache check — seed + negative + cfg are part of the key so per-
-        key = _cache_key(prompt, width, height, steps)
+        # worker / per-config requests don't collapse to one cached image.
        key = _cache_key(
            f"{prompt}||neg={negative_prompt or ''}||cfg={cfg or ''}",
            width, height, steps, seed,
        )
        cached = _cache_get(key)
        if cached:
            self._json(200, {"image": cached, "format": "webp", "width": width, "height": height,
@ -192,7 +228,11 @@ class ImageHandler(BaseHTTPRequestHandler):
        try:
            comfy_check = urllib.request.urlopen(f"{COMFYUI_URL}/system_stats", timeout=3)
            if comfy_check.status == 200:
-                img_bytes, seed = _comfyui_generate(prompt, width, height, steps, seed)
+                img_bytes, seed = _comfyui_generate(
                    prompt, width, height, steps, seed,
                    negative_prompt=negative_prompt, cfg=cfg,
                    sampler=sampler, scheduler=scheduler,
                )
                backend = "comfyui"
        except:
            pass
@ -210,6 +250,11 @@ class ImageHandler(BaseHTTPRequestHandler):
        elapsed_ms = int((time.time() - t0) * 1000)
        img_b64 = base64.b64encode(img_bytes).decode()
        # Recompute key with the actual seed used (when caller passed
        # None, _comfyui_generate picks a random one and we want the
        # cache to reflect that so re-requests with the same returned
        # seed hit the disk).
        key = _cache_key(prompt, width, height, steps, seed)
        _cache_put(key, img_bytes)
        self._json(200, {
--- a/scripts/staffing/fetch_face_pool.py
+++ b/scripts/staffing/fetch_face_pool.py
@ -0,0 +1,225 @@
 #!/usr/bin/env python3
 """
 fetch_face_pool.py — pull N synthetic headshots from
 https://thispersondoesnotexist.com/, write to data/headshots/face_NNNN.jpg,
 optionally tag each with gender via deepface, emit a JSONL manifest.
 Each fetch is a fresh StyleGAN face — no real people. Deterministic per
 worker mapping happens at serve time (mcp-server hashes the worker key
 into the pool); this script just builds the pool.
 Usage:
    python3 scripts/staffing/fetch_face_pool.py --count 300 --concurrency 3
    python3 scripts/staffing/fetch_face_pool.py --count 50  --no-gender
 Re-running is idempotent: existing face_NNNN.jpg files are skipped, and
 the manifest is rewritten from disk state.
 """
 from __future__ import annotations
 import argparse
 import hashlib
 import json
 import os
 import sys
 import time
 from concurrent.futures import ThreadPoolExecutor, as_completed
 import urllib.request
 import urllib.error
 URL = "https://thispersondoesnotexist.com/"
 UA = "Lakehouse/1.0 (face-pool fetch · synthetic-only · no real-person tracking)"
 def fetch_one(idx: int, out_dir: str) -> tuple[int, str, bool, str | None]:
    """Returns (idx, basename, cached, error)."""
    fname = f"face_{idx:04d}.jpg"
    full = os.path.join(out_dir, fname)
    if os.path.exists(full) and os.path.getsize(full) > 1024:
        return idx, fname, True, None
    try:
        req = urllib.request.Request(URL, headers={"User-Agent": UA})
        with urllib.request.urlopen(req, timeout=20) as resp:
            blob = resp.read()
        if len(blob) < 1024:
            return idx, fname, False, f"response too small ({len(blob)} bytes)"
        with open(full, "wb") as f:
            f.write(blob)
        return idx, fname, False, None
    except urllib.error.URLError as e:
        return idx, fname, False, f"urlerror: {e}"
    except Exception as e:
        return idx, fname, False, f"{type(e).__name__}: {e}"
 def maybe_tag_gender(records: list[dict], out_dir: str) -> dict[str, int]:
    """If deepface is installed, label records that don't already have a
    gender. Returns a count summary; mutates records in place.
    Preservation contract: never overwrites prior `gender` (or any other
    tag — race/age/excluded — set by tag_face_pool.py). On deepface
    import failure, leaves existing tags alone instead of resetting them
    to None. The previous behavior wiped 952 hand-classified rows when
    fetch_face_pool was re-run from a Python without deepface installed."""
    try:
        from deepface import DeepFace  # type: ignore
    except Exception as e:
        print(f"  (deepface unavailable: {e}) — leaving existing tags untouched")
        for r in records:
            r.setdefault("gender", None)
        already = sum(1 for r in records if r.get("gender") in ("man", "woman"))
        return {"preserved_tagged": already, "untagged": len(records) - already}
    todo = [r for r in records if r.get("gender") not in ("man", "woman")]
    if not todo:
        print("  every record already has gender — nothing to tag.")
        return {"preserved_tagged": len(records)}
    print(f"  tagging gender via deepface ({len(todo)} of {len(records)} records, CPU; ~0.5-1s per face)…")
    counts: dict[str, int] = {}
    for i, r in enumerate(todo):
        full = os.path.join(out_dir, r["file"])
        try:
            ana = DeepFace.analyze(
                img_path=full,
                actions=["gender"],
                enforce_detection=False,
                silent=True,
            )
            if isinstance(ana, list):
                ana = ana[0] if ana else {}
            g_raw = (ana.get("dominant_gender") or "").lower().strip()
            r["gender"] = (
                "man" if g_raw.startswith("man") else
                "woman" if g_raw.startswith("woman") else
                None
            )
        except Exception as e:
            r["gender"] = None
            r["gender_error"] = f"{type(e).__name__}: {e}"
        counts[r["gender"] or "unknown"] = counts.get(r["gender"] or "unknown", 0) + 1
        if (i + 1) % 25 == 0:
            print(f"    [{i+1}/{len(todo)}] {counts}")
    return counts
 def main():
    p = argparse.ArgumentParser()
    p.add_argument("--count", type=int, default=300, help="how many faces to maintain in pool")
    p.add_argument(
        "--out",
        default=os.path.join(os.path.dirname(__file__), "..", "..", "data", "headshots"),
    )
    p.add_argument("--concurrency", type=int, default=3, help="parallel fetches (be polite)")
    p.add_argument("--no-gender", action="store_true", help="skip deepface gender tagging")
    p.add_argument("--shrink", action="store_true",
                   help="allow --count to drop manifest entries with id >= count. Default: preserve them.")
    args = p.parse_args()
    out = os.path.realpath(args.out)
    os.makedirs(out, exist_ok=True)
    # Load any existing manifest into a by-id dict so prior tags
    # (gender / race / age / excluded) survive the rewrite. Also
    # naturally dedupes — if the file accidentally has duplicate
    # lines for the same id (this is how we ended up with a 2497-
    # row manifest backing a 1000-face pool), the last one wins.
    manifest = os.path.join(out, "manifest.jsonl")
    existing: dict[int, dict] = {}
    if os.path.exists(manifest):
        dup_count = 0
        with open(manifest) as f:
            for line in f:
                line = line.strip()
                if not line:
                    continue
                try:
                    row = json.loads(line)
                except json.JSONDecodeError:
                    continue
                rid = row.get("id")
                if not isinstance(rid, int):
                    continue
                if rid in existing:
                    dup_count += 1
                existing[rid] = row
        print(f"Loaded existing manifest: {len(existing)} unique ids ({dup_count} duplicate lines collapsed)")
        max_existing = max(existing.keys()) if existing else -1
        if max_existing >= args.count and not args.shrink:
            print(
                f"\nERROR: --count={args.count} would drop {sum(1 for k in existing if k >= args.count)} "
                f"manifest entries (max existing id = {max_existing}). Pass --shrink to allow.\n",
                file=sys.stderr,
            )
            sys.exit(2)
    print(f"Fetching {args.count} faces → {out}")
    print(f"Source: {URL} (synthetic StyleGAN — no real people)")
    results: list[dict] = [None] * args.count  # type: ignore
    t0 = time.time()
    with ThreadPoolExecutor(max_workers=max(1, args.concurrency)) as ex:
        futs = {ex.submit(fetch_one, i, out): i for i in range(args.count)}
        for done, fut in enumerate(as_completed(futs), 1):
            idx, fname, cached, err = fut.result()
            # Start from prior manifest row (preserves gender/race/age/excluded)
            # and overlay only the fields fetch_one is responsible for.
            base = dict(existing.get(idx, {}))
            base.update({
                "id": idx,
                "file": fname,
                "cached": cached,
                "error": err,
            })
            results[idx] = base
            if done % 25 == 0 or done == args.count:
                ok = sum(1 for r in results if r and not r.get("error"))
                print(f"  [{done}/{args.count}] {ok} ok  ({time.time()-t0:.1f}s)")
    # Drop slots that errored or are still None (shouldn't happen)
    records = [r for r in results if r and not r.get("error")]
    print(f"\nPool ready: {len(records)} faces, {sum(1 for r in records if r['cached'])} from cache")
    preserved_tags = sum(1 for r in records if r.get("gender") in ("man", "woman"))
    if preserved_tags:
        print(f"Preserved {preserved_tags} prior gender tags (and any race/age/excluded fields).")
    if not args.no_gender and records:
        print("\nGender-tagging pass:")
        summary = maybe_tag_gender(records, out)
        print(f"  distribution: {summary}")
    else:
        for r in records:
            r.setdefault("gender", None)
    # If --shrink was NOT used and somehow id >= count rows are still in
    # `existing` (which can only happen if the early gate was bypassed),
    # carry them forward so we don't quietly drop them.
    if not args.shrink:
        for rid, row in existing.items():
            if rid >= args.count and rid not in {r["id"] for r in records}:
                records.append(row)
        records.sort(key=lambda r: r.get("id", 0))
    # Strip transient flags before persisting
    for r in records:
        r.pop("cached", None)
        r.pop("error", None)
    # Atomic write — if a re-run is interrupted, manifest stays intact.
    tmp = manifest + ".tmp"
    with open(tmp, "w") as f:
        for r in records:
            f.write(json.dumps(r) + "\n")
    os.replace(tmp, manifest)
    print(f"\nManifest: {manifest}  ({len(records)} entries)")
    # Quick checksum manifest for downstream debugging
    h = hashlib.sha256()
    for r in records:
        h.update(r["file"].encode())
        h.update(b"|")
        h.update((r.get("gender") or "?").encode())
    print(f"Pool fingerprint (sha256): {h.hexdigest()[:16]}")
 if __name__ == "__main__":
    main()
--- a/scripts/staffing/render_role_pool.py
+++ b/scripts/staffing/render_role_pool.py
@ -0,0 +1,230 @@
 #!/usr/bin/env python3
 """
 render_role_pool.py — pre-render a role-aware face pool by hitting
 serve_imagegen.py (localhost:3600/generate) with prompts pulled from
 the bun server's /headshots/_scenes endpoint (single source of truth
 for SCENES + SCENES_VERSION).
 Layout:
    data/headshots_role_pool/
      {band}/
        {gender}_{race}/
          face_00.webp
          face_01.webp
          ...
      manifest.jsonl
 Each entry in manifest.jsonl:
    {"band": "warehouse", "gender": "man", "race": "caucasian",
     "file": "warehouse/man_caucasian/face_03.webp",
     "seed": 184729338, "scenes_version": "v1"}
 Idempotent: a file at the target path is skipped. Re-run with --force
 to regenerate. SCENES_VERSION is captured per render so the server's
 pool route can refuse stale renders if the version drifts.
 """
 from __future__ import annotations
 import argparse
 import base64
 import json
 import os
 import sys
 import time
 import urllib.request
 import urllib.error
 DEFAULT_BANDS = ["warehouse", "production", "trades", "driver", "lead"]
 DEFAULT_GENDERS = ["man", "woman"]
 DEFAULT_RACES = ["caucasian", "east_asian", "south_asian", "middle_eastern", "black", "hispanic"]
 def race_text(r: str) -> str:
    return {
        "caucasian": "",
        "east_asian": "East Asian",
        "south_asian": "South Asian",
        "middle_eastern": "Middle Eastern",
        "black": "Black",
        "hispanic": "Hispanic",
    }.get(r, "")
 def fetch_scenes(mcp_url: str) -> tuple[str, dict]:
    """Pull canonical SCENES from the bun server. Single source of truth."""
    req = urllib.request.Request(f"{mcp_url}/headshots/_scenes")
    with urllib.request.urlopen(req, timeout=10) as resp:
        data = json.loads(resp.read())
    return data["version"], data["scenes"]
 def render(comfy_url: str, prompt: str, seed: int, steps: int, timeout: int, dim: int) -> bytes | None:
    payload = json.dumps({
        "prompt": prompt,
        "width": dim,
        "height": dim,
        "steps": steps,
        "seed": seed,
    }).encode()
    req = urllib.request.Request(
        f"{comfy_url}/generate",
        data=payload,
        headers={"Content-Type": "application/json"},
    )
    try:
        with urllib.request.urlopen(req, timeout=timeout) as resp:
            data = json.loads(resp.read())
    except urllib.error.HTTPError as e:
        print(f"  HTTP {e.code} from comfy: {e.read()[:200]}", file=sys.stderr)
        return None
    except Exception as e:
        print(f"  comfy error: {type(e).__name__}: {e}", file=sys.stderr)
        return None
    img_b64 = data.get("image")
    if not img_b64:
        print(f"  comfy response missing 'image' field: {list(data.keys())}", file=sys.stderr)
        return None
    return base64.b64decode(img_b64)
 def main():
    p = argparse.ArgumentParser()
    p.add_argument("--out", default=os.path.join(os.path.dirname(__file__), "..", "..", "data", "headshots_role_pool"))
    p.add_argument("--per-bucket", type=int, default=10, help="how many faces per (band × gender × race)")
    p.add_argument("--mcp", default="http://localhost:3700")
    p.add_argument("--comfy", default="http://localhost:3600")
    p.add_argument("--steps", type=int, default=8)
    p.add_argument("--bands", nargs="*", default=DEFAULT_BANDS)
    p.add_argument("--genders", nargs="*", default=DEFAULT_GENDERS)
    p.add_argument("--races", nargs="*", default=DEFAULT_RACES)
    p.add_argument("--force", action="store_true", help="regenerate existing files")
    p.add_argument("--age", type=int, default=32)
    p.add_argument("--timeout", type=int, default=120, help="per-render timeout (1024² takes ~5s on A4000)")
    p.add_argument("--dim", type=int, default=1024, help="square render dimension (v2 default 1024, v1 was 512)")
    args = p.parse_args()
    out_root = os.path.realpath(args.out)
    os.makedirs(out_root, exist_ok=True)
    print(f"Fetching canonical SCENES from {args.mcp}/headshots/_scenes…")
    try:
        version, scenes = fetch_scenes(args.mcp)
    except Exception as e:
        print(f"FATAL: could not fetch scenes ({e}). Is the mcp-server up?", file=sys.stderr)
        sys.exit(1)
    print(f"  SCENES_VERSION={version}, {len(scenes)} bands available: {list(scenes.keys())}")
    # v2+ files live at {out}/{version}/{band}/{g}_{r}/face_NN.webp.
    # v1 lived at {out}/{band}/... — keep that layout intact for
    # rollback; the server route reads both and prefers current.
    out = out_root if version == "v1" else os.path.join(out_root, version)
    os.makedirs(out, exist_ok=True)
    print(f"  writing to: {out}")
    print(f"  render dim: {args.dim}×{args.dim}")
    # Reject any --bands not in the server's SCENES
    unknown = [b for b in args.bands if b not in scenes]
    if unknown:
        print(f"FATAL: unknown bands {unknown}. Server has: {list(scenes.keys())}", file=sys.stderr)
        sys.exit(1)
    manifest_rows = []
    todo = [
        (band, g, r, n)
        for band in args.bands
        for g in args.genders
        for r in args.races
        for n in range(args.per_bucket)
    ]
    print(f"\nPlanning: {len(todo)} renders ({len(args.bands)} bands × {len(args.genders)} genders × {len(args.races)} races × {args.per_bucket} faces).")
    print(f"Estimated GPU time at 1.5s/render = {len(todo) * 1.5 / 60:.1f} min.\n")
    t0 = time.time()
    rendered = 0
    skipped = 0
    failed = 0
    for i, (band, g, r, n) in enumerate(todo):
        bucket_dir = os.path.join(out, band, f"{g}_{r}")
        os.makedirs(bucket_dir, exist_ok=True)
        fname = f"face_{n:02d}.webp"
        full = os.path.join(bucket_dir, fname)
        rel = os.path.relpath(full, out)
        if os.path.exists(full) and os.path.getsize(full) > 1024 and not args.force:
            skipped += 1
            manifest_rows.append({
                "band": band, "gender": g, "race": r, "file": rel,
                "seed": None, "scenes_version": version, "cached": True,
            })
            continue
        scene_def = scenes[band]
        scene_clause = scene_def["scene"]
        race_clause = race_text(r)
        gender_clause = g  # "man" / "woman"
        # Match the bun server's prompt builder exactly. If you tweak
        # one, tweak the other (or factor a /prompt-builder endpoint).
        # The {role} slot is intentionally a band-typical title here
        # — the pre-rendered face is shared across roles in the same
        # band, so we use the band's archetypal role. Specific roles
        # still hit the on-demand /headshots/generate/:key path with
        # their actual title.
        archetype_role = {
            "warehouse": "warehouse worker",
            "production": "production worker",
            "trades": "skilled tradesperson",
            "driver": "delivery driver",
            "lead": "shift supervisor",
        }.get(band, "warehouse worker")
        prompt = (
            f"professional headshot portrait of a {args.age}-year-old "
            f"{race_clause} {gender_clause} {archetype_role}, {scene_clause}, "
            f"neutral confident expression, sharp focus, photorealistic"
        )
        # Deterministic seed per slot — same (band, g, r, n) always
        # gets the same face. Mixing scenes_version means a SCENES
        # tweak shifts every face slightly; that's the right behavior
        # (it's how cache invalidation propagates to the pool too).
        seed_str = f"{band}|{g}|{r}|{n}|{version}"
        seed_h = 5381
        for ch in seed_str:
            seed_h = ((seed_h << 5) + seed_h + ord(ch)) & 0x7fffffff
        seed = seed_h
        bytes_ = render(args.comfy, prompt, seed, args.steps, args.timeout, args.dim)
        if bytes_ is None:
            failed += 1
            continue
        with open(full, "wb") as f:
            f.write(bytes_)
        rendered += 1
        manifest_rows.append({
            "band": band, "gender": g, "race": r, "file": rel,
            "seed": seed, "scenes_version": version, "cached": False,
        })
        if (i + 1) % 10 == 0 or (i + 1) == len(todo):
            elapsed = time.time() - t0
            done = i + 1
            rate = done / elapsed if elapsed > 0 else 0
            eta = (len(todo) - done) / rate if rate > 0 else 0
            print(f"  [{done}/{len(todo)}]  rendered={rendered} skipped={skipped} failed={failed}  "
                  f"rate={rate:.2f}/s  eta={eta:.0f}s")
    # Atomic manifest write
    manifest_path = os.path.join(out, "manifest.jsonl")
    tmp = manifest_path + ".tmp"
    with open(tmp, "w") as f:
        for row in manifest_rows:
            f.write(json.dumps(row) + "\n")
    os.replace(tmp, manifest_path)
    print(f"\nDone. {rendered} new, {skipped} cached, {failed} failed in {time.time()-t0:.1f}s")
    print(f"Manifest: {manifest_path} ({len(manifest_rows)} entries)")
    print(f"\nNext: poke {args.mcp}/headshots/__reload to pick up the new pool.")
 if __name__ == "__main__":
    main()
--- a/scripts/staffing/tag_face_pool.py
+++ b/scripts/staffing/tag_face_pool.py
@ -0,0 +1,169 @@
 #!/usr/bin/env python3
 """
 tag_face_pool.py — run deepface gender + race classification over the
 synthetic face pool produced by fetch_face_pool.py and rewrite
 manifest.jsonl with `gender` (man / woman) and `race` (asian / black /
 hispanic / indian / middle_eastern / white) tags.
 Run with the venv that has deepface installed:
    /home/profit/.local/share/deepface-venv/bin/python \
        scripts/staffing/tag_face_pool.py
 Idempotent: rows that already have BOTH gender and race tagged are
 skipped. Pass --force to re-tag everything.
 Mapping deepface buckets → /headshots/ ?e= values:
  asian        → split by manual region (deepface doesn't differentiate
                  East / South Asian; we lump as 'east_asian' since the
                  StyleGAN training set leans East Asian)
  indian       → south_asian
  middle eastern → middle_eastern
  black        → black
  hispanic     → hispanic
  white        → caucasian
 """
 from __future__ import annotations
 import argparse
 import json
 import os
 import sys
 import time
 DEEPFACE_RACE_TO_HINT = {
    "asian": "east_asian",
    "indian": "south_asian",
    "middle eastern": "middle_eastern",
    "black": "black",
    "latino hispanic": "hispanic",
    "hispanic": "hispanic",
    "white": "caucasian",
 }
 def main():
    p = argparse.ArgumentParser()
    p.add_argument(
        "--out",
        default=os.path.join(os.path.dirname(__file__), "..", "..", "data", "headshots"),
    )
    p.add_argument("--force", action="store_true", help="re-tag rows that already have gender+race")
    p.add_argument("--limit", type=int, default=0, help="cap how many faces to process this run (0 = all)")
    p.add_argument("--min-age", type=int, default=22, help="exclude faces estimated below this age (kids/teens). Staffing context = legal-age workers only.")
    args = p.parse_args()
    out = os.path.realpath(args.out)
    manifest_path = os.path.join(out, "manifest.jsonl")
    if not os.path.exists(manifest_path):
        print(f"manifest not found: {manifest_path}", file=sys.stderr)
        sys.exit(1)
    print(f"loading deepface (cold start ~10-15s for first model build)…")
    from deepface import DeepFace  # type: ignore
    rows = []
    with open(manifest_path) as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            rows.append(json.loads(line))
    print(f"manifest: {len(rows)} rows")
    todo = [
        r for r in rows
        if args.force or r.get("gender") is None or r.get("race") is None or r.get("age") is None
    ]
    if args.limit > 0:
        todo = todo[: args.limit]
    print(f"to tag: {len(todo)} faces")
    if not todo:
        print("nothing to do.")
        return
    counts_g = {}
    counts_r = {}
    failed = 0
    t0 = time.time()
    for i, r in enumerate(todo):
        full = os.path.join(out, r["file"])
        try:
            ana = DeepFace.analyze(
                img_path=full,
                actions=["gender", "race", "age"],
                enforce_detection=False,
                silent=True,
            )
            if isinstance(ana, list):
                ana = ana[0] if ana else {}
            g_raw = (ana.get("dominant_gender") or "").lower().strip()
            r["gender"] = (
                "man" if g_raw.startswith("man") else
                "woman" if g_raw.startswith("woman") else
                None
            )
            r_raw = (ana.get("dominant_race") or "").lower().strip()
            r["race"] = DEEPFACE_RACE_TO_HINT.get(r_raw, None)
            if r["race"] is None and r_raw:
                r["race_raw"] = r_raw
            # Age estimation — exclude minors / teens. Staffing context
            # uses adult workers only. Threshold is 22 by default
            # (legal + a buffer because age estimation is noisy).
            try:
                age = int(round(float(ana.get("age") or 0)))
            except Exception:
                age = 0
            r["age"] = age
            if age and age < args.min_age:
                r["excluded"] = "minor"
            else:
                r.pop("excluded", None)
            counts_g[r["gender"] or "unknown"] = counts_g.get(r["gender"] or "unknown", 0) + 1
            counts_r[r["race"] or r_raw or "unknown"] = counts_r.get(r["race"] or r_raw or "unknown", 0) + 1
        except Exception as e:
            r["tag_error"] = f"{type(e).__name__}: {e}"
            failed += 1
        if (i + 1) % 25 == 0 or (i + 1) == len(todo):
            elapsed = time.time() - t0
            rate = (i + 1) / elapsed if elapsed > 0 else 0
            eta = (len(todo) - i - 1) / rate if rate > 0 else 0
            print(f"  [{i+1}/{len(todo)}]  rate={rate:.1f}/s  eta={eta:.0f}s  failed={failed}")
            print(f"    gender: {counts_g}")
            print(f"    race  : {counts_r}")
    # Write updated manifest atomically
    tmp = manifest_path + ".tmp"
    with open(tmp, "w") as f:
        for r in rows:
            f.write(json.dumps(r) + "\n")
    os.replace(tmp, manifest_path)
    final_g = {}
    final_r = {}
    excluded = 0
    age_hist = {"<18": 0, "18-22": 0, "22-30": 0, "30-40": 0, "40-50": 0, "50-60": 0, "60+": 0, "unknown": 0}
    for r in rows:
        if r.get("excluded"):
            excluded += 1
            continue
        final_g[r.get("gender") or "untagged"] = final_g.get(r.get("gender") or "untagged", 0) + 1
        final_r[r.get("race") or "untagged"] = final_r.get(r.get("race") or "untagged", 0) + 1
        a = r.get("age") or 0
        if a == 0: age_hist["unknown"] += 1
        elif a < 18: age_hist["<18"] += 1
        elif a < 22: age_hist["18-22"] += 1
        elif a < 30: age_hist["22-30"] += 1
        elif a < 40: age_hist["30-40"] += 1
        elif a < 50: age_hist["40-50"] += 1
        elif a < 60: age_hist["50-60"] += 1
        else: age_hist["60+"] += 1
    print(f"\nDone. {len(rows)} rows, {excluded} excluded as <{args.min_age}, {failed} tag errors, {time.time()-t0:.1f}s")
    print(f"  final gender: {final_g}")
    print(f"  final race  : {final_r}")
    print(f"  age dist   : {age_hist}")
    print(f"\nNext: poke /headshots/__reload to refresh the in-memory pool.")
 if __name__ == "__main__":
    main()
--- a/sidecar/sidecar/lab_ui.py
+++ b/sidecar/sidecar/lab_ui.py
@ -0,0 +1,385 @@
 """Pipeline Lab notebook UI — served as a single HTML page.
 Note: innerHTML usage in this file is intentional for building the UI.
 All user-supplied text is escaped through the esc() function before insertion.
 The only values rendered via innerHTML are pre-formatted HTML strings with
 escaped user content — no raw user input is ever injected unescaped.
 """
 from fastapi import APIRouter
 from fastapi.responses import HTMLResponse
 router = APIRouter()
 def _get_lab_html() -> str:
    """Return the Pipeline Lab HTML. Separated into a function for clarity."""
    # The HTML is a self-contained notebook UI.
    # All user-facing text is escaped via the esc() JS function.
    return r"""<!DOCTYPE html>
 <html lang="en"><head>
 <meta charset="UTF-8"><meta name="viewport" content="width=device-width,initial-scale=1.0">
 <title>Pipeline Lab — Lakehouse</title>
 <style>
 :root{--bg:#08090c;--surface:rgba(14,16,22,0.9);--border:#2a2d35;--text:#e8e6e3;--text2:#7a7872;--accent:#4ade80;--gold:#e2b55a;--red:#e05252;--blue:#5b9cf5;--purple:#c084fc}
 *{box-sizing:border-box;margin:0;padding:0}
 body{font-family:'SF Mono','Menlo','Consolas',monospace;background:var(--bg);color:var(--text);min-height:100vh;padding:20px 28px;font-size:13px}
 h1{font-size:18px;font-weight:700;margin-bottom:4px}h1 span{color:var(--accent)}
 .subtitle{color:var(--text2);font-size:11px;margin-bottom:20px}
 .cells{display:flex;flex-direction:column;gap:12px;max-width:1100px}
 .cell{background:var(--surface);border:1px solid var(--border);border-radius:4px;overflow:hidden}
 .cell.running{border-color:var(--gold)}
 .cell-header{display:flex;align-items:center;gap:8px;padding:8px 12px;border-bottom:1px solid var(--border);font-size:10px;text-transform:uppercase;letter-spacing:1px;color:var(--text2)}
 .cell-type{font-weight:700}
 .cell-time{margin-left:auto;color:var(--text2)}
 .cell-input{padding:12px;background:rgba(0,0,0,0.3)}
 .cell-input textarea{width:100%;min-height:60px;background:transparent;border:none;color:var(--text);font-family:inherit;font-size:13px;resize:vertical;outline:none;line-height:1.6}
 .cell-output{padding:12px;font-size:12px;line-height:1.6;white-space:pre-wrap;max-height:400px;overflow-y:auto;display:none}
 .cell-output.has-data{display:block;border-top:1px solid var(--border)}
 .toolbar{display:flex;gap:6px;padding:8px 12px;border-top:1px solid var(--border);flex-wrap:wrap}
 .btn{font-family:inherit;font-size:10px;text-transform:uppercase;letter-spacing:0.5px;padding:5px 12px;border:1px solid var(--border);border-radius:3px;background:transparent;color:var(--text2);cursor:pointer}
 .btn:hover{border-color:var(--accent);color:var(--accent)}
 .btn.primary{border-color:var(--accent);color:var(--accent);background:rgba(74,222,128,0.06)}
 .btn.gold{border-color:var(--gold);color:var(--gold)}
 .btn.blue{border-color:var(--blue);color:var(--blue)}
 .btn.purple{border-color:var(--purple);color:var(--purple)}
 .btn.red{border-color:var(--red);color:var(--red)}
 .top-bar{display:flex;gap:8px;margin-bottom:16px;align-items:center;flex-wrap:wrap}
 .status-bar{display:flex;gap:12px;padding:8px 12px;background:var(--surface);border:1px solid var(--border);border-radius:4px;margin-bottom:16px;font-size:10px;color:var(--text2)}
 .stat{display:flex;align-items:center;gap:4px}.stat b{color:var(--text)}
 .result-row{display:flex;gap:8px;padding:6px 8px;border-bottom:1px solid rgba(42,45,53,0.3);align-items:center;font-size:11px}
 .result-row:last-child{border-bottom:none}
 .score-bar{width:60px;height:5px;background:rgba(0,0,0,0.2);border-radius:3px;overflow:hidden}
 .score-fill{height:100%;border-radius:3px}
 .benchmark-grid{display:grid;grid-template-columns:1fr 1fr;gap:12px;margin-top:8px}
 .bench-col{background:rgba(0,0,0,0.2);border-radius:3px;padding:10px}
 .bench-label{font-size:10px;text-transform:uppercase;letter-spacing:1px;margin-bottom:6px;font-weight:700}
 .threshold-slider{display:flex;align-items:center;gap:8px;padding:0 12px;margin:4px 0}
 .threshold-slider input[type=range]{flex:1;accent-color:var(--accent)}
 .threshold-slider .val{font-weight:700;min-width:36px;text-align:right}
 </style></head><body>
 <h1><span>Pipeline Lab</span> // Lakehouse</h1>
 <div class="subtitle">Embedding-based screening vs LLM classification &#x2014; iterative experimentation</div>
 <div class="status-bar" id="status-bar">
  <div class="stat"><span>Exemplars:</span> <b id="st-exemplars">0</b></div>
  <div class="stat"><span>Categories:</span> <b id="st-categories">0</b></div>
  <div class="stat"><span>Pipelines:</span> <b id="st-pipelines">0</b></div>
  <div class="stat" style="margin-left:auto"><span>Sidecar:</span> <b id="st-health" style="color:var(--text2)">...</b></div>
 </div>
 <div class="top-bar">
  <button class="btn primary" onclick="addCell('exemplars')">+ Exemplars</button>
  <button class="btn gold" onclick="addCell('screen')">+ Screen</button>
  <button class="btn blue" onclick="addCell('classify')">+ Classify</button>
  <button class="btn purple" onclick="addCell('benchmark')">+ Benchmark</button>
  <button class="btn" onclick="addCell('similarity')">+ Similarity</button>
  <button class="btn" onclick="addCell('generate')">+ Generate</button>
  <button class="btn" onclick="addCell('pipeline')">+ Pipeline</button>
  <span style="flex:1"></span>
  <button class="btn red" onclick="clearCells()">Clear All</button>
 </div>
 <div class="cells" id="cells"></div>
 <script>
 var BASE = '';
 var cellCounter = 0;
 function esc(t){var d=document.createElement('span');d.textContent=String(t);return d.innerHTML}
 async function api(path, body) {
  var opts = body ? {method:'POST', headers:{'Content-Type':'application/json'}, body:JSON.stringify(body)} : {};
  var r = await fetch(BASE + '/lab' + path, opts);
  return r.json();
 }
 async function refreshStatus() {
  try {
    var ex = await api('/exemplars');
    var pl = await api('/pipelines');
    var h = await fetch(BASE + '/health').then(function(r){return r.json()});
    document.getElementById('st-exemplars').textContent = ex.total || 0;
    document.getElementById('st-categories').textContent = Object.keys(ex.categories || {}).length;
    document.getElementById('st-pipelines').textContent = (pl.pipelines || []).length;
    document.getElementById('st-health').textContent = h.status || 'ok';
    document.getElementById('st-health').style.color = 'var(--accent)';
  } catch(e) {
    document.getElementById('st-health').textContent = 'error';
    document.getElementById('st-health').style.color = 'var(--red)';
  }
 }
 function addCell(type) {
  var id = 'cell-' + (++cellCounter);
  var cells = document.getElementById('cells');
  var cell = document.createElement('div'); cell.className = 'cell'; cell.id = id;
  var colors = {exemplars:'var(--accent)',screen:'var(--gold)',classify:'var(--blue)',benchmark:'var(--purple)',similarity:'var(--text2)',generate:'var(--text2)',pipeline:'var(--accent)'};
  var labels = {exemplars:'EXEMPLARS',screen:'SCREEN',classify:'CLASSIFY (LLM)',benchmark:'BENCHMARK A/B',similarity:'SIMILARITY',generate:'GENERATE',pipeline:'PIPELINE'};
  var placeholders = {
    exemplars:'Category: decision\n---\nWe decided to use Parquet for all storage\nThe team chose React over Vue\nArchitecture decision: microservices',
    screen:'Enter texts to classify via embedding similarity (one per line):\n\nWe decided to migrate to PostgreSQL\nThe weather is nice today\nArchitecture: chose event sourcing over CRUD',
    classify:'Enter texts to classify via LLM (one per line):\n\nWe decided to migrate to PostgreSQL\nThe weather is nice today',
    benchmark:'Enter texts to benchmark (one per line):\n\nWe decided to use Kubernetes for orchestration\nThe new hire starts Monday\nTechnical debt: refactor the auth module\nLunch menu looks good today',
    similarity:'Enter texts to compare pairwise (one per line):\n\nWe chose React for the frontend\nReact was selected as our UI framework\nThe database uses PostgreSQL',
    generate:'Enter a prompt for the LLM...',
    pipeline:'Pipeline name: my-extraction\n---\nscreen | threshold=0.6\nclassify\nextract | prompt=Extract the key decision and its rationale\nvalidate | dedup_threshold=0.9'
  };
  var color = colors[type] || 'var(--text2)';
  var label = labels[type] || type.toUpperCase();
  var ph = placeholders[type] || '';
  // Build cell using DOM methods
  var header = document.createElement('div'); header.className = 'cell-header';
  var typeSpan = document.createElement('span'); typeSpan.className = 'cell-type'; typeSpan.style.color = color; typeSpan.textContent = label; header.appendChild(typeSpan);
  var numSpan = document.createElement('span'); numSpan.textContent = 'Cell #' + cellCounter; header.appendChild(numSpan);
  var timeSpan = document.createElement('span'); timeSpan.className = 'cell-time'; timeSpan.id = id + '-time'; header.appendChild(timeSpan);
  cell.appendChild(header);
  var inputDiv = document.createElement('div'); inputDiv.className = 'cell-input';
  var textarea = document.createElement('textarea'); textarea.id = id + '-input'; textarea.placeholder = ph; textarea.value = ph;
  inputDiv.appendChild(textarea); cell.appendChild(inputDiv);
  if (type === 'screen' || type === 'benchmark') {
    var slider = document.createElement('div'); slider.className = 'threshold-slider';
    var slLabel = document.createElement('span'); slLabel.style.cssText = 'font-size:10px;color:var(--text2)'; slLabel.textContent = 'Threshold:'; slider.appendChild(slLabel);
    var range = document.createElement('input'); range.type = 'range'; range.min = '0.3'; range.max = '0.95'; range.step = '0.05'; range.value = '0.65'; range.id = id + '-threshold';
    var valSpan = document.createElement('span'); valSpan.className = 'val'; valSpan.textContent = '0.65';
    range.oninput = function() { valSpan.textContent = this.value; };
    slider.appendChild(range); slider.appendChild(valSpan); cell.appendChild(slider);
  }
  var outputDiv = document.createElement('div'); outputDiv.className = 'cell-output'; outputDiv.id = id + '-output';
  cell.appendChild(outputDiv);
  var tb = document.createElement('div'); tb.className = 'toolbar';
  var runBtn = document.createElement('button'); runBtn.className = 'btn primary'; runBtn.textContent = 'Run';
  runBtn.onclick = function() { runCell(id, type); }; tb.appendChild(runBtn);
  var rmBtn = document.createElement('button'); rmBtn.className = 'btn red'; rmBtn.textContent = 'Remove';
  rmBtn.onclick = function() { removeCell(id); }; tb.appendChild(rmBtn);
  cell.appendChild(tb);
  cells.appendChild(cell);
  textarea.focus();
  return id;
 }
 function removeCell(id) { var el = document.getElementById(id); if (el) el.remove(); }
 function clearCells() { document.getElementById('cells').textContent = ''; cellCounter = 0; }
 function parseLines(text) { return text.split('\n').map(function(l){return l.trim()}).filter(function(l){return l && l.charAt(0) !== '#'}); }
 async function runCell(id, type) {
  var cell = document.getElementById(id);
  var input = document.getElementById(id+'-input').value;
  var output = document.getElementById(id+'-output');
  var timeEl = document.getElementById(id+'-time');
  cell.classList.add('running');
  output.className = 'cell-output has-data';
  output.textContent = 'Running...';
  try {
    var t0 = performance.now();
    var result;
    if (type === 'exemplars') {
      var parts = input.split('---');
      var catLine = (parts[0] || '').trim();
      var category = catLine.replace(/^category:\s*/i, '').trim().toLowerCase();
      var texts = parseLines(parts.slice(1).join('\n'));
      if (!category || !texts.length) { output.textContent = 'Format: Category: name\\n---\\ntext1\\ntext2'; return; }
      result = await api('/exemplars', {category: category, texts: texts});
      output.textContent = 'Added ' + result.added + ' exemplars to "' + result.category + '" (total: ' + result.total + ')';
      output.style.color = 'var(--accent)';
      refreshStatus();
    }
    else if (type === 'screen') {
      var texts = parseLines(input);
      var threshold = parseFloat((document.getElementById(id+'-threshold') || {}).value || '0.65');
      result = await api('/screen', {texts: texts, threshold: threshold});
      renderScreenResults(output, result, threshold);
    }
    else if (type === 'classify') {
      var texts = parseLines(input);
      result = await api('/classify', {texts: texts});
      renderClassifyResults(output, result);
    }
    else if (type === 'benchmark') {
      var texts = parseLines(input);
      var threshold = parseFloat((document.getElementById(id+'-threshold') || {}).value || '0.65');
      result = await api('/benchmark', {texts: texts, threshold: threshold});
      renderBenchmark(output, result);
    }
    else if (type === 'similarity') {
      var texts = parseLines(input);
      result = await api('/cell', {action:'similarity', texts: texts});
      renderSimilarityMatrix(output, result);
    }
    else if (type === 'generate') {
      result = await api('/cell', {action:'generate', text: input});
      output.textContent = result.text || '(empty)';
    }
    else if (type === 'pipeline') {
      var parts = input.split('---');
      var nameLine = (parts[0] || '').trim();
      var pName = nameLine.replace(/^pipeline\s*name:\s*/i, '').trim();
      var stageLines = parseLines(parts.slice(1).join('\n'));
      var stages = stageLines.map(function(line) {
        var ps = line.split('|').map(function(s){return s.trim()});
        var mode = ps[0];
        var config = {};
        ps.slice(1).forEach(function(p) {
          var kv = p.split('='); if (kv.length===2) {
            var v = kv[1].trim();
            config[kv[0].trim()] = isNaN(parseFloat(v)) ? v : parseFloat(v);
          }
        });
        return {name: mode, mode: mode, config: config};
      });
      await api('/pipelines', {name: pName, stages: stages, description: 'Created in Pipeline Lab'});
      output.textContent = 'Pipeline "' + pName + '" saved (' + stages.length + ' stages). Use the API to run it: POST /lab/pipelines/run';
      output.style.color = 'var(--accent)';
      refreshStatus();
    }
    var elapsed = Math.round(performance.now() - t0);
    timeEl.textContent = elapsed + 'ms' + (result && result.time_ms ? ' (server: '+result.time_ms+'ms)' : '');
  } catch(e) {
    output.textContent = 'Error: ' + e.message;
    output.style.color = 'var(--red)';
  } finally {
    cell.classList.remove('running');
  }
 }
 function renderScreenResults(el, results, threshold) {
  el.textContent = '';
  results.forEach(function(r) {
    var row = document.createElement('div'); row.className = 'result-row';
    var cat = document.createElement('span');
    cat.style.cssText = 'min-width:80px;font-weight:700;color:' + (r.above_threshold ? 'var(--accent)' : 'var(--text2)');
    cat.textContent = r.best_category || 'none'; row.appendChild(cat);
    var sim = document.createElement('span'); sim.style.cssText = 'min-width:50px;font-weight:700';
    sim.textContent = (r.similarity * 100).toFixed(1) + '%';
    sim.style.color = r.similarity >= 0.7 ? 'var(--accent)' : r.similarity >= threshold ? 'var(--gold)' : 'var(--text2)';
    row.appendChild(sim);
    var bar = document.createElement('div'); bar.className = 'score-bar';
    var fill = document.createElement('div'); fill.className = 'score-fill';
    fill.style.width = (r.similarity * 100) + '%';
    fill.style.background = r.similarity >= 0.7 ? 'var(--accent)' : r.similarity >= threshold ? 'var(--gold)' : 'var(--red)';
    bar.appendChild(fill); row.appendChild(bar);
    var txt = document.createElement('span'); txt.style.cssText = 'flex:1;overflow:hidden;text-overflow:ellipsis;white-space:nowrap';
    txt.textContent = r.text; row.appendChild(txt);
    var badge = document.createElement('span');
    badge.style.cssText = 'font-size:9px;padding:2px 6px;border-radius:2px;border:1px solid;' +
      (r.above_threshold ? 'color:var(--accent);border-color:var(--accent)' : 'color:var(--text2);border-color:var(--border)');
    badge.textContent = r.above_threshold ? 'PASS' : 'FILTERED'; row.appendChild(badge);
    el.appendChild(row);
  });
 }
 function renderClassifyResults(el, results) {
  el.textContent = '';
  results.forEach(function(r) {
    var row = document.createElement('div'); row.className = 'result-row';
    var cat = document.createElement('span'); cat.style.cssText = 'min-width:80px;font-weight:700;color:var(--blue)';
    cat.textContent = r.category; row.appendChild(cat);
    var conf = document.createElement('span');
    conf.style.cssText = 'min-width:50px;font-size:10px;color:' + (r.confidence==='high'?'var(--accent)':r.confidence==='medium'?'var(--gold)':'var(--text2)');
    conf.textContent = r.confidence; row.appendChild(conf);
    var txt = document.createElement('span'); txt.style.flex = '1'; txt.textContent = r.text; row.appendChild(txt);
    el.appendChild(row);
  });
 }
 function renderBenchmark(el, result) {
  el.textContent = '';
  // Summary stats (using safe DOM construction)
  var summary = document.createElement('div'); summary.style.cssText = 'display:flex;gap:16px;margin-bottom:12px;flex-wrap:wrap';
  var stats = [
    ['Agreement', (result.agreement_rate*100).toFixed(1)+'%', result.agreement_rate>=0.8?'var(--accent)':'var(--gold)'],
    ['Speedup', result.speedup+'x', result.speedup>=2?'var(--accent)':'var(--text)'],
    ['Embed', result.embed_time_ms+'ms', 'var(--gold)'],
    ['LLM', result.llm_time_ms+'ms', 'var(--blue)'],
    ['Hybrid est.', result.hybrid_estimated_ms+'ms', 'var(--accent)'],
    ['Screened out', result.texts_screened_out+'/'+result.total_texts, 'var(--purple)']
  ];
  stats.forEach(function(s) {
    var box = document.createElement('div'); box.style.cssText = 'background:rgba(0,0,0,0.2);padding:6px 10px;border-radius:3px;text-align:center';
    var lbl = document.createElement('div'); lbl.style.cssText = 'font-size:9px;color:var(--text2);text-transform:uppercase;letter-spacing:0.5px'; lbl.textContent = s[0]; box.appendChild(lbl);
    var val = document.createElement('div'); val.style.cssText = 'font-size:16px;font-weight:700;color:'+s[2]; val.textContent = s[1]; box.appendChild(val);
    summary.appendChild(box);
  });
  el.appendChild(summary);
  // Side-by-side comparison
  var grid = document.createElement('div'); grid.style.cssText = 'display:grid;grid-template-columns:1fr 1fr;gap:12px;margin-top:8px';
  // Embed column
  var leftCol = document.createElement('div'); leftCol.style.cssText = 'background:rgba(0,0,0,0.2);border-radius:3px;padding:10px';
  var leftTitle = document.createElement('div'); leftTitle.style.cssText = 'font-size:10px;text-transform:uppercase;letter-spacing:1px;margin-bottom:6px;font-weight:700;color:var(--gold)';
  leftTitle.textContent = 'EMBEDDING SCREENING (' + result.embed_time_ms + 'ms)'; leftCol.appendChild(leftTitle);
  (result.embed_results||[]).forEach(function(r) {
    var row = document.createElement('div'); row.style.cssText = 'font-size:11px;padding:3px 0;display:flex;gap:6px;align-items:center';
    var c = document.createElement('span'); c.style.cssText = 'min-width:60px;font-weight:700;color:'+(r.above_threshold?'var(--accent)':'var(--text2)'); c.textContent = r.best_category||'none'; row.appendChild(c);
    var s = document.createElement('span'); s.style.cssText = 'min-width:40px;color:var(--text2)'; s.textContent = (r.similarity*100).toFixed(0)+'%'; row.appendChild(s);
    var t = document.createElement('span'); t.style.cssText = 'flex:1;overflow:hidden;text-overflow:ellipsis;white-space:nowrap'; t.textContent = r.text; row.appendChild(t);
    leftCol.appendChild(row);
  });
  grid.appendChild(leftCol);
  // LLM column
  var rightCol = document.createElement('div'); rightCol.style.cssText = 'background:rgba(0,0,0,0.2);border-radius:3px;padding:10px';
  var rightTitle = document.createElement('div'); rightTitle.style.cssText = 'font-size:10px;text-transform:uppercase;letter-spacing:1px;margin-bottom:6px;font-weight:700;color:var(--blue)';
  rightTitle.textContent = 'LLM CLASSIFICATION (' + result.llm_time_ms + 'ms)'; rightCol.appendChild(rightTitle);
  (result.llm_results||[]).forEach(function(r) {
    var row = document.createElement('div'); row.style.cssText = 'font-size:11px;padding:3px 0;display:flex;gap:6px;align-items:center';
    var c = document.createElement('span'); c.style.cssText = 'min-width:60px;font-weight:700;color:var(--blue)'; c.textContent = r.category; row.appendChild(c);
    var s = document.createElement('span'); s.style.cssText = 'min-width:40px;color:'+(r.confidence==='high'?'var(--accent)':'var(--text2)'); s.textContent = r.confidence; row.appendChild(s);
    var t = document.createElement('span'); t.style.cssText = 'flex:1;overflow:hidden;text-overflow:ellipsis;white-space:nowrap'; t.textContent = r.text; row.appendChild(t);
    rightCol.appendChild(row);
  });
  grid.appendChild(rightCol);
  el.appendChild(grid);
 }
 function renderSimilarityMatrix(el, result) {
  el.textContent = '';
  var matrix = result.matrix || [];
  var texts = result.texts || [];
  if (!matrix.length) { el.textContent = 'No results'; return; }
  var tbl = document.createElement('table'); tbl.style.cssText = 'border-collapse:collapse;font-size:11px;width:100%';
  var hdr = document.createElement('tr');
  var corner = document.createElement('th'); hdr.appendChild(corner);
  texts.forEach(function(t) {
    var th = document.createElement('th'); th.style.cssText = 'padding:4px;color:var(--text2);font-size:9px;max-width:100px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap';
    th.textContent = t.substring(0, 20); th.title = t; hdr.appendChild(th);
  });
  tbl.appendChild(hdr);
  matrix.forEach(function(row, i) {
    var tr = document.createElement('tr');
    var td0 = document.createElement('td'); td0.style.cssText = 'padding:4px;color:var(--text2);font-size:9px;max-width:100px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap';
    td0.textContent = texts[i].substring(0, 20); tr.appendChild(td0);
    row.forEach(function(v, j) {
      var td = document.createElement('td');
      var bg = i===j ? 'rgba(74,222,128,0.1)' : v>=0.8 ? 'rgba(74,222,128,0.15)' : v>=0.6 ? 'rgba(226,181,90,0.1)' : 'transparent';
      td.style.cssText = 'padding:4px;text-align:center;font-weight:'+(v>=0.7?'700':'400')+';color:'+(v>=0.8?'var(--accent)':v>=0.6?'var(--gold)':'var(--text2)')+';background:'+bg;
      td.textContent = v.toFixed(2); tr.appendChild(td);
    });
    tbl.appendChild(tr);
  });
  el.appendChild(tbl);
 }
 refreshStatus();
 </script>
 </body></html>"""
@router.get("", response_class=HTMLResponse)
 async def lab_page():
    return _get_lab_html()
--- a/sidecar/sidecar/pipeline_lab.py
+++ b/sidecar/sidecar/pipeline_lab.py
@ -0,0 +1,503 @@
 """Pipeline Lab — iterative embedding/LLM pipeline experimentation.
 Provides:
 - Exemplar-based embedding classification (fast screening)
 - LLM-based classification (accurate but slow)
 - A/B benchmarking between the two
 - Pipeline definition and execution
 - Notebook-style API for interactive experimentation
 """
 import json
 import math
 import os
 import time
 from pathlib import Path
 from typing import Optional
 from fastapi import APIRouter, HTTPException
 from fastapi.responses import HTMLResponse
 from pydantic import BaseModel
 from .ollama import client
 router = APIRouter()
 EMBED_MODEL = os.environ.get("EMBED_MODEL", "nomic-embed-text")
 GEN_MODEL = os.environ.get("GEN_MODEL", "qwen2.5")
 LAB_DIR = Path(os.environ.get("LAB_DIR", "./data/_pipeline_lab"))
 LAB_DIR.mkdir(parents=True, exist_ok=True)
 # ─── Vector math ─────────────────────────────────────────────
 def cosine_similarity(a: list[float], b: list[float]) -> float:
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = math.sqrt(sum(x * x for x in a))
    norm_b = math.sqrt(sum(x * x for x in b))
    if norm_a == 0 or norm_b == 0:
        return 0.0
    return dot / (norm_a * norm_b)
 # ─── Exemplar store ──────────────────────────────────────────
 # Exemplars are labeled text+embedding pairs used for classification.
 # e.g. category="decision" texts=["We decided to use Parquet", "The team chose React"]
 _exemplars: dict[str, list[dict]] = {}  # category -> [{text, embedding}]
 def _exemplar_file() -> Path:
    return LAB_DIR / "exemplars.json"
 def _load_exemplars():
    global _exemplars
    fp = _exemplar_file()
    if fp.exists():
        data = json.loads(fp.read_text())
        _exemplars = data
    return _exemplars
 def _save_exemplars():
    _exemplar_file().write_text(json.dumps(_exemplars, indent=2))
 _load_exemplars()
 # ─── Pipeline store ──────────────────────────────────────────
 def _pipelines_dir() -> Path:
    d = LAB_DIR / "pipelines"
    d.mkdir(exist_ok=True)
    return d
 # ─── Embedding helper ────────────────────────────────────────
 async def _embed_texts(texts: list[str], model: str = EMBED_MODEL) -> list[list[float]]:
    embeddings = []
    async with client() as c:
        for text in texts:
            resp = await c.post("/api/embed", json={"model": model, "input": text})
            if resp.status_code != 200:
                raise HTTPException(502, f"Ollama embed error: {resp.text}")
            data = resp.json()
            embeddings.extend(data.get("embeddings", []))
    return embeddings
 async def _generate(prompt: str, model: str = GEN_MODEL, temperature: float = 0.3) -> str:
    async with client() as c:
        resp = await c.post("/api/generate", json={
            "model": model, "prompt": prompt, "stream": False,
            "options": {"temperature": temperature, "num_predict": 1024}
        })
        if resp.status_code != 200:
            raise HTTPException(502, f"Ollama generate error: {resp.text}")
        return resp.json().get("response", "")
 # ─── API: Exemplars ──────────────────────────────────────────
 class ExemplarAdd(BaseModel):
    category: str
    texts: list[str]
 class ExemplarList(BaseModel):
    categories: dict[str, int]  # category -> count
@router.post("/exemplars")
 async def add_exemplars(req: ExemplarAdd):
    """Add labeled exemplar texts for a category. Embeddings generated automatically."""
    category = req.category.strip().lower()
    if not category or not req.texts:
        raise HTTPException(400, "category and texts required")
    embeddings = await _embed_texts(req.texts)
    if category not in _exemplars:
        _exemplars[category] = []
    for text, emb in zip(req.texts, embeddings):
        _exemplars[category].append({"text": text, "embedding": emb})
    _save_exemplars()
    return {"ok": True, "category": category, "added": len(req.texts),
            "total": len(_exemplars[category])}
@router.get("/exemplars")
 async def list_exemplars():
    """List all exemplar categories and counts."""
    return {"categories": {k: len(v) for k, v in _exemplars.items()},
            "total": sum(len(v) for v in _exemplars.values())}
@router.delete("/exemplars/{category}")
 async def delete_exemplar_category(category: str):
    if category in _exemplars:
        del _exemplars[category]
        _save_exemplars()
    return {"ok": True}
 # ─── API: Screen (embedding-based classification) ────────────
 class ScreenRequest(BaseModel):
    texts: list[str]
    threshold: float = 0.65
    top_k: int = 1
 class ScreenResult(BaseModel):
    text: str
    best_category: str | None
    similarity: float
    above_threshold: bool
    all_scores: dict[str, float]
@router.post("/screen", response_model=list[ScreenResult])
 async def screen_texts(req: ScreenRequest):
    """Classify texts by cosine similarity to exemplar embeddings (fast path)."""
    if not _exemplars:
        raise HTTPException(400, "No exemplars defined. Add exemplars first.")
    embeddings = await _embed_texts(req.texts)
    results = []
    for text, emb in zip(req.texts, embeddings):
        category_scores = {}
        for category, exemplar_list in _exemplars.items():
            sims = [cosine_similarity(emb, ex["embedding"]) for ex in exemplar_list]
            category_scores[category] = max(sims) if sims else 0.0
        best_cat = max(category_scores, key=category_scores.get) if category_scores else None
        best_sim = category_scores.get(best_cat, 0.0) if best_cat else 0.0
        results.append(ScreenResult(
            text=text[:200],
            best_category=best_cat if best_sim >= req.threshold else None,
            similarity=round(best_sim, 4),
            above_threshold=best_sim >= req.threshold,
            all_scores={k: round(v, 4) for k, v in sorted(category_scores.items(),
                        key=lambda x: x[1], reverse=True)},
        ))
    return results
 # ─── API: Classify (LLM-based classification) ────────────────
 class ClassifyRequest(BaseModel):
    texts: list[str]
    categories: list[str] | None = None  # if None, use exemplar category names
    model: str | None = None
 class ClassifyResult(BaseModel):
    text: str
    category: str
    confidence: str
    reasoning: str
@router.post("/classify", response_model=list[ClassifyResult])
 async def classify_texts(req: ClassifyRequest):
    """Classify texts using LLM (slow but accurate path)."""
    categories = req.categories or list(_exemplars.keys())
    if not categories:
        raise HTTPException(400, "No categories. Provide categories or add exemplars.")
    model = req.model or GEN_MODEL
    results = []
    for text in req.texts:
        prompt = (
            f"Classify this text into exactly ONE of these categories: {', '.join(categories)}\n\n"
            f"TEXT: {text[:500]}\n\n"
            f"Respond with JSON: {{\"category\": \"...\", \"confidence\": \"high|medium|low\", "
            f"\"reasoning\": \"one sentence\"}}"
        )
        raw = await _generate(prompt, model=model, temperature=0.1)
        # Parse
        try:
            j_s, j_e = raw.find("{"), raw.rfind("}") + 1
            parsed = json.loads(raw[j_s:j_e]) if j_s >= 0 and j_e > j_s else {}
        except Exception:
            parsed = {}
        results.append(ClassifyResult(
            text=text[:200],
            category=parsed.get("category", "unknown"),
            confidence=parsed.get("confidence", "low"),
            reasoning=parsed.get("reasoning", raw[:200]),
        ))
    return results
 # ─── API: Benchmark (A/B comparison) ─────────────────────────
 class BenchmarkRequest(BaseModel):
    texts: list[str]
    threshold: float = 0.65
    model: str | None = None
 class BenchmarkResult(BaseModel):
    total_texts: int
    # Embedding path
    embed_time_ms: int
    embed_results: list[dict]
    # LLM path
    llm_time_ms: int
    llm_results: list[dict]
    # Comparison
    agreement_rate: float
    speedup: float
    texts_screened_out: int
    texts_needing_llm: int
    hybrid_estimated_ms: int
@router.post("/benchmark", response_model=BenchmarkResult)
 async def benchmark(req: BenchmarkRequest):
    """Run same texts through embedding screening and LLM classification. Compare."""
    if not _exemplars:
        raise HTTPException(400, "No exemplars. Add exemplars first.")
    categories = list(_exemplars.keys())
    # Embedding path
    t0 = time.monotonic()
    embed_results = await screen_texts(ScreenRequest(
        texts=req.texts, threshold=req.threshold
    ))
    embed_ms = int((time.monotonic() - t0) * 1000)
    # LLM path
    t0 = time.monotonic()
    llm_results = await classify_texts(ClassifyRequest(
        texts=req.texts, categories=categories, model=req.model
    ))
    llm_ms = int((time.monotonic() - t0) * 1000)
    # Compare
    agreements = 0
    screened_out = 0
    for er, lr in zip(embed_results, llm_results):
        if not er.above_threshold:
            screened_out += 1
        if er.best_category == lr.category:
            agreements += 1
    needing_llm = len(req.texts) - screened_out
    # Hybrid estimate: embed all + LLM only the uncertain ones
    per_text_embed_ms = embed_ms / max(len(req.texts), 1)
    per_text_llm_ms = llm_ms / max(len(req.texts), 1)
    hybrid_ms = int(embed_ms + needing_llm * per_text_llm_ms)
    return BenchmarkResult(
        total_texts=len(req.texts),
        embed_time_ms=embed_ms,
        embed_results=[r.model_dump() for r in embed_results],
        llm_time_ms=llm_ms,
        llm_results=[r.model_dump() for r in llm_results],
        agreement_rate=round(agreements / max(len(req.texts), 1), 3),
        speedup=round(llm_ms / max(hybrid_ms, 1), 2),
        texts_screened_out=screened_out,
        texts_needing_llm=needing_llm,
        hybrid_estimated_ms=hybrid_ms,
    )
 # ─── API: Pipeline definition & execution ────────────────────
 class PipelineStage(BaseModel):
    name: str
    mode: str  # "screen", "classify", "extract", "validate", "custom"
    config: dict = {}  # stage-specific config (threshold, prompt, etc.)
 class PipelineDef(BaseModel):
    name: str
    stages: list[PipelineStage]
    description: str = ""
 class PipelineRunRequest(BaseModel):
    pipeline_name: str
    texts: list[str]
@router.post("/pipelines")
 async def save_pipeline(pipeline: PipelineDef):
    """Save a pipeline definition."""
    fp = _pipelines_dir() / f"{pipeline.name}.json"
    fp.write_text(pipeline.model_dump_json(indent=2))
    return {"ok": True, "name": pipeline.name}
@router.get("/pipelines")
 async def list_pipelines():
    """List saved pipeline definitions."""
    pipelines = []
    for fp in _pipelines_dir().glob("*.json"):
        try:
            data = json.loads(fp.read_text())
            pipelines.append({"name": data["name"], "stages": len(data["stages"]),
                             "description": data.get("description", "")})
        except Exception:
            pass
    return {"pipelines": pipelines}
@router.get("/pipelines/{name}")
 async def get_pipeline(name: str):
    fp = _pipelines_dir() / f"{name}.json"
    if not fp.exists():
        raise HTTPException(404, "Pipeline not found")
    return json.loads(fp.read_text())
@router.post("/pipelines/run")
 async def run_pipeline(req: PipelineRunRequest):
    """Execute a pipeline on a set of texts. Returns per-stage results and timing."""
    fp = _pipelines_dir() / f"{req.pipeline_name}.json"
    if not fp.exists():
        raise HTTPException(404, f"Pipeline '{req.pipeline_name}' not found")
    pipeline = json.loads(fp.read_text())
    results = {"pipeline": req.pipeline_name, "stages": [], "total_ms": 0}
    current_texts = req.texts[:]
    for stage_def in pipeline["stages"]:
        stage_name = stage_def["name"]
        mode = stage_def["mode"]
        config = stage_def.get("config", {})
        t0 = time.monotonic()
        stage_result = {"name": stage_name, "mode": mode, "input_count": len(current_texts)}
        if mode == "screen":
            threshold = config.get("threshold", 0.65)
            screen_res = await screen_texts(ScreenRequest(
                texts=current_texts, threshold=threshold
            ))
            passed = [r for r in screen_res if r.above_threshold]
            stage_result["output_count"] = len(passed)
            stage_result["filtered_out"] = len(current_texts) - len(passed)
            stage_result["results"] = [r.model_dump() for r in screen_res]
            # Pass only above-threshold texts to next stage
            current_texts = [r.text for r in screen_res if r.above_threshold]
        elif mode == "classify":
            cls_res = await classify_texts(ClassifyRequest(
                texts=current_texts,
                categories=config.get("categories"),
                model=config.get("model"),
            ))
            stage_result["output_count"] = len(cls_res)
            stage_result["results"] = [r.model_dump() for r in cls_res]
        elif mode == "extract":
            extract_prompt = config.get("prompt", "Extract key information from this text:")
            extractions = []
            for text in current_texts:
                raw = await _generate(f"{extract_prompt}\n\nTEXT: {text[:800]}")
                extractions.append({"text": text[:200], "extracted": raw})
            stage_result["output_count"] = len(extractions)
            stage_result["results"] = extractions
        elif mode == "validate":
            # Embedding-based dedup: find near-duplicate results
            if len(current_texts) > 1:
                embs = await _embed_texts(current_texts)
                dupes = []
                threshold = config.get("dedup_threshold", 0.92)
                for i in range(len(embs)):
                    for j in range(i + 1, len(embs)):
                        sim = cosine_similarity(embs[i], embs[j])
                        if sim >= threshold:
                            dupes.append({"i": i, "j": j, "similarity": round(sim, 4),
                                         "text_a": current_texts[i][:100],
                                         "text_b": current_texts[j][:100]})
                stage_result["duplicates_found"] = len(dupes)
                stage_result["results"] = dupes
            else:
                stage_result["duplicates_found"] = 0
                stage_result["results"] = []
            stage_result["output_count"] = len(current_texts)
        else:
            stage_result["error"] = f"Unknown mode: {mode}"
            stage_result["output_count"] = len(current_texts)
        stage_ms = int((time.monotonic() - t0) * 1000)
        stage_result["time_ms"] = stage_ms
        results["stages"].append(stage_result)
        results["total_ms"] += stage_ms
    return results
 # ─── API: REPL cell (free-form eval) ─────────────────────────
 class CellRequest(BaseModel):
    action: str  # "embed", "generate", "similarity", "screen", "classify"
    text: str = ""
    texts: list[str] = []
    params: dict = {}
@router.post("/cell")
 async def run_cell(req: CellRequest):
    """Execute a single notebook cell. Flexible entry point for ad-hoc operations."""
    t0 = time.monotonic()
    result = {}
    if req.action == "embed":
        texts = req.texts or ([req.text] if req.text else [])
        embs = await _embed_texts(texts)
        result = {"embeddings_count": len(embs), "dimensions": len(embs[0]) if embs else 0,
                  "texts": texts}
    elif req.action == "generate":
        text = await _generate(req.text, **{k: v for k, v in req.params.items()
                                             if k in ("model", "temperature")})
        result = {"text": text}
    elif req.action == "similarity":
        if len(req.texts) < 2:
            raise HTTPException(400, "Need at least 2 texts for similarity")
        embs = await _embed_texts(req.texts)
        matrix = []
        for i in range(len(embs)):
            row = []
            for j in range(len(embs)):
                row.append(round(cosine_similarity(embs[i], embs[j]), 4))
            matrix.append(row)
        result = {"matrix": matrix, "texts": [t[:80] for t in req.texts]}
    elif req.action == "screen":
        texts = req.texts or ([req.text] if req.text else [])
        threshold = req.params.get("threshold", 0.65)
        res = await screen_texts(ScreenRequest(texts=texts, threshold=threshold))
        result = {"results": [r.model_dump() for r in res]}
    elif req.action == "classify":
        texts = req.texts or ([req.text] if req.text else [])
        res = await classify_texts(ClassifyRequest(texts=texts))
        result = {"results": [r.model_dump() for r in res]}
    else:
        raise HTTPException(400, f"Unknown action: {req.action}")
    result["time_ms"] = int((time.monotonic() - t0) * 1000)
    return result
--- a/tests/agent_test/PRD.md
+++ b/tests/agent_test/PRD.md
@ -0,0 +1,90 @@
 # PRD: Chicago Permit Staffing Recommendation
 ## Mission
 You are a staffing-intelligence assistant. Your job is to **analyze a Chicago building permit and produce a one-page staffing recommendation** for our staffing company.
 The output is a markdown document that a human staffing coordinator will read in under 2 minutes to decide whether to pursue the contract for staffing fit.
 ## Critical rules
 1. **DO NOT START WRITING THE FINAL ANALYSIS YET.**
   - First, READ this PRD fully.
   - Then, PLAN your approach in `note()` — what steps will you take, what tools will you call, what evidence will you need.
   - Only after planning, begin executing.
 2. **Never invent facts.** If you don't have evidence for a claim (from a tool call), do not make the claim. Say "no evidence available" instead.
 3. **Cite your sources.** Every factual claim in the final output should reference either:
   - The permit data you read (cite the permit ID)
   - A matrix-retrieved chunk (cite as `[matrix:source:doc_id]`)
 4. **Stay focused.** This is a one-page deliverable, not a research paper. Aim for 600-1000 words total.
 ## Tools available
 - `list_permits(min_cost?: number, permit_type?: string)` — list permits matching filter; default returns top 5 by cost
 - `read_permit(permit_id: string)` — get full details for one permit
 - `query_matrix(query: string, top_k?: number)` — search the knowledge base for relevant context (contractor entities, prior permits, SEC tickers, LLM team patterns)
 - `note(text: string)` — append to your working scratchpad (visible to you across iterations)
 - `read_scratchpad()` — read your full scratchpad
 - `done(summary: string)` — finish; pass your final markdown analysis as `summary`
 ## Required output structure
 When you call `done(summary=...)`, the summary should contain:
 ```markdown
 # Staffing Recommendation: Permit <ID>
 ## Permit Summary
 [2-3 sentences: type, cost, address, scope of work]
 ## Contractor Profile
 [What we know about the contractor(s) from matrix evidence. If no matrix hits, say so explicitly.]
 ## Staffing Implications
 [What trades + headcount this permit implies. Ground in the work description.]
 ## Risk Signals
 [Any matrix hits suggesting caution: debarment, prior incidents, low-quality history. If none, say so.]
 ## Recommendation
 [Pursue / Pass / Investigate-Further, with one-sentence rationale.]
 ```
 ## Example workflow (do not copy verbatim)
 1. Note your plan: "I will list 5 mid-range permits, pick one with a private contractor, read it fully, query the matrix for the contractor name, then write the recommendation."
 2. Call `list_permits(min_cost=100000)` → see candidates
 3. **PICK A PERMIT WITH A PRIVATE CONTRACTOR (a person's name or a private LLC), NOT a government agency** like CDOT, City of Chicago, etc. Government permits have no useful contractor profile to recommend on.
 4. `read_permit(id)` → see all fields
 5. Call `query_matrix("<contractor name> contractor Chicago renovation")` → see what the matrix has
 6. Note any evidence found, gaps, surprises
 7. Call `done(summary="<final markdown>")`
 ## Success criteria
 - You called `done()` with a summary that follows the required structure
 - Every factual claim has a source (permit ID or matrix citation)
 - Total output is 600-1000 words
 - You did not invent contractor names, prior incidents, or capabilities
 - Plan was noted BEFORE execution started
 ## What "good" looks like
 - Plan is concrete (which permit, which queries)
 - Matrix queries are specific (contractor name + work type, not "find anything about this")
 - When matrix returns nothing useful, you say so honestly
 - Recommendation reflects the actual evidence, not boilerplate
 ## What "bad" looks like
 - Skipping the plan and jumping to execution
 - Making up contractor histories with no matrix evidence
 - Generic recommendations that don't reference the actual permit
 - Walls of text or structured padding to look thorough
 ## Begin
 Start by acknowledging you've read this PRD and noting your plan via `note()`. Then proceed.
--- a/tests/battery/compounding_battery.ts
+++ b/tests/battery/compounding_battery.ts
@ -0,0 +1,404 @@
 // Compounding Stress Battery — the rigorous smoke test.
 //
 // Three iterations against /v1/respond, each running:
 //   α  baseline (3 easy tasks)       — should complete local-only with boost
 //   β  drift   (3 niche tasks)       — forces executor miss → overseer fires
 //   γ  impossible (2 zero-supply)    — must fail honestly, no token explosion
 //   δ  distill outcomes              — writes distilled_*.jsonl + vector indexes
 //   ε  overseer meta-review          — gpt-oss:120b judges the iteration
 //   ζ  scrum judgment                — gpt-oss:120b reviews overseer proposals
 //
 // Iteration N+1 runs the same tasks as iteration N. We measure compounding:
 // does turns_per_task drop? does overseer_called_rate drop? does
 // correction_effective rise? If 3/5 metrics trend favorably, architecture
 // validated; otherwise the scrum verdict points at what to fix.
 //
 // Fail-fast: every error bubbles. No silent catches — the run ABORTS with
 // the underlying stack so we see exactly where the architecture broke.
 //
 // Runtime: ~60-90 min. Cloud cost: ~24-32 gpt-oss calls (well under daily cap).
 import { writeFile, mkdir, readFile } from "node:fs/promises";
 import { join } from "node:path";
 const GATEWAY = process.env.GATEWAY_URL ?? "http://localhost:3100";
 const LLM_TEAM = process.env.LLM_TEAM_URL ?? "http://localhost:5000";
 const BATTERY_DIR = process.env.BATTERY_DIR
  ?? "/home/profit/lakehouse/data/_kb/battery";
 // 10-minute timeout per /v1/respond call — cloud executor on a hard task
 // can chew for a while, and we want to see real behavior, not premature aborts.
 const RESPOND_TIMEOUT_MS = 10 * 60 * 1000;
 const META_TIMEOUT_MS = 5 * 60 * 1000;
 interface Task {
  task_class: string;
  operation: string;
  spec: Record<string, any>;
 }
 interface Tasks {
  phases: {
    alpha_baseline: Task[];
    beta_drift: Task[];
    gamma_impossible: Task[];
  };
  models: {
    executor_cloud: string;
    reviewer_cloud: string;
    overseer_cloud: string;
  };
 }
 interface RunResult {
  status: "ok" | "failed" | "blocked";
  iterations: number;
  artifact: any;
  log: any[];
  error?: string | null;
  _elapsed_ms: number;
 }
 interface TaskRun {
  task: Task;
  phase: "alpha" | "beta" | "gamma";
  result: RunResult;
 }
 // ─── HTTP helpers ───
 async function runRespond(task: Task, models: Tasks["models"]): Promise<RunResult> {
  const body = {
    task_class: task.task_class,
    operation: task.operation,
    spec: task.spec,
    executor_model: models.executor_cloud,
    reviewer_model: models.reviewer_cloud,
  };
  const start = Date.now();
  const resp = await fetch(`${GATEWAY}/v1/respond`, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify(body),
    signal: AbortSignal.timeout(RESPOND_TIMEOUT_MS),
  });
  if (!resp.ok) {
    const txt = await resp.text();
    throw new Error(`/v1/respond HTTP ${resp.status}: ${txt.slice(0, 500)}`);
  }
  const j = (await resp.json()) as RunResult;
  j._elapsed_ms = Date.now() - start;
  return j;
 }
 async function runDistill(source: string): Promise<any[]> {
  const body = { mode: "distill", prompt: "battery iteration distill", source };
  const resp = await fetch(`${LLM_TEAM}/api/run?mode=distill`, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify(body),
    signal: AbortSignal.timeout(META_TIMEOUT_MS),
  });
  if (!resp.ok) throw new Error(`distill HTTP ${resp.status}`);
  const text = await resp.text();
  // SSE stream — parse data: lines, return parsed event objects
  const events: any[] = [];
  for (const line of text.split("\n")) {
    if (!line.startsWith("data: ")) continue;
    try { events.push(JSON.parse(line.slice(6))); } catch { /* skip */ }
  }
  return events;
 }
 async function cloudChat(
  model: string,
  prompt: string,
  temperature: number,
  think: boolean,
 ): Promise<string> {
  const resp = await fetch(`${GATEWAY}/v1/chat`, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({
      model,
      messages: [{ role: "user", content: prompt }],
      temperature,
      think,
      provider: "ollama_cloud",
    }),
    signal: AbortSignal.timeout(META_TIMEOUT_MS),
  });
  if (!resp.ok) {
    const txt = await resp.text();
    throw new Error(`/v1/chat ${model} HTTP ${resp.status}: ${txt.slice(0, 500)}`);
  }
  const j = await resp.json() as any;
  return j.choices?.[0]?.message?.content ?? "";
 }
 // ─── Meta-review + scrum ───
 async function overseerReview(
  iterNum: number,
  artifacts: any,
  models: Tasks["models"],
 ): Promise<string> {
  const prompt = `You are the OVERSEER reviewing iteration ${iterNum} of a stress battery run against Lakehouse /v1/respond.
 For each task in the battery below, examine: status (ok/failed/blocked), iterations used, error signature, whether the in-loop overseer fired, total tokens.
 Produce a PR-style meta-review in markdown with these sections:
 ## What worked
 List specific tasks (by operation string) that completed correctly and the evidence — turns_used, citations, tokens. Be concrete.
 ## What failed
 List specific tasks that failed or needed overseer correction. Classify: was it a real failure (impossible task), a drift we should repair, or a false positive from the test?
 ## Proposed changes for iteration ${iterNum + 1}
 At least 3 concrete architectural changes, each with:
 - **Target file** (e.g. \`crates/gateway/src/execution_loop/mod.rs\`)
 - **Rationale** (what the metrics show)
 - **Expected impact** (which metric should move in iter ${iterNum + 1})
 Be honest about weaknesses. Do NOT propose generic best practices — reference specific observations from the artifacts below.
 ARTIFACTS (iteration ${iterNum}):
 ${JSON.stringify(artifacts, null, 2).slice(0, 30000)}`;
  return cloudChat(models.overseer_cloud, prompt, 0.2, true);
 }
 async function scrumJudge(
  iterNum: number,
  review: string,
  models: Tasks["models"],
 ): Promise<string> {
  const prompt = `You are the SCRUM MASTER. The OVERSEER proposed these architectural changes for iteration ${iterNum + 1} based on iteration ${iterNum}'s results.
 For each proposal, produce a verdict in markdown:
 - **Proposal N**: <short name>
  - **Verdict**: APPROVE | REVISE | REJECT
  - **Reason**: why
  - **If APPROVE**: is the expected impact realistic? what's the blast radius? is the target file correct?
  - **If REVISE**: what should change about the proposal before applying?
  - **If REJECT**: why is the proposal wrong or out of scope?
 Final section:
 ## PR-ready changes
 Bulleted list of only the APPROVE proposals, ready to apply.
 Be rigorous. Don't rubber-stamp. If a proposal references a file that probably doesn't exist, REJECT and say so. If a proposal is a generic "improve X" without concrete plan, REVISE.
 OVERSEER PROPOSED:
 ${review.slice(0, 15000)}`;
  return cloudChat(models.overseer_cloud, prompt, 0.1, true);
 }
 // ─── Iteration driver ───
 async function runIteration(iterNum: number, tasks: Tasks): Promise<any> {
  console.log(`\n${"═".repeat(60)}`);
  console.log(`▶ ITERATION ${iterNum}`);
  console.log(`${"═".repeat(60)}\n`);
  const iterDir = join(BATTERY_DIR, `iter_${iterNum}`);
  await mkdir(iterDir, { recursive: true });
  const runs: TaskRun[] = [];
  for (const [phaseKey, phaseName] of [
    ["alpha_baseline", "alpha"],
    ["beta_drift", "beta"],
    ["gamma_impossible", "gamma"],
  ] as const) {
    console.log(`\n── Phase ${phaseName} ──`);
    for (const task of tasks.phases[phaseKey]) {
      console.log(`  ▶ ${task.operation}`);
      const result = await runRespond(task, tasks.models);
      const overseerFired = (result.log ?? []).some(e => e.kind === "overseer_correction");
      console.log(
        `    status=${result.status} turns=${result.iterations}` +
        ` tokens=${result.artifact?.usage?.total_tokens ?? 0}` +
        ` overseer=${overseerFired}` +
        ` elapsed=${Math.round(result._elapsed_ms / 1000)}s`
      );
      if (result.error) console.log(`    error: ${result.error.slice(0, 200)}`);
      runs.push({ task, phase: phaseName, result });
    }
  }
  // Phase δ
  console.log(`\n── Phase δ: distill outcomes_tail:20 ──`);
  const distillEvents = await runDistill("outcomes_tail:20");
  const distillFinal = [...distillEvents].reverse()
    .find(e => e.role === "final") ?? distillEvents[distillEvents.length - 1];
  const distillText = distillFinal?.text ?? JSON.stringify(distillFinal ?? {}).slice(0, 200);
  console.log(`  ${distillText.split("\n")[0]}`);
  await writeFile(join(iterDir, "distill_output.txt"), distillText);
  // Metrics
  const collectPhase = (p: string) => runs.filter(r => r.phase === p);
  const phaseMetrics = (p: string) => {
    const ps = collectPhase(p);
    if (ps.length === 0) return { count: 0 };
    return {
      count: ps.length,
      ok: ps.filter(r => r.result.status === "ok").length,
      failed: ps.filter(r => r.result.status === "failed").length,
      avg_turns: ps.reduce((s, r) => s + (r.result.iterations || 0), 0) / ps.length,
      total_tokens: ps.reduce((s, r) => s + (r.result.artifact?.usage?.total_tokens ?? 0), 0),
      overseer_called: ps.filter(r => (r.result.log ?? []).some(e => e.kind === "overseer_correction")).length,
      avg_elapsed_s: ps.reduce((s, r) => s + (r.result._elapsed_ms || 0), 0) / ps.length / 1000,
    };
  };
  const metrics = {
    iteration: iterNum,
    total_tasks: runs.length,
    ok_tasks: runs.filter(r => r.result.status === "ok").length,
    failed_tasks: runs.filter(r => r.result.status === "failed").length,
    blocked_tasks: runs.filter(r => r.result.status === "blocked").length,
    total_tokens: runs.reduce((s, r) => s + (r.result.artifact?.usage?.total_tokens ?? 0), 0),
    avg_turns_per_task: runs.reduce((s, r) => s + (r.result.iterations || 0), 0) / runs.length,
    overseer_called_rate: runs.filter(r => (r.result.log ?? []).some(e => e.kind === "overseer_correction")).length / runs.length,
    total_elapsed_s: runs.reduce((s, r) => s + (r.result._elapsed_ms || 0), 0) / 1000,
    by_phase: {
      alpha: phaseMetrics("alpha"),
      beta: phaseMetrics("beta"),
      gamma: phaseMetrics("gamma"),
    },
  };
  console.log(`\n── Metrics ──`);
  console.log(`  total_tokens: ${metrics.total_tokens}`);
  console.log(`  avg_turns_per_task: ${metrics.avg_turns_per_task.toFixed(2)}`);
  console.log(`  overseer_called_rate: ${(metrics.overseer_called_rate * 100).toFixed(1)}%`);
  console.log(`  ok/total: ${metrics.ok_tasks}/${metrics.total_tasks}`);
  await writeFile(join(iterDir, "runs.json"), JSON.stringify(runs, null, 2));
  await writeFile(join(iterDir, "metrics.json"), JSON.stringify(metrics, null, 2));
  // Phase ε: overseer review
  console.log(`\n── Phase ε: overseer meta-review ──`);
  const reviewInput = {
    metrics,
    task_summary: runs.map(r => ({
      operation: r.task.operation,
      phase: r.phase,
      status: r.result.status,
      iterations: r.result.iterations,
      tokens: r.result.artifact?.usage?.total_tokens ?? 0,
      overseer_called: (r.result.log ?? []).some(e => e.kind === "overseer_correction"),
      error: r.result.error ?? null,
      elapsed_s: Math.round((r.result._elapsed_ms || 0) / 1000),
    })),
  };
  const review = await overseerReview(iterNum, reviewInput, tasks.models);
  await writeFile(join(iterDir, "overseer_review.md"), review);
  console.log(`  ✓ ${review.length} chars`);
  // Phase ζ: scrum
  console.log(`\n── Phase ζ: scrum judgment ──`);
  const verdict = await scrumJudge(iterNum, review, tasks.models);
  await writeFile(join(iterDir, "scrum_findings.md"), verdict);
  console.log(`  ✓ ${verdict.length} chars`);
  return metrics;
 }
 // ─── Main ───
 async function main() {
  const tasks = JSON.parse(
    await readFile("/home/profit/lakehouse/tests/battery/tasks.json", "utf8"),
  ) as Tasks;
  await mkdir(BATTERY_DIR, { recursive: true });
  const iterations: any[] = [];
  const batteryStart = Date.now();
  for (let i = 1; i <= 3; i++) {
    const m = await runIteration(i, tasks);
    iterations.push(m);
  }
  const batteryElapsed = (Date.now() - batteryStart) / 1000;
  // Summary
  const delta = (k: keyof any, inverted = false) => {
    const vals = iterations.map((m: any) => m[k]);
    if (vals.some(v => v === undefined)) return "—";
    const diff = vals[2] - vals[0];
    const pct = vals[0] !== 0 ? (diff / vals[0]) * 100 : 0;
    const arrow = inverted ? (diff < 0 ? "↓ better" : "↑ worse") : (diff > 0 ? "↑ better" : "↓ worse");
    return `${arrow} (${diff > 0 ? "+" : ""}${diff.toFixed?.(2) ?? diff}, ${pct.toFixed(1)}%)`;
  };
  const rows = [
    ["total_tokens", "inverted", "want ↓ — fewer tokens for same work"],
    ["avg_turns_per_task", "inverted", "want ↓ — executor gets smarter"],
    ["overseer_called_rate", "inverted", "want ↓ — fewer cloud escalations"],
    ["ok_tasks", "normal", "want ↑ — more successes"],
    ["total_elapsed_s", "inverted", "want ↓ — faster iterations"],
  ];
  let summary = `# Compounding Stress Battery — Summary\n\n`;
  summary += `**Run:** ${new Date().toISOString()}\n`;
  summary += `**Elapsed:** ${Math.round(batteryElapsed)}s (${(batteryElapsed/60).toFixed(1)} min)\n`;
  summary += `**Models:** executor=${tasks.models.executor_cloud}, reviewer=${tasks.models.reviewer_cloud}, overseer=${tasks.models.overseer_cloud}\n\n`;
  summary += `## Compounding Metrics\n\n`;
  summary += `| Metric | iter 1 | iter 2 | iter 3 | Trend (1→3) | Goal |\n`;
  summary += `|---|---|---|---|---|---|\n`;
  for (const [key, inv, goal] of rows) {
    const vals = iterations.map((m: any) => {
      const v = m[key as string];
      return typeof v === "number" ? v.toFixed(2) : String(v);
    });
    summary += `| ${key} | ${vals[0]} | ${vals[1]} | ${vals[2]} | ${delta(key as any, inv === "inverted")} | ${goal} |\n`;
  }
  summary += "\n";
  // Count trending metrics
  const trends = rows.map(([k, inv]) => {
    const vs = iterations.map((m: any) => m[k as string]) as number[];
    const improved = inv === "inverted" ? vs[2] < vs[0] : vs[2] > vs[0];
    return { metric: k, improved };
  });
  const improvedCount = trends.filter(t => t.improved).length;
  summary += `## Verdict\n\n`;
  if (improvedCount >= 3) {
    summary += `**✓ Architecture validated** — ${improvedCount}/${trends.length} compounding metrics improved from iteration 1 to 3.\n\n`;
  } else {
    summary += `**✗ Compounding NOT demonstrated** — only ${improvedCount}/${trends.length} metrics improved. See scrum_findings.md in each iter_N/ directory for the overseer's proposals and the scrum master's review of what to change.\n\n`;
  }
  summary += `Metrics that ${improvedCount >= 3 ? "improved" : "regressed"}:\n`;
  for (const t of trends) {
    summary += `- ${t.metric}: ${t.improved ? "✓ improved" : "✗ flat or worse"}\n`;
  }
  summary += `\n## Artifacts\n\n`;
  summary += `- \`iter_1/\`, \`iter_2/\`, \`iter_3/\` — per-iteration runs.json, metrics.json, overseer_review.md, scrum_findings.md, distill_output.txt\n`;
  summary += `- \`summary.md\` — this file\n`;
  await writeFile(join(BATTERY_DIR, "summary.md"), summary);
  console.log(`\n${"═".repeat(60)}`);
  console.log(`✓ BATTERY COMPLETE — ${Math.round(batteryElapsed)}s`);
  console.log(`  Summary: ${join(BATTERY_DIR, "summary.md")}`);
  console.log(`${"═".repeat(60)}\n`);
  console.log(summary);
 }
 main().catch(e => {
  console.error(`\n${"═".repeat(60)}`);
  console.error(`✗ BATTERY FAILED: ${e.message}`);
  console.error(`${"═".repeat(60)}\n`);
  if (e.stack) console.error(e.stack);
  process.exit(1);
 });
--- a/tests/battery/tasks.json
+++ b/tests/battery/tasks.json
@ -0,0 +1,57 @@
 {
  "description": "Compounding stress battery tasks. Each iteration runs α (baseline) + β (drift) + γ (impossible) phases. The SAME tasks repeat across iterations so we can measure compounding (turns_used, overseer_called_rate, correction_effective).",
  "phases": {
    "alpha_baseline": [
      {
        "task_class": "staffing.fill",
        "operation": "fill: Warehouse Associate x3 in Columbus, OH",
        "spec": { "target_role": "Warehouse Associate", "target_count": 3, "target_city": "Columbus", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
      },
      {
        "task_class": "staffing.fill",
        "operation": "fill: Forklift Operator x2 in Toledo, OH",
        "spec": { "target_role": "Forklift Operator", "target_count": 2, "target_city": "Toledo", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
      },
      {
        "task_class": "staffing.fill",
        "operation": "fill: Packer x4 in Cleveland, OH",
        "spec": { "target_role": "Packer", "target_count": 4, "target_city": "Cleveland", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
      }
    ],
    "beta_drift": [
      {
        "task_class": "staffing.fill",
        "operation": "fill: Machine Operator x2 in Youngstown, OH (requires OSHA 30 + bilingual Spanish)",
        "spec": { "target_role": "Machine Operator", "target_count": 2, "target_city": "Youngstown", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1; prefer candidates with OSHA certification and Spanish" }
      },
      {
        "task_class": "staffing.fill",
        "operation": "fill: Welder x2 in Dayton, OH (AWS D1.1 certified, night shift)",
        "spec": { "target_role": "Welder", "target_count": 2, "target_city": "Dayton", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1; filter by certification and shift flexibility" }
      },
      {
        "task_class": "staffing.fill",
        "operation": "fill: Assembler x5 in Akron, OH (SMT experience, cleanroom)",
        "spec": { "target_role": "Assembler", "target_count": 5, "target_city": "Akron", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1" }
      }
    ],
    "gamma_impossible": [
      {
        "task_class": "staffing.fill",
        "operation": "fill: Underwater Welder x2 in Toledo, OH",
        "spec": { "target_role": "Underwater Welder", "target_count": 2, "target_city": "Toledo", "target_state": "OH", "approach_hint": "hybrid search against workers_500k_v1 (expected to fail — no supply)" }
      },
      {
        "task_class": "staffing.fill",
        "operation": "fill: Astronaut x1 in Springfield, OH",
        "spec": { "target_role": "Astronaut", "target_count": 1, "target_city": "Springfield", "target_state": "OH", "approach_hint": "(expected to fail — out-of-domain role)" }
      }
    ]
  },
  "models": {
    "executor_cloud": "gpt-oss:20b",
    "reviewer_cloud": "gpt-oss:20b",
    "overseer_cloud": "gpt-oss:120b",
    "notes": "gpt-oss:20b for hot path (faster, cheaper per call), gpt-oss:120b for meta-reviews. All cloud per 2026-04-23 'cloud modes are on' directive."
  }
 }
--- a/tests/multi-agent/playbooks/ab_scorecard.json
+++ b/tests/multi-agent/playbooks/ab_scorecard.json
@ -0,0 +1,45 @@
 {
  "generated_at": "2026-04-21T00:44:59.486489Z",
  "runs": [
    {
      "label": "A(no-T3)",
      "path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-30-54",
      "ok_events": 0,
      "total_events": 5,
      "total_turns": 0,
      "total_gaps": 5,
      "total_citations": 0,
      "prior_lessons_loaded": 0
    },
    {
      "label": "B(T3-seed)",
      "path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-37-04",
      "ok_events": 0,
      "total_events": 5,
      "total_turns": 0,
      "total_gaps": 5,
      "total_citations": 0,
      "prior_lessons_loaded": 1
    },
    {
      "label": "C(T3-read)",
      "path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-39-54",
      "ok_events": 0,
      "total_events": 5,
      "total_turns": 0,
      "total_gaps": 5,
      "total_citations": 0,
      "prior_lessons_loaded": 2
    },
    {
      "label": "D(T3-cloud)",
      "path": "tests/multi-agent/playbooks/scenario-2026-04-21T00-43-44",
      "ok_events": 0,
      "total_events": 5,
      "total_turns": 0,
      "total_gaps": 5,
      "total_citations": 0,
      "prior_lessons_loaded": 3
    }
  ]
 }
--- a/tests/multi-agent/playbooks/kb_measurement.md
+++ b/tests/multi-agent/playbooks/kb_measurement.md
@ -0,0 +1,25 @@
 # KB Measurement Report
 Generated from 26 runs across 24 distinct signatures.
 ## Recommender confidence
 - high: 23
 - medium: 1
 - low: 3
 ## Overall fill + citation
 - Fill rate: **60/86** (69.8%)
 - Avg citations per run: **1.38**
 - Avg turns per run: 6.6
 ## Citation coverage by (role, city, state)
 - Combos with ≥1 citation: 9
 - Combos with ok fills but 0 citations: 31
 ## Item 3 decision signal
 Non-zero: there are **combos that succeeded but never triggered playbook_memory boost**. Candidates for item 3 investigation:
 - Machine Operator in Indianapolis, IN: 1/1 ok, 0 cites
 - Assembler in Indianapolis, IN: 2/2 ok, 0 cites
 - Warehouse Associate in Indianapolis, IN: 1/1 ok, 0 cites
 - Forklift Operator in Cleveland, OH: 1/1 ok, 0 cites
 - Assembler in Cleveland, OH: 2/2 ok, 0 cites
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/dispatch.jsonl
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/dispatch.jsonl
@ -0,0 +1 @@
 {"at":"10:30","kind":"recurring","operation":"fill: Machine Operator x2 in Toledo, OH","fills":[{"candidate_id":"W500K-29048","name":"Raymond G. Ward","reason":"Raymond has a high availability score (0.9033) and relevant skills for the Machine Operator position in Toledo, OH. He is also endorsed by past similar fills."},{"candidate_id":"W500K-20613","name":"Pamela V. Green","reason":"Pamela has a high availability score (0.8496) and relevant skills for the Machine Operator position in Toledo, OH. She is also endorsed by past similar fills."}],"turns":5,"duration_secs":12.051,"pool_size":997,"playbook_citations":[]}
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/emails.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/emails.md
@ -0,0 +1,17 @@
 # Client emails — Riverfront Steel, 2026-04-21
 ## 10:30 recurring — Machine Operator x2
 Subject: 2 Filled
 Dear Riverfront Steel Team,
 We are pleased to confirm that we have filled both Machine Operator roles for your Tuesday/Thursday shifts starting at 11:00 AM.
 - Raymond G. Ward
 - Pamela V. Green
 Both candidates have high availability scores and relevant experience. Please note this is a recurring slot, and prior workers may still be available.
 Best regards,
 Dispatch Team Lakehouse
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/report.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/report.md
@ -0,0 +1,45 @@
 # Scenario retrospective — Riverfront Steel, 2026-04-21
 Executor: `mistral:latest`   Reviewer: `qwen2.5:latest`   Draft: `qwen2.5:latest`
 ## Events
 | At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
 |---|---|---|---|---|---|---|---|---|
 | 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 28.9 | 0 | 1 |
 | 10:30 | recurring | Machine Operator × 2 | 997 | ✓ 2 | 5 | 12.1 | 0 | 1 |
 | 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 20.3 | 0 | 1 |
 | 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 35.7 | 0 | 1 |
 | 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 11.5 | 0 | 1 |
 ## Final roster
 | Worker | Booked | Role | City, ST | Status |
 |---|---|---|---|---|
 | undefined Raymond G. Ward | 10:30 | Machine Operator | Toledo, OH | confirmed |
 | undefined Pamela V. Green | 10:30 | Machine Operator | Toledo, OH | confirmed |
 ## Gap signals
 ### drift_or_tool
 - **08:00** — invalid JSON from executor: JSON Parse error: Unable to parse JSON string | raw: {"kind":"plan","steps":["Verify one candidate from the current list using sql tool for SQL verification.","Propose_done with top 3 candidates who are Warehouse Associates in Toledo, OH."]}
 {"kind":"tool_call","tool":"sql","args":{"query":"SELECT worker_id, name, role, city, state, availability FROM
 - **12:15** — invalid JSON from executor: JSON Parse error: Expected ']' | raw: {"kind":"plan","steps":["1. Use hybrid_search to find Forklift Operators in Toledo, OH with high engagement and communications scores who are available for work at Riverfront Steel starting at 01:00 PM. The search should prioritize workers with team/collaboration signals (engagement, communications
 - **14:00** — no consensus after 14 turns
 - **15:45** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search", "args":{"index_name":"workers_500k_v1","sql_filter":"LOWER(role) LIKE '%warehouse%' AND city = 'Toledo' AND state = 'OH' AND availability > 0.5 AND shift = '08:00' AND worker_id NOT IN [, ] AND worker_id NOT IN ["EXCLUDE_WORKERS_ID1", "EXCLUDE_WORKERS_ID2"
 ### double_book
 - **10:30** — undefined Pamela V. Green already booked for 10:30
 ### fairness
 - _cross-event_ — Raymond G. Ward (undefined) booked 2 times today
 ### write_through_audit
 - _post-run_ — playbook_memory has 33 entries (ran 5 events, expected ≥ 1 new entries from this run)
 ## Narrative
 - 1/5 events reached consensus.
 - Final roster: 2 bookings across 1 distinct workers.
 - Playbook citations across the day: 0 (proof the feedback loop fired across events).
 - Dropped events: 08:00 baseline_fill, 12:15 expansion, 14:00 emergency, 15:45 misplacement.
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/results.json
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/results.json
@ -0,0 +1,118 @@
 [
  {
    "event": {
      "kind": "baseline_fill",
      "at": "08:00",
      "role": "Warehouse Associate",
      "count": 3,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "08:00 AM",
      "scenario_note": "Regular Monday morning shift, 8-hour."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 28.888,
    "error": "invalid JSON from executor: JSON Parse error: Unable to parse JSON string | raw: {\"kind\":\"plan\",\"steps\":[\"Verify one candidate from the current list using sql tool for SQL verification.\",\"Propose_done with top 3 candidates who are Warehouse Associates in Toledo, OH.\"]}\n{\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name, role, city, state, availability FROM ",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Unable to parse JSON string | raw: {\"kind\":\"plan\",\"steps\":[\"Verify one candidate from the current list using sql tool for SQL verification.\",\"Propose_done with top 3 candidates who are Warehouse Associates in Toledo, OH.\"]}\n{\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name, role, city, state, availability FROM "
    ]
  },
  {
    "event": {
      "kind": "recurring",
      "at": "10:30",
      "role": "Machine Operator",
      "count": 2,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "11:00 AM",
      "scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
    },
    "ok": true,
    "fills": [
      {
        "candidate_id": "W500K-29048",
        "name": "Raymond G. Ward",
        "reason": "Raymond has a high availability score (0.9033) and relevant skills for the Machine Operator position in Toledo, OH. He is also endorsed by past similar fills."
      },
      {
        "candidate_id": "W500K-20613",
        "name": "Pamela V. Green",
        "reason": "Pamela has a high availability score (0.8496) and relevant skills for the Machine Operator position in Toledo, OH. She is also endorsed by past similar fills."
      }
    ],
    "turns": 5,
    "duration_secs": 12.051,
    "gap_signals": [
      "double_book: undefined Pamela V. Green already booked for 10:30"
    ],
    "sources_first_score": 0.6692528,
    "sources_last_score": 0.64494026,
    "pool_size": 997,
    "playbook_citations": []
  },
  {
    "event": {
      "kind": "expansion",
      "at": "12:15",
      "role": "Forklift Operator",
      "count": 5,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "01:00 PM",
      "scenario_note": "New warehouse location opening, five-worker team needed."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 20.342,
    "error": "invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"1. Use hybrid_search to find Forklift Operators in Toledo, OH with high engagement and communications scores who are available for work at Riverfront Steel starting at 01:00 PM. The search should prioritize workers with team/collaboration signals (engagement, communications ",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Expected ']' | raw: {\"kind\":\"plan\",\"steps\":[\"1. Use hybrid_search to find Forklift Operators in Toledo, OH with high engagement and communications scores who are available for work at Riverfront Steel starting at 01:00 PM. The search should prioritize workers with team/collaboration signals (engagement, communications "
    ]
  },
  {
    "event": {
      "kind": "emergency",
      "at": "14:00",
      "role": "Loader",
      "count": 4,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "04:00 PM same day",
      "deadline": "16:00",
      "scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 35.727,
    "error": "no consensus after 14 turns",
    "gap_signals": [
      "drift_or_tool: no consensus after 14 turns"
    ]
  },
  {
    "event": {
      "kind": "misplacement",
      "at": "15:45",
      "role": "Warehouse Associate",
      "count": 1,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "remainder of 08:00 shift",
      "scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
      "replaces_event": "08:00"
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 11.518,
    "error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\", \"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%warehouse%' AND city = 'Toledo' AND state = 'OH' AND availability > 0.5 AND shift = '08:00' AND worker_id NOT IN [, ] AND worker_id NOT IN [\"EXCLUDE_WORKERS_ID1\", \"EXCLUDE_WORKERS_ID2\"",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\", \"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%warehouse%' AND city = 'Toledo' AND state = 'OH' AND availability > 0.5 AND shift = '08:00' AND worker_id NOT IN [, ] AND worker_id NOT IN [\"EXCLUDE_WORKERS_ID1\", \"EXCLUDE_WORKERS_ID2\""
    ]
  }
 ]
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/roster.json
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/roster.json
@ -0,0 +1,18 @@
 [
  {
    "name": "Raymond G. Ward",
    "booked_for": "10:30",
    "role": "Machine Operator",
    "city": "Toledo",
    "state": "OH",
    "status": "confirmed"
  },
  {
    "name": "Pamela V. Green",
    "booked_for": "10:30",
    "role": "Machine Operator",
    "city": "Toledo",
    "state": "OH",
    "status": "confirmed"
  }
 ]
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/sms.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T09-55-13/sms.md
@ -0,0 +1,11 @@
 # SMS drafts — Riverfront Steel, 2026-04-21
 ## 10:30 recurring — Machine Operator x2 in Toledo, OH
 TO: Raymond G. Ward
 Confirming your Machine Operator shift at Riverfront Steel in Toledo, OH starting at 11:00 AM on Tuesday/Thursday. Still available! 
 ---
 TO: Pamela V. Green
 Your Machine Operator shift at Riverfront Steel in Toledo, OH starts at 11:00 AM on Tuesday/Thursday. Confirm your availability please.
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T19-59-48/dispatch.jsonl
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T19-59-48/dispatch.jsonl
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T19-59-48/emails.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T19-59-48/emails.md
@ -0,0 +1 @@
 # Client emails — Riverfront Steel, 2026-04-21
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T19-59-48/sms.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T19-59-48/sms.md
@ -0,0 +1 @@
 # SMS drafts — Riverfront Steel, 2026-04-21
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-02-01/dispatch.jsonl
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-02-01/dispatch.jsonl
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-02-01/emails.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-02-01/emails.md
@ -0,0 +1,22 @@
 # Client emails — Riverfront Steel, 2026-04-21
 ## 12:15 expansion — Forklift Operator x5
 Subject: 5 Workers Confirmed
 Dear Riverfront Steel Team,
 I am pleased to confirm that we have filled all five positions for Forklift Operators at your new warehouse location opening today starting at 1:00 PM. The workers are:
 - Laura F. Morales
 - Kyle F. Brooks 
 - Maria K. Cruz
 - Jeffrey D. Taylor
 - Charles T. Walker
 All meet the criteria of being Forklift Operators in Toledo, OH.
 Looking forward to a successful shift!
 Best regards,
 Dispatch Team Lakehouse
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-02-01/sms.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-02-01/sms.md
@ -0,0 +1,26 @@
 # SMS drafts — Riverfront Steel, 2026-04-21
 ## 12:15 expansion — Forklift Operator x5 in Toledo, OH
 TO: Laura F. Morales
 Confirming your shift as a Forklift Operator at Riverfront Steel in Toledo, OH starting at 1:00 PM for the new warehouse location opening. W500K-22625
 ---
 TO: Kyle F. Brooks
 You're scheduled to start your Forklift Operator shift at Riverfront Steel in Toledo, OH at 1:00 PM for the new warehouse location opening. Confirm W500K-33961
 ---
 TO: Maria K. Cruz
 Your shift as a Forklift Operator at Riverfront Steel in Toledo, OH starts at 1:00 PM for the new warehouse location opening. Please confirm your attendance. W500K-19588
 ---
 TO: Jeffrey D. Taylor
 Confirming your shift as a Forklift Operator at Riverfront Steel in Toledo, OH starting at 1:00 PM for the new warehouse location opening. W500K-37729
 ---
 TO: Charles T. Walker
 You're scheduled to start your Forklift Operator shift at Riverfront Steel in Toledo, OH at 1:00 PM for the new warehouse location opening. Confirm your attendance. W500K-17543
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/dispatch.jsonl
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/dispatch.jsonl
@ -0,0 +1,2 @@
 {"at":"08:00","kind":"baseline_fill","operation":"fill: Warehouse Associate x3 in Toledo, OH","fills":[{"candidate_id":"W500K-49164","name":"Christopher Y. Phillips","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."},{"candidate_id":"W500K-40928","name":"Janet E. Hill","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."},{"candidate_id":"W500K-34704","name":"Fatima U. Rivera","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."}],"turns":7,"duration_secs":20.128,"pool_size":770,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)"}
 {"at":"10:30","kind":"recurring","operation":"fill: Machine Operator x2 in Toledo, OH","fills":[{"candidate_id":"W500K-19759","name":"Carmen Z. Rodriguez","reason":"Recurring Machine Operator in Toledo, OH with a score of 0.75, verified via sql tool."},{"candidate_id":"W500K-29298","name":"Robert W. Gonzalez","reason":"Recurring Machine Operator in Toledo, OH with a score of 0.74, not yet SQL verified but highly likely to meet requirements."}],"turns":5,"duration_secs":17.426,"pool_size":997,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)"}
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/emails.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/emails.md
@ -0,0 +1,40 @@
 # Client emails — Riverfront Steel, 2026-04-21
 ## 08:00 baseline_fill — Warehouse Associate x3
 Subject: 3 Filled
 Dear Riverfront Steel Staffing Team,
 I am pleased to confirm that we have filled all three roles of Warehouse Associate for your Monday morning shift starting at 08:00 AM.
 The workers assigned are:
 - Christopher Y. Phillips
 - Janet E. Hill 
 - Fatima U. Rivera
 All three have confirmed their availability and are reliable team members.
 Best regards,
 Dispatch Team Lakehouse
 ## 10:30 recurring — Machine Operator x2
 To: staffing@riverfrontsteel.example  
 From: dispatch@lakehouse.example  
 Subject: Confirmed
 Dear Riverfront Steel Team,
 We are pleased to confirm that we have filled both Machine Operator roles for your Tuesday/Thursday shifts starting at 11:00 AM. The workers assigned are:
 - Carmen Z. Rodriguez
 - Robert W. Gonzalez
 Both are recurring Machine Operators in Toledo, OH with a score of 0.7.
 Please note this is a recurring slot; prior workers may still be available.
 Best regards,
 Dispatch Team
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/report.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/report.md
@ -0,0 +1,74 @@
 # Scenario retrospective — Riverfront Steel, 2026-04-21
 Executor: `mistral:latest`   Reviewer: `qwen2.5:latest`   Draft: `qwen2.5:latest`
 ## Events
 | At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
 |---|---|---|---|---|---|---|---|---|
 | 08:00 | baseline_fill | Warehouse Associate × 3 | 770 | ✓ 3 | 7 | 20.1 | 0 | 2 |
 | 10:30 | recurring | Machine Operator × 2 | 997 | ✓ 2 | 5 | 17.4 | 0 | 2 |
 | 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 46.4 | 0 | 1 |
 | 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 54.1 | 0 | 1 |
 | 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 59.6 | 0 | 1 |
 ## Final roster
 | Worker | Booked | Role | City, ST | Status |
 |---|---|---|---|---|
 | undefined Christopher Y. Phillips | 08:00 | Warehouse Associate | Toledo, OH | no_show |
 | undefined Janet E. Hill | 08:00 | Warehouse Associate | Toledo, OH | confirmed |
 | undefined Fatima U. Rivera | 08:00 | Warehouse Associate | Toledo, OH | confirmed |
 | undefined Carmen Z. Rodriguez | 10:30 | Machine Operator | Toledo, OH | confirmed |
 | undefined Robert W. Gonzalez | 10:30 | Machine Operator | Toledo, OH | confirmed |
 ## Gap signals
 ### double_book
 - **08:00** — undefined Janet E. Hill already booked for 08:00
 - **08:00** — undefined Fatima U. Rivera already booked for 08:00
 - **10:30** — undefined Carmen Z. Rodriguez already booked for 08:00
 - **10:30** — undefined Robert W. Gonzalez already booked for 08:00
 ### drift_or_tool
 - **12:15** — invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {"kind":"plan", "steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1', 'sql_filter':'LOWER(role) LIKE '%forklift%' AND city = \'Toledo\' AND state = \'OH\' AND availability > CAST(0.5 AS DOUBLE) AND reliability > CAST(0.75 AS DOUBLE)', 'question':'reliable forklift operators in Toledo, O
 - **14:00** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search",
 "args":{"index_name":"workers_500k_v1","sql_filter":"LOWER(role) LIKE '%loader%' AND city = 'Toledo' AND state = 'OH' AND availability > CAST(0.7 AS DOUBLE) AND worker_id NOT IN ('W500K-4321', 'W500K-8963', 'W500K-2345', 'W500K-6789', 'W500K-9876') AND wor
 - **15:45** — no consensus after 14 turns
 ### fairness
 - _cross-event_ — Christopher Y. Phillips (undefined) booked 4 times today
 ### write_through_audit
 - _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 2 new entries from this run)
 ## Workers touched across the week
 6 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
 | Worker ID | Name | Events | Outcome |
 |---|---|---|---|
 | W500K-49164 | Christopher Y. Phillips | 08:00 baseline_fill | booked |
 | W500K-40928 | Janet E. Hill | 08:00 baseline_fill | booked |
 | W500K-34704 | Fatima U. Rivera | 08:00 baseline_fill | booked |
 | W500K-19759 | Carmen Z. Rodriguez | 10:30 recurring | booked |
 | W500K-29298 | Robert W. Gonzalez | 10:30 recurring | booked |
 | undefined | Christopher Y. Phillips | 08:00 | no_show |
 ## Discovered patterns (meta-index)
 What the system identified across semantically-similar past fills as each event ran:
 - **08:00 baseline_fill** (Warehouse Associate): Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)
 - **10:30 recurring** (Machine Operator): Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)
 - **12:15 expansion** (Forklift Operator): —
 - **14:00 emergency** (Loader): —
 - **15:45 misplacement** (Warehouse Associate): —
 ## Narrative
 - 2/5 events reached consensus.
 - Final roster: 5 bookings across 1 distinct workers.
 - Workers touched (booked, failed, or otherwise decided): 6.
 - Playbook citations across the day: 0 (proof the feedback loop fired across events).
 - Dropped events: 12:15 expansion, 14:00 emergency, 15:45 misplacement.
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/results.json
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/results.json
@ -0,0 +1,146 @@
 [
  {
    "event": {
      "kind": "baseline_fill",
      "at": "08:00",
      "role": "Warehouse Associate",
      "count": 3,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "08:00 AM",
      "scenario_note": "Regular Monday morning shift, 8-hour."
    },
    "ok": true,
    "fills": [
      {
        "candidate_id": "W500K-49164",
        "name": "Christopher Y. Phillips",
        "reason": "Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."
      },
      {
        "candidate_id": "W500K-40928",
        "name": "Janet E. Hill",
        "reason": "Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."
      },
      {
        "candidate_id": "W500K-34704",
        "name": "Fatima U. Rivera",
        "reason": "Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."
      }
    ],
    "turns": 7,
    "duration_secs": 20.128,
    "gap_signals": [
      "double_book: undefined Janet E. Hill already booked for 08:00",
      "double_book: undefined Fatima U. Rivera already booked for 08:00"
    ],
    "sources_first_score": 0.7124013,
    "sources_last_score": 0.66623676,
    "pool_size": 770,
    "playbook_citations": [],
    "discovered_pattern": "Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)"
  },
  {
    "event": {
      "kind": "recurring",
      "at": "10:30",
      "role": "Machine Operator",
      "count": 2,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "11:00 AM",
      "scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
    },
    "ok": true,
    "fills": [
      {
        "candidate_id": "W500K-19759",
        "name": "Carmen Z. Rodriguez",
        "reason": "Recurring Machine Operator in Toledo, OH with a score of 0.75, verified via sql tool."
      },
      {
        "candidate_id": "W500K-29298",
        "name": "Robert W. Gonzalez",
        "reason": "Recurring Machine Operator in Toledo, OH with a score of 0.74, not yet SQL verified but highly likely to meet requirements."
      }
    ],
    "turns": 5,
    "duration_secs": 17.426,
    "gap_signals": [
      "double_book: undefined Carmen Z. Rodriguez already booked for 08:00",
      "double_book: undefined Robert W. Gonzalez already booked for 08:00"
    ],
    "sources_first_score": 0.72546995,
    "sources_last_score": 0.6690281,
    "pool_size": 997,
    "playbook_citations": [],
    "discovered_pattern": "Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)"
  },
  {
    "event": {
      "kind": "expansion",
      "at": "12:15",
      "role": "Forklift Operator",
      "count": 5,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "01:00 PM",
      "scenario_note": "New warehouse location opening, five-worker team needed."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 46.391,
    "error": "invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\"kind\":\"plan\", \"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1', 'sql_filter':'LOWER(role) LIKE '%forklift%' AND city = \\'Toledo\\' AND state = \\'OH\\' AND availability > CAST(0.5 AS DOUBLE) AND reliability > CAST(0.75 AS DOUBLE)', 'question':'reliable forklift operators in Toledo, O",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Invalid escape character ' | raw: {\"kind\":\"plan\", \"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1', 'sql_filter':'LOWER(role) LIKE '%forklift%' AND city = \\'Toledo\\' AND state = \\'OH\\' AND availability > CAST(0.5 AS DOUBLE) AND reliability > CAST(0.75 AS DOUBLE)', 'question':'reliable forklift operators in Toledo, O"
    ]
  },
  {
    "event": {
      "kind": "emergency",
      "at": "14:00",
      "role": "Loader",
      "count": 4,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "04:00 PM same day",
      "deadline": "16:00",
      "scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 54.123,
    "error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%loader%' AND city = 'Toledo' AND state = 'OH' AND availability > CAST(0.7 AS DOUBLE) AND worker_id NOT IN ('W500K-4321', 'W500K-8963', 'W500K-2345', 'W500K-6789', 'W500K-9876') AND wor",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\n\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"LOWER(role) LIKE '%loader%' AND city = 'Toledo' AND state = 'OH' AND availability > CAST(0.7 AS DOUBLE) AND worker_id NOT IN ('W500K-4321', 'W500K-8963', 'W500K-2345', 'W500K-6789', 'W500K-9876') AND wor"
    ]
  },
  {
    "event": {
      "kind": "misplacement",
      "at": "15:45",
      "role": "Warehouse Associate",
      "count": 1,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "remainder of 08:00 shift",
      "scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
      "replaces_event": "08:00",
      "exclude_worker_ids": [
        null,
        null,
        null
      ]
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 59.593,
    "error": "no consensus after 14 turns",
    "gap_signals": [
      "drift_or_tool: no consensus after 14 turns"
    ]
  }
 ]
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/roster.json
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/roster.json
@ -0,0 +1,42 @@
 [
  {
    "name": "Christopher Y. Phillips",
    "booked_for": "08:00",
    "role": "Warehouse Associate",
    "city": "Toledo",
    "state": "OH",
    "status": "no_show"
  },
  {
    "name": "Janet E. Hill",
    "booked_for": "08:00",
    "role": "Warehouse Associate",
    "city": "Toledo",
    "state": "OH",
    "status": "confirmed"
  },
  {
    "name": "Fatima U. Rivera",
    "booked_for": "08:00",
    "role": "Warehouse Associate",
    "city": "Toledo",
    "state": "OH",
    "status": "confirmed"
  },
  {
    "name": "Carmen Z. Rodriguez",
    "booked_for": "10:30",
    "role": "Machine Operator",
    "city": "Toledo",
    "state": "OH",
    "status": "confirmed"
  },
  {
    "name": "Robert W. Gonzalez",
    "booked_for": "10:30",
    "role": "Machine Operator",
    "city": "Toledo",
    "state": "OH",
    "status": "confirmed"
  }
 ]
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/sms.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-04-45/sms.md
@ -0,0 +1,26 @@
 # SMS drafts — Riverfront Steel, 2026-04-21
 ## 08:00 baseline_fill — Warehouse Associate x3 in Toledo, OH
 TO: Christopher Y. Phillips
 Confirming your shift as a Warehouse Associate at Riverfront Steel in Toledo, OH starting 8 AM today.
 ---
 TO: Janet E. Hill
 Good morning! Confirming your shift as a Warehouse Associate from 8 AM onwards at our Toledo, OH location.
 ---
 TO: Fatima U. Rivera
 Morning Fatima! Just confirming your shift as a Warehouse Associate at Riverfront Steel in Toledo, OH starting at 8 AM.
 ## 10:30 recurring — Machine Operator x2 in Toledo, OH
 TO: Carmen Z. Rodriguez
 Confirming your shift as a Machine Operator at Riverfront Steel in Toledo, OH starting 11:00 AM on Tuesday/Thursday. Still available! 
 ---
 TO: Robert W. Gonzalez
 Your recurring Tuesday/Thursday Machine Operator shift at Riverfront Steel in Toledo, OH starts at 11:00 AM. Confirm your availability please.
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/dispatch.jsonl
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/dispatch.jsonl
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/emails.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/emails.md
@ -0,0 +1 @@
 # Client emails — Riverfront Steel, 2026-04-21
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/report.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/report.md
@ -0,0 +1,57 @@
 # Scenario retrospective — Riverfront Steel, 2026-04-21
 Executor: `mistral:latest`   Reviewer: `qwen2.5:latest`   Draft: `qwen2.5:latest`
 ## Events
 | At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
 |---|---|---|---|---|---|---|---|---|
 | 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 63.8 | 0 | 1 |
 | 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 9.5 | 0 | 1 |
 | 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 47.8 | 0 | 1 |
 | 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 60.1 | 0 | 1 |
 | 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 62.3 | 0 | 1 |
 ## Final roster
 | Worker | Booked | Role | City, ST | Status |
 |---|---|---|---|---|
 ## Gap signals
 ### drift_or_tool
 - **08:00** — aborted — 3 consecutive drift flags
 - **10:30** — invalid JSON from executor: JSON Parse error: Unterminated string | raw: {"kind":"plan","steps":["TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})",
 "TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state, CAST(availability AS DOUBLE) A
 - **12:15** — aborted — 3 consecutive drift flags
 - **14:00** — aborted — 3 consecutive drift flags
 - **15:45** — invalid JSON from executor: JSON Parse error: Unterminated string | raw: {"kind": "plan", "steps": ["1.1. TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (49164, 1181, 7239, 299, 30930, 33212)'})",
 "2.2. TOOL_CALL sql({'qu
 ### write_through_audit
 - _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 0 new entries from this run)
 ## Workers touched across the week
 0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
 | Worker ID | Name | Events | Outcome |
 |---|---|---|---|
 ## Discovered patterns (meta-index)
 What the system identified across semantically-similar past fills as each event ran:
 - **08:00 baseline_fill** (Warehouse Associate): —
 - **10:30 recurring** (Machine Operator): —
 - **12:15 expansion** (Forklift Operator): —
 - **14:00 emergency** (Loader): —
 - **15:45 misplacement** (Warehouse Associate): —
 ## Narrative
 - 0/5 events reached consensus.
 - Final roster: 0 bookings across 0 distinct workers.
 - Workers touched (booked, failed, or otherwise decided): 0.
 - Playbook citations across the day: 0 (proof the feedback loop fired across events).
 - Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/results.json
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/results.json
@ -0,0 +1,104 @@
 [
  {
    "event": {
      "kind": "baseline_fill",
      "at": "08:00",
      "role": "Warehouse Associate",
      "count": 3,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "08:00 AM",
      "scenario_note": "Regular Monday morning shift, 8-hour."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 63.815,
    "error": "aborted — 3 consecutive drift flags",
    "gap_signals": [
      "drift_or_tool: aborted — 3 consecutive drift flags"
    ]
  },
  {
    "event": {
      "kind": "recurring",
      "at": "10:30",
      "role": "Machine Operator",
      "count": 2,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "11:00 AM",
      "scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 9.538,
    "error": "invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})\",\n\"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state, CAST(availability AS DOUBLE) A",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\":\"plan\",\"steps\":[\"TOOL_CALL hybrid_search({'index_name':'workers_500k_v1','sql_filter':'role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5'})\",\n\"TOOL_CALL sql({'query':'SELECT worker_id, name, role, city, state, CAST(availability AS DOUBLE) A"
    ]
  },
  {
    "event": {
      "kind": "expansion",
      "at": "12:15",
      "role": "Forklift Operator",
      "count": 5,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "01:00 PM",
      "scenario_note": "New warehouse location opening, five-worker team needed."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 47.797,
    "error": "aborted — 3 consecutive drift flags",
    "gap_signals": [
      "drift_or_tool: aborted — 3 consecutive drift flags"
    ]
  },
  {
    "event": {
      "kind": "emergency",
      "at": "14:00",
      "role": "Loader",
      "count": 4,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "04:00 PM same day",
      "deadline": "16:00",
      "scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 60.115,
    "error": "aborted — 3 consecutive drift flags",
    "gap_signals": [
      "drift_or_tool: aborted — 3 consecutive drift flags"
    ]
  },
  {
    "event": {
      "kind": "misplacement",
      "at": "15:45",
      "role": "Warehouse Associate",
      "count": 1,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "remainder of 08:00 shift",
      "scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
      "replaces_event": "08:00"
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 62.283,
    "error": "invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\": \"plan\", \"steps\": [\"1.1. TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (49164, 1181, 7239, 299, 30930, 33212)'})\",\n\"2.2. TOOL_CALL sql({'qu",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Unterminated string | raw: {\"kind\": \"plan\", \"steps\": [\"1.1. TOOL_CALL hybrid_search({'index_name': 'workers_500k_v1', 'sql_filter': 'role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND worker_id NOT IN (49164, 1181, 7239, 299, 30930, 33212)'})\",\n\"2.2. TOOL_CALL sql({'qu"
    ]
  }
 ]
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/roster.json
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/roster.json
@ -0,0 +1 @@
 []
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/sms.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-11-11/sms.md
@ -0,0 +1 @@
 # SMS drafts — Riverfront Steel, 2026-04-21
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/dispatch.jsonl
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/dispatch.jsonl
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/emails.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/emails.md
@ -0,0 +1 @@
 # Client emails — Riverfront Steel, 2026-04-21
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/report.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/report.md
@ -0,0 +1,55 @@
 # Scenario retrospective — Riverfront Steel, 2026-04-21
 Executor: `qwen2.5:latest`   Reviewer: `qwen2.5:latest`   Draft: `qwen2.5:latest`
 ## Events
 | At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
 |---|---|---|---|---|---|---|---|---|
 | 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 6.4 | 0 | 1 |
 | 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 16.8 | 0 | 1 |
 | 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 7.2 | 0 | 1 |
 | 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 54.0 | 0 | 1 |
 | 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 49.3 | 0 | 1 |
 ## Final roster
 | Worker | Booked | Role | City, ST | Status |
 |---|---|---|---|---|
 ## Gap signals
 ### drift_or_tool
 - **08:00** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"sql","args":{"query":"SELECT worker_id, name FROM workers_500k_v1 WHERE role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 LIMIT 3"},"rationale":"verify top candidates via SQL query")}
 - **10:30** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"hybrid_search","args":{"index_name":"workers_500k_v1","sql_filter":"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND reliability >= 0.7","question":"machine operator Toledo OH high reliability","k":2}
 - **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"sql","args":{"query":"SELECT worker_id FROM workers_500k_v1 WHERE role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 LIMIT 5"},"rationale":"verify top candidates via SQL query to me
 - **14:00** — no consensus after 14 turns
 - **15:45** — no consensus after 14 turns
 ### write_through_audit
 - _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 0 new entries from this run)
 ## Workers touched across the week
 0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
 | Worker ID | Name | Events | Outcome |
 |---|---|---|---|
 ## Discovered patterns (meta-index)
 What the system identified across semantically-similar past fills as each event ran:
 - **08:00 baseline_fill** (Warehouse Associate): —
 - **10:30 recurring** (Machine Operator): —
 - **12:15 expansion** (Forklift Operator): —
 - **14:00 emergency** (Loader): —
 - **15:45 misplacement** (Warehouse Associate): —
 ## Narrative
 - 0/5 events reached consensus.
 - Final roster: 0 bookings across 0 distinct workers.
 - Workers touched (booked, failed, or otherwise decided): 0.
 - Playbook citations across the day: 0 (proof the feedback loop fired across events).
 - Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/results.json
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/results.json
@ -0,0 +1,104 @@
 [
  {
    "event": {
      "kind": "baseline_fill",
      "at": "08:00",
      "role": "Warehouse Associate",
      "count": 3,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "08:00 AM",
      "scenario_note": "Regular Monday morning shift, 8-hour."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 6.434,
    "error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name FROM workers_500k_v1 WHERE role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 LIMIT 3\"},\"rationale\":\"verify top candidates via SQL query\")}",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id, name FROM workers_500k_v1 WHERE role = 'Warehouse Associate' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 LIMIT 3\"},\"rationale\":\"verify top candidates via SQL query\")}"
    ]
  },
  {
    "event": {
      "kind": "recurring",
      "at": "10:30",
      "role": "Machine Operator",
      "count": 2,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "11:00 AM",
      "scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 16.752,
    "error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND reliability >= 0.7\",\"question\":\"machine operator Toledo OH high reliability\",\"k\":2}",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"hybrid_search\",\"args\":{\"index_name\":\"workers_500k_v1\",\"sql_filter\":\"role = 'Machine Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND reliability >= 0.7\",\"question\":\"machine operator Toledo OH high reliability\",\"k\":2}"
    ]
  },
  {
    "event": {
      "kind": "expansion",
      "at": "12:15",
      "role": "Forklift Operator",
      "count": 5,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "01:00 PM",
      "scenario_note": "New warehouse location opening, five-worker team needed."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 7.181,
    "error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id FROM workers_500k_v1 WHERE role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 LIMIT 5\"},\"rationale\":\"verify top candidates via SQL query to me",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"sql\",\"args\":{\"query\":\"SELECT worker_id FROM workers_500k_v1 WHERE role = 'Forklift Operator' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5 AND CAST(reliability AS DOUBLE) > 0.75 LIMIT 5\"},\"rationale\":\"verify top candidates via SQL query to me"
    ]
  },
  {
    "event": {
      "kind": "emergency",
      "at": "14:00",
      "role": "Loader",
      "count": 4,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "04:00 PM same day",
      "deadline": "16:00",
      "scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 54.028,
    "error": "no consensus after 14 turns",
    "gap_signals": [
      "drift_or_tool: no consensus after 14 turns"
    ]
  },
  {
    "event": {
      "kind": "misplacement",
      "at": "15:45",
      "role": "Warehouse Associate",
      "count": 1,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "remainder of 08:00 shift",
      "scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
      "replaces_event": "08:00"
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 49.298,
    "error": "no consensus after 14 turns",
    "gap_signals": [
      "drift_or_tool: no consensus after 14 turns"
    ]
  }
 ]
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/roster.json
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/roster.json
@ -0,0 +1 @@
 []
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/sms.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-16-49/sms.md
@ -0,0 +1 @@
 # SMS drafts — Riverfront Steel, 2026-04-21
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/dispatch.jsonl
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/dispatch.jsonl
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/emails.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/emails.md
@ -0,0 +1 @@
 # Client emails — Riverfront Steel, 2026-04-21
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/report.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/report.md
@ -0,0 +1,55 @@
 # Scenario retrospective — Riverfront Steel, 2026-04-21
 Executor: `mistral:latest`   Reviewer: `qwen2.5:latest`   Draft: `qwen2.5:latest`
 ## Events
 | At | Kind | Role / Count | Pool | Fills | Turns | Dur(s) | Cites | Gaps |
 |---|---|---|---|---|---|---|---|---|
 | 08:00 | baseline_fill | Warehouse Associate × 3 | - | ✗ 0 | 0 | 47.4 | 0 | 1 |
 | 10:30 | recurring | Machine Operator × 2 | - | ✗ 0 | 0 | 40.4 | 0 | 1 |
 | 12:15 | expansion | Forklift Operator × 5 | - | ✗ 0 | 0 | 9.4 | 0 | 1 |
 | 14:00 | emergency | Loader × 4 | - | ✗ 0 | 0 | 44.7 | 0 | 1 |
 | 15:45 | misplacement | Warehouse Associate × 1 | - | ✗ 0 | 0 | 45.1 | 0 | 1 |
 ## Final roster
 | Worker | Booked | Role | City, ST | Status |
 |---|---|---|---|---|
 ## Gap signals
 ### drift_or_tool
 - **08:00** — no consensus after 14 turns
 - **10:30** — aborted — 3 consecutive drift flags
 - **12:15** — invalid JSON from executor: JSON Parse error: Expected '}' | raw: {"kind":"tool_call","tool":"propose_done","args":{"fills":[{"candidate_id":"W500K-37736","name":"Jennifer K. Robinson","reason":"verified Toledo forklift op, reliability 0.9"}],"rationale":"one SQL-verified candidate from surfaced candidates"}
 - **14:00** — aborted — 3 consecutive drift flags
 - **15:45** — no consensus after 14 turns
 ### write_through_audit
 - _post-run_ — playbook_memory has 165 entries (ran 5 events, expected ≥ 0 new entries from this run)
 ## Workers touched across the week
 0 distinct workers made it through to a decision. Every one is accounted for below — no-shows flagged, rebookings noted, everyone visible.
 | Worker ID | Name | Events | Outcome |
 |---|---|---|---|
 ## Discovered patterns (meta-index)
 What the system identified across semantically-similar past fills as each event ran:
 - **08:00 baseline_fill** (Warehouse Associate): —
 - **10:30 recurring** (Machine Operator): —
 - **12:15 expansion** (Forklift Operator): —
 - **14:00 emergency** (Loader): —
 - **15:45 misplacement** (Warehouse Associate): —
 ## Narrative
 - 0/5 events reached consensus.
 - Final roster: 0 bookings across 0 distinct workers.
 - Workers touched (booked, failed, or otherwise decided): 0.
 - Playbook citations across the day: 0 (proof the feedback loop fired across events).
 - Dropped events: 08:00 baseline_fill, 10:30 recurring, 12:15 expansion, 14:00 emergency, 15:45 misplacement.
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/results.json
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/results.json
@ -0,0 +1,104 @@
 [
  {
    "event": {
      "kind": "baseline_fill",
      "at": "08:00",
      "role": "Warehouse Associate",
      "count": 3,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "08:00 AM",
      "scenario_note": "Regular Monday morning shift, 8-hour."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 47.404,
    "error": "no consensus after 14 turns",
    "gap_signals": [
      "drift_or_tool: no consensus after 14 turns"
    ]
  },
  {
    "event": {
      "kind": "recurring",
      "at": "10:30",
      "role": "Machine Operator",
      "count": 2,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "11:00 AM",
      "scenario_note": "Recurring Tuesday/Thursday slot — prior workers may still be available."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 40.374,
    "error": "aborted — 3 consecutive drift flags",
    "gap_signals": [
      "drift_or_tool: aborted — 3 consecutive drift flags"
    ]
  },
  {
    "event": {
      "kind": "expansion",
      "at": "12:15",
      "role": "Forklift Operator",
      "count": 5,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "01:00 PM",
      "scenario_note": "New warehouse location opening, five-worker team needed."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 9.414,
    "error": "invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"propose_done\",\"args\":{\"fills\":[{\"candidate_id\":\"W500K-37736\",\"name\":\"Jennifer K. Robinson\",\"reason\":\"verified Toledo forklift op, reliability 0.9\"}],\"rationale\":\"one SQL-verified candidate from surfaced candidates\"}",
    "gap_signals": [
      "drift_or_tool: invalid JSON from executor: JSON Parse error: Expected '}' | raw: {\"kind\":\"tool_call\",\"tool\":\"propose_done\",\"args\":{\"fills\":[{\"candidate_id\":\"W500K-37736\",\"name\":\"Jennifer K. Robinson\",\"reason\":\"verified Toledo forklift op, reliability 0.9\"}],\"rationale\":\"one SQL-verified candidate from surfaced candidates\"}"
    ]
  },
  {
    "event": {
      "kind": "emergency",
      "at": "14:00",
      "role": "Loader",
      "count": 4,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "04:00 PM same day",
      "deadline": "16:00",
      "scenario_note": "Walkoff incident — replacement crew needed by 16:00 sharp."
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 44.673,
    "error": "aborted — 3 consecutive drift flags",
    "gap_signals": [
      "drift_or_tool: aborted — 3 consecutive drift flags"
    ]
  },
  {
    "event": {
      "kind": "misplacement",
      "at": "15:45",
      "role": "Warehouse Associate",
      "count": 1,
      "city": "Toledo",
      "state": "OH",
      "shift_start": "remainder of 08:00 shift",
      "scenario_note": "One worker from the 08:00 fill didn't show; rebuild the gap.",
      "replaces_event": "08:00"
    },
    "ok": false,
    "fills": [],
    "turns": 0,
    "duration_secs": 45.149,
    "error": "no consensus after 14 turns",
    "gap_signals": [
      "drift_or_tool: no consensus after 14 turns"
    ]
  }
 ]
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/roster.json
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/roster.json
@ -0,0 +1 @@
 []
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/sms.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-20-23/sms.md
@ -0,0 +1 @@
 # SMS drafts — Riverfront Steel, 2026-04-21
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-24-48/dispatch.jsonl
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-24-48/dispatch.jsonl
@ -0,0 +1,2 @@
 {"at":"12:15","kind":"expansion","operation":"fill: Forklift Operator x5 in Toledo, OH","fills":[{"candidate_id":"W500K-37736","name":"Jennifer K. Robinson","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-33961","name":"Kyle F. Brooks","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-31297","name":"Jacob T. Diaz","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-40884","name":"Jerry M. Jones","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-37729","name":"Jeffrey D. Taylor","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."}],"turns":7,"duration_secs":28.23,"pool_size":687,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (10 workers examined) · recurring certifications: Forklift (40%), OSHA-10 (40%) · recurring skills: mill (40%) · archetype mostly: leader · reliability median 0.83 (range 0.66–0.96)"}
 {"at":"14:00","kind":"emergency","operation":"fill: Loader x4 in Toledo, OH","fills":[{"candidate_id":"W500K-15305","name":"Mary R. Richardson","reason":"Verified availability score of 0.988 via SQL and ranked highest among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-12325","name":"Raj Torres","reason":"Ranked second among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-16975","name":"Brian X. Price","reason":"Ranked third among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-22851","name":"Fatima X. Gutierrez","reason":"Ranked fourth among the candidates with an availability score greater than 0.7."}],"turns":6,"duration_secs":22.25,"pool_size":380,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (9 workers examined) · recurring certifications: Forklift (44%) · recurring skills: mill (44%) · archetype mostly: leader · reliability median 0.80 (range 0.66–0.96)"}
--- a/tests/multi-agent/playbooks/scenario-2026-04-20T20-24-48/emails.md
+++ b/tests/multi-agent/playbooks/scenario-2026-04-20T20-24-48/emails.md
@ -0,0 +1,40 @@
 # Client emails — Riverfront Steel, 2026-04-21
 ## 12:15 expansion — Forklift Operator x5
 Subject: 5 Workers Confirmed
 Dear Riverfront Steel Team,
 We are pleased to confirm that we have filled all five positions for Forklift Operators at your new warehouse location opening. The workers starting at 01:00 PM today are:
 - Jennifer K. Robinson
 - Kyle F. Brooks 
 - Jacob T. Diaz
 - Jerry M. Jones
 - Jeffrey D. Taylor
 Each meets the criteria of being a Forklift Operator in Toledo, OH.
 Best regards,
 Dispatch Team Lakehouse
 ## 14:00 emergency — Loader x4
 Subject: 4 Loader Workers Confirmed
 Dear Riverfront Steel Team,
 I am pleased to confirm that we have filled all four loader positions as requested:
 - Mary R. Richardson 
 - Raj Torres  
 - Brian X. Price 
 - Fatima X. Gutierrez 
 All workers will start their shift at 04:00 PM today. Please note the walkoff incident requiring a replacement crew by 16:00 sharp.
 Thank you for your trust in Lakehouse Dispatch.
 Best regards,  
 Dispatch Team
--- a/Show More
+++ b/Show More
		`@ -0,0 +1 @@`
							{"at":"10:30","kind":"recurring","operation":"fill: Machine Operator x2 in Toledo, OH","fills":[{"candidate_id":"W500K-29048","name":"Raymond G. Ward","reason":"Raymond has a high availability score (0.9033) and relevant skills for the Machine Operator position in Toledo, OH. He is also endorsed by past similar fills."},{"candidate_id":"W500K-20613","name":"Pamela V. Green","reason":"Pamela has a high availability score (0.8496) and relevant skills for the Machine Operator position in Toledo, OH. She is also endorsed by past similar fills."}],"turns":5,"duration_secs":12.051,"pool_size":997,"playbook_citations":[]}
		`@ -0,0 +1 @@`
							`# Client emails — Riverfront Steel, 2026-04-21`
		`@ -0,0 +1 @@`
							`# SMS drafts — Riverfront Steel, 2026-04-21`
		`@ -0,0 +1,2 @@`
							{"at":"08:00","kind":"baseline_fill","operation":"fill: Warehouse Associate x3 in Toledo, OH","fills":[{"candidate_id":"W500K-49164","name":"Christopher Y. Phillips","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."},{"candidate_id":"W500K-40928","name":"Janet E. Hill","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."},{"candidate_id":"W500K-34704","name":"Fatima U. Rivera","reason":"Reliable Warehouse Associate with availability greater than 0.5 in Toledo, OH."}],"turns":7,"duration_secs":20.128,"pool_size":770,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)"}
							{"at":"10:30","kind":"recurring","operation":"fill: Machine Operator x2 in Toledo, OH","fills":[{"candidate_id":"W500K-19759","name":"Carmen Z. Rodriguez","reason":"Recurring Machine Operator in Toledo, OH with a score of 0.75, verified via sql tool."},{"candidate_id":"W500K-29298","name":"Robert W. Gonzalez","reason":"Recurring Machine Operator in Toledo, OH with a score of 0.74, not yet SQL verified but highly likely to meet requirements."}],"turns":5,"duration_secs":17.426,"pool_size":997,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (8 workers examined) · archetype mostly: reliable · reliability median 0.80 (range 0.66–0.96)"}
		`@ -0,0 +1,2 @@`
							{"at":"12:15","kind":"expansion","operation":"fill: Forklift Operator x5 in Toledo, OH","fills":[{"candidate_id":"W500K-37736","name":"Jennifer K. Robinson","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-33961","name":"Kyle F. Brooks","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-31297","name":"Jacob T. Diaz","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-40884","name":"Jerry M. Jones","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."},{"candidate_id":"W500K-37729","name":"Jeffrey D. Taylor","reason":"Meets the criteria of being a Forklift Operator in Toledo, OH with availability and reliability above the specified thresholds."}],"turns":7,"duration_secs":28.23,"pool_size":687,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (10 workers examined) · recurring certifications: Forklift (40%), OSHA-10 (40%) · recurring skills: mill (40%) · archetype mostly: leader · reliability median 0.83 (range 0.66–0.96)"}
							{"at":"14:00","kind":"emergency","operation":"fill: Loader x4 in Toledo, OH","fills":[{"candidate_id":"W500K-15305","name":"Mary R. Richardson","reason":"Verified availability score of 0.988 via SQL and ranked highest among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-12325","name":"Raj Torres","reason":"Ranked second among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-16975","name":"Brian X. Price","reason":"Ranked third among the candidates with an availability score greater than 0.7."},{"candidate_id":"W500K-22851","name":"Fatima X. Gutierrez","reason":"Ranked fourth among the candidates with an availability score greater than 0.7."}],"turns":6,"duration_secs":22.25,"pool_size":380,"playbook_citations":[],"discovered_pattern":"Across 25 similar past playbooks (9 workers examined) · recurring certifications: Forklift (44%) · recurring skills: mill (44%) · archetype mostly: leader · reliability median 0.80 (range 0.66–0.96)"}