lakehouse/STATE_OF_PLAY.md
root 0af62861d2
Some checks failed
lakehouse/auditor 9 blocking issues: cloud: claim not backed — "Verified end-to-end via playwright on devop.live/lakehouse:"
STATE_OF_PLAY: refresh for 2026-05-02 wave (Lance gauntlet + parity + housekeeping)
Anchor was 5 days stale. Adds the 12-commit wave (Lance backend hardening,
sidecar drop, observability parity, gitignore cleanup, gray-zone content
add) with verification status for each. Updates DO NOT RELITIGATE with
the 4 new things this wave makes load-bearing:
- python sidecar dropped from hot path (don't wire it back)
- lance gauntlet shipped (don't re-discover the bugs we just fixed)
- 32/32 cross-runtime parity (don't build a 6th probe for already-covered surface)
- ARCHITECTURE_COMPARISON.md is the single source of truth for cross-runtime decisions

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:23:36 -05:00

21 KiB
Raw Blame History

STATE OF PLAY — Lakehouse

Last verified: 2026-05-02 evening CDT Verified by: live probe (smoke 9/9, parity 32/32, gateway restarted), not memory.

Read this FIRST. When the user says "we're working on lakehouse," they mean the working code captured below — NOT what git log framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.


WHAT LANDED 2026-05-01 → 2026-05-02 (10 commits this wave)

Commit What Verified
5d30b3d lance: auto-build doc_id btree in lance_migrate handler doc-fetch ~5ms (was ~100ms full scan) on scale_test_10m
044650a lance-bench: same scalar build post-IVF (matches gateway) cargo check clean
7594725 lance: 4-pack — sanitize_lance_err + 7 unit tests + 9-probe smoke + 10M re-bench smoke 9/9 PASS, tests 7/7 PASS
98b6647 gateway: IterateResponse.trace_id echoed; session_log_path enabled parity probes see one unified JSONL
57bde63 gateway: trace-id propagation + coordinator session JSONL (Rust parity with Go wave) session_log_parity 4/4
ba928b1 aibridge: drop Python sidecar from hot path; AiClient → direct Ollama aibridge tests 32/32 PASS, /ai/embed live 768d
654797a gateway: pub extract_json + parity_extract_json bin extract_json_parity 12/12
c5654d4 docs: pointer to golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md
150cc3b aibridge: LRU embed cache, 236× RPS warm (78ms → 129us p50) load test
9eed982 mcp-server: /_go/* pass-through for G5 cutover slice
6e34ef7 gitignore: stop tracking 100+ runtime ephemera (data/_*, lance, logs, node_modules) untracked dropped 100+ → 0
41b0a99 chore: add 33 real items that were sitting untracked (scripts, scenarios, kimi reports, dev UIs) clean working tree

Cross-runtime parity (post-this-wave): 32/32 across 5 probes — validator(6/6) + extract_json(12/12) + session_log(4/4) + materializer(2/2) + embed(8/8). Run cd /home/profit/golangLAKEHOUSE && for p in scripts/cutover/parity/*.sh; do bash "$p"; done to re-verify.

Lance backend (was untested 5 days ago, now gauntlet-ready):

  • cargo test -p vectord-lance --release → 7/7 PASS
  • ./scripts/lance_smoke.sh → 9/9 PASS against live gateway
  • reports/lance_10m_rebench_2026-05-02.md — search warm ~20ms / cold ~46ms median, doc-fetch ~5ms post-btree

VERIFIED WORKING RIGHT NOW

The client demo (Staffing Co-Pilot)

Public URL: https://devop.live/lakehouse/ — 200, "Staffing Co-Pilot" (159 KB SPA, leaflet maps, dark theme). Local URL: http://localhost:3700/ — same page, served by mcp-server/index.ts (PID 1271, started 09:48 CDT today).

The staffers console (the one the client was thoroughly impressed with):

  • https://devop.live/lakehouse/console — 200, "Lakehouse — What Your Staffing System Would Do" (26 KB)
  • Pulls project index via /api/catalog/datasets (36 datasets) + playbook memory via /api/vectors/playbook_memory/stats (4,701 entries with embeddings, real ops like "fill: Maintenance Tech x2 in Milwaukee, WI")

Client-visible flow that works end-to-end on the public URL:

Endpoint Sample output
GET /api/catalog/datasets 36 datasets indexed: timesheets 1M, call_log 800K, workers_500k 500K, email_log 500K, workers_100k 100K, candidates 100K, placements 50K, job_orders 15K, successful_playbooks_live 2,077
GET /api/vectors/playbook_memory/stats 4,701 fill operations with embeddings
GET /system/summary 36 datasets, 2.98M rows, 60 indexes, 500K workers loaded, 1K candidates
POST /intelligence/staffing_forecast 744 Production Workers needed in 30d, 11,281 bench (4,687 reliable), coverage 1,444%, risk=ok. Same for Electrician (need 32, bench 2,440) and Maintenance Tech (need 17, bench 5,004).
POST /intelligence/permit_contracts permit 3442956 $500K → 3 Production Workers, 886-candidate pool, 95% fill, $36K gross. 5 more Chicago permits with 8 workers each, same pool, 95% fill, $96K each.
POST /intelligence/market major Chicago permits ranked: $730M O'Hare, $615M 307 N Michigan, $580M casino, $445M Loop transit (real geo coords).
POST /intelligence/permit_entities architects + contractors from permit contacts (e.g. "KACPRZYNSKI, ANDY", "SLS ELECTRICAL SERVICE").
POST /intelligence/activity + /intelligence/arch_signals + /intelligence/chat all 200

The demo tells the story: "upcoming Chicago contracts → workers needed → coverage from the bench → architects/contractors involved → revenue and margin." That's the "live data + anticipating contracts + complete workflow" pitch — working as of right now.

Backend, verified live this session

Surface State
Gateway :3100 up, 4 providers configured, /v1/health 200 with 500K workers loaded
MCP server :3700 (Co-Pilot demo) up, all /intelligence/* endpoints respond
VCP UI :3950 started this session, /data/* 200, real numbers
Observer :3800 ring full (2,000/2,000) — older events evicted, query Langfuse for 24h-ago state
Sidecar :3200 up
Langfuse :3001 recording, gw:/log + v1.chat:openrouter traces visible
LLM Team UI :5000 up, only extract mode registered
OpenCode fleet 40 models reachable through one sk-* key (verified live GET https://opencode.ai/zen/v1/models)

OpenCode catalog (live):

  • Claude: opus-4-7, opus-4-6, opus-4-5, opus-4-1, sonnet-4-6, sonnet-4-5, sonnet-4, haiku-4-5
  • GPT-5: 5.5-pro, 5.5, 5.4-pro, 5.4, 5.4-mini, 5.4-nano, 5.3-codex-spark, 5.3-codex, 5.2, 5.2-codex, 5.1-codex-max, 5.1-codex, 5.1-codex-mini, 5.1, 5-codex, 5-nano, 5
  • Gemini: 3.1-pro, 3-flash
  • GLM: 5.1, 5
  • Minimax: m2.7, m2.5
  • Kimi: k2.6, k2.5
  • Qwen: 3.6-plus, 3.5-plus
  • Other: BIG-PKL (was a typo-prone name in the catalog, model id starts with "big-pkl-something")
  • Free tier: minimax-m2.5-free, hy3-preview-free, ling-2.6-flash-free, trinity-large-preview-free

The substrate (frozen — do not re-architect)

  • Distillation v1.0.0 at tag e7636f2145/145 bun tests pass, 22/22 acceptance, 16/16 audit-full
  • Output: data/_kb/distilled_{facts,procedures,config_hints}.jsonl + data/vectors/distilled_{factual,procedural,config_hint}_v20260423102847.parquet
  • Auditor cross-lineage: Kimi K2.6 ↔ Haiku 4.5 alternation, Opus auto-promote on diffs >100k chars, per-PR cap=3 with auto-reset on new head SHA
  • Pathway memory: 88 traces, 11/11 successful replays (probation gate crossed)
  • Mode runner: 5 native modes; codereview_isolation is default; composed-corpus auto-downgrade verified Apr 26 (composed lost 5/5 vs isolation, p=0.031)

Matrix indexer

30+ live corpora including:

  • 5 versions of workers_500k_v1..v9 (50K embedded chunks each)
  • 11 batched 2K-row shards w500k_b3..b17
  • chicago_permits_v1 (3,420), resumes_100k_v2 (100K candidates), ethereal_workers_v1 (10K)
  • lakehouse_arch_v1 (2,119), lakehouse_symbols_v1 (2,470), lakehouse_answers_v1 (1,269), scrum_findings_v1 (1,260)
  • kb_team_runs_v1 (12,693) + kb_team_runs_agent (4,407) — LLM-team play history embedded
  • distilled_factual_v20260423102507 (8) — distillation output

Code health

  • cargo check --workspace0 warnings, 0 errors
  • bun test auditor + tests/distillation145/145 pass
  • ui/server.ts + auditor.ts bundle clean

DO NOT RELITIGATE

  • PR #11 is merged into origin/main as ed57eda — do not "still need to merge PR #11."
  • Distillation tag distillation-v1.0.0 at e7636f2 is FROZEN — do not re-architect schemas, scorer rules, audit fixtures.
  • Kimi forensic HOLD verdict (2026-04-27) was 2/8 false + 6/8 latent — do not re-debate, see reports/kimi/audit-last-week-full.md.
  • candidates_safe vertical column bug — fixed at catalog metadata layer in commit c3c9c21. Do not "discover" it again.
  • Decisions A/B/C/D from synthetic-data-gap-report.md — all four scripts shipped today (d56f08e, 940737d, c3c9c21). Do not "ask J for approval."
  • workers_500k.phone type fixup — already string. The fixup script is idempotent; running it is a no-op.
  • client_workerskjkk typo dataset — was breaking every SQL query (catalog had it registered, file didn't exist). Removed via DELETE /catalog/datasets/by-name/client_workerskjkk this session. Do not re-add. Adding a startup gate that errors on unrecognized parquet names is the long-term fix per now.md Step 2C.
  • Python sidecar dropped from hot path 2026-05-02 (ba928b1) — AiClient calls Ollama directly. Do not "wire python embedding back in." lab_ui.py + pipeline_lab.py keep running as dev-only UIs (not on the runtime path).
  • Lance backend gauntlet (2026-05-02) — sanitizer over all 5 routes, 7 unit tests, 9-probe smoke, 10M re-bench. The doc_id btree auto-builds inside lance_migrate AND lance-bench. Do not "discover" the missing scalar index again or the leaked filesystem paths in error bodies.
  • Cross-runtime parity = 32/32 across 5 probes in golangLAKEHOUSE/scripts/cutover/parity/. Do not "build a parity probe for X" without checking — validator, extract_json, session_log, materializer, and embed are all already covered.
  • Decisions tracker is golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md — single living source of truth for cross-runtime decisions. As of 2026-05-02 it has 0 _open_ code work items; only 2 strategic items left (Lance vs Parquet+HNSW-with-spilling, Go-vs-Rust primary cutover).

FIXES MADE THIS SESSION (2026-04-27 evening)

  1. crates/gateway/src/v1/iterate.rs:93state_state (cleared the one cargo warning).
  2. lakehouse-ui.service (Dioxus) — disabled. Was failing 7,242 times against a missing target/dx/ui/debug/web/public build dir. systemctl stop && disable.
  3. VCP UI on :3950 — started bun run ui/server.ts (PID 1162212, log /tmp/lakehouse_ui.log). /data/* endpoints now 200 with real data.
  4. client_workerskjkk catalog entryDELETE /catalog/datasets/by-name/client_workerskjkk removed the dead manifest. This was the actual root cause of /system/summary reporting workers_500k_rows: 0 and the demo showing zero bench. Every SQL query was failing schema inference on the missing file before reaching its target table. Fixed → workers_500k_rows: 500000, candidates_rows: 1000, demo coverage flipped from "critical 0%" to actual percentages on devop.live/lakehouse.

FIXES MADE THIS SESSION (2026-04-28 early — face pool)

  1. Synthetic StyleGAN face pool — 1000 faces, gender+race+age tagged. scripts/staffing/fetch_face_pool.py fetches from thispersondoesnotexist.com; scripts/staffing/tag_face_pool.py --min-age 22 runs deepface and excludes minors. data/headshots/manifest.jsonl now has gender (494 men / 458 women), race (caucasian 662 · east_asian 128 · hispanic 86 · middle_eastern 59 · black 14 · south_asian 3), age, and 48 minor exclusions. Server pool = 952 servable faces.
  2. mcp-server/index.ts:1308 /headshots/:key route — gender×race×age intersection bucketing with graceful fallback (gender-only → all). Same key always returns same face; different keys spread evenly.
  3. /headshots/_thumbs/ pre-resized 384×384 webp (60× smaller: 587KB → ~11KB). Without this, 40-card grids overran Chrome's parallel-connection budget and ~75% of tiles never finished decoding. Generated via parallel ffmpeg (xargs -P 8); .gitignored.
  4. mcp-server/search.html + console.html — dropped img.loading='lazy'. With 11KB thumbs, eager load is cheap (~500KB for 50 cards) and avoids the off-screen race that lazy decode produced.
  5. ComfyUI on-demand uniqueness — serve_imagegen.py:32 added seed to _cache_key() (was caching by prompt only — 3 different worker seeds collapsed to 1 cached image). Verified: seed=839185194/195/196 → 3 distinct md5s.
  6. mcp-server/index.ts:1234 /headshots/generate/:key — ComfyUI hot-path that derives a deterministic-per-worker seed via djb2-style hash; cold ~1.5s, cached ~1ms. Worker prompt format: professional corporate headshot portrait of a {age}-year-old {race} {gender}, {role}, neutral expression, plain studio background, soft natural lighting, sharp focus, photorealistic, dslr. Cache at data/headshots_gen/ (gitignored, regeneratable).
  7. Confidence-default name resolution in search.htmlgenderFor() and guessEthnicityFromFirstName() lookup tables (FEMALE_NAMES, MALE_NAMES, NAMES_HISPANIC, NAMES_BLACK, NAMES_SOUTH_ASIAN, NAMES_EAST_ASIAN, NAMES_MIDDLE_EASTERN). Xavier → man+hispanic, Aisha → woman+black, etc. Every worker resolves to a face-pool bucket.

End-to-end verified: playwright run on https://devop.live/lakehouse/?q=forklift+operators+IL → 21/21 cards loaded, 0 broken, all 384×384 webp thumbs.


OPEN — but not blocking the demo

Item What When to act
modes.toml staffing_inference.matrix_corpus still says workers_500k_v8. v9 in vector index is from Apr 17 (raw-sourced, not safe-view). The new build_workers_v9.sh rebuilds from workers_safe. Run when you have 30+ min for the rebuild.
Open PRs #6, #7, #10 sitting since Apr 22-24, auditor verdicts on disk at data/_auditor/kimi_verdicts/{6,7,10}-*.json Read verdicts, decide reconcile/close.
test/enrich-prd-pipeline branch 35 unmerged commits, includes more-evolved auditor/inference.ts (666 vs main's 580 lines), curation+fact-extractor wiring Reconcile or formally archive — see memory/project_unmerged_architecture_work.md.
federation-hnsw-trials stash Lance + S3/MinIO prototype, aws-config crate added, 708 insertions Phase B from EXECUTION_PLAN.md — revisit when Parquet vector ceiling actually hurts.
candidates manifest drift manifest 100K vs SQL 1K. Cosmetic. Run a metadata resync if it matters.

RUNTIME CHEATSHEET

# Verify the demo (public + local both work)
curl -sS https://devop.live/lakehouse/                   # Co-Pilot HTML
curl -sS https://devop.live/lakehouse/console            # staffers console
curl -sS -X POST https://devop.live/lakehouse/intelligence/staffing_forecast \
  -d '{}' -H 'content-type: application/json' \
  | jq '.forecast[] | {role, demand_workers, bench_total, coverage_pct, risk}'

# Restart sequence (after Rust changes)
sudo systemctl restart lakehouse.service                 # gateway :3100
sudo systemctl restart lakehouse-auditor                 # auditor daemon
sudo systemctl restart lakehouse-observer                # observer :3800
# UI bun on :3950 is NOT systemd-managed (lakehouse-ui.service is disabled).
# Restart manually:  kill <pid>; nohup bun run ui/server.ts > /tmp/lakehouse_ui.log 2>&1 &

# Health checks
curl -sS http://localhost:3100/v1/health | jq            # workers_count, providers
curl -sS http://localhost:3100/vectors/pathway/stats | jq
curl -sS http://localhost:3100/v1/usage | jq             # since-restart cost
curl -sS http://localhost:3700/system/summary | jq       # dataset counts

VISION — what we're actually building (not what's done)

J's framing for the legacy staffing company:

  • Pull live data, anticipate contracts based on Chicago permits → real architect/contractor associations, headcount, time period, money, scope.
  • Hybrid + memory index → search large corpora cheaply.
  • Email comes in → verify against contract; SMS comes in → alert when index changes.
  • Real-time.
  • Invent metrics nobody else has using the hybrid index.
  • Next stage: workers download an app → geolocation clock-in → automatic responsiveness measurement, no user effort, with incentives for using it.
  • Find people getting certificates (passive cert tracking).
  • Pull union data → bring contracts that work for employees, not just employers.
  • All metrics visible, nothing hidden, value-aligned with what each side actually needs.

If a future session is shaving away from this vision toward "fix the cutover" or "land Phase X," the vision wins. Phases are scaffolding for the vision, not the goal.


CURRENT PLAN — fix the demo for the legacy staffing client

Built from playwright audit of the live demo (2026-04-27 evening). Each item ends in something the client can SEE, not internal cleanups.

Demo state is anchored by git tag demo-2026-04-27 (commit ed57eda, the merge of PR #11). To restore code state: git checkout demo-2026-04-27. To restore runtime state: DELETE /catalog/datasets/by-name/client_workerskjkk (catalog hot-fix is not in git).

P1 — Search box that actually filters (highest visible impact)

Problem: typing in #sq and pressing Enter fires POST /intelligence/chat with body {"message":"<query>"}. The state (#sst) and role (#srl) selects are ignored — never sent in the body. So every search returns a generic chat completion, never a SQL+vector hybrid filter against workers_500k. That is the "cached/generic response" the client sees.

Fix: in mcp-server/search.html, change the search-submit handler to call the real worker search endpoint with {query, state, role, top_k}. The MCP search_workers tool surface already exists; route the form there. Render returned worker rows in the existing card grid.

Done when: typing "forklift" + state IL + role "Forklift Operator" returns ≤ top_k IL Forklift Operators, and changing state to WI returns different workers.

P2 — Contractor-name click → /contractor profile page

Problem: clicking a contractor name in any rendered card stays on /lakehouse/. URL doesn't change.

Fix: wrap contractor names in <a href="/contractor?name=<encoded>">. The page mcp-server/contractor.html (14.8 KB, "Contractor Profile · Staffing Co-Pilot") already exists at /contractor and the data endpoint /intelligence/contractor_profile already returns rich data.

Then check contractor.html actually shows: full history of every record the database has on that contractor + heat map of locations underneath + relevant info (per J 2026-04-27). If the page is incomplete, finish it. Otherwise just wire the link.

Done when: clicking "KACPRZYNSKI, ANDY" opens a profile with: every Chicago permit they're contact_1 or contact_2 on, a leaflet map with markers for each address, and any matched workers from prior placements at their sites.

P3 — Substrate signal at the bottom shows the right numbers

Problem: J reports the bottom panel says "playbook memory empty, 80 traces 0 replies." Reality from the live endpoints: /api/vectors/playbook_memory/stats = 4,701 entries with embeddings; /vectors/pathway/stats = 88 traces, 11/11 replays.

Fix: find the renderer in search.html that builds the substrate signal panel; verify it's hitting the right endpoints and reading the right keys; fix shape mismatches.

Done when: bottom panel shows real numbers (4,701 playbooks, 88 traces, 11/11 replays) and references at least one specific recent operation from the playbook stats sample.

P4 — Top nav reflects today's architecture

Problem: Walkthrough/Architecture/Spec/Onboard/Alerts/Workspaces tabs all return 200 but content is from old architecture. Doesn't mention: gateway scratchpad, memory indexer, ranker, mode runner, OpenCode 40-model fleet, distillation substrate, auditor cross-lineage.

Fix: rewrite mcp-server/proof.html (or add a single new page "What's running" that replaces Architecture+Spec) to describe what's actually shipped as of demo-2026-04-27. Keep one architecture page, drop redundancy. Either complete or hide Onboard/Alerts/Workspaces — J's call which.

Done when: the architecture page tells a non-technical reader, in 2 minutes, what each piece does in coordinator-relatable terms ("intern that read every email", not "3-stage adversarial inference pipeline").

P5 — Caching for the project-index build_signal (J flagged unfinished)

Problem: "we never finished our caching for project index build signal it's not pulling new information." Need to find what build_signal refers to. Likely a scrum/auditor signal that should rebuild the lakehouse_arch_v1 corpus on commit but isn't wired to.

Fix: identify the build-signal pipeline (likely in auditor/ or crates/vectord/), wire its emit to a corpus rebuild, verify by making a test commit and watching the new chunk appear in /vectors/indexes for lakehouse_arch_v1.

Done when: committing a new file to crates/ causes lakehouse_arch_v1 chunk_count to increase within N minutes.

P0 — Anchor the demo state (DONE)

Tagged ed57eda as demo-2026-04-27. Future sessions: git checkout demo-2026-04-27 to land in this exact code state.


EXECUTION ORDER

  1. P1 first — biggest visible bug, ~30-60 min
  2. P2 next — contractor click is the second-biggest "doesn't work" the client sees, ~20 min if profile is mostly done
  3. P3 — small fix, big "looks alive" win
  4. P4 — biggest scope; might split across sessions
  5. P5 — feature work, only after the visible bugs are fixed

Each item commits independently with the format demo: P<n> — <one-line> so the commit log doubles as a progress journal. After each merge to main, re-tag demo-latest to point at the new HEAD.

Stop here and let J pick which item to start with. Do not silently extend scope.