docs: record Python sidecar drop in architecture comparison

Companion to lakehouse commit ba928b1 (aibridge: drop Python sidecar from hot path; AiClient → direct Ollama). ARCHITECTURE_COMPARISON.md: - Decisions tracker: "Drop Python sidecar" moves _open_ → DONE - "Cross-cutting abstracts to address" item #3 marked complete - "Python dependency (the load-bearing axis)" section reframed: pre/post-2026-05-02 diagram, lab UI / pipeline_lab clarified as dev-only Python (not on runtime hot path) - Change log: new entry STATE_OF_PLAY: timestamp refresh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:00:51 -05:00 · 2026-05-02 05:00:51 -05:00 · afdeca80d9
commit afdeca80d9
parent 7d6636b33e
2 changed files with 18 additions and 10 deletions
--- a/STATE_OF_PLAY.md
+++ b/STATE_OF_PLAY.md
@ -1,6 +1,6 @@
 # STATE OF PLAY — Lakehouse-Go

-**Last verified:** 2026-05-02 ~05:50 CDT
+**Last verified:** 2026-05-02 ~06:30 CDT
 **Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green, per-component scrum across 4 bundles, **3 cross-runtime parity probes all green post-fix** (validator: **6/6 match** after wire-format alignment shipped; materializer: 2/2 after omitempty fix; extract_json: 12/12). All findings surfaced by the parity probes have been actioned. Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`.

 > **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
--- a/docs/ARCHITECTURE_COMPARISON.md
+++ b/docs/ARCHITECTURE_COMPARISON.md
@ -48,7 +48,7 @@ Don't:
 | 2026-05-01 | Multi-tier load test against 100k corpus | 335k scenarios in 5min, 4/6 at 0% fail. Surfaced coder/hnsw v0.6.1 bug. Recover guard added. **DONE** (multitier_100k.md). |
 | 2026-05-01 | **coder/hnsw v0.6.1 panic — REAL FIX landed** | Lifted source-of-truth out of coder/hnsw via `i.vectors map[string][]float32` side store + `safeGraphAdd`/`safeGraphDelete` recover wrappers + warm-path rebuild fallback. Re-run: 0 failures across 19,622 scenarios (was 96-98% on 2/6). **DONE.** Architecture invariant in STATE_OF_PLAY "DO NOT RELITIGATE". |
 | 2026-05-02 | **Substrate fix verified at original failure scale** | Re-ran multitier 5min @ conc=50 (the footprint that originally surfaced the bug at 96-98% fail). Result: 132,211 scenarios at 438/sec, **6/6 classes at 0% failure**. Throughput dropped 1,115/sec → 438/sec because broken scenarios now do real HNSW Add work. Tails healthy: surge_fill_validate p99=1.53s, playbook_record_replay p99=2.32s. **Fix scales — closing the open thread.** |
-| _open_ | Drop Python sidecar from Rust aibridge | Universal-win architectural cleanup. ~200 LOC, removes 1 runtime + 1 process. |
+| 2026-05-02 | **Drop Python sidecar from Rust aibridge — DONE** | `crates/aibridge/src/client.rs` rewrite (commit `ba928b1` in lakehouse). AiClient now talks Ollama directly: per-text `/api/embed` loop, `/api/generate` for chat + rerank-loop + admin (unload/preload), `/api/ps` + `std::process::Command nvidia-smi` for vram_snapshot. Public API unchanged — 0 callers updated. Verification: cargo test -p aibridge 32/32 PASS, live smoke `/ai/embed` returns 768-dim vector + `/v1/chat` returns "OK". Sidecar's `lab_ui.py` + `pipeline_lab.py` (~888 LOC dev-only UIs) keep running; only the hot-path embed/generate/rerank/admin routes are retired. Process count drops from "mega-binary + sidecar" to "mega-binary alone". |
 | 2026-05-02 | **Port Rust materializer to Go (transforms.ts) — DONE** | `internal/materializer` + `cmd/materializer` + `materializer_smoke.sh`. Ports `transforms.ts` (12 transforms) + `build_evidence_index.ts`. Idempotency, day-partition, receipt. 14 tests green; on-wire JSON matches TS so both runtimes interoperate. |
 | 2026-05-02 | **Port Rust replay tool to Go — DONE** | `internal/replay` + `cmd/replay` + `replay_smoke.sh`. Ports `replay.ts` retrieve → bundle → /v1/chat → validate → log. Closes audit-FULL phase 7 live invocation on Go side. 14 tests green; same `data/_kb/replay_runs.jsonl` shape (schema=replay_run.v1) as TS. |
 | 2026-05-02 | **`/v1/validate` + `/v1/iterate` HTTP surface — DONE** | `cmd/validatord` (port 3221) hosts both endpoints. `internal/validator` gains `PlaybookValidator` (3rd kind), JSONL roster loader, and the `Iterate` orchestrator + `ExtractJSON` helper. Gateway proxies `/v1/validate` + `/v1/iterate` to validatord. Closes the last "Go-primary" backlog item (architecture_comparison.md item #7). 30+ tests + `validatord_smoke.sh` 5/5 PASS. |
@ -97,17 +97,24 @@ up). Go also lets you tune per-daemon resource limits via systemd.

 ## Python dependency (the load-bearing axis)

-This is the architectural difference that drove the original perf gap.
-Both call Ollama at `:11434`, but the path differs:
+This was the architectural difference that drove the original perf gap.

 ```
-Rust embed:  gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
-Go embed:    gateway → HTTP → Go embedd      :4216 → HTTP → Ollama :11434
+Pre-2026-05-02:
+  Rust embed:  gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
+  Go embed:    gateway → HTTP → Go embedd      :4216 → HTTP → Ollama :11434
+
+Post-2026-05-02 (commit ba928b1):
+  Rust embed:  gateway → HTTP → Ollama :11434                (sidecar dropped)
+  Go embed:    gateway → HTTP → Go embedd :4216 → HTTP → Ollama :11434
 ```

-The Python sidecar (`sidecar/sidecar/main.py`, 1,237 lines) is a
-FastAPI wrapper around Ollama. It does pydantic validation + request
-shaping; **no fundamental compute** that Ollama can't do directly.
+The hot-path Python sidecar (~120 LOC across embed.py / generate.py
+/ rerank.py / admin.py) was pure pass-through to Ollama and added no
+logic — just translation. AiClient was rewritten 2026-05-02 to call
+Ollama directly. `lab_ui.py` + `pipeline_lab.py` (~888 LOC of
+Streamlit-shape dev UIs) remain as ad-hoc tooling, not on the
+runtime hot path.

 ### Performance impact (load-tested 2026-05-01, 6 rotating bodies, 10 concurrency, 30s)

@ -270,7 +277,7 @@ The list below is a working backlog. Move items to "Decisions tracker"

 1. ✅ **Embed cache in Rust aibridge** — DONE 2026-05-01 (`150cc3b`).
 2. ✅ **FillValidator + EmailValidator in Go** — DONE 2026-05-01 (`b03521a`).
-3. **Drop Python sidecar from Rust** — Rewrite aibridge to call Ollama at `:11434/api/embed` and `/api/generate` directly. Removes 1 runtime + 1 process from ops. ~200 LOC.
+3. ✅ **Drop Python sidecar from Rust** — DONE 2026-05-02 (`ba928b1`).
 4. **Cross-runtime contract tests** — Pin shared JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in `auditor/schemas/` with Go-side validators consuming the same definitions.

 ### If keeping Go primary
@ -320,6 +327,7 @@ Append entries here when this doc gets updated. One-line entries; link to commit
 - 2026-05-02 — Substrate fix verified at original failure-surfacing scale. Multitier 5min @ conc=50: 132,211 scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix). Throughput drop (1,115 → 438/sec) is the honest cost of the formerly-broken scenarios doing real HNSW Add work. STATE_OF_PLAY refreshed to 2026-05-02.
 - 2026-05-02 — Materializer + replay tool ported from Rust legacy to Go (`internal/materializer` + `internal/replay`, both with CLI + smoke + tests). Both runtimes now produce the same `data/evidence/YYYY/MM/DD/*.jsonl` and `data/_kb/replay_runs.jsonl` shapes; Go side no longer needs Bun for these phases.
 - 2026-05-02 — `/v1/validate` + `/v1/iterate` HTTP surface ported as `cmd/validatord` on `:3221`. Closes the last "If keeping Go primary" backlog item — Go now owns the entire validator path end-to-end (no Rust dep for staffing safety net). 5/5 smoke probes via gateway :3110.
+- 2026-05-02 — Python sidecar dropped from Rust hot path (`ba928b1`). AiClient now talks Ollama directly. Process count: mega-binary + sidecar → mega-binary alone. Public API unchanged, 0 callers updated. Cargo tests 32/32 PASS, live `/ai/embed` + `/v1/chat` smokes green on test gateway. Lab UI Python remains as ad-hoc dev tooling.

 ---