From afdeca80d97e4a333d9a870f25f130976cdfe608 Mon Sep 17 00:00:00 2001
From: root <root@island37.com>
Date: Sat, 2 May 2026 05:00:51 -0500
Subject: [PATCH] docs: record Python sidecar drop in architecture comparison
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Companion to lakehouse commit ba928b1 (aibridge: drop Python sidecar
from hot path; AiClient → direct Ollama).

ARCHITECTURE_COMPARISON.md:
  - Decisions tracker: "Drop Python sidecar" moves _open_ → DONE
  - "Cross-cutting abstracts to address" item #3 marked complete
  - "Python dependency (the load-bearing axis)" section reframed:
    pre/post-2026-05-02 diagram, lab UI / pipeline_lab clarified
    as dev-only Python (not on runtime hot path)
  - Change log: new entry

STATE_OF_PLAY: timestamp refresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 STATE_OF_PLAY.md                |  2 +-
 docs/ARCHITECTURE_COMPARISON.md | 26 +++++++++++++++++---------
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md
index b6a7a81..7f43fb2 100644
--- a/STATE_OF_PLAY.md
+++ b/STATE_OF_PLAY.md
@@ -1,6 +1,6 @@
 # STATE OF PLAY — Lakehouse-Go
 
-**Last verified:** 2026-05-02 ~05:50 CDT
+**Last verified:** 2026-05-02 ~06:30 CDT
 **Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green, per-component scrum across 4 bundles, **3 cross-runtime parity probes all green post-fix** (validator: **6/6 match** after wire-format alignment shipped; materializer: 2/2 after omitempty fix; extract_json: 12/12). All findings surfaced by the parity probes have been actioned. Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`.
 
 > **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
diff --git a/docs/ARCHITECTURE_COMPARISON.md b/docs/ARCHITECTURE_COMPARISON.md
index 4e2c04f..e3545da 100644
--- a/docs/ARCHITECTURE_COMPARISON.md
+++ b/docs/ARCHITECTURE_COMPARISON.md
@@ -48,7 +48,7 @@ Don't:
 | 2026-05-01 | Multi-tier load test against 100k corpus | 335k scenarios in 5min, 4/6 at 0% fail. Surfaced coder/hnsw v0.6.1 bug. Recover guard added. **DONE** (multitier_100k.md). |
 | 2026-05-01 | **coder/hnsw v0.6.1 panic — REAL FIX landed** | Lifted source-of-truth out of coder/hnsw via `i.vectors map[string][]float32` side store + `safeGraphAdd`/`safeGraphDelete` recover wrappers + warm-path rebuild fallback. Re-run: 0 failures across 19,622 scenarios (was 96-98% on 2/6). **DONE.** Architecture invariant in STATE_OF_PLAY "DO NOT RELITIGATE". |
 | 2026-05-02 | **Substrate fix verified at original failure scale** | Re-ran multitier 5min @ conc=50 (the footprint that originally surfaced the bug at 96-98% fail). Result: 132,211 scenarios at 438/sec, **6/6 classes at 0% failure**. Throughput dropped 1,115/sec → 438/sec because broken scenarios now do real HNSW Add work. Tails healthy: surge_fill_validate p99=1.53s, playbook_record_replay p99=2.32s. **Fix scales — closing the open thread.** |
-| _open_ | Drop Python sidecar from Rust aibridge | Universal-win architectural cleanup. ~200 LOC, removes 1 runtime + 1 process. |
+| 2026-05-02 | **Drop Python sidecar from Rust aibridge — DONE** | `crates/aibridge/src/client.rs` rewrite (commit `ba928b1` in lakehouse). AiClient now talks Ollama directly: per-text `/api/embed` loop, `/api/generate` for chat + rerank-loop + admin (unload/preload), `/api/ps` + `std::process::Command nvidia-smi` for vram_snapshot. Public API unchanged — 0 callers updated. Verification: cargo test -p aibridge 32/32 PASS, live smoke `/ai/embed` returns 768-dim vector + `/v1/chat` returns "OK". Sidecar's `lab_ui.py` + `pipeline_lab.py` (~888 LOC dev-only UIs) keep running; only the hot-path embed/generate/rerank/admin routes are retired. Process count drops from "mega-binary + sidecar" to "mega-binary alone". |
 | 2026-05-02 | **Port Rust materializer to Go (transforms.ts) — DONE** | `internal/materializer` + `cmd/materializer` + `materializer_smoke.sh`. Ports `transforms.ts` (12 transforms) + `build_evidence_index.ts`. Idempotency, day-partition, receipt. 14 tests green; on-wire JSON matches TS so both runtimes interoperate. |
 | 2026-05-02 | **Port Rust replay tool to Go — DONE** | `internal/replay` + `cmd/replay` + `replay_smoke.sh`. Ports `replay.ts` retrieve → bundle → /v1/chat → validate → log. Closes audit-FULL phase 7 live invocation on Go side. 14 tests green; same `data/_kb/replay_runs.jsonl` shape (schema=replay_run.v1) as TS. |
 | 2026-05-02 | **`/v1/validate` + `/v1/iterate` HTTP surface — DONE** | `cmd/validatord` (port 3221) hosts both endpoints. `internal/validator` gains `PlaybookValidator` (3rd kind), JSONL roster loader, and the `Iterate` orchestrator + `ExtractJSON` helper. Gateway proxies `/v1/validate` + `/v1/iterate` to validatord. Closes the last "Go-primary" backlog item (architecture_comparison.md item #7). 30+ tests + `validatord_smoke.sh` 5/5 PASS. |
@@ -97,17 +97,24 @@ up). Go also lets you tune per-daemon resource limits via systemd.
 
 ## Python dependency (the load-bearing axis)
 
-This is the architectural difference that drove the original perf gap.
-Both call Ollama at `:11434`, but the path differs:
+This was the architectural difference that drove the original perf gap.
 
 ```
-Rust embed:  gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
-Go embed:    gateway → HTTP → Go embedd      :4216 → HTTP → Ollama :11434
+Pre-2026-05-02:
+  Rust embed:  gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434
+  Go embed:    gateway → HTTP → Go embedd      :4216 → HTTP → Ollama :11434
+
+Post-2026-05-02 (commit ba928b1):
+  Rust embed:  gateway → HTTP → Ollama :11434                (sidecar dropped)
+  Go embed:    gateway → HTTP → Go embedd :4216 → HTTP → Ollama :11434
 ```
 
-The Python sidecar (`sidecar/sidecar/main.py`, 1,237 lines) is a
-FastAPI wrapper around Ollama. It does pydantic validation + request
-shaping; **no fundamental compute** that Ollama can't do directly.
+The hot-path Python sidecar (~120 LOC across embed.py / generate.py
+/ rerank.py / admin.py) was pure pass-through to Ollama and added no
+logic — just translation. AiClient was rewritten 2026-05-02 to call
+Ollama directly. `lab_ui.py` + `pipeline_lab.py` (~888 LOC of
+Streamlit-shape dev UIs) remain as ad-hoc tooling, not on the
+runtime hot path.
 
 ### Performance impact (load-tested 2026-05-01, 6 rotating bodies, 10 concurrency, 30s)
 
@@ -270,7 +277,7 @@ The list below is a working backlog. Move items to "Decisions tracker"
 
 1. ✅ **Embed cache in Rust aibridge** — DONE 2026-05-01 (`150cc3b`).
 2. ✅ **FillValidator + EmailValidator in Go** — DONE 2026-05-01 (`b03521a`).
-3. **Drop Python sidecar from Rust** — Rewrite aibridge to call Ollama at `:11434/api/embed` and `/api/generate` directly. Removes 1 runtime + 1 process from ops. ~200 LOC.
+3. ✅ **Drop Python sidecar from Rust** — DONE 2026-05-02 (`ba928b1`).
 4. **Cross-runtime contract tests** — Pin shared JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in `auditor/schemas/` with Go-side validators consuming the same definitions.
 
 ### If keeping Go primary
@@ -320,6 +327,7 @@ Append entries here when this doc gets updated. One-line entries; link to commit
 - 2026-05-02 — Substrate fix verified at original failure-surfacing scale. Multitier 5min @ conc=50: 132,211 scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix). Throughput drop (1,115 → 438/sec) is the honest cost of the formerly-broken scenarios doing real HNSW Add work. STATE_OF_PLAY refreshed to 2026-05-02.
 - 2026-05-02 — Materializer + replay tool ported from Rust legacy to Go (`internal/materializer` + `internal/replay`, both with CLI + smoke + tests). Both runtimes now produce the same `data/evidence/YYYY/MM/DD/*.jsonl` and `data/_kb/replay_runs.jsonl` shapes; Go side no longer needs Bun for these phases.
 - 2026-05-02 — `/v1/validate` + `/v1/iterate` HTTP surface ported as `cmd/validatord` on `:3221`. Closes the last "If keeping Go primary" backlog item — Go now owns the entire validator path end-to-end (no Rust dep for staffing safety net). 5/5 smoke probes via gateway :3110.
+- 2026-05-02 — Python sidecar dropped from Rust hot path (`ba928b1`). AiClient now talks Ollama directly. Process count: mega-binary + sidecar → mega-binary alone. Public API unchanged, 0 callers updated. Cargo tests 32/32 PASS, live `/ai/embed` + `/v1/chat` smokes green on test gateway. Lab UI Python remains as ad-hoc dev tooling.
 
 ---