From afdeca80d97e4a333d9a870f25f130976cdfe608 Mon Sep 17 00:00:00 2001 From: root Date: Sat, 2 May 2026 05:00:51 -0500 Subject: [PATCH] docs: record Python sidecar drop in architecture comparison MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Companion to lakehouse commit ba928b1 (aibridge: drop Python sidecar from hot path; AiClient → direct Ollama). ARCHITECTURE_COMPARISON.md: - Decisions tracker: "Drop Python sidecar" moves _open_ → DONE - "Cross-cutting abstracts to address" item #3 marked complete - "Python dependency (the load-bearing axis)" section reframed: pre/post-2026-05-02 diagram, lab UI / pipeline_lab clarified as dev-only Python (not on runtime hot path) - Change log: new entry STATE_OF_PLAY: timestamp refresh. Co-Authored-By: Claude Opus 4.7 (1M context) --- STATE_OF_PLAY.md | 2 +- docs/ARCHITECTURE_COMPARISON.md | 26 +++++++++++++++++--------- 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md index b6a7a81..7f43fb2 100644 --- a/STATE_OF_PLAY.md +++ b/STATE_OF_PLAY.md @@ -1,6 +1,6 @@ # STATE OF PLAY — Lakehouse-Go -**Last verified:** 2026-05-02 ~05:50 CDT +**Last verified:** 2026-05-02 ~06:30 CDT **Verified by:** **production-readiness gauntlet** — 21/21 smoke chain green, per-component scrum across 4 bundles, **3 cross-runtime parity probes all green post-fix** (validator: **6/6 match** after wire-format alignment shipped; materializer: 2/2 after omitempty fix; extract_json: 12/12). All findings surfaced by the parity probes have been actioned. Disposition: `reports/cutover/gauntlet_2026-05-02/disposition.md`. > **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes. diff --git a/docs/ARCHITECTURE_COMPARISON.md b/docs/ARCHITECTURE_COMPARISON.md index 4e2c04f..e3545da 100644 --- a/docs/ARCHITECTURE_COMPARISON.md +++ b/docs/ARCHITECTURE_COMPARISON.md @@ -48,7 +48,7 @@ Don't: | 2026-05-01 | Multi-tier load test against 100k corpus | 335k scenarios in 5min, 4/6 at 0% fail. Surfaced coder/hnsw v0.6.1 bug. Recover guard added. **DONE** (multitier_100k.md). | | 2026-05-01 | **coder/hnsw v0.6.1 panic — REAL FIX landed** | Lifted source-of-truth out of coder/hnsw via `i.vectors map[string][]float32` side store + `safeGraphAdd`/`safeGraphDelete` recover wrappers + warm-path rebuild fallback. Re-run: 0 failures across 19,622 scenarios (was 96-98% on 2/6). **DONE.** Architecture invariant in STATE_OF_PLAY "DO NOT RELITIGATE". | | 2026-05-02 | **Substrate fix verified at original failure scale** | Re-ran multitier 5min @ conc=50 (the footprint that originally surfaced the bug at 96-98% fail). Result: 132,211 scenarios at 438/sec, **6/6 classes at 0% failure**. Throughput dropped 1,115/sec → 438/sec because broken scenarios now do real HNSW Add work. Tails healthy: surge_fill_validate p99=1.53s, playbook_record_replay p99=2.32s. **Fix scales — closing the open thread.** | -| _open_ | Drop Python sidecar from Rust aibridge | Universal-win architectural cleanup. ~200 LOC, removes 1 runtime + 1 process. | +| 2026-05-02 | **Drop Python sidecar from Rust aibridge — DONE** | `crates/aibridge/src/client.rs` rewrite (commit `ba928b1` in lakehouse). AiClient now talks Ollama directly: per-text `/api/embed` loop, `/api/generate` for chat + rerank-loop + admin (unload/preload), `/api/ps` + `std::process::Command nvidia-smi` for vram_snapshot. Public API unchanged — 0 callers updated. Verification: cargo test -p aibridge 32/32 PASS, live smoke `/ai/embed` returns 768-dim vector + `/v1/chat` returns "OK". Sidecar's `lab_ui.py` + `pipeline_lab.py` (~888 LOC dev-only UIs) keep running; only the hot-path embed/generate/rerank/admin routes are retired. Process count drops from "mega-binary + sidecar" to "mega-binary alone". | | 2026-05-02 | **Port Rust materializer to Go (transforms.ts) — DONE** | `internal/materializer` + `cmd/materializer` + `materializer_smoke.sh`. Ports `transforms.ts` (12 transforms) + `build_evidence_index.ts`. Idempotency, day-partition, receipt. 14 tests green; on-wire JSON matches TS so both runtimes interoperate. | | 2026-05-02 | **Port Rust replay tool to Go — DONE** | `internal/replay` + `cmd/replay` + `replay_smoke.sh`. Ports `replay.ts` retrieve → bundle → /v1/chat → validate → log. Closes audit-FULL phase 7 live invocation on Go side. 14 tests green; same `data/_kb/replay_runs.jsonl` shape (schema=replay_run.v1) as TS. | | 2026-05-02 | **`/v1/validate` + `/v1/iterate` HTTP surface — DONE** | `cmd/validatord` (port 3221) hosts both endpoints. `internal/validator` gains `PlaybookValidator` (3rd kind), JSONL roster loader, and the `Iterate` orchestrator + `ExtractJSON` helper. Gateway proxies `/v1/validate` + `/v1/iterate` to validatord. Closes the last "Go-primary" backlog item (architecture_comparison.md item #7). 30+ tests + `validatord_smoke.sh` 5/5 PASS. | @@ -97,17 +97,24 @@ up). Go also lets you tune per-daemon resource limits via systemd. ## Python dependency (the load-bearing axis) -This is the architectural difference that drove the original perf gap. -Both call Ollama at `:11434`, but the path differs: +This was the architectural difference that drove the original perf gap. ``` -Rust embed: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434 -Go embed: gateway → HTTP → Go embedd :4216 → HTTP → Ollama :11434 +Pre-2026-05-02: + Rust embed: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama :11434 + Go embed: gateway → HTTP → Go embedd :4216 → HTTP → Ollama :11434 + +Post-2026-05-02 (commit ba928b1): + Rust embed: gateway → HTTP → Ollama :11434 (sidecar dropped) + Go embed: gateway → HTTP → Go embedd :4216 → HTTP → Ollama :11434 ``` -The Python sidecar (`sidecar/sidecar/main.py`, 1,237 lines) is a -FastAPI wrapper around Ollama. It does pydantic validation + request -shaping; **no fundamental compute** that Ollama can't do directly. +The hot-path Python sidecar (~120 LOC across embed.py / generate.py +/ rerank.py / admin.py) was pure pass-through to Ollama and added no +logic — just translation. AiClient was rewritten 2026-05-02 to call +Ollama directly. `lab_ui.py` + `pipeline_lab.py` (~888 LOC of +Streamlit-shape dev UIs) remain as ad-hoc tooling, not on the +runtime hot path. ### Performance impact (load-tested 2026-05-01, 6 rotating bodies, 10 concurrency, 30s) @@ -270,7 +277,7 @@ The list below is a working backlog. Move items to "Decisions tracker" 1. ✅ **Embed cache in Rust aibridge** — DONE 2026-05-01 (`150cc3b`). 2. ✅ **FillValidator + EmailValidator in Go** — DONE 2026-05-01 (`b03521a`). -3. **Drop Python sidecar from Rust** — Rewrite aibridge to call Ollama at `:11434/api/embed` and `/api/generate` directly. Removes 1 runtime + 1 process from ops. ~200 LOC. +3. ✅ **Drop Python sidecar from Rust** — DONE 2026-05-02 (`ba928b1`). 4. **Cross-runtime contract tests** — Pin shared JSONL schemas (audit_baselines, scored_run, sft_sample) as canonical specs in `auditor/schemas/` with Go-side validators consuming the same definitions. ### If keeping Go primary @@ -320,6 +327,7 @@ Append entries here when this doc gets updated. One-line entries; link to commit - 2026-05-02 — Substrate fix verified at original failure-surfacing scale. Multitier 5min @ conc=50: 132,211 scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix). Throughput drop (1,115 → 438/sec) is the honest cost of the formerly-broken scenarios doing real HNSW Add work. STATE_OF_PLAY refreshed to 2026-05-02. - 2026-05-02 — Materializer + replay tool ported from Rust legacy to Go (`internal/materializer` + `internal/replay`, both with CLI + smoke + tests). Both runtimes now produce the same `data/evidence/YYYY/MM/DD/*.jsonl` and `data/_kb/replay_runs.jsonl` shapes; Go side no longer needs Bun for these phases. - 2026-05-02 — `/v1/validate` + `/v1/iterate` HTTP surface ported as `cmd/validatord` on `:3221`. Closes the last "If keeping Go primary" backlog item — Go now owns the entire validator path end-to-end (no Rust dep for staffing safety net). 5/5 smoke probes via gateway :3110. +- 2026-05-02 — Python sidecar dropped from Rust hot path (`ba928b1`). AiClient now talks Ollama directly. Process count: mega-binary + sidecar → mega-binary alone. Public API unchanged, 0 callers updated. Cargo tests 32/32 PASS, live `/ai/embed` + `/v1/chat` smokes green on test gateway. Lab UI Python remains as ad-hoc dev tooling. ---