Some checks failed
lakehouse/auditor 11 blocking issues: cloud: claim not backed — "Verified end-to-end against persistent Go stack on :4110:"
The "drop Python sidecar from Rust aibridge" item from the architecture_comparison decisions tracker. Universal-win cleanup — removes 1 process + 1 runtime + 1 hop from every embed/generate request, with no behavior change. ## What was on the hot path before gateway → AiClient → http://:3200 (FastAPI sidecar) ├── embed.py → http://:11434 (Ollama) ├── generate.py → http://:11434 ├── rerank.py → http://:11434 (loops generate) └── admin.py → http://:11434 (/api/ps + nvidia-smi) The sidecar's hot-path code (~120 LOC across embed.py / generate.py / rerank.py / admin.py) was pure pass-through: each route translated its request body to Ollama's wire format and returned Ollama's response in a sidecar envelope. Zero logic, one full HTTP hop of overhead. ## What's on the hot path now gateway → AiClient → http://:11434 (Ollama directly) Inline rewrites in crates/aibridge/src/client.rs: - embed_uncached: per-text loop to /api/embed; computes dimension from response[0].length (matches the sidecar's prior shape) - generate (direct path): translates GenerateRequest → /api/generate (model, prompt, stream:false, options:{temperature, num_predict}, system, think); maps response → GenerateResponse using Ollama's field names (response, prompt_eval_count, eval_count) - rerank: per-doc loop with the same score-prompt the sidecar used; parses leading number, clamps 0-10, sorts desc - unload_model: /api/generate with prompt:"", keep_alive:0 - preload_model: /api/generate with prompt:" ", keep_alive:"5m", num_predict:1 - vram_snapshot: GET /api/ps + std::process::Command nvidia-smi; same envelope shape as the sidecar's /admin/vram so callers keep parsing - health: GET /api/version, wrapped in a sidecar-shaped envelope ({status, ollama_url, ollama_version}) Public AiClient API is unchanged — Request/Response types untouched. Callers (gateway routes, vectord, etc.) require zero updates. ## Config changes - crates/shared/src/config.rs: default_sidecar_url() bumps to :11434. The TOML field stays `[sidecar].url` for migration compat (operators with existing configs don't need to rename anything). - lakehouse.toml + config/providers.toml: bumped to localhost:11434 with comments explaining the 2026-05-02 transition. ## What stays Python sidecar/sidecar/lab_ui.py (385 LOC) + pipeline_lab.py (503 LOC) are dev-mode Streamlit-shape UIs for prompt experimentation. Not on the runtime hot path; continue running for ad-hoc work. The embed/generate/rerank/admin routes inside sidecar can be retired, but operators who want to keep the sidecar process running for the lab UI face no breakage — those routes still call Ollama and work. ## Verification - cargo check --workspace: clean - cargo test -p aibridge --lib: 32/32 PASS - Live smoke against test gateway on :3199 with new config: /ai/embed → 768-dim vector for "forklift operator" ✓ /v1/chat → provider=ollama, model=qwen2.5:latest, content=OK ✓ - nvidia-smi parsing tested via std::process::Command path - Live `lakehouse.service` (port :3100) NOT yet restarted — deploy step is operator-driven (sudo systemctl restart lakehouse.service) ## Architecture comparison update (Captured separately in golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md decisions tracker.) The "drop Python sidecar" line moves from _open_ to DONE. The Rust process model now has 1 mega-binary instead of 1 mega-binary + 1 sidecar process — a small but real reduction in ops surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
Rust-first object storage system
Languages
TypeScript
38.4%
Rust
35.8%
HTML
13.9%
Python
7.8%
Shell
2.1%
Other
2%