Some checks failed
lakehouse/auditor 11 blocking issues: cloud: claim not backed — "Verified end-to-end against persistent Go stack on :4110:"
The "drop Python sidecar from Rust aibridge" item from the architecture_comparison decisions tracker. Universal-win cleanup — removes 1 process + 1 runtime + 1 hop from every embed/generate request, with no behavior change. ## What was on the hot path before gateway → AiClient → http://:3200 (FastAPI sidecar) ├── embed.py → http://:11434 (Ollama) ├── generate.py → http://:11434 ├── rerank.py → http://:11434 (loops generate) └── admin.py → http://:11434 (/api/ps + nvidia-smi) The sidecar's hot-path code (~120 LOC across embed.py / generate.py / rerank.py / admin.py) was pure pass-through: each route translated its request body to Ollama's wire format and returned Ollama's response in a sidecar envelope. Zero logic, one full HTTP hop of overhead. ## What's on the hot path now gateway → AiClient → http://:11434 (Ollama directly) Inline rewrites in crates/aibridge/src/client.rs: - embed_uncached: per-text loop to /api/embed; computes dimension from response[0].length (matches the sidecar's prior shape) - generate (direct path): translates GenerateRequest → /api/generate (model, prompt, stream:false, options:{temperature, num_predict}, system, think); maps response → GenerateResponse using Ollama's field names (response, prompt_eval_count, eval_count) - rerank: per-doc loop with the same score-prompt the sidecar used; parses leading number, clamps 0-10, sorts desc - unload_model: /api/generate with prompt:"", keep_alive:0 - preload_model: /api/generate with prompt:" ", keep_alive:"5m", num_predict:1 - vram_snapshot: GET /api/ps + std::process::Command nvidia-smi; same envelope shape as the sidecar's /admin/vram so callers keep parsing - health: GET /api/version, wrapped in a sidecar-shaped envelope ({status, ollama_url, ollama_version}) Public AiClient API is unchanged — Request/Response types untouched. Callers (gateway routes, vectord, etc.) require zero updates. ## Config changes - crates/shared/src/config.rs: default_sidecar_url() bumps to :11434. The TOML field stays `[sidecar].url` for migration compat (operators with existing configs don't need to rename anything). - lakehouse.toml + config/providers.toml: bumped to localhost:11434 with comments explaining the 2026-05-02 transition. ## What stays Python sidecar/sidecar/lab_ui.py (385 LOC) + pipeline_lab.py (503 LOC) are dev-mode Streamlit-shape UIs for prompt experimentation. Not on the runtime hot path; continue running for ad-hoc work. The embed/generate/rerank/admin routes inside sidecar can be retired, but operators who want to keep the sidecar process running for the lab UI face no breakage — those routes still call Ollama and work. ## Verification - cargo check --workspace: clean - cargo test -p aibridge --lib: 32/32 PASS - Live smoke against test gateway on :3199 with new config: /ai/embed → 768-dim vector for "forklift operator" ✓ /v1/chat → provider=ollama, model=qwen2.5:latest, content=OK ✓ - nvidia-smi parsing tested via std::process::Command path - Live `lakehouse.service` (port :3100) NOT yet restarted — deploy step is operator-driven (sudo systemctl restart lakehouse.service) ## Architecture comparison update (Captured separately in golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md decisions tracker.) The "drop Python sidecar" line moves from _open_ to DONE. The Rust process model now has 1 mega-binary instead of 1 mega-binary + 1 sidecar process — a small but real reduction in ops surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
103 lines
4.4 KiB
TOML
103 lines
4.4 KiB
TOML
# Phase 39: Provider Registry
|
|
#
|
|
# Per-provider base_url, auth scheme, and default model. The gateway's
|
|
# /v1/chat dispatcher reads this file at boot to populate its provider
|
|
# table. Secrets (API keys) come from /etc/lakehouse/secrets.toml or
|
|
# environment variables — NEVER inline a key here.
|
|
#
|
|
# Adding a new provider:
|
|
# 1. New [[provider]] block with name, base_url, auth, default_model
|
|
# 2. Matching adapter at crates/aibridge/src/providers/<name>.rs
|
|
# implementing the ProviderAdapter trait (chat + embed + unload)
|
|
# 3. Route arm in crates/gateway/src/v1/mod.rs matching on `name`
|
|
# 4. Model-prefix routing hint in resolve_provider() if the provider
|
|
# uses an "<name>/..." model prefix (e.g. "openrouter/...")
|
|
|
|
[[provider]]
|
|
name = "ollama"
|
|
base_url = "http://localhost:11434"
|
|
auth = "none"
|
|
default_model = "qwen3.5:latest"
|
|
# Hot-path local inference. No bearer needed — direct to Ollama as of
|
|
# 2026-05-02 (Python sidecar's pass-through wrapper retired). Model
|
|
# names are bare (e.g. "qwen3.5:latest", not "ollama/qwen3.5:latest").
|
|
|
|
[[provider]]
|
|
name = "ollama_cloud"
|
|
base_url = "https://ollama.com"
|
|
auth = "bearer"
|
|
auth_env = "OLLAMA_CLOUD_KEY"
|
|
default_model = "deepseek-v3.2"
|
|
# Cloud-tier Ollama (Pro plan as of 2026-04-28). Key resolved from
|
|
# OLLAMA_CLOUD_KEY at gateway boot; Pro tier upgraded the account so
|
|
# rate limits + model access widen without a key change. Model-prefix
|
|
# routing: "cloud/<model>" auto-routes here. 39-model fleet now
|
|
# includes deepseek-v3.2, deepseek-v4-{flash,pro}, gemini-3-flash-
|
|
# preview, glm-{5,5.1}, kimi-k2.6, qwen3-coder-next.
|
|
# 2026-04-28: default upgraded gpt-oss:120b → deepseek-v3.2 (newest
|
|
# DeepSeek revision; kimi-k2:1t still upstream-broken with HTTP 500).
|
|
|
|
[[provider]]
|
|
name = "openrouter"
|
|
base_url = "https://openrouter.ai/api/v1"
|
|
auth = "bearer"
|
|
auth_env = "OPENROUTER_API_KEY"
|
|
auth_fallback_files = ["/home/profit/.env", "/root/llm_team_config.json"]
|
|
default_model = "x-ai/grok-4.1-fast"
|
|
# Multi-provider gateway. Covers Anthropic, Google, OpenAI, MiniMax,
|
|
# Qwen, Gemma, etc. Key resolved via crates/gateway/src/v1/openrouter.rs
|
|
# resolve_openrouter_key() — env first, then fallback files.
|
|
# Model-prefix routing: "openrouter/<vendor>/<model>" auto-routes here,
|
|
# prefix stripped before upstream call.
|
|
|
|
[[provider]]
|
|
name = "opencode"
|
|
base_url = "https://opencode.ai/zen/v1"
|
|
# Unified endpoint — covers BOTH Zen (pay-per-token Anthropic/OpenAI/
|
|
# Gemini frontier) AND Go (flat-sub Kimi/GLM/DeepSeek/Qwen/Minimax).
|
|
# Upstream bills per-model: Zen models hit Zen balance, Go models hit
|
|
# Go subscription cap. /zen/go/v1 is the Go-only sub-path (rejects
|
|
# Zen models), kept for reference but not used by this provider.
|
|
auth = "bearer"
|
|
auth_env = "OPENCODE_API_KEY"
|
|
default_model = "claude-opus-4-7"
|
|
# OpenCode (Zen + GO unified endpoint). One sk-* key reaches Claude
|
|
# Opus 4.7, GPT-5.5-pro, Gemini 3.1-pro, Kimi K2.6, DeepSeek, GLM,
|
|
# Qwen, plus 4 free-tier models. OpenAI-compatible Chat Completions
|
|
# at /v1/chat/completions. Model-prefix routing: "opencode/<name>"
|
|
# auto-routes here, prefix stripped before upstream call.
|
|
# Key file: /etc/lakehouse/opencode.env (loaded via systemd EnvironmentFile).
|
|
# Model catalog: curl -H "Authorization: Bearer ..." https://opencode.ai/zen/v1/models
|
|
# Note: /zen/go/v1 is the GO-only sub-path (Kimi/GLM/DeepSeek tier);
|
|
# /zen/v1 covers everything including Anthropic (which /zen/go/v1 rejects).
|
|
|
|
[[provider]]
|
|
name = "kimi"
|
|
base_url = "https://api.kimi.com/coding/v1"
|
|
auth = "bearer"
|
|
auth_env = "KIMI_API_KEY"
|
|
default_model = "kimi-for-coding"
|
|
# Direct Kimi For Coding provider. `api.kimi.com` is a SEPARATE account
|
|
# system from `api.moonshot.ai` and `api.moonshot.cn` — keys are NOT
|
|
# interchangeable. Used when Ollama Cloud's `kimi-k2:1t` is upstream-
|
|
# broken and OpenRouter's `moonshotai/kimi-k2.6` is rate-limited.
|
|
# Model id: `kimi-for-coding` (kimi-k2.6 underneath).
|
|
# Key file: /etc/lakehouse/kimi.env (loaded via systemd EnvironmentFile).
|
|
# Model-prefix routing: "kimi/<model>" auto-routes here, prefix stripped.
|
|
|
|
# Planned (Phase 40 long-horizon — adapters not yet shipped):
|
|
#
|
|
# [[provider]]
|
|
# name = "gemini"
|
|
# base_url = "https://generativelanguage.googleapis.com/v1beta"
|
|
# auth = "api_key_query"
|
|
# auth_env = "GEMINI_API_KEY"
|
|
# default_model = "gemini-2.0-flash"
|
|
#
|
|
# [[provider]]
|
|
# name = "claude"
|
|
# base_url = "https://api.anthropic.com/v1"
|
|
# auth = "x_api_key"
|
|
# auth_env = "ANTHROPIC_API_KEY"
|
|
# default_model = "claude-3-5-sonnet-latest"
|