Four findings deferred from the 2026-05-02 scrum, all 1-5 line fixes: W1 (kimi WARN @ scrum_master_pipeline.ts:1143) — `gemini-3-flash-preview` hardcoded twice in MAP and REDUCE phases. Extracted TREE_SPLIT_MODEL + TREE_SPLIT_PROVIDER constants near the existing config block. Diverging the two would break tree-split coherence (per-shard digests must come from the same model the reducer collapses). W2 (qwen WARN @ providers.toml:30) — stale `kimi-k2:1t` reference in operator-facing comments after PR #13 noted it's upstream-broken. Reframed as historical context ("was X here pre-2026-05-03 — that model is broken") so future operators don't paste-route from the comment. W3 (opus WARN @ vectord-lance/src/lib.rs:622) — temp_path() entropy was only pid+nanos, which collide under tokio scheduling when multiple tests in the same cargo process create temp dirs back-to-back. Added per-process AtomicU64 sequence counter — guarantees uniqueness regardless of clock. W4 (opus INFO @ scripts/lance_smoke.sh:38) — `|| echo '{}'` swallowed curl transport failures (gateway down, network broken, timeout), surfacing as misleading "no method field" jq errors at the next probe. Now captures $? separately, gates a "curl reachable" probe, and only falls back to empty body for the dependent jq parse. Smoke went 9 → 10 probes. Verified: vectord-lance 7/7 tests PASS, gateway cargo check clean, lance_smoke.sh 10/10 PASS against live gateway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
107 lines
4.6 KiB
TOML
107 lines
4.6 KiB
TOML
# Phase 39: Provider Registry
|
|
#
|
|
# Per-provider base_url, auth scheme, and default model. The gateway's
|
|
# /v1/chat dispatcher reads this file at boot to populate its provider
|
|
# table. Secrets (API keys) come from /etc/lakehouse/secrets.toml or
|
|
# environment variables — NEVER inline a key here.
|
|
#
|
|
# Adding a new provider:
|
|
# 1. New [[provider]] block with name, base_url, auth, default_model
|
|
# 2. Matching adapter at crates/aibridge/src/providers/<name>.rs
|
|
# implementing the ProviderAdapter trait (chat + embed + unload)
|
|
# 3. Route arm in crates/gateway/src/v1/mod.rs matching on `name`
|
|
# 4. Model-prefix routing hint in resolve_provider() if the provider
|
|
# uses an "<name>/..." model prefix (e.g. "openrouter/...")
|
|
|
|
[[provider]]
|
|
name = "ollama"
|
|
base_url = "http://localhost:11434"
|
|
auth = "none"
|
|
default_model = "qwen3.5:latest"
|
|
# Hot-path local inference. No bearer needed — direct to Ollama as of
|
|
# 2026-05-02 (Python sidecar's pass-through wrapper retired). Model
|
|
# names are bare (e.g. "qwen3.5:latest", not "ollama/qwen3.5:latest").
|
|
|
|
[[provider]]
|
|
name = "ollama_cloud"
|
|
base_url = "https://ollama.com"
|
|
auth = "bearer"
|
|
auth_env = "OLLAMA_CLOUD_KEY"
|
|
default_model = "deepseek-v3.2"
|
|
# Cloud-tier Ollama (Pro plan as of 2026-04-28). Key resolved from
|
|
# OLLAMA_CLOUD_KEY at gateway boot; Pro tier upgraded the account so
|
|
# rate limits + model access widen without a key change. Model-prefix
|
|
# routing: "cloud/<model>" auto-routes here. 39-model fleet now
|
|
# includes deepseek-v3.2, deepseek-v4-{flash,pro}, gemini-3-flash-
|
|
# preview, glm-{5,5.1}, kimi-k2.6, qwen3-coder-next.
|
|
# 2026-04-28: default upgraded gpt-oss:120b → deepseek-v3.2 (newest
|
|
# DeepSeek revision). NOTE: kimi-k2:1t is upstream-broken (HTTP 500
|
|
# on Ollama Pro probe 2026-04-28) — do not route to it. Use kimi-k2.6
|
|
# instead, which is what staffing_inference points at.
|
|
|
|
[[provider]]
|
|
name = "openrouter"
|
|
base_url = "https://openrouter.ai/api/v1"
|
|
auth = "bearer"
|
|
auth_env = "OPENROUTER_API_KEY"
|
|
auth_fallback_files = ["/home/profit/.env", "/root/llm_team_config.json"]
|
|
default_model = "x-ai/grok-4.1-fast"
|
|
# Multi-provider gateway. Covers Anthropic, Google, OpenAI, MiniMax,
|
|
# Qwen, Gemma, etc. Key resolved via crates/gateway/src/v1/openrouter.rs
|
|
# resolve_openrouter_key() — env first, then fallback files.
|
|
# Model-prefix routing: "openrouter/<vendor>/<model>" auto-routes here,
|
|
# prefix stripped before upstream call.
|
|
|
|
[[provider]]
|
|
name = "opencode"
|
|
base_url = "https://opencode.ai/zen/v1"
|
|
# Unified endpoint — covers BOTH Zen (pay-per-token Anthropic/OpenAI/
|
|
# Gemini frontier) AND Go (flat-sub Kimi/GLM/DeepSeek/Qwen/Minimax).
|
|
# Upstream bills per-model: Zen models hit Zen balance, Go models hit
|
|
# Go subscription cap. /zen/go/v1 is the Go-only sub-path (rejects
|
|
# Zen models), kept for reference but not used by this provider.
|
|
auth = "bearer"
|
|
auth_env = "OPENCODE_API_KEY"
|
|
default_model = "claude-opus-4-7"
|
|
# OpenCode (Zen + GO unified endpoint). One sk-* key reaches Claude
|
|
# Opus 4.7, GPT-5.5-pro, Gemini 3.1-pro, Kimi K2.6, DeepSeek, GLM,
|
|
# Qwen, plus 4 free-tier models. OpenAI-compatible Chat Completions
|
|
# at /v1/chat/completions. Model-prefix routing: "opencode/<name>"
|
|
# auto-routes here, prefix stripped before upstream call.
|
|
# Key file: /etc/lakehouse/opencode.env (loaded via systemd EnvironmentFile).
|
|
# Model catalog: curl -H "Authorization: Bearer ..." https://opencode.ai/zen/v1/models
|
|
# Note: /zen/go/v1 is the GO-only sub-path (Kimi/GLM/DeepSeek tier);
|
|
# /zen/v1 covers everything including Anthropic (which /zen/go/v1 rejects).
|
|
|
|
[[provider]]
|
|
name = "kimi"
|
|
base_url = "https://api.kimi.com/coding/v1"
|
|
auth = "bearer"
|
|
auth_env = "KIMI_API_KEY"
|
|
default_model = "kimi-for-coding"
|
|
# Direct Kimi For Coding provider. `api.kimi.com` is a SEPARATE account
|
|
# system from `api.moonshot.ai` and `api.moonshot.cn` — keys are NOT
|
|
# interchangeable. Used as a fallback when Ollama Cloud's kimi-k2.6 is
|
|
# unavailable and OpenRouter's `moonshotai/kimi-k2.6` is rate-limited.
|
|
# (Was `kimi-k2:1t` here pre-2026-05-03 — that model is upstream-broken
|
|
# and removed from operator guidance.)
|
|
# Model id: `kimi-for-coding` (kimi-k2.6 underneath).
|
|
# Key file: /etc/lakehouse/kimi.env (loaded via systemd EnvironmentFile).
|
|
# Model-prefix routing: "kimi/<model>" auto-routes here, prefix stripped.
|
|
|
|
# Planned (Phase 40 long-horizon — adapters not yet shipped):
|
|
#
|
|
# [[provider]]
|
|
# name = "gemini"
|
|
# base_url = "https://generativelanguage.googleapis.com/v1beta"
|
|
# auth = "api_key_query"
|
|
# auth_env = "GEMINI_API_KEY"
|
|
# default_model = "gemini-2.0-flash"
|
|
#
|
|
# [[provider]]
|
|
# name = "claude"
|
|
# base_url = "https://api.anthropic.com/v1"
|
|
# auth = "x_api_key"
|
|
# auth_env = "ANTHROPIC_API_KEY"
|
|
# default_model = "claude-3-5-sonnet-latest"
|