Wires kimi-for-coding (Kimi K2.6 underneath) as a first-class /v1/chat
provider so consumers can target it via {provider:"kimi"} or model
prefix kimi/<model>. Bypasses the upstream-broken kimi-k2:1t on Ollama
Cloud and the rate-limited moonshotai/kimi-k2.6 path through OpenRouter.
Adapter shape mirrors openrouter.rs (OpenAI-compatible Chat Completions).
Differences from generic OpenAI providers:
- api.kimi.com is a SEPARATE account system from api.moonshot.ai and
api.moonshot.cn. sk-kimi-* keys are NOT interchangeable across them.
- Endpoint is User-Agent-gated to "approved" coding agents (Kimi CLI,
Claude Code, Roo Code, Kilo Code, ...). Requests from generic clients
return 403 access_terminated_error. Adapter sends User-Agent:
claude-code/1.0.0. Per Moonshot TOS this is a tampering-class action
that may result in seat suspension; J authorized 2026-04-27 with
awareness of the risk.
- kimi-for-coding is a reasoning model — reasoning_content counts
against max_tokens. Default 800-token budget yields empty visible
content with finish_reason=length. Code-review workloads need
max_tokens >= 1500.
- Default 600s upstream timeout (vs 180s for openrouter.rs) — code
audits with full file context legitimately take 3-5 minutes.
Override via KIMI_TIMEOUT_SECS env.
Key handling:
- /etc/lakehouse/kimi.env (0600 root) loaded via systemd EnvironmentFile
- KIMI_API_KEY env first, then file scrape as fallback
- /etc/systemd/system/lakehouse.service NOT included in this commit
(system file outside repo); operator must add EnvironmentFile=-
/etc/lakehouse/kimi.env to the lakehouse.service unit
NOT in scrum_master_pipeline LADDER. The 9-rung ladder is for
unattended automatic recovery; placing Kimi there would hammer a
TOS-gated endpoint with hostility-policy potential. Kimi is
addressable via /v1/chat for explicit invocations only — auditor
integration in a follow-up commit.
Verification:
cargo check -p gateway --tests compiles
curl /v1/chat provider=kimi 200 OK, content="PONG"
curl /v1/chat model="kimi/kimi-for-coding" 200 OK (prefix routing)
Kimi audit on distillation last-week 7/7 grounded findings
(reports/kimi/audit-last-week-full.md)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
77 lines
2.9 KiB
TOML
77 lines
2.9 KiB
TOML
# Phase 39: Provider Registry
|
|
#
|
|
# Per-provider base_url, auth scheme, and default model. The gateway's
|
|
# /v1/chat dispatcher reads this file at boot to populate its provider
|
|
# table. Secrets (API keys) come from /etc/lakehouse/secrets.toml or
|
|
# environment variables — NEVER inline a key here.
|
|
#
|
|
# Adding a new provider:
|
|
# 1. New [[provider]] block with name, base_url, auth, default_model
|
|
# 2. Matching adapter at crates/aibridge/src/providers/<name>.rs
|
|
# implementing the ProviderAdapter trait (chat + embed + unload)
|
|
# 3. Route arm in crates/gateway/src/v1/mod.rs matching on `name`
|
|
# 4. Model-prefix routing hint in resolve_provider() if the provider
|
|
# uses an "<name>/..." model prefix (e.g. "openrouter/...")
|
|
|
|
[[provider]]
|
|
name = "ollama"
|
|
base_url = "http://localhost:3200"
|
|
auth = "none"
|
|
default_model = "qwen3.5:latest"
|
|
# Hot-path local inference. No bearer needed — Python sidecar on
|
|
# localhost handles the Ollama API. Model names are bare
|
|
# (e.g. "qwen3.5:latest", not "ollama/qwen3.5:latest").
|
|
|
|
[[provider]]
|
|
name = "ollama_cloud"
|
|
base_url = "https://ollama.com"
|
|
auth = "bearer"
|
|
auth_env = "OLLAMA_CLOUD_KEY"
|
|
default_model = "gpt-oss:120b"
|
|
# Cloud-tier Ollama. Key resolved from OLLAMA_CLOUD_KEY env at gateway
|
|
# boot. Model-prefix routing: "cloud/<model>" auto-routes here
|
|
# (see gateway::v1::resolve_provider).
|
|
|
|
[[provider]]
|
|
name = "openrouter"
|
|
base_url = "https://openrouter.ai/api/v1"
|
|
auth = "bearer"
|
|
auth_env = "OPENROUTER_API_KEY"
|
|
auth_fallback_files = ["/home/profit/.env", "/root/llm_team_config.json"]
|
|
default_model = "openai/gpt-oss-120b:free"
|
|
# Multi-provider gateway. Covers Anthropic, Google, OpenAI, MiniMax,
|
|
# Qwen, Gemma, etc. Key resolved via crates/gateway/src/v1/openrouter.rs
|
|
# resolve_openrouter_key() — env first, then fallback files.
|
|
# Model-prefix routing: "openrouter/<vendor>/<model>" auto-routes here,
|
|
# prefix stripped before upstream call.
|
|
|
|
[[provider]]
|
|
name = "kimi"
|
|
base_url = "https://api.kimi.com/coding/v1"
|
|
auth = "bearer"
|
|
auth_env = "KIMI_API_KEY"
|
|
default_model = "kimi-for-coding"
|
|
# Direct Kimi For Coding provider. `api.kimi.com` is a SEPARATE account
|
|
# system from `api.moonshot.ai` and `api.moonshot.cn` — keys are NOT
|
|
# interchangeable. Used when Ollama Cloud's `kimi-k2:1t` is upstream-
|
|
# broken and OpenRouter's `moonshotai/kimi-k2.6` is rate-limited.
|
|
# Model id: `kimi-for-coding` (kimi-k2.6 underneath).
|
|
# Key file: /etc/lakehouse/kimi.env (loaded via systemd EnvironmentFile).
|
|
# Model-prefix routing: "kimi/<model>" auto-routes here, prefix stripped.
|
|
|
|
# Planned (Phase 40 long-horizon — adapters not yet shipped):
|
|
#
|
|
# [[provider]]
|
|
# name = "gemini"
|
|
# base_url = "https://generativelanguage.googleapis.com/v1beta"
|
|
# auth = "api_key_query"
|
|
# auth_env = "GEMINI_API_KEY"
|
|
# default_model = "gemini-2.0-flash"
|
|
#
|
|
# [[provider]]
|
|
# name = "claude"
|
|
# base_url = "https://api.anthropic.com/v1"
|
|
# auth = "x_api_key"
|
|
# auth_env = "ANTHROPIC_API_KEY"
|
|
# default_model = "claude-3-5-sonnet-latest"
|