new cmd/chatd on :3220 routes /v1/chat to the right provider based
on model-name prefix or :cloud suffix. closes the architectural gap
named in lakehouse.toml [models]: tiers map to model IDs, but until
phase 4 there was no service that could actually CALL those models
from go.
routing rules (registry.Resolve):
ollama/<m> → local Ollama (prefix stripped)
ollama_cloud/<m> → Ollama Cloud
<m>:cloud → Ollama Cloud (suffix variant — kimi-k2.6:cloud)
openrouter/<v>/<m> → OpenRouter (prefix stripped, OpenAI-compat)
opencode/<m> → OpenCode unified Zen+Go
kimi/<m> → Kimi For Coding (api.kimi.com/coding/v1)
bare names → local Ollama (default)
provider implementations:
- internal/chat/types.go Provider interface, Request/Response, errors
- internal/chat/registry.go prefix + :cloud suffix dispatch
- internal/chat/ollama.go local Ollama via /api/chat (think=false default)
- internal/chat/ollama_cloud.go Ollama Cloud via /api/generate (Bearer auth)
- internal/chat/openai_compat.go shared OpenAI Chat Completions for the
OpenRouter/OpenCode/Kimi family
- internal/chat/builder.go BuildRegistry from BuilderInput;
ResolveKey reads env then .env file fallback
config:
- ChatdConfig in internal/shared/config.go with bind, ollama_url,
per-provider key env names + .env fallback paths, timeout
- Gateway gains chatd_url + /v1/chat + /v1/chat/* routes
- lakehouse.toml [chatd] block with /etc/lakehouse/<provider>.env defaults
tests (19 in internal/chat):
- registry: prefix + :cloud + errors + telemetry + provider listing
- ollama: happy path + prefix strip + format=json + 500 mapping +
flatten_messages
- openai_compat: happy path + format=json + 429 mapping + zero-choices
think=false default in ollama + ollama_cloud — local hot path skips
reasoning, low-budget callers (the playbook_lift judge at max_tokens=10)
get direct answers instead of empty content + done_reason=length.
proven via chatd_smoke acceptance.
acceptance gate: scripts/chatd_smoke.sh — 6/6 PASS:
1. /v1/chat/providers lists exactly registered providers (1 in dev mode)
2. bare model → ollama default with content + token counts + latency
3. explicit ollama/<m> → prefix stripped at upstream
4. <m>:cloud without ollama_cloud registered → 404 (no silent fall-through)
5. unknown/<m> → falls through to default → upstream 502 (no prefix rewrite)
6. missing model field → 400
just verify: PASS (vet + 30 packages × short tests + 9 smokes).
chatd_smoke is a domain smoke (not in just verify, mirrors matrix /
observer / pathway pattern).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
105 lines
4.2 KiB
Go
105 lines
4.2 KiB
Go
// Package chat provides the LLM chat abstraction over multiple
|
|
// providers (local Ollama, Ollama Cloud, OpenRouter, OpenCode, Kimi).
|
|
//
|
|
// Architecture per docs/SPEC.md §1 (chatd) + the small-model pipeline
|
|
// vision (project_small_model_pipeline_vision.md):
|
|
//
|
|
// - Hot path runs on local Ollama via cheap `qwen3.5:latest` repeats.
|
|
// - Oversight tier reaches Ollama Cloud (Pro plan) for tasks needing
|
|
// more capacity (kimi-k2.6:cloud, qwen3-coder:480b, deepseek-v3.2).
|
|
// - Frontier escalation goes through OpenRouter (claude-opus-4-7,
|
|
// gpt-5, kimi-k2-0905) or OpenCode (free-tier opus-4-7) for
|
|
// blockers that need full-scope reasoning.
|
|
//
|
|
// Provider resolution is by model-name prefix (mirroring the Rust
|
|
// gateway's pattern from crates/gateway/src/v1/mod.rs):
|
|
//
|
|
// ollama/<model> → ollama (or bare names like "qwen3.5:latest")
|
|
// ollama_cloud/<model> → ollama_cloud
|
|
// openrouter/<vendor>/<m> → openrouter
|
|
// opencode/<model> → opencode
|
|
// kimi/<model> → kimi
|
|
//
|
|
// The prefix is stripped before the upstream call.
|
|
package chat
|
|
|
|
import (
|
|
"context"
|
|
"errors"
|
|
)
|
|
|
|
// Sentinel errors for the upper layers (gateway response codes,
|
|
// observability, retry decisions).
|
|
var (
|
|
// ErrProviderNotFound — model name didn't resolve to any registered
|
|
// provider. Maps to 404 at the HTTP layer.
|
|
ErrProviderNotFound = errors.New("provider not found for model")
|
|
|
|
// ErrProviderDisabled — the provider was registered but its key
|
|
// resolved empty. Maps to 503 at the HTTP layer (operator can fix
|
|
// by setting the env var; not a client bug).
|
|
ErrProviderDisabled = errors.New("provider disabled (no auth key)")
|
|
|
|
// ErrUpstream — the provider returned a non-2xx. Body included in
|
|
// the wrapped error message. Maps to 502.
|
|
ErrUpstream = errors.New("provider upstream error")
|
|
|
|
// ErrTimeout — provider call exceeded the configured timeout.
|
|
// Maps to 504.
|
|
ErrTimeout = errors.New("provider timeout")
|
|
)
|
|
|
|
// Message is one entry in a Chat request's conversation. Role is
|
|
// "system", "user", or "assistant" — matches OpenAI/Anthropic shapes.
|
|
type Message struct {
|
|
Role string `json:"role"`
|
|
Content string `json:"content"`
|
|
}
|
|
|
|
// Request is the provider-neutral chat request. Providers translate
|
|
// to their wire format (OpenAI Chat Completions, Ollama /api/chat,
|
|
// Ollama Cloud /api/generate, etc.). Format="json" asks the provider
|
|
// to return JSON-only output (constrained decoding when supported).
|
|
type Request struct {
|
|
Model string `json:"model"`
|
|
Messages []Message `json:"messages"`
|
|
Temperature float64 `json:"temperature"`
|
|
MaxTokens int `json:"max_tokens,omitempty"`
|
|
Format string `json:"format,omitempty"` // "" or "json"
|
|
Stream bool `json:"stream,omitempty"` // ignored in G0 — chatd is synchronous
|
|
}
|
|
|
|
// Response is the provider-neutral chat response. LatencyMs is filled
|
|
// by the dispatcher (Provider implementations don't track it).
|
|
// Provider names the resolved provider for telemetry / debug.
|
|
type Response struct {
|
|
Model string `json:"model"`
|
|
Content string `json:"content"`
|
|
InputTokens int `json:"input_tokens,omitempty"`
|
|
OutputTokens int `json:"output_tokens,omitempty"`
|
|
FinishReason string `json:"finish_reason,omitempty"`
|
|
Provider string `json:"provider"`
|
|
LatencyMs int64 `json:"latency_ms"`
|
|
}
|
|
|
|
// Provider is the contract every backend must implement. Implementations
|
|
// must be safe for concurrent calls — the dispatcher shares one
|
|
// instance across all incoming requests.
|
|
//
|
|
// Chat receives the full Request including the prefixed model name
|
|
// (e.g. "openrouter/anthropic/claude-opus-4-7"). Implementations
|
|
// strip their prefix before the upstream call.
|
|
type Provider interface {
|
|
// Name returns the provider's short identifier (matches the prefix
|
|
// in model names). Used for telemetry and Response.Provider.
|
|
Name() string
|
|
|
|
// Chat performs one round-trip. Should respect ctx cancellation.
|
|
Chat(ctx context.Context, req Request) (*Response, error)
|
|
|
|
// Available reports whether the provider is ready to serve. False
|
|
// means missing auth key or unreachable upstream — dispatcher
|
|
// returns ErrProviderDisabled for callers using this provider.
|
|
Available() bool
|
|
}
|