root 0efc7363c5 scrum 2026-04-30: 4 real fixes + 2 INFOs from cross-lineage review
3-lineage scrum (Opus 4.7 / Kimi K2.6 / Qwen3-coder) on today's wave
landed 4 real findings (2 BLOCK + 2 WARN) and 2 INFO touch-ups.
Verbatim verdicts + disposition table at:
  reports/scrum/_evidence/2026-04-30/

B-1 (BLOCK Opus + INFO Kimi convergent) — ResolveKey API:
  collapse from 3-arg (envVar, envFileName, envFilePath) to 2-arg
  (envVar, envFilePath). Pre-fix every chatd caller passed the env
  var name twice; if operator renamed *_key_env in lakehouse.toml
  while keeping the canonical KEY= line in the .env file, fallback
  silently missed.

B-2 (WARN Opus + WARN Kimi convergent) — handleProviders probe:
  drop the synthesize-then-Resolve probe; look up by name directly
  via Registry.Available(name). Prior probe synthesized "<name>/probe"
  model strings and routed through Resolve, fragile to any future
  routing rule (e.g. cloud-suffix special case).

B-3 (BLOCK Opus single — verified by trace + end-to-end probe) —
  OllamaCloud.Chat StripPrefix used "cloud" but registry routes
  "ollama_cloud/<m>". Result: upstream got the prefixed model name
  and 400'd. Smoke missed it because chatd_smoke runs without
  ollama_cloud registered. Now strips the right prefix; new
  TestOllamaCloud_StripsCorrectPrefix locks both prefix + suffix
  cases. Verified live: ollama_cloud/deepseek-v3.2 round-trips
  cleanly through the real ollama.com endpoint.

B-4 (WARN Opus single) — Ollama finishReason: read done_reason
  field instead of inferring from done bool alone. Newer Ollama
  reports done=true with done_reason="length" on truncation; the
  prior code mapped that to "stop" and lost the truncation signal
  the playbook_lift judge needs to retry. New
  TestFinishReasonFromOllama_PrefersDoneReason covers the fallback
  ladder.

INFOs:
- B-5: replace hand-rolled insertion sort in Registry.Names with
  sort.Strings (Opus called the "avoid sort import" comment a
  false economy — correct).
- A-1: clarify the playbook_lift.sh comment around -judge "" arg
  passing (Opus noted the comment said "env priority" but didn't
  reflect that the empty arg also passes through the Go driver's
  resolution chain).

False positives dismissed (3, documented in disposition.md):
- Kimi: TestMaybeDowngrade_WithConfigList wrong assertion (test IS
  correct per design — model excluded from weak list = strong = downgrade)
- Qwen: nil-deref claim (defensive code already handles nil)
- Opus: qwen3.5:latest doesn't exist on Ollama hub (true on the
  public hub but local install has it)

just verify: PASS. chatd_smoke 6/6 PASS. New regression tests:
3 (B-2, B-3, B-4 each get a focused test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:28:08 -05:00

145 lines
4.7 KiB
Go

// chatd is the LLM chat dispatcher service (Phase 4 — small-model
// pipeline tier abstraction). Routes POST /chat to the right
// provider based on the model-name prefix or :cloud suffix:
//
// ollama/<m> → local Ollama (no auth)
// ollama_cloud/<m> → Ollama Cloud (Bearer auth)
// <m>:cloud → Ollama Cloud (suffix variant)
// openrouter/<v>/<m> → OpenRouter (Bearer auth)
// opencode/<m> → OpenCode unified Zen+Go (Bearer auth)
// kimi/<m> → Kimi For Coding (Bearer auth)
// bare names → local Ollama (default)
//
// Provider keys come from env vars (or /etc/lakehouse/<provider>.env
// fallback files). Providers with empty keys stay unregistered, so
// requests for them 404 cleanly instead of 503-ing at call time.
//
// Routes:
//
// POST /chat — dispatch a chat request to the resolved provider
// GET /providers — list registered providers (telemetry / health)
// GET /health — readiness (always 200 — sub-providers are
// independently checked via /providers)
package main
import (
"encoding/json"
"flag"
"errors"
"log/slog"
"net/http"
"os"
"time"
"github.com/go-chi/chi/v5"
"git.agentview.dev/profit/golangLAKEHOUSE/internal/chat"
"git.agentview.dev/profit/golangLAKEHOUSE/internal/shared"
)
const maxRequestBytes = 4 << 20 // 4 MiB cap on /chat bodies
func main() {
configPath := flag.String("config", "lakehouse.toml", "path to TOML config")
flag.Parse()
cfg, err := shared.LoadConfig(*configPath)
if err != nil {
slog.Error("config", "err", err)
os.Exit(1)
}
timeout := time.Duration(cfg.Chatd.TimeoutSecs) * time.Second
registry := chat.BuildRegistry(chat.BuilderInput{
OllamaURL: cfg.Chatd.OllamaURL,
OllamaCloudKey: chat.ResolveKey(cfg.Chatd.OllamaCloudKeyEnv, cfg.Chatd.OllamaCloudKeyFile),
OpenRouterKey: chat.ResolveKey(cfg.Chatd.OpenRouterKeyEnv, cfg.Chatd.OpenRouterKeyFile),
OpenCodeKey: chat.ResolveKey(cfg.Chatd.OpenCodeKeyEnv, cfg.Chatd.OpenCodeKeyFile),
KimiKey: chat.ResolveKey(cfg.Chatd.KimiKeyEnv, cfg.Chatd.KimiKeyFile),
Timeout: timeout,
})
h := &handlers{registry: registry}
if err := shared.Run("chatd", cfg.Chatd.Bind, h.register, cfg.Auth); err != nil {
slog.Error("server", "err", err)
os.Exit(1)
}
}
type handlers struct {
registry *chat.Registry
}
func (h *handlers) register(r chi.Router) {
// Routes mirror what the gateway proxies: /v1/chat → /chat (POST)
// and /v1/chat/providers → /chat/providers (GET). Keeping providers
// under /chat/ avoids a separate /providers root route that would
// need its own gateway proxy entry.
r.Post("/chat", h.handleChat)
r.Get("/chat/providers", h.handleProviders)
}
func (h *handlers) handleChat(w http.ResponseWriter, r *http.Request) {
r.Body = http.MaxBytesReader(w, r.Body, maxRequestBytes)
defer r.Body.Close()
var req chat.Request
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, "invalid JSON: "+err.Error(), http.StatusBadRequest)
return
}
if req.Model == "" || len(req.Messages) == 0 {
http.Error(w, "model and messages are required", http.StatusBadRequest)
return
}
resp, err := h.registry.Chat(r.Context(), req)
if err != nil {
writeChatError(w, err)
return
}
writeJSON(w, http.StatusOK, resp)
}
// handleProviders lists registered providers + per-provider Available()
// status. Phase 4 scrum fix B-2: looks up by name directly, skipping
// the prior synthetic-model-name route through Resolve. The endpoint
// reports registry membership, not routing rules — the latter is
// covered by /v1/chat itself.
func (h *handlers) handleProviders(w http.ResponseWriter, _ *http.Request) {
names := h.registry.Names()
statuses := make(map[string]bool, len(names))
for _, n := range names {
statuses[n] = h.registry.Available(n)
}
writeJSON(w, http.StatusOK, map[string]any{
"providers": statuses,
})
}
// writeChatError maps chat sentinel errors to HTTP status codes.
// Unknown errors map to 500.
func writeChatError(w http.ResponseWriter, err error) {
switch {
case errors.Is(err, chat.ErrProviderNotFound):
http.Error(w, err.Error(), http.StatusNotFound)
case errors.Is(err, chat.ErrProviderDisabled):
http.Error(w, err.Error(), http.StatusServiceUnavailable)
case errors.Is(err, chat.ErrUpstream):
http.Error(w, err.Error(), http.StatusBadGateway)
case errors.Is(err, chat.ErrTimeout):
http.Error(w, err.Error(), http.StatusGatewayTimeout)
default:
slog.Error("chat", "err", err)
http.Error(w, "internal", http.StatusInternalServerError)
}
}
func writeJSON(w http.ResponseWriter, status int, body any) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(status)
if err := json.NewEncoder(w).Encode(body); err != nil {
slog.Error("encode", "err", err)
}
}