3 Commits

Author SHA1 Message Date
root
bc698eb6da gateway: OpenCode (Zen + Go) provider adapter
Wires opencode.ai as a /v1/chat provider. One sk-* key reaches 40
models across Anthropic, OpenAI, Google, Moonshot, DeepSeek, Zhipu,
Alibaba, Minimax — billed against either the user's Zen balance
(pay-per-token premium models) or Go subscription (flat-rate
Kimi/GLM/DeepSeek/etc.). The unified /zen/v1 endpoint routes both;
upstream picks the billing tier based on model id.

Notable adapter quirks:

- Strip "opencode/" prefix on outbound (mirrors openrouter/kimi
  pattern). Caller can use {provider:"opencode", model:"X"} or
  {model:"opencode/X"}.
- Drop temperature for claude-*, gpt-5*, o1/o3/o4 models. Anthropic
  and OpenAI's reasoning lineage rejects temperature with 400
  "deprecated for this model". OCChatBody now serializes temperature
  as Option<f64> with skip_serializing_if so omitting it produces
  clean JSON.
- max_tokens.filter(|&n| n > 0) catches Some(0) — defensive after
  the same trap bit kimi.rs (empty env -> Number("") -> 0 -> 503).
- 600s default upstream timeout; reasoning models on big audit
  prompts legitimately take 3-5 min. Override OPENCODE_TIMEOUT_SECS.

Key handling:
- /etc/lakehouse/opencode.env (0600 root) loaded via systemd
  EnvironmentFile. Same pattern as kimi.env.
- OPENCODE_API_KEY env first, file scrape as fallback.

Verified end-to-end:
  opencode/claude-opus-4-7   -> "I'm Claude, made by Anthropic."
  opencode/kimi-k2.6         -> PONG-K26-GO
  opencode/deepseek-v4-pro   -> PONG-DS-V4
  opencode/glm-5.1           -> PONG-GLM
  opencode/minimax-m2.5-free -> PONG-FREE

Pricing reference (per audit @ ~14k in / 6k out):
  claude-opus-4-7   ~$0.22  (Zen)
  claude-haiku-4-5  ~$0.04  (Zen)
  gpt-5.5-pro       ~$1.50  (Zen)
  gemini-3-flash    ~$0.03  (Zen)
  kimi-k2.6 / glm / deepseek / qwen / minimax / mimo: covered by Go
  subscription ($10/mo, $60/mo cap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 06:40:55 -05:00
root
643dd2d520 gateway: direct Kimi For Coding provider adapter (api.kimi.com)
Wires kimi-for-coding (Kimi K2.6 underneath) as a first-class /v1/chat
provider so consumers can target it via {provider:"kimi"} or model
prefix kimi/<model>. Bypasses the upstream-broken kimi-k2:1t on Ollama
Cloud and the rate-limited moonshotai/kimi-k2.6 path through OpenRouter.

Adapter shape mirrors openrouter.rs (OpenAI-compatible Chat Completions).
Differences from generic OpenAI providers:

- api.kimi.com is a SEPARATE account system from api.moonshot.ai and
  api.moonshot.cn. sk-kimi-* keys are NOT interchangeable across them.
- Endpoint is User-Agent-gated to "approved" coding agents (Kimi CLI,
  Claude Code, Roo Code, Kilo Code, ...). Requests from generic clients
  return 403 access_terminated_error. Adapter sends User-Agent:
  claude-code/1.0.0. Per Moonshot TOS this is a tampering-class action
  that may result in seat suspension; J authorized 2026-04-27 with
  awareness of the risk.
- kimi-for-coding is a reasoning model — reasoning_content counts
  against max_tokens. Default 800-token budget yields empty visible
  content with finish_reason=length. Code-review workloads need
  max_tokens >= 1500.
- Default 600s upstream timeout (vs 180s for openrouter.rs) — code
  audits with full file context legitimately take 3-5 minutes.
  Override via KIMI_TIMEOUT_SECS env.

Key handling:
- /etc/lakehouse/kimi.env (0600 root) loaded via systemd EnvironmentFile
- KIMI_API_KEY env first, then file scrape as fallback
- /etc/systemd/system/lakehouse.service NOT included in this commit
  (system file outside repo); operator must add EnvironmentFile=-
  /etc/lakehouse/kimi.env to the lakehouse.service unit

NOT in scrum_master_pipeline LADDER. The 9-rung ladder is for
unattended automatic recovery; placing Kimi there would hammer a
TOS-gated endpoint with hostility-policy potential. Kimi is
addressable via /v1/chat for explicit invocations only — auditor
integration in a follow-up commit.

Verification:
  cargo check -p gateway --tests          compiles
  curl /v1/chat provider=kimi             200 OK, content="PONG"
  curl /v1/chat model="kimi/kimi-for-coding"  200 OK (prefix routing)
  Kimi audit on distillation last-week    7/7 grounded findings
                                          (reports/kimi/audit-last-week-full.md)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:35:58 -05:00
root
2f1b9c9768 phase-39+41: land promised artifacts — providers.toml, activation.rs, profiles/
Three PRD gaps closed in one coherent batch — all were cosmetic or
scaffold-shaped, now real files:

Phase 39 (PRD:57):
  + config/providers.toml — provider registry (name/base_url/auth/
    default_model) for ollama, ollama_cloud, openrouter. Commented
    stubs for gemini + claude pending adapter work. Secrets stay in
    /etc/lakehouse/secrets.toml or env, NEVER inline.

Phase 41 (PRD:115):
  + crates/vectord/src/activation.rs — ActivationTracker with the
    PRD-named single-flight guard ("refuse new activation if one is
    pending/running"). Per-profile granularity — activating A doesn't
    block B. 5 tests cover the full state machine. Handler body stays
    in service.rs for now; tracker usage integration is a follow-up.

Phase 41 (PRD:113):
  + crates/shared/src/profiles/ with 4 submodules:
      * execution.rs — `pub use crate::types::ModelProfile as
        ExecutionProfile` (backward-compat rename per PRD)
      * retrieval.rs — top_k, rerank_top_k, freshness cutoff,
        playbook boost, sensitivity-gate enforcement
      * memory.rs — playbook boost ceiling, history cap, doc
        staleness, auto-retire-on-failure
      * observer.rs — failure cluster size, alert cooldown, ring
        size, langfuse forwarding
    All fields `#[serde(default)]` so existing ModelProfile files
    load unchanged.

Still open from the same phases:
  - Gemini + Claude provider adapters (Phase 40 — 100-200 LOC each)
  - Full activate_profile handler extraction into activation.rs
    (Phase 41 — module-structure refactor)
  - Catalogd CRUD endpoints for retrieval/memory/observer profiles
    (Phase 41 — exists at list level, no create/update/delete yet)
  - truth/ repo-root directory for file-backed rules (Phase 42 —
    TOML loader + schema)
  - crates/validator crate (Phase 43 — full greenfield)

Workspace warnings still at 0. 5 new tests, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:32:40 -05:00