lakehouse

profit/lakehouse

Fork 0

Commit Graph

Author	SHA1	Message	Date
profit	42a11d35cd	Phase 39 (first slice): Ollama Cloud adapter on /v1/chat Second provider wired. /v1/chat now routes by optional `provider` field: default "ollama" hits local via sidecar, "ollama_cloud" (or "cloud") hits ollama.com/api/generate directly with Bearer auth. Key sourced at gateway startup from OLLAMA_CLOUD_KEY env, then /root/llm_team_config.json (providers.ollama_cloud.api_key), then OLLAMA_CLOUD_API_KEY env. Config source matches LLM Team convention. Shape-identical to scenario.ts::generateCloud — same endpoint, same body, same Bearer auth. Cloud path bypasses sidecar entirely (sidecar is local-only by design, mirrors TS agent.ts). Changes: - crates/gateway/src/v1/ollama_cloud.rs (new, 130 LOC) — reqwest client, resolve_cloud_key(), chat() adapter, CloudGenerateBody / CloudGenerateResponse wire shapes - crates/gateway/src/v1/ollama.rs — flatten_messages_public() re-export so sibling adapters reuse the shape collapse - crates/gateway/src/v1/mod.rs — provider field on ChatRequest, dispatch match in chat() handler, ollama_cloud_key on V1State - crates/gateway/src/main.rs — resolves cloud key at startup, logs which source provided it - crates/gateway/Cargo.toml — reqwest 0.12 with rustls-tls Verified end-to-end after restart: - provider=ollama → qwen3.5:latest local (~400ms, Phase 38 unchanged) - provider=ollama_cloud + model=gpt-oss:120b → real 225-word technical response in 5.4s, 313 tokens Tests: 9/9 green (7 from Phase 38 + 2 new for cloud body serialization and key resolver shape). Not in this slice: trait extraction (full Phase 39 scope adds ProviderAdapter trait + OpenRouter adapter + fallback chain logic). These land next with Phase 40 routing engine on top. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:57:42 -05:00
profit	8cbbd0ef70	Phase 38 fix: default think=false on /v1/chat Live-test caught the Phase 21 thinking-model trap on first call. qwen3.5 with max_tokens=50 and default think behavior burned all 50 tokens on hidden reasoning; visible content was "". completion_tokens exactly matching max_tokens was the tell. Adapter now defaults think: Some(false) matching scenario.ts hot-path discipline. Callers that want reasoning (overseers, T3+) opt in via a non-OpenAI `think: true` extension field on the request. Verified end-to-end after restart: - "Lakehouse supports ACID and raw data." (5 words, 516ms) - "tokio\nasync-std\nsmol" (3 Rust crates, 391ms) - /v1/usage accumulates across calls (2 req / 95 total tokens) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:50:09 -05:00
profit	4cb405bb42	Phase 38: Universal API skeleton — /v1/chat, /v1/usage, /v1/sessions First slice of the control-plane pivot. OpenAI-compatible surface over the existing aibridge → Ollama path. Additive — no existing routes touched. All 7 unit tests green, release build clean. What ships: - crates/gateway/src/v1/mod.rs — router, V1State (ai_client + Usage counter), ChatRequest/ChatResponse/Message/UsageBlock types, handlers for /chat, /usage, /sessions. OpenAI-compatible field shapes: {model, messages[{role,content}], temperature?, max_tokens?, stream?} - crates/gateway/src/v1/ollama.rs — shape adapter. Flattens messages into (system, prompt), calls aibridge.generate, unwraps response back into OpenAI /v1/chat shape. Prefers sidecar-reported tokens; falls back to chars/4 ceiling estimate matching Phase 21 convention. - crates/gateway/src/main.rs — one new mod, one .nest("/v1", ...) Tests (7/7): - chat_request_parses_openai_shape - chat_request_accepts_minimal - usage_counter_default_is_zero - flatten_separates_system_from_turns - flatten_concatenates_multiple_system_messages - flatten_with_no_system_returns_empty_system - estimate_tokens_chars_div_4_ceiling Not in this phase (per CONTROL_PLANE_PRD.md): streaming, tool calls, session state, multi-provider, fallback chain, cost gating. All land in Phases 39-44. Next: live-test POST /v1/chat after gateway restart, then migrate bot/propose.ts off direct sidecar calls to prove the loop end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:47:15 -05:00

Author

SHA1

Message

Date

profit

42a11d35cd

Phase 39 (first slice): Ollama Cloud adapter on /v1/chat

Second provider wired. /v1/chat now routes by optional `provider`
field: default "ollama" hits local via sidecar, "ollama_cloud"
(or "cloud") hits ollama.com/api/generate directly with Bearer auth.
Key sourced at gateway startup from OLLAMA_CLOUD_KEY env, then
/root/llm_team_config.json (providers.ollama_cloud.api_key), then
OLLAMA_CLOUD_API_KEY env. Config source matches LLM Team convention.

Shape-identical to scenario.ts::generateCloud — same endpoint, same
body, same Bearer auth. Cloud path bypasses sidecar entirely (sidecar
is local-only by design, mirrors TS agent.ts).

Changes:
- crates/gateway/src/v1/ollama_cloud.rs (new, 130 LOC) — reqwest
  client, resolve_cloud_key(), chat() adapter, CloudGenerateBody /
  CloudGenerateResponse wire shapes
- crates/gateway/src/v1/ollama.rs — flatten_messages_public()
  re-export so sibling adapters reuse the shape collapse
- crates/gateway/src/v1/mod.rs — provider field on ChatRequest,
  dispatch match in chat() handler, ollama_cloud_key on V1State
- crates/gateway/src/main.rs — resolves cloud key at startup,
  logs which source provided it
- crates/gateway/Cargo.toml — reqwest 0.12 with rustls-tls

Verified end-to-end after restart:
- provider=ollama → qwen3.5:latest local (~400ms, Phase 38 unchanged)
- provider=ollama_cloud + model=gpt-oss:120b → real 225-word
  technical response in 5.4s, 313 tokens

Tests: 9/9 green (7 from Phase 38 + 2 new for cloud body serialization
and key resolver shape).

Not in this slice: trait extraction (full Phase 39 scope adds
ProviderAdapter trait + OpenRouter adapter + fallback chain logic).
These land next with Phase 40 routing engine on top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-22 02:57:42 -05:00

profit

8cbbd0ef70

Phase 38 fix: default think=false on /v1/chat

Live-test caught the Phase 21 thinking-model trap on first call.
qwen3.5 with max_tokens=50 and default think behavior burned all 50
tokens on hidden reasoning; visible content was "". completion_tokens
exactly matching max_tokens was the tell.

Adapter now defaults think: Some(false) matching scenario.ts hot-path
discipline. Callers that want reasoning (overseers, T3+) opt in via
a non-OpenAI `think: true` extension field on the request.

Verified end-to-end after restart:
- "Lakehouse supports ACID and raw data." (5 words, 516ms)
- "tokio\nasync-std\nsmol" (3 Rust crates, 391ms)
- /v1/usage accumulates across calls (2 req / 95 total tokens)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-22 02:50:09 -05:00

profit

4cb405bb42

Phase 38: Universal API skeleton — /v1/chat, /v1/usage, /v1/sessions

First slice of the control-plane pivot. OpenAI-compatible surface
over the existing aibridge → Ollama path. Additive — no existing
routes touched. All 7 unit tests green, release build clean.

What ships:
- crates/gateway/src/v1/mod.rs — router, V1State (ai_client + Usage
  counter), ChatRequest/ChatResponse/Message/UsageBlock types, handlers
  for /chat, /usage, /sessions. OpenAI-compatible field shapes:
  {model, messages[{role,content}], temperature?, max_tokens?, stream?}
- crates/gateway/src/v1/ollama.rs — shape adapter. Flattens messages
  into (system, prompt), calls aibridge.generate, unwraps response
  back into OpenAI /v1/chat shape. Prefers sidecar-reported tokens;
  falls back to chars/4 ceiling estimate matching Phase 21 convention.
- crates/gateway/src/main.rs — one new mod, one .nest("/v1", ...)

Tests (7/7):
- chat_request_parses_openai_shape
- chat_request_accepts_minimal
- usage_counter_default_is_zero
- flatten_separates_system_from_turns
- flatten_concatenates_multiple_system_messages
- flatten_with_no_system_returns_empty_system
- estimate_tokens_chars_div_4_ceiling

Not in this phase (per CONTROL_PLANE_PRD.md): streaming, tool calls,
session state, multi-provider, fallback chain, cost gating. All
land in Phases 39-44.

Next: live-test POST /v1/chat after gateway restart, then migrate
bot/propose.ts off direct sidecar calls to prove the loop end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-22 02:47:15 -05:00

3 Commits