lakehouse

Author	SHA1	Message	Date
root	540a9a27ee	v1: accept OpenAI multimodal content shape (array-of-parts) Some checks failed lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts Modern OpenAI clients (pi-ai, openai SDK 6.x, langchain-js, the official agents) send `messages[].content` as an array of content parts: `[{type:"text", text:"..."}, {type:"image_url", ...}]`. Our gateway typed `content` as plain `String` and 422'd those calls. Fix: `Message.content` is now `serde_json::Value` so requests deserialize regardless of shape. `Message::text()` flattens content-parts arrays (concat'd `text` fields, non-text parts skipped) for places that need a plain string — Ollama prompt assembly, char counts, the assistant's own response synthesis. `Message::new_text()` constructs string-content messages without writing the wrapper at each call site. Forwarders (openrouter) clone content through verbatim so providers see exactly what the client sent. Verified end-to-end: Pi CLI (`pi --print --provider openrouter`) landed a clean 1902-token request through `/v1/chat/completions`, routed to OpenRouter as `openai/gpt-oss-120b:free`, response in 1.62s, Langfuse trace `v1.chat:openrouter` recorded with provider tag. Same path that any tool using the official openai SDK takes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:56:46 -05:00
profit	42a11d35cd	Phase 39 (first slice): Ollama Cloud adapter on /v1/chat Second provider wired. /v1/chat now routes by optional `provider` field: default "ollama" hits local via sidecar, "ollama_cloud" (or "cloud") hits ollama.com/api/generate directly with Bearer auth. Key sourced at gateway startup from OLLAMA_CLOUD_KEY env, then /root/llm_team_config.json (providers.ollama_cloud.api_key), then OLLAMA_CLOUD_API_KEY env. Config source matches LLM Team convention. Shape-identical to scenario.ts::generateCloud — same endpoint, same body, same Bearer auth. Cloud path bypasses sidecar entirely (sidecar is local-only by design, mirrors TS agent.ts). Changes: - crates/gateway/src/v1/ollama_cloud.rs (new, 130 LOC) — reqwest client, resolve_cloud_key(), chat() adapter, CloudGenerateBody / CloudGenerateResponse wire shapes - crates/gateway/src/v1/ollama.rs — flatten_messages_public() re-export so sibling adapters reuse the shape collapse - crates/gateway/src/v1/mod.rs — provider field on ChatRequest, dispatch match in chat() handler, ollama_cloud_key on V1State - crates/gateway/src/main.rs — resolves cloud key at startup, logs which source provided it - crates/gateway/Cargo.toml — reqwest 0.12 with rustls-tls Verified end-to-end after restart: - provider=ollama → qwen3.5:latest local (~400ms, Phase 38 unchanged) - provider=ollama_cloud + model=gpt-oss:120b → real 225-word technical response in 5.4s, 313 tokens Tests: 9/9 green (7 from Phase 38 + 2 new for cloud body serialization and key resolver shape). Not in this slice: trait extraction (full Phase 39 scope adds ProviderAdapter trait + OpenRouter adapter + fallback chain logic). These land next with Phase 40 routing engine on top. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:57:42 -05:00
profit	8cbbd0ef70	Phase 38 fix: default think=false on /v1/chat Live-test caught the Phase 21 thinking-model trap on first call. qwen3.5 with max_tokens=50 and default think behavior burned all 50 tokens on hidden reasoning; visible content was "". completion_tokens exactly matching max_tokens was the tell. Adapter now defaults think: Some(false) matching scenario.ts hot-path discipline. Callers that want reasoning (overseers, T3+) opt in via a non-OpenAI `think: true` extension field on the request. Verified end-to-end after restart: - "Lakehouse supports ACID and raw data." (5 words, 516ms) - "tokio\nasync-std\nsmol" (3 Rust crates, 391ms) - /v1/usage accumulates across calls (2 req / 95 total tokens) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:50:09 -05:00
profit	4cb405bb42	Phase 38: Universal API skeleton — /v1/chat, /v1/usage, /v1/sessions First slice of the control-plane pivot. OpenAI-compatible surface over the existing aibridge → Ollama path. Additive — no existing routes touched. All 7 unit tests green, release build clean. What ships: - crates/gateway/src/v1/mod.rs — router, V1State (ai_client + Usage counter), ChatRequest/ChatResponse/Message/UsageBlock types, handlers for /chat, /usage, /sessions. OpenAI-compatible field shapes: {model, messages[{role,content}], temperature?, max_tokens?, stream?} - crates/gateway/src/v1/ollama.rs — shape adapter. Flattens messages into (system, prompt), calls aibridge.generate, unwraps response back into OpenAI /v1/chat shape. Prefers sidecar-reported tokens; falls back to chars/4 ceiling estimate matching Phase 21 convention. - crates/gateway/src/main.rs — one new mod, one .nest("/v1", ...) Tests (7/7): - chat_request_parses_openai_shape - chat_request_accepts_minimal - usage_counter_default_is_zero - flatten_separates_system_from_turns - flatten_concatenates_multiple_system_messages - flatten_with_no_system_returns_empty_system - estimate_tokens_chars_div_4_ceiling Not in this phase (per CONTROL_PLANE_PRD.md): streaming, tool calls, session state, multi-provider, fallback chain, cost gating. All land in Phases 39-44. Next: live-test POST /v1/chat after gateway restart, then migrate bot/propose.ts off direct sidecar calls to prove the loop end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:47:15 -05:00

4 Commits