The Phase 44 PRD's "AiClient becomes a thin /v1/chat client" was a
chicken-and-egg problem: the gateway's own /v1/chat ollama_arm calls
AiClient.generate() to reach the sidecar. If AiClient unconditionally
routed through /v1/chat, gateway → /v1/chat → ollama → AiClient →
/v1/chat would loop forever.
Solution: opt-in routing.
- `AiClient::new(base_url)` — direct-sidecar, gateway-internal use
(gateway's own /v1/chat handlers, ollama::chat in mod.rs)
- `AiClient::new_with_gateway(base_url, gateway_url)` — routes
generate() through ${gateway_url}/v1/chat with provider="ollama"
so the call lands in /v1/usage + Langfuse traces
Shape translation in generate_via_gateway():
GenerateRequest {prompt, system, model, temperature, max_tokens, think}
→ /v1/chat {messages: [system?, user], provider:"ollama", ...}
/v1/chat response choices[0].message.content + usage.{prompt,completion}_tokens
→ GenerateResponse {text, model, tokens_evaluated, tokens_generated}
embed(), rerank(), and admin methods (health, unload_model, etc.) stay
direct-to-sidecar — no /v1/embed equivalent yet, no point round-trip.
Transitive migration: aibridge::continuation::generate_continuable
goes through TextGenerator::generate_text() → AiClient.generate(), so
every caller of generate_continuable inherits the routing decision
made at AiClient construction. Phase 21's continuation loop, hot-
path JSON emitters, etc. all gain observability for free when the
construction site opts in.
Verified end-to-end:
curl /v1/chat with the exact JSON shape AiClient sends
→ "PONG-AIBRIDGE", finish=stop, 27/7 tokens
/v1/usage after the call
→ requests=1, by_provider.ollama.requests=1, tokens tracked
Phase 44 part 3 (next):
- Migrate vectord's AiClient construction site so vectord modules
(rag, autotune, harness, refresh, supervisor, playbook_memory)
flow through /v1/chat. Currently the gateway's main.rs constructs
one AiClient via `new()` and shares it via V1State; vectord
inherits direct-sidecar transport. Migration requires constructing
a SEPARATE AiClient with `new_with_gateway` for vectord's state
bag (V1State.ai_client must stay direct to avoid the self-loop).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
Rust-first object storage system
Languages
TypeScript
38.4%
Rust
35.8%
HTML
13.9%
Python
7.8%
Shell
2.1%
Other
2%