4 Commits

Author SHA1 Message Date
profit
75a0f424ef Phase 40 (early): Langfuse tracing on /v1/chat — observability recovery
The lost stack J flagged was partly already present: Langfuse
container has been running 2 days with the staffing project, SDK
installed, mcp-server tracing gw:/* routes. What was missing was
Rust-side /v1/chat emission — the new Phase 38/39 code bypassed
Langfuse entirely.

This commit bridges it. Fire-and-forget HTTP POST to
http://localhost:3001/api/public/ingestion (batch {trace-create +
generation-create}) on every chat call. Non-blocking — spawned
tokio task, response latency unaffected. Trace failures log warn
and drop, never propagate.

Verified end-to-end after restart:
- Log line "v1: Langfuse tracing enabled" at startup
- /v1/chat local (qwen3.5:latest) → v1.chat:ollama trace appears
  with lat=0.41s, 24+6 tokens
- /v1/chat cloud (gpt-oss:120b) → v1.chat:ollama_cloud trace appears
  with lat=1.87s, 73+87 tokens
- mcp-server's existing gw:/log + gw:/intelligence/* traces
  continue to flow into the same project unchanged

Files:
- crates/gateway/src/v1/langfuse_trace.rs (new, 195 LOC) — thin
  client, no SDK. reqwest Basic Auth. ChatTrace payload + event
  serializer. from_env_or_defaults() resolver matches
  mcp-server/tracing.ts conventions (pk-lf-staffing / sk-lf-
  staffing-secret / localhost:3001)
- crates/gateway/src/v1/mod.rs — V1State.langfuse field, emission
  after successful provider call (post-dispatch, pre-usage-update)
- crates/gateway/src/main.rs — resolve + log at startup

Tests: 12/12 green (9 prior + 3 for langfuse_trace: ingestion-batch
serialization, uuid generator uniqueness, env resolver shape).

Recovered piece #1 of 3 from the lost-stack narrative. Still open:
- Langfuse → observer :3800 pipe (Phase 40 mid-deliverable)
- Gitea MCP reconnect in mcp-server/index.ts (Phase 40 late)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 03:04:28 -05:00
profit
42a11d35cd Phase 39 (first slice): Ollama Cloud adapter on /v1/chat
Second provider wired. /v1/chat now routes by optional `provider`
field: default "ollama" hits local via sidecar, "ollama_cloud"
(or "cloud") hits ollama.com/api/generate directly with Bearer auth.
Key sourced at gateway startup from OLLAMA_CLOUD_KEY env, then
/root/llm_team_config.json (providers.ollama_cloud.api_key), then
OLLAMA_CLOUD_API_KEY env. Config source matches LLM Team convention.

Shape-identical to scenario.ts::generateCloud — same endpoint, same
body, same Bearer auth. Cloud path bypasses sidecar entirely (sidecar
is local-only by design, mirrors TS agent.ts).

Changes:
- crates/gateway/src/v1/ollama_cloud.rs (new, 130 LOC) — reqwest
  client, resolve_cloud_key(), chat() adapter, CloudGenerateBody /
  CloudGenerateResponse wire shapes
- crates/gateway/src/v1/ollama.rs — flatten_messages_public()
  re-export so sibling adapters reuse the shape collapse
- crates/gateway/src/v1/mod.rs — provider field on ChatRequest,
  dispatch match in chat() handler, ollama_cloud_key on V1State
- crates/gateway/src/main.rs — resolves cloud key at startup,
  logs which source provided it
- crates/gateway/Cargo.toml — reqwest 0.12 with rustls-tls

Verified end-to-end after restart:
- provider=ollama → qwen3.5:latest local (~400ms, Phase 38 unchanged)
- provider=ollama_cloud + model=gpt-oss:120b → real 225-word
  technical response in 5.4s, 313 tokens

Tests: 9/9 green (7 from Phase 38 + 2 new for cloud body serialization
and key resolver shape).

Not in this slice: trait extraction (full Phase 39 scope adds
ProviderAdapter trait + OpenRouter adapter + fallback chain logic).
These land next with Phase 40 routing engine on top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 02:57:42 -05:00
profit
8cbbd0ef70 Phase 38 fix: default think=false on /v1/chat
Live-test caught the Phase 21 thinking-model trap on first call.
qwen3.5 with max_tokens=50 and default think behavior burned all 50
tokens on hidden reasoning; visible content was "". completion_tokens
exactly matching max_tokens was the tell.

Adapter now defaults think: Some(false) matching scenario.ts hot-path
discipline. Callers that want reasoning (overseers, T3+) opt in via
a non-OpenAI `think: true` extension field on the request.

Verified end-to-end after restart:
- "Lakehouse supports ACID and raw data." (5 words, 516ms)
- "tokio\nasync-std\nsmol" (3 Rust crates, 391ms)
- /v1/usage accumulates across calls (2 req / 95 total tokens)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 02:50:09 -05:00
profit
4cb405bb42 Phase 38: Universal API skeleton — /v1/chat, /v1/usage, /v1/sessions
First slice of the control-plane pivot. OpenAI-compatible surface
over the existing aibridge → Ollama path. Additive — no existing
routes touched. All 7 unit tests green, release build clean.

What ships:
- crates/gateway/src/v1/mod.rs — router, V1State (ai_client + Usage
  counter), ChatRequest/ChatResponse/Message/UsageBlock types, handlers
  for /chat, /usage, /sessions. OpenAI-compatible field shapes:
  {model, messages[{role,content}], temperature?, max_tokens?, stream?}
- crates/gateway/src/v1/ollama.rs — shape adapter. Flattens messages
  into (system, prompt), calls aibridge.generate, unwraps response
  back into OpenAI /v1/chat shape. Prefers sidecar-reported tokens;
  falls back to chars/4 ceiling estimate matching Phase 21 convention.
- crates/gateway/src/main.rs — one new mod, one .nest("/v1", ...)

Tests (7/7):
- chat_request_parses_openai_shape
- chat_request_accepts_minimal
- usage_counter_default_is_zero
- flatten_separates_system_from_turns
- flatten_concatenates_multiple_system_messages
- flatten_with_no_system_returns_empty_system
- estimate_tokens_chars_div_4_ceiling

Not in this phase (per CONTROL_PLANE_PRD.md): streaming, tool calls,
session state, multi-provider, fallback chain, cost gating. All
land in Phases 39-44.

Next: live-test POST /v1/chat after gateway restart, then migrate
bot/propose.ts off direct sidecar calls to prove the loop end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 02:47:15 -05:00