diff --git a/docs/CONTROL_PLANE_PRD.md b/docs/CONTROL_PLANE_PRD.md index 28d3bd6..b30a5a4 100644 --- a/docs/CONTROL_PLANE_PRD.md +++ b/docs/CONTROL_PLANE_PRD.md @@ -72,11 +72,11 @@ Ship each phase before starting the next. Each ends with green tests + docs upda --- -## Phase 40 — Routing & Policy Engine +## Phase 40 — Routing & Policy Engine + Observability Recovery -**Goal:** Replace hardcoded T1-T5 routing with a rules engine. Add Gemini + Claude adapters. Cost gating enforced at router level. +**Goal:** Replace hardcoded T1-T5 routing with a rules engine. Add Gemini + Claude adapters. Cost gating enforced at router level. **Reinstate Langfuse + Gitea MCP** — recovery of the observability + repo-ops stack J built previously (see `project_lost_stack` memory). -**Ships:** +**Ships — routing:** - `crates/aibridge/src/routing.rs` — rules engine (match on: task type, token budget, previous attempt failures, profile ID) - `config/routing.toml` — rules in TOML (human-editable, hot-reloadable) - `crates/aibridge/src/providers/gemini.rs` — `generativelanguage.googleapis.com` adapter @@ -84,15 +84,24 @@ Ship each phase before starting the next. Each ends with green tests + docs upda - Fallback chain support: if primary returns 5xx or times out, try next in chain - Cost gate: per-request budget + daily budget per-provider +**Ships — observability (was lost, now restored):** +- **Langfuse** self-hosted via Docker Compose. Single source of truth for every LLM call trace: prompt / response / tokens / cost / latency / provider / fallback chain / profile used. UI at `localhost:3000`. Keys in `/etc/lakehouse/secrets.toml`. +- `crates/aibridge/src/langfuse.rs` — thin fire-and-forget trace emitter. Every `/v1/chat` call spawns a background task that POSTs to `langfuse/api/public/ingestion`. Non-blocking: trace failures never affect response. +- **Langfuse → observer pipe** — `mcp-server/langfuse_bridge.ts` or similar. Polls Langfuse's trace API at interval, forwards completed traces to observer `:3800/event` with `source: "langfuse"`. KB now sees cost/latency deltas per model, not just outcome deltas. +- **Gitea MCP reconnect** — the MCP server binary still installed at `/home/profit/.bun/install/cache/gitea-mcp@0.0.10/` gets wired into `mcp-server/index.ts` tool registry. Agents can open PRs, comment on issues, list commits via named tools. Closes Phase 28's repo-ops gap. + **Gate:** - Rule like "local models for simple JSON emitters, cloud for reasoning" fires correctly by task type - Primary fails → fallback provider hits, response still matches `/v1/chat` shape - Daily budget hit → subsequent requests return 429 with clear retry-at header - `/v1/usage` reports per-provider breakdown +- **Every `/v1/chat` call appears in Langfuse UI** with correct prompt, response, latency, token count within 2 seconds of the request completing +- **Langfuse → observer pipe** delivers trace deltas to KB: `GET :3800/stats?source=langfuse` shows non-zero count after a few scenarios run +- **Gitea MCP tools callable** — `list_prs`, `open_pr`, `comment_on_issue` exposed in `mcp-server/index.ts`, verifiable via a quick agent scenario -**Non-goals:** Retrieval Profile split (Phase 41), Truth Layer (Phase 42). +**Non-goals:** Retrieval Profile split (Phase 41), Truth Layer (Phase 42). Langfuse self-hosted UI customization / SSO. -**Risk:** Medium. Multi-provider auth + cost tracking is cross-cutting. Mitigation: every provider call wrapped in a single `dispatch()` function, all observability flows through there. +**Risk:** Medium. Multi-provider auth + cost tracking is cross-cutting; Langfuse adds 4-5 Docker containers (PostgreSQL, ClickHouse, Redis, web, worker). Mitigation: every provider call wrapped in a single `dispatch()` function so observability flows through one point; Langfuse Docker Compose is their supported deployment path, well-tested. ---