Direction shift 2026-04-22: docs/CONTROL_PLANE_PRD.md becomes the long-horizon architecture target. Existing Lakehouse (docs/PRD.md, Phases 0-37) is preserved as the reference implementation and first consumer. New 6-layer architecture: L1 Universal API /v1/chat /v1/usage /v1/sessions /v1/tools /v1/context L2 Routing & Policy Engine (rules, fallback chains, cost gating) L3 Provider Adapter Layer (Ollama + OpenRouter + Gemini + Claude) L4 Knowledge + Memory + Playbooks (already built) L5 Execution Loop (scenarios + bot/cycle.ts instances) L6 Observability + token accounting Phases 38-44 sequenced with detailed per-phase specs in the PRD. Current scope: staffing domain (synthetic workers_500k, contracts, emails, SMS, playbooks). DevOps (Terraform/Ansible) is long-horizon target — architecture-compatible but not current. Files added: - docs/CONTROL_PLANE_PRD.md — 6-layer architecture, Phase 38-44 sequencing with staffing-first Truth Layer + Validation pipeline - bot/ — manual-only PR bot scaffold. First consumer test-bed for /v1/chat (Phase 38). Mem0-aligned ADD/UPDATE/NOOP apply semantics; KB feedback loop reads prior cycles on same gap and injects into cloud prompt so bot cycles compound like scenario.ts runs do. - tests/multi-agent/run_stress.ts — the 6-task diverse stress test referenced in the previous commit but missing from its staging Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 KiB
PRD — Universal AI Control Plane
Status: Long-horizon architecture target as of 2026-04-22. Lakehouse Phases 0-37 (docs/PRD.md) are preserved as the reference implementation and first domain-specific consumer. Phases 38+ (control-plane layers) are sequenced below.
Current domain: staffing. The immediate proving ground is the staffing substrate already built — synthetic workers_500k, contracts, emails, SMS drafts, playbook memory. Everything Phase 38-44 ships is validated first against that domain. The DevOps / Terraform / Ansible framing from the original PRD draft stays as a long-horizon target — architecture-compatible but not in current scope. See §Long-horizon domains at the bottom.
Owner: J
Cross-read: docs/PRD.md for what's shipped (staffing + AI substrate, 13 crates, ~3M rows). This doc for the layered architecture those pieces now fit into.
Phase Sequencing (Phases 38-44)
Ship each phase before starting the next. Each ends with green tests + docs update.
| Phase | Layer | What ships | Est. LOC | Risk |
|---|---|---|---|---|
| 38 | Layer 1 skeleton | /v1/chat, /v1/usage, /v1/sessions routes forwarding to existing aibridge → Ollama. Bot migrates as first consumer. |
~400 | Low — additive, no existing routes touched |
| 39 | Layer 3 adapters | aibridge::ProviderAdapter trait; Ollama + one new (OpenRouter). /v1/chat routes by config. |
~500 | Low-medium |
| 40 | Layer 2 engine | Rules-based routing (config/routing.toml), fallback chains, cost gating. Add Gemini + Claude adapters. |
~600 | Medium |
| 41 | Profile split | Separate Retrieval / Memory / Execution / Observer profiles; Phase 17 backward-compat. Absorbs Phase 37 hot-swap-async. | ~300 | Medium |
| 42 | Truth Layer | New crates/truth; Terraform/Ansible schemas; /v1/context serves rules to router + observer. |
~700 | Medium |
| 43 | Validation pipeline | Syntax/lint/dry-run/policy gates per output type. Plugs into Layer 5 execution loop. | ~400 | Medium |
| 44 | Caller migration | All internal callers route through /v1/chat. Direct sidecar access deprecated. |
~200 | Low |
Total ≈3100 LOC. Phase 37 (hot-swap async) folds into Phase 41 — it's an Execution-Profile activation concern.
Phase 38 — Universal API Skeleton
Goal: OpenAI-compatible /v1/* surface exists and forwards to existing aibridge → Ollama. Nothing about multi-provider yet — just the SHAPE, so every downstream piece (adapters, routing, usage accounting) has a surface to plug into.
Ships:
crates/gateway/src/v1/mod.rs— router +/v1/chat,/v1/usage,/v1/sessionscrates/gateway/src/v1/ollama.rs— shape adapter (OpenAI chat ↔ existing aibridgeGenerateRequest)- One-line
nest("/v1", ...)incrates/gateway/src/main.rs - Unit test:
POST /v1/chatroundtrips through mocked provider
Gate:
curl -X POST localhost:3100/v1/chat -d '{"model":"qwen3.5:latest","messages":[{"role":"user","content":"hi"}]}'returns valid OpenAI-shape response.GET localhost:3100/v1/usagereturns{requests, prompt_tokens, completion_tokens, total_tokens}.GET localhost:3100/v1/sessionsreturns{data:[]}(stub; real impl Phase 41).cargo test -p gatewaygreen.
Non-goals (explicit): streaming, tool calls, function calling, session state, multi-provider, fallback, cost gating.
Risk: Low — additive, doesn't touch existing routes. Worst case: /v1/* returns 502 and we fix the adapter. No existing caller affected.
Phase 39 — Provider Adapter Refactor
Goal: aibridge grows a ProviderAdapter trait. Ollama implementation wraps existing sidecar code. One new provider lands as proof: OpenRouter (simplest — it's OpenAI-compatible, so adapter is mostly passthrough).
Ships:
crates/aibridge/src/provider.rs—ProviderAdaptertrait withchat()+embed()+unload()methodscrates/aibridge/src/providers/ollama.rs— existing sidecar code moved behind the traitcrates/aibridge/src/providers/openrouter.rs— new, HTTP client toopenrouter.ai/api/v1/chat/completionsconfig/providers.toml— provider registry (name, base_url, auth, default_models)/v1/chatroutes bymodelfield: prefix match (e.g.openrouter/anthropic/claude-3.5-sonnet→ OpenRouter; bare names → Ollama)
Gate:
/v1/chatwithmodel: "qwen3.5:latest"hits Ollama → green/v1/chatwithmodel: "openrouter/openai/gpt-4o-mini"hits OpenRouter (key from secrets.toml) → green- Neither call leaks provider-specific fields upward. Response is always the
/v1/chatshape.
Non-goals: Fallback chain (Phase 40), cost gating (Phase 40), Gemini/Claude adapters (Phase 40).
Risk: Low-medium. The trait extraction is mostly a rearrange; OpenRouter is thin. Biggest risk is secret-loading conventions — SecretsProvider is already in place, so reuse that path.
Phase 40 — Routing & Policy Engine
Goal: Replace hardcoded T1-T5 routing with a rules engine. Add Gemini + Claude adapters. Cost gating enforced at router level.
Ships:
crates/aibridge/src/routing.rs— rules engine (match on: task type, token budget, previous attempt failures, profile ID)config/routing.toml— rules in TOML (human-editable, hot-reloadable)crates/aibridge/src/providers/gemini.rs—generativelanguage.googleapis.comadaptercrates/aibridge/src/providers/claude.rs—api.anthropic.comadapter- Fallback chain support: if primary returns 5xx or times out, try next in chain
- Cost gate: per-request budget + daily budget per-provider
Gate:
- Rule like "local models for simple JSON emitters, cloud for reasoning" fires correctly by task type
- Primary fails → fallback provider hits, response still matches
/v1/chatshape - Daily budget hit → subsequent requests return 429 with clear retry-at header
/v1/usagereports per-provider breakdown
Non-goals: Retrieval Profile split (Phase 41), Truth Layer (Phase 42).
Risk: Medium. Multi-provider auth + cost tracking is cross-cutting. Mitigation: every provider call wrapped in a single dispatch() function, all observability flows through there.
Phase 41 — Profile System Expansion (+ Phase 37 hot-swap async folded in)
Goal: The existing ModelProfile (Phase 17) becomes ExecutionProfile. Three new profile types land alongside. Profile activation is async — returns job_id, work runs in background (Phase 37 deliverable).
Ships:
crates/shared/src/profiles/—ExecutionProfile,RetrievalProfile,MemoryProfile,ObserverProfilecrates/catalogdgains per-profile-type CRUD endpoints (/catalog/profiles/retrieval, etc.)crates/vectord/src/activation.rs—ActivationTrackerwith background-job pattern (Phase 37 content)POST /vectors/profile/{id}/activatereturns 202 + job_id, polling atGET /vectors/profile/jobs/{id}- Single-flight guard: refuse new activation if one is pending/running
- Backward compat:
ModelProfilestill loads, aliased to ExecutionProfile
Gate:
- Activate a profile → returns 202 in <100ms → job completes in background →
/vectors/profile/jobs/{id}shows progress + final report tests/multi-agent/run_stress.tsPhase 3 (hot-swap stress) passes (was SKIPPED)- Retrieval + Memory + Observer profiles can be created independently of Execution profile
Non-goals: Truth Layer (Phase 42), validation (Phase 43), caller migration (Phase 44).
Risk: Medium. Schema change + async refactor. Mitigation: #[serde(default)] on all new fields; existing profiles load unchanged.
Phase 42 — Truth Layer (staffing rules first)
Goal: New crates/truth crate holds immutable task-class constraints. Served via /v1/context to router and observer. No layer can override truth. Staffing rules ship first; Terraform/Ansible rule shapes are scaffolded but unpopulated until the long-horizon phase.
Ships:
crates/truth/src/lib.rs—TruthStorewith schema loading (TOML/YAML rules)crates/truth/src/staffing.rs— staffing rule shapes:- Worker eligibility (active status, not blacklisted for client, geo match, role match, availability window)
- Contract invariants (deadline present, role/count/city/state populated, budget_per_hour_max ≥ 0)
- PII handling (redaction rules on fields tagged
PIIbefore any cloud call — covers existing Phase 10 sensitivity tags) - Client blacklist enforcement (auto-applied before any fill proposal)
- Fill requirements (endorsed_names count matches target_count, no duplicate worker_ids within a single fill)
crates/truth/src/devops.rs— scaffold only: empty rule struct for Terraform/Ansible, populated in the long-horizon phase. Keeps the dispatcher signature stable so no refactor needed later.truth/dir at repo root — rule files, versioned in git/v1/contextendpoint — returns applicable rules for a task class (staffing.fill,staffing.rescue,staffing.sms_draft, etc.)- Router consults truth before dispatching: if task violates a rule, hard-fail with structured error + rule citation (matches existing Phase 13 access-control pattern)
Gate:
- Submit a fill proposal where a worker is client-blacklisted — router returns 422 + rule citation, no cloud tokens burned
- Submit a fill with
endorsed_names.length != target_count— 422 before dispatch - Observer cannot promote a correction that violates truth (rejected at router gate)
- PII redaction verified: SSN / salary fields stripped from prompts before cloud calls
- Truth reload is explicit (no file-watch hot reload in this phase)
Non-goals: Validation execution (Phase 43), policy learning / evolution (deferred), actual Terraform/Ansible rules (long-horizon phase).
Risk: Medium. Domain-specific rule enumeration takes discovery — start with a minimal rule set (5-10 staffing rules, derived from existing Phase 10-13 work) and grow organically as real fills surface edge cases.
Phase 43 — Validation Pipeline (staffing outputs first)
Goal: Staffing outputs run through schema / completeness / consistency / policy gates. Plug into Layer 5 execution loop — failure triggers observer-correction iteration. This is where the 0→85% pattern reproduces on real staffing tasks — the iteration loop with validation in place is what made small models successful.
Ships:
crates/validator/src/lib.rs—Validatortrait:validate(artifact) -> Result<Report, ValidationError>+Artifactenum over output typescrates/validator/src/staffing/fill.rs— fill-proposal validator:- Schema compliance (propose_done shape matches
{fills: [{candidate_id, name}]}) - Completeness (endorsed count == target_count)
- Worker existence (every candidate_id present in workers_500k via SQL lookup)
- Status check (every worker has status=active, not_on_client_blacklist)
- Geo/role match (worker city/state/role matches contract)
- Schema compliance (propose_done shape matches
crates/validator/src/staffing/email.rs— generated email/SMS drafts:- Schema (TO/BODY fields present)
- Length (SMS ≤ 160 chars; email subject ≤ 78 chars)
- PII absence (no SSN / salary leaked into outgoing text)
- Worker-name consistency (name in message matches worker record)
crates/validator/src/staffing/playbook.rs— sealed playbook:- Operation format (
fill: Role xN in City, ST) - endorsed_names non-empty, ≤ target_count × 2
- fingerprint populated (Phase 25 validity window requirement)
- Operation format (
crates/validator/src/devops.rs— scaffold only: stubbed Terraform/Ansible validators (terraform validate,ansible-lint) for the long-horizon phase- Task execution loop in gateway: generate → validate → if fail, observer correction + retry (bounded by
max_iterations=3) - Validation results logged to observer (
data/_observer/ops.jsonl) + KB (data/_kb/outcomes.jsonl)
Gate:
- Generate a fill proposal → validator catches a phantom worker_id → observer + cloud rescue propose correction → retry → green. This reproduces the 0→85% pattern on the live staffing pipeline.
/v1/usageshows iteration count per task, provider fallback chain, and tokens-per-iteration. Cost attribution per task class visible.- Reproduces 14× citation-lift finding from Phase 19 refinement on similar geos after validation gates.
Non-goals: Caller migration (Phase 44), Terraform/Ansible wired validation (long-horizon).
Risk: Medium. Validation shapes have to match actual executor outputs; mitigation is using real scenario runs as test fixtures (we have ~100 of them in tests/multi-agent/playbooks/).
Phase 44 — Caller Migration + Direct-Provider Deprecation
Goal: Every internal LLM caller routes through /v1/chat. Direct sidecar / direct Ollama / direct OpenAI calls are removed or explicitly deprecated with a warning.
Ships:
aibridge::AiClientbecomes a thin/v1/chatclient (was direct-to-sidecar)crates/vectord::agent(autotune): routes through/v1crates/vectord::autotune: routes through/v1tests/multi-agent/agent.ts::generate(): routes through/v1bot/propose.ts: routes through/v1(already proposed as Phase 38's test consumer, formalized here)- Lint rule / grep pre-commit hook: no
fetch.*:3200/generateoutside the provider adapters
Gate:
grep -r "localhost:3200/generate\|/api/generate"returns only adapter files + deprecation shims/v1/usageaccounts for every LLM call in the system within a 1-minute window after hitting a fresh scenario- Full scenario passes end-to-end without any caller bypassing
/v1/*
Non-goals: New features. This phase is purely mechanical migration.
Risk: Low. Mechanical. Tests catch regressions.
Long-horizon domains (not in current phase sequence)
The architecture was drafted with DevOps execution (Terraform, Ansible) as the eventual target. That remains aspirational, not current scope — we don't start wiring terraform validate / ansible-lint until the staffing domain proves the six-layer architecture at scale.
What "proves at scale" means concretely:
- Phases 38-44 all shipped against staffing, green tests
- Live staffing pipeline handles multiple concurrent contracts with emails + SMS + indexed playbooks via
/v1/* - Observed iteration success lift (the 0→85% pattern) reproduced on varied staffing scenarios, not just the original proof-of-concept
- Token + cost accounting stable across provider fallback chains under real load
- Truth Layer rules prevent real fill errors before cloud burn (not just theoretical)
When staffing hits that bar, the DevOps domain lights up by:
- Populating
crates/truth/src/devops.rswith real Terraform/Ansible rule shapes - Populating
crates/validator/src/devops.rswithterraform validate/ansible-lintshell-out - Adding DevOps task classes to
/v1/contextrule lookup - No architectural changes needed — the dispatcher, router, and execution loop stay identical.
Other candidate long-horizon domains (same pattern):
- Code generation tasks (validation via
cargo check/bun test) - SQL query generation (validation via EXPLAIN + schema compliance)
- Data pipeline definitions (validation via lineage check + schema compliance)
None of these are in the current roadmap. Staffing first, production-proven, then expand.
1. Purpose
Design and implement a universal AI control-plane API that enables:
- deterministic high-stakes task execution — the immediate domain is staffing fills (contracts, workers, emails, SMS) at scale; the same architecture extends later to DevOps (Terraform, Ansible) without redesign
- iterative capability amplification via observer loops
- hybrid local + cloud model orchestration
- structured knowledge + memory + playbook reuse
- controlled improvement over time through validated iteration
The system prioritizes validated pipeline success over raw model intelligence.
Current scope — staffing at scale
The architecture must make the already-built staffing substrate reliably answer millions of inputs: pull real data, graph it across contracts, handle multiple concurrent contracts, index emails + SMS + playbooks via the hybrid SQL+vector method, and get faster and better each iteration via the feedback loops (Phase 19 playbook boost, Phase 22 KB pathway recommender, Phase 24 observer, Phase 26 Mem0 upsert).
DevOps is an eventual domain — see §Long-horizon domains.
2. Core Objectives
2.1 Functional Goals
- Provide a single universal API for all AI interactions
- Support multi-provider routing (local, flat-rate, token-based)
- Enable iterative execution loops with observer correction
- Store and reuse successful execution playbooks
- Integrate: S3-based knowledge storage, LanceDB retrieval/indexing, Mem0 memory layer, MCP tool ecosystem
2.2 Non-Functional Goals
- Deterministic behavior under constrained execution
- Full observability and cost accounting
- Safe DevOps execution (no uncontrolled mutation)
- Profile-driven routing and execution
- Reproducibility of successful runs
3. System Architecture
3.1 Layer Overview
Layer 1 — Universal API
Single entry point for all applications. Endpoints:
/v1/chat/v1/respond/v1/tools/v1/context/v1/usage/v1/sessions
All programs must use this layer. No direct provider calls allowed.
Layer 2 — Routing & Policy Engine
Responsibilities: provider selection, fallback logic, cost gating, premium access control, profile enforcement. Routing based on: task type, constraints, execution profile, system health.
Layer 3 — Provider Adapter Layer
Normalizes all providers: Ollama (local), OpenRouter, Gemini (direct), Claude (direct or routed), future providers. Guarantee: no provider-specific logic leaks upward.
Layer 4 — Knowledge & Memory Plane
- Knowledge (S3 + LanceDB): raw documents, processed chunks, embeddings, index profiles
- Memory (Mem0): extracted facts, entity-linked memory, session-aware retrieval
- Playbooks: successful execution traces, reusable patterns, correction strategies
Layer 5 — Execution Loop
Each task runs through: Retrieval → Planning → Generation → Validation → Observer feedback → Iteration (if needed).
Layer 6 — Observability & Accounting
Every request logs: tokens (input/output), cost, latency, provider, fallback chain, profile used, iteration delta.
4. Execution Model
4.1 Iterative Loop
Each task follows: Attempt → Validate → Observe → Adjust → Retry
Constraints:
- max iterations (default: 3)
- minimum improvement threshold
- cost ceiling per task
4.2 Observer Role
Observer can: analyze failure, suggest corrections, recommend profile changes. Observer cannot: modify truth layer, auto-promote changes, override constraints.
4.3 Cloud Escalation
Cloud models (Gemini, Claude) are used for: structural correction, reasoning gaps, complex decomposition. They are not used for: brute-force retries, bulk execution.
5. Profile System
5.1 Profile Types
- Retrieval Profile — chunking strategy, embedding method, reranking rules
- Memory Profile — memory weighting, context injection rules
- Execution Profile — allowed providers, tool access, risk level
- Observer Profile — mutation aggressiveness, iteration strategy
5.2 Profile Constraints
- only one major profile change per iteration
- profiles must produce measurable deltas
- promotion requires repeated success
6. Truth Layer (Critical)
Defines non-negotiable constraints:
- Terraform rules
- Ansible structure requirements
- security policies
- organization standards
Rules:
- immutable at runtime
- referenced by all layers
- cannot be overridden by observer
7. Playbook System
7.1 Playbook Definition
Each successful run produces: task class, context used, steps executed, tools used, output artifacts, validation results, cost/latency, success score.
7.2 Playbook Lifecycle
- created on success
- reused for similar tasks
- decayed over time
- pruned if ineffective
8. Validation System
All DevOps outputs must pass: syntax validation, linting, dry-run, policy compliance. Failure → iteration continues or task fails.
9. MCP Integration
MCP servers provide: tools, external data, execution capabilities. All MCP outputs must be: normalized, validated, schema-compliant. No direct MCP output reaches the model.
10. Token Accounting & Budget Control
Each request tracks: input tokens, output tokens, retries, fallback cost. Policies: premium providers gated, cost ceilings enforced, per-task budget limits.
11. Failure Handling
Recoverable failures: bad decomposition, missing steps, weak retrieval → observer + iteration.
Hard failures: missing truth data, invalid task classification, unsafe execution → termination + error report.
12. Success Criteria
A task is successful only if:
- output is valid
- all validators pass
- no policy violations
- result is reproducible
- cost within limits
13. Key Risks & Mitigations
- Observer drift → bounded authority, confidence tracking
- Memory poisoning → validation layer, memory weighting
- Cost explosion → token accounting, iteration caps
- Retrieval errors → post-retrieval validation, profile tuning