profit f44b6b3e6b Control-plane pivot: Phase 38-44 plan + bot scaffold

Direction shift 2026-04-22: docs/CONTROL_PLANE_PRD.md becomes the
long-horizon architecture target. Existing Lakehouse (docs/PRD.md,
Phases 0-37) is preserved as the reference implementation and first
consumer. New 6-layer architecture:

  L1 Universal API /v1/chat /v1/usage /v1/sessions /v1/tools /v1/context
  L2 Routing & Policy Engine (rules, fallback chains, cost gating)
  L3 Provider Adapter Layer (Ollama + OpenRouter + Gemini + Claude)
  L4 Knowledge + Memory + Playbooks (already built)
  L5 Execution Loop (scenarios + bot/cycle.ts instances)
  L6 Observability + token accounting

Phases 38-44 sequenced with detailed per-phase specs in the PRD.
Current scope: staffing domain (synthetic workers_500k, contracts,
emails, SMS, playbooks). DevOps (Terraform/Ansible) is long-horizon
target — architecture-compatible but not current.

Files added:
- docs/CONTROL_PLANE_PRD.md — 6-layer architecture, Phase 38-44
  sequencing with staffing-first Truth Layer + Validation pipeline
- bot/ — manual-only PR bot scaffold. First consumer test-bed for
  /v1/chat (Phase 38). Mem0-aligned ADD/UPDATE/NOOP apply semantics;
  KB feedback loop reads prior cycles on same gap and injects into
  cloud prompt so bot cycles compound like scenario.ts runs do.
- tests/multi-agent/run_stress.ts — the 6-task diverse stress test
  referenced in the previous commit but missing from its staging

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-22 02:43:31 -05:00

21 KiB

Raw Blame History

PRD — Universal AI Control Plane

Status: Long-horizon architecture target as of 2026-04-22. Lakehouse Phases 0-37 (docs/PRD.md) are preserved as the reference implementation and first domain-specific consumer. Phases 38+ (control-plane layers) are sequenced below.

Current domain: staffing. The immediate proving ground is the staffing substrate already built — synthetic workers_500k, contracts, emails, SMS drafts, playbook memory. Everything Phase 38-44 ships is validated first against that domain. The DevOps / Terraform / Ansible framing from the original PRD draft stays as a long-horizon target — architecture-compatible but not in current scope. See §Long-horizon domains at the bottom.

Owner: J

Cross-read: docs/PRD.md for what's shipped (staffing + AI substrate, 13 crates, ~3M rows). This doc for the layered architecture those pieces now fit into.

Phase Sequencing (Phases 38-44)

Ship each phase before starting the next. Each ends with green tests + docs update.

Phase	Layer	What ships	Est. LOC	Risk
38	Layer 1 skeleton	`/v1/chat`, `/v1/usage`, `/v1/sessions` routes forwarding to existing `aibridge` → Ollama. Bot migrates as first consumer.	~400	Low — additive, no existing routes touched
39	Layer 3 adapters	`aibridge::ProviderAdapter` trait; Ollama + one new (OpenRouter). `/v1/chat` routes by config.	~500	Low-medium
40	Layer 2 engine	Rules-based routing (`config/routing.toml`), fallback chains, cost gating. Add Gemini + Claude adapters.	~600	Medium
41	Profile split	Separate Retrieval / Memory / Execution / Observer profiles; Phase 17 backward-compat. Absorbs Phase 37 hot-swap-async.	~300	Medium
42	Truth Layer	New `crates/truth`; Terraform/Ansible schemas; `/v1/context` serves rules to router + observer.	~700	Medium
43	Validation pipeline	Syntax/lint/dry-run/policy gates per output type. Plugs into Layer 5 execution loop.	~400	Medium
44	Caller migration	All internal callers route through `/v1/chat`. Direct sidecar access deprecated.	~200	Low

Total ≈3100 LOC. Phase 37 (hot-swap async) folds into Phase 41 — it's an Execution-Profile activation concern.

Phase 38 — Universal API Skeleton

Goal: OpenAI-compatible /v1/* surface exists and forwards to existing aibridge → Ollama. Nothing about multi-provider yet — just the SHAPE, so every downstream piece (adapters, routing, usage accounting) has a surface to plug into.

Ships:

crates/gateway/src/v1/mod.rs — router + /v1/chat, /v1/usage, /v1/sessions
crates/gateway/src/v1/ollama.rs — shape adapter (OpenAI chat ↔ existing aibridge GenerateRequest)
One-line nest("/v1", ...) in crates/gateway/src/main.rs
Unit test: POST /v1/chat roundtrips through mocked provider

Gate:

curl -X POST localhost:3100/v1/chat -d '{"model":"qwen3.5:latest","messages":[{"role":"user","content":"hi"}]}' returns valid OpenAI-shape response.
GET localhost:3100/v1/usage returns {requests, prompt_tokens, completion_tokens, total_tokens}.
GET localhost:3100/v1/sessions returns {data:[]} (stub; real impl Phase 41).
cargo test -p gateway green.

Non-goals (explicit): streaming, tool calls, function calling, session state, multi-provider, fallback, cost gating.

Risk: Low — additive, doesn't touch existing routes. Worst case: /v1/* returns 502 and we fix the adapter. No existing caller affected.

Phase 39 — Provider Adapter Refactor

Goal: aibridge grows a ProviderAdapter trait. Ollama implementation wraps existing sidecar code. One new provider lands as proof: OpenRouter (simplest — it's OpenAI-compatible, so adapter is mostly passthrough).

Ships:

crates/aibridge/src/provider.rs — ProviderAdapter trait with chat() + embed() + unload() methods
crates/aibridge/src/providers/ollama.rs — existing sidecar code moved behind the trait
crates/aibridge/src/providers/openrouter.rs — new, HTTP client to openrouter.ai/api/v1/chat/completions
config/providers.toml — provider registry (name, base_url, auth, default_models)
/v1/chat routes by model field: prefix match (e.g. openrouter/anthropic/claude-3.5-sonnet → OpenRouter; bare names → Ollama)

Gate:

/v1/chat with model: "qwen3.5:latest" hits Ollama → green
/v1/chat with model: "openrouter/openai/gpt-4o-mini" hits OpenRouter (key from secrets.toml) → green
Neither call leaks provider-specific fields upward. Response is always the /v1/chat shape.

Non-goals: Fallback chain (Phase 40), cost gating (Phase 40), Gemini/Claude adapters (Phase 40).

Risk: Low-medium. The trait extraction is mostly a rearrange; OpenRouter is thin. Biggest risk is secret-loading conventions — SecretsProvider is already in place, so reuse that path.

Phase 40 — Routing & Policy Engine

Goal: Replace hardcoded T1-T5 routing with a rules engine. Add Gemini + Claude adapters. Cost gating enforced at router level.

Ships:

crates/aibridge/src/routing.rs — rules engine (match on: task type, token budget, previous attempt failures, profile ID)
config/routing.toml — rules in TOML (human-editable, hot-reloadable)
crates/aibridge/src/providers/gemini.rs — generativelanguage.googleapis.com adapter
crates/aibridge/src/providers/claude.rs — api.anthropic.com adapter
Fallback chain support: if primary returns 5xx or times out, try next in chain
Cost gate: per-request budget + daily budget per-provider

Gate:

Rule like "local models for simple JSON emitters, cloud for reasoning" fires correctly by task type
Primary fails → fallback provider hits, response still matches /v1/chat shape
Daily budget hit → subsequent requests return 429 with clear retry-at header
/v1/usage reports per-provider breakdown

Non-goals: Retrieval Profile split (Phase 41), Truth Layer (Phase 42).

Risk: Medium. Multi-provider auth + cost tracking is cross-cutting. Mitigation: every provider call wrapped in a single dispatch() function, all observability flows through there.

Phase 41 — Profile System Expansion (+ Phase 37 hot-swap async folded in)

Goal: The existing ModelProfile (Phase 17) becomes ExecutionProfile. Three new profile types land alongside. Profile activation is async — returns job_id, work runs in background (Phase 37 deliverable).

Ships:

crates/shared/src/profiles/ — ExecutionProfile, RetrievalProfile, MemoryProfile, ObserverProfile
crates/catalogd gains per-profile-type CRUD endpoints (/catalog/profiles/retrieval, etc.)
crates/vectord/src/activation.rs — ActivationTracker with background-job pattern (Phase 37 content)
POST /vectors/profile/{id}/activate returns 202 + job_id, polling at GET /vectors/profile/jobs/{id}
Single-flight guard: refuse new activation if one is pending/running
Backward compat: ModelProfile still loads, aliased to ExecutionProfile

Gate:

Activate a profile → returns 202 in <100ms → job completes in background → /vectors/profile/jobs/{id} shows progress + final report
tests/multi-agent/run_stress.ts Phase 3 (hot-swap stress) passes (was SKIPPED)
Retrieval + Memory + Observer profiles can be created independently of Execution profile

Non-goals: Truth Layer (Phase 42), validation (Phase 43), caller migration (Phase 44).

Risk: Medium. Schema change + async refactor. Mitigation: #[serde(default)] on all new fields; existing profiles load unchanged.

Phase 42 — Truth Layer (staffing rules first)

Goal: New crates/truth crate holds immutable task-class constraints. Served via /v1/context to router and observer. No layer can override truth. Staffing rules ship first; Terraform/Ansible rule shapes are scaffolded but unpopulated until the long-horizon phase.

Ships:

crates/truth/src/lib.rs — TruthStore with schema loading (TOML/YAML rules)
crates/truth/src/staffing.rs — staffing rule shapes:
- Worker eligibility (active status, not blacklisted for client, geo match, role match, availability window)
- Contract invariants (deadline present, role/count/city/state populated, budget_per_hour_max ≥ 0)
- PII handling (redaction rules on fields tagged PII before any cloud call — covers existing Phase 10 sensitivity tags)
- Client blacklist enforcement (auto-applied before any fill proposal)
- Fill requirements (endorsed_names count matches target_count, no duplicate worker_ids within a single fill)
crates/truth/src/devops.rs — scaffold only: empty rule struct for Terraform/Ansible, populated in the long-horizon phase. Keeps the dispatcher signature stable so no refactor needed later.
truth/ dir at repo root — rule files, versioned in git
/v1/context endpoint — returns applicable rules for a task class (staffing.fill, staffing.rescue, staffing.sms_draft, etc.)
Router consults truth before dispatching: if task violates a rule, hard-fail with structured error + rule citation (matches existing Phase 13 access-control pattern)

Gate:

Submit a fill proposal where a worker is client-blacklisted — router returns 422 + rule citation, no cloud tokens burned
Submit a fill with endorsed_names.length != target_count — 422 before dispatch
Observer cannot promote a correction that violates truth (rejected at router gate)
PII redaction verified: SSN / salary fields stripped from prompts before cloud calls
Truth reload is explicit (no file-watch hot reload in this phase)

Non-goals: Validation execution (Phase 43), policy learning / evolution (deferred), actual Terraform/Ansible rules (long-horizon phase).

Risk: Medium. Domain-specific rule enumeration takes discovery — start with a minimal rule set (5-10 staffing rules, derived from existing Phase 10-13 work) and grow organically as real fills surface edge cases.

Phase 43 — Validation Pipeline (staffing outputs first)

Goal: Staffing outputs run through schema / completeness / consistency / policy gates. Plug into Layer 5 execution loop — failure triggers observer-correction iteration. This is where the 0→85% pattern reproduces on real staffing tasks — the iteration loop with validation in place is what made small models successful.

Ships:

crates/validator/src/lib.rs — Validator trait: validate(artifact) -> Result<Report, ValidationError> + Artifact enum over output types
crates/validator/src/staffing/fill.rs — fill-proposal validator:
- Schema compliance (propose_done shape matches {fills: [{candidate_id, name}]})
- Completeness (endorsed count == target_count)
- Worker existence (every candidate_id present in workers_500k via SQL lookup)
- Status check (every worker has status=active, not_on_client_blacklist)
- Geo/role match (worker city/state/role matches contract)
crates/validator/src/staffing/email.rs — generated email/SMS drafts:
- Schema (TO/BODY fields present)
- Length (SMS ≤ 160 chars; email subject ≤ 78 chars)
- PII absence (no SSN / salary leaked into outgoing text)
- Worker-name consistency (name in message matches worker record)
crates/validator/src/staffing/playbook.rs — sealed playbook:
- Operation format (fill: Role xN in City, ST)
- endorsed_names non-empty, ≤ target_count × 2
- fingerprint populated (Phase 25 validity window requirement)
crates/validator/src/devops.rs — scaffold only: stubbed Terraform/Ansible validators (terraform validate, ansible-lint) for the long-horizon phase
Task execution loop in gateway: generate → validate → if fail, observer correction + retry (bounded by max_iterations=3)
Validation results logged to observer (data/_observer/ops.jsonl) + KB (data/_kb/outcomes.jsonl)

Gate:

Generate a fill proposal → validator catches a phantom worker_id → observer + cloud rescue propose correction → retry → green. This reproduces the 0→85% pattern on the live staffing pipeline.
/v1/usage shows iteration count per task, provider fallback chain, and tokens-per-iteration. Cost attribution per task class visible.
Reproduces 14× citation-lift finding from Phase 19 refinement on similar geos after validation gates.

Non-goals: Caller migration (Phase 44), Terraform/Ansible wired validation (long-horizon).

Risk: Medium. Validation shapes have to match actual executor outputs; mitigation is using real scenario runs as test fixtures (we have ~100 of them in tests/multi-agent/playbooks/).

Phase 44 — Caller Migration + Direct-Provider Deprecation

Goal: Every internal LLM caller routes through /v1/chat. Direct sidecar / direct Ollama / direct OpenAI calls are removed or explicitly deprecated with a warning.

Ships:

aibridge::AiClient becomes a thin /v1/chat client (was direct-to-sidecar)
crates/vectord::agent (autotune): routes through /v1
crates/vectord::autotune: routes through /v1
tests/multi-agent/agent.ts::generate(): routes through /v1
bot/propose.ts: routes through /v1 (already proposed as Phase 38's test consumer, formalized here)
Lint rule / grep pre-commit hook: no fetch.*:3200/generate outside the provider adapters

Gate:

grep -r "localhost:3200/generate\|/api/generate" returns only adapter files + deprecation shims
/v1/usage accounts for every LLM call in the system within a 1-minute window after hitting a fresh scenario
Full scenario passes end-to-end without any caller bypassing /v1/*

Non-goals: New features. This phase is purely mechanical migration.

Risk: Low. Mechanical. Tests catch regressions.

Long-horizon domains (not in current phase sequence)

The architecture was drafted with DevOps execution (Terraform, Ansible) as the eventual target. That remains aspirational, not current scope — we don't start wiring terraform validate / ansible-lint until the staffing domain proves the six-layer architecture at scale.

What "proves at scale" means concretely:

Phases 38-44 all shipped against staffing, green tests
Live staffing pipeline handles multiple concurrent contracts with emails + SMS + indexed playbooks via /v1/*
Observed iteration success lift (the 0→85% pattern) reproduced on varied staffing scenarios, not just the original proof-of-concept
Token + cost accounting stable across provider fallback chains under real load
Truth Layer rules prevent real fill errors before cloud burn (not just theoretical)

When staffing hits that bar, the DevOps domain lights up by:

Populating crates/truth/src/devops.rs with real Terraform/Ansible rule shapes
Populating crates/validator/src/devops.rs with terraform validate / ansible-lint shell-out
Adding DevOps task classes to /v1/context rule lookup
No architectural changes needed — the dispatcher, router, and execution loop stay identical.

Other candidate long-horizon domains (same pattern):

Code generation tasks (validation via cargo check / bun test)
SQL query generation (validation via EXPLAIN + schema compliance)
Data pipeline definitions (validation via lineage check + schema compliance)

None of these are in the current roadmap. Staffing first, production-proven, then expand.

1. Purpose

Design and implement a universal AI control-plane API that enables:

deterministic high-stakes task execution — the immediate domain is staffing fills (contracts, workers, emails, SMS) at scale; the same architecture extends later to DevOps (Terraform, Ansible) without redesign
iterative capability amplification via observer loops
hybrid local + cloud model orchestration
structured knowledge + memory + playbook reuse
controlled improvement over time through validated iteration

The system prioritizes validated pipeline success over raw model intelligence.

Current scope — staffing at scale

The architecture must make the already-built staffing substrate reliably answer millions of inputs: pull real data, graph it across contracts, handle multiple concurrent contracts, index emails + SMS + playbooks via the hybrid SQL+vector method, and get faster and better each iteration via the feedback loops (Phase 19 playbook boost, Phase 22 KB pathway recommender, Phase 24 observer, Phase 26 Mem0 upsert).

DevOps is an eventual domain — see §Long-horizon domains.

2. Core Objectives

2.1 Functional Goals

Provide a single universal API for all AI interactions
Support multi-provider routing (local, flat-rate, token-based)
Enable iterative execution loops with observer correction
Store and reuse successful execution playbooks
Integrate: S3-based knowledge storage, LanceDB retrieval/indexing, Mem0 memory layer, MCP tool ecosystem

2.2 Non-Functional Goals

Deterministic behavior under constrained execution
Full observability and cost accounting
Safe DevOps execution (no uncontrolled mutation)
Profile-driven routing and execution
Reproducibility of successful runs

3. System Architecture

3.1 Layer Overview

Layer 1 — Universal API

Single entry point for all applications. Endpoints:

/v1/chat
/v1/respond
/v1/tools
/v1/context
/v1/usage
/v1/sessions

All programs must use this layer. No direct provider calls allowed.

Layer 2 — Routing & Policy Engine

Responsibilities: provider selection, fallback logic, cost gating, premium access control, profile enforcement. Routing based on: task type, constraints, execution profile, system health.

Layer 3 — Provider Adapter Layer

Normalizes all providers: Ollama (local), OpenRouter, Gemini (direct), Claude (direct or routed), future providers. Guarantee: no provider-specific logic leaks upward.

Layer 4 — Knowledge & Memory Plane

Knowledge (S3 + LanceDB): raw documents, processed chunks, embeddings, index profiles
Memory (Mem0): extracted facts, entity-linked memory, session-aware retrieval
Playbooks: successful execution traces, reusable patterns, correction strategies

Layer 5 — Execution Loop

Each task runs through: Retrieval → Planning → Generation → Validation → Observer feedback → Iteration (if needed).

Layer 6 — Observability & Accounting

Every request logs: tokens (input/output), cost, latency, provider, fallback chain, profile used, iteration delta.

4. Execution Model

4.1 Iterative Loop

Each task follows: Attempt → Validate → Observe → Adjust → Retry

Constraints:

max iterations (default: 3)
minimum improvement threshold
cost ceiling per task

4.2 Observer Role

Observer can: analyze failure, suggest corrections, recommend profile changes. Observer cannot: modify truth layer, auto-promote changes, override constraints.

4.3 Cloud Escalation

Cloud models (Gemini, Claude) are used for: structural correction, reasoning gaps, complex decomposition. They are not used for: brute-force retries, bulk execution.

5. Profile System

5.1 Profile Types

Retrieval Profile — chunking strategy, embedding method, reranking rules
Memory Profile — memory weighting, context injection rules
Execution Profile — allowed providers, tool access, risk level
Observer Profile — mutation aggressiveness, iteration strategy

5.2 Profile Constraints

only one major profile change per iteration
profiles must produce measurable deltas
promotion requires repeated success

6. Truth Layer (Critical)

Defines non-negotiable constraints:

Terraform rules
Ansible structure requirements
security policies
organization standards

Rules:

immutable at runtime
referenced by all layers
cannot be overridden by observer

7. Playbook System

7.1 Playbook Definition

Each successful run produces: task class, context used, steps executed, tools used, output artifacts, validation results, cost/latency, success score.

7.2 Playbook Lifecycle

created on success
reused for similar tasks
decayed over time
pruned if ineffective

8. Validation System

All DevOps outputs must pass: syntax validation, linting, dry-run, policy compliance. Failure → iteration continues or task fails.

9. MCP Integration

MCP servers provide: tools, external data, execution capabilities. All MCP outputs must be: normalized, validated, schema-compliant. No direct MCP output reaches the model.

10. Token Accounting & Budget Control

Each request tracks: input tokens, output tokens, retries, fallback cost. Policies: premium providers gated, cost ceilings enforced, per-task budget limits.

11. Failure Handling

Recoverable failures: bad decomposition, missing steps, weak retrieval → observer + iteration.

Hard failures: missing truth data, invalid task classification, unsafe execution → termination + error report.

12. Success Criteria

A task is successful only if:

output is valid
all validators pass
no policy violations
result is reproducible
cost within limits

13. Key Risks & Mitigations

Observer drift → bounded authority, confidence tracking
Memory poisoning → validation layer, memory weighting
Cost explosion → token accounting, iteration caps
Retrieval errors → post-retrieval validation, profile tuning

21 KiB Raw Blame History Unescape Escape

PRD — Universal AI Control Plane

Phase Sequencing (Phases 38-44)

Phase 38 — Universal API Skeleton

Phase 39 — Provider Adapter Refactor

Phase 40 — Routing & Policy Engine

Phase 41 — Profile System Expansion (+ Phase 37 hot-swap async folded in)

Phase 42 — Truth Layer (staffing rules first)

Phase 43 — Validation Pipeline (staffing outputs first)

Phase 44 — Caller Migration + Direct-Provider Deprecation

Long-horizon domains (not in current phase sequence)

1. Purpose

Current scope — staffing at scale

2. Core Objectives

2.1 Functional Goals

2.2 Non-Functional Goals

3. System Architecture

3.1 Layer Overview

4. Execution Model

4.1 Iterative Loop

4.2 Observer Role

4.3 Cloud Escalation

5. Profile System

5.1 Profile Types

5.2 Profile Constraints

6. Truth Layer (Critical)

7. Playbook System

7.1 Playbook Definition

7.2 Playbook Lifecycle

8. Validation System

9. MCP Integration

10. Token Accounting & Budget Control

11. Failure Handling

12. Success Criteria

13. Key Risks & Mitigations

21 KiB

Raw Blame History