Phase J keeps asking for: playbooks know which external docs they
used, get flagged when those docs drift. This commit ships the data
model; context7 bridge + drift check endpoints land in follow-ups.
Added to crates/vectord/src/playbook_memory.rs:
- pub struct DocRef { tool, version_seen, snippet_hash, source_url,
seen_at } — one external doc reference
- PlaybookEntry.doc_refs: Vec<DocRef> — empty on legacy entries,
serde default ensures pre-Phase-45 persisted state loads cleanly
- PlaybookEntry.doc_drift_flagged_at: Option<String> — set by the
(future) drift-check code when context7 reports newer version
- PlaybookEntry.doc_drift_reviewed_at: Option<String> — set by
human via /resolve endpoint after reviewing the diagnosis
- impl Default for PlaybookEntry — collapses most test-helper
constructors from 17 explicit fields to 6-9 fields +
..Default::default()
Updated SeedPlaybookRequest + RevisePlaybookRequest (service.rs) to
accept optional doc_refs: the seed/revise endpoints already take the
field, downstream drift detection (Phase 45.2) consumes it.
Docs: docs/CONTROL_PLANE_PRD.md gains full Phase 45 spec with gate
criteria, non-goals, and risk notes.
Tests: 51/51 vectord lib tests green (same count as before, field
additions are backward-compat).
Memory: project_doc_drift_vision.md written so this keeps coming
back to the front of mind.
Next slices (same phase): context7 HTTP bridge in mcp-server,
/vectors/playbook_memory/doc_drift/check/{id} endpoint, overview-
model drift synthesis writing to data/_kb/doc_drift_corrections.jsonl,
boost exclusion for flagged+unreviewed entries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
458 lines
26 KiB
Markdown
458 lines
26 KiB
Markdown
# PRD — Universal AI Control Plane
|
||
|
||
**Status:** Long-horizon architecture target as of 2026-04-22. Lakehouse Phases 0-37 (`docs/PRD.md`) are preserved as the reference implementation and first domain-specific consumer. Phases 38+ (control-plane layers) are sequenced below.
|
||
|
||
**Current domain: staffing.** The immediate proving ground is the staffing substrate already built — synthetic workers_500k, contracts, emails, SMS drafts, playbook memory. Everything Phase 38-44 ships is validated first against that domain. The DevOps / Terraform / Ansible framing from the original PRD draft stays as a **long-horizon target** — architecture-compatible but not in current scope. See §Long-horizon domains at the bottom.
|
||
|
||
**Owner:** J
|
||
|
||
**Cross-read:** `docs/PRD.md` for what's shipped (staffing + AI substrate, 13 crates, ~3M rows). This doc for the layered architecture those pieces now fit into.
|
||
|
||
---
|
||
|
||
## Phase Sequencing (Phases 38-44)
|
||
|
||
Ship each phase before starting the next. Each ends with green tests + docs update.
|
||
|
||
| Phase | Layer | What ships | Est. LOC | Risk |
|
||
|---|---|---|---|---|
|
||
| 38 | Layer 1 skeleton | `/v1/chat`, `/v1/usage`, `/v1/sessions` routes forwarding to existing `aibridge` → Ollama. Bot migrates as first consumer. | ~400 | Low — additive, no existing routes touched |
|
||
| 39 | Layer 3 adapters | `aibridge::ProviderAdapter` trait; Ollama + one new (OpenRouter). `/v1/chat` routes by config. | ~500 | Low-medium |
|
||
| 40 | Layer 2 engine | Rules-based routing (`config/routing.toml`), fallback chains, cost gating. Add Gemini + Claude adapters. | ~600 | Medium |
|
||
| 41 | Profile split | Separate Retrieval / Memory / Execution / Observer profiles; Phase 17 backward-compat. Absorbs Phase 37 hot-swap-async. | ~300 | Medium |
|
||
| 42 | Truth Layer | New `crates/truth`; Terraform/Ansible schemas; `/v1/context` serves rules to router + observer. | ~700 | Medium |
|
||
| 43 | Validation pipeline | Syntax/lint/dry-run/policy gates per output type. Plugs into Layer 5 execution loop. | ~400 | Medium |
|
||
| 44 | Caller migration | All internal callers route through `/v1/chat`. Direct sidecar access deprecated. | ~200 | Low |
|
||
|
||
**Total ≈3100 LOC.** Phase 37 (hot-swap async) folds into Phase 41 — it's an Execution-Profile activation concern.
|
||
|
||
---
|
||
|
||
## Phase 38 — Universal API Skeleton
|
||
|
||
**Goal:** OpenAI-compatible `/v1/*` surface exists and forwards to existing aibridge → Ollama. Nothing about multi-provider yet — just the SHAPE, so every downstream piece (adapters, routing, usage accounting) has a surface to plug into.
|
||
|
||
**Ships:**
|
||
- `crates/gateway/src/v1/mod.rs` — router + `/v1/chat`, `/v1/usage`, `/v1/sessions`
|
||
- `crates/gateway/src/v1/ollama.rs` — shape adapter (OpenAI chat ↔ existing aibridge `GenerateRequest`)
|
||
- One-line `nest("/v1", ...)` in `crates/gateway/src/main.rs`
|
||
- Unit test: `POST /v1/chat` roundtrips through mocked provider
|
||
|
||
**Gate:**
|
||
- `curl -X POST localhost:3100/v1/chat -d '{"model":"qwen3.5:latest","messages":[{"role":"user","content":"hi"}]}'` returns valid OpenAI-shape response.
|
||
- `GET localhost:3100/v1/usage` returns `{requests, prompt_tokens, completion_tokens, total_tokens}`.
|
||
- `GET localhost:3100/v1/sessions` returns `{data:[]}` (stub; real impl Phase 41).
|
||
- `cargo test -p gateway` green.
|
||
|
||
**Non-goals (explicit):** streaming, tool calls, function calling, session state, multi-provider, fallback, cost gating.
|
||
|
||
**Risk:** Low — additive, doesn't touch existing routes. Worst case: `/v1/*` returns 502 and we fix the adapter. No existing caller affected.
|
||
|
||
---
|
||
|
||
## Phase 39 — Provider Adapter Refactor
|
||
|
||
**Goal:** `aibridge` grows a `ProviderAdapter` trait. Ollama implementation wraps existing sidecar code. One new provider lands as proof: **OpenRouter** (simplest — it's OpenAI-compatible, so adapter is mostly passthrough).
|
||
|
||
**Ships:**
|
||
- `crates/aibridge/src/provider.rs` — `ProviderAdapter` trait with `chat()` + `embed()` + `unload()` methods
|
||
- `crates/aibridge/src/providers/ollama.rs` — existing sidecar code moved behind the trait
|
||
- `crates/aibridge/src/providers/openrouter.rs` — new, HTTP client to `openrouter.ai/api/v1/chat/completions`
|
||
- `config/providers.toml` — provider registry (name, base_url, auth, default_models)
|
||
- `/v1/chat` routes by `model` field: prefix match (e.g. `openrouter/anthropic/claude-3.5-sonnet` → OpenRouter; bare names → Ollama)
|
||
|
||
**Gate:**
|
||
- `/v1/chat` with `model: "qwen3.5:latest"` hits Ollama → green
|
||
- `/v1/chat` with `model: "openrouter/openai/gpt-4o-mini"` hits OpenRouter (key from secrets.toml) → green
|
||
- Neither call leaks provider-specific fields upward. Response is always the `/v1/chat` shape.
|
||
|
||
**Non-goals:** Fallback chain (Phase 40), cost gating (Phase 40), Gemini/Claude adapters (Phase 40).
|
||
|
||
**Risk:** Low-medium. The trait extraction is mostly a rearrange; OpenRouter is thin. Biggest risk is secret-loading conventions — `SecretsProvider` is already in place, so reuse that path.
|
||
|
||
---
|
||
|
||
## Phase 40 — Routing & Policy Engine + Observability Recovery
|
||
|
||
**Goal:** Replace hardcoded T1-T5 routing with a rules engine. Add Gemini + Claude adapters. Cost gating enforced at router level. **Reinstate Langfuse + Gitea MCP** — recovery of the observability + repo-ops stack J built previously (see `project_lost_stack` memory).
|
||
|
||
**Ships — routing:**
|
||
- `crates/aibridge/src/routing.rs` — rules engine (match on: task type, token budget, previous attempt failures, profile ID)
|
||
- `config/routing.toml` — rules in TOML (human-editable, hot-reloadable)
|
||
- `crates/aibridge/src/providers/gemini.rs` — `generativelanguage.googleapis.com` adapter
|
||
- `crates/aibridge/src/providers/claude.rs` — `api.anthropic.com` adapter
|
||
- Fallback chain support: if primary returns 5xx or times out, try next in chain
|
||
- Cost gate: per-request budget + daily budget per-provider
|
||
|
||
**Ships — observability (was lost, now restored):**
|
||
- **Langfuse** self-hosted via Docker Compose. Single source of truth for every LLM call trace: prompt / response / tokens / cost / latency / provider / fallback chain / profile used. UI at `localhost:3000`. Keys in `/etc/lakehouse/secrets.toml`.
|
||
- `crates/aibridge/src/langfuse.rs` — thin fire-and-forget trace emitter. Every `/v1/chat` call spawns a background task that POSTs to `langfuse/api/public/ingestion`. Non-blocking: trace failures never affect response.
|
||
- **Langfuse → observer pipe** — `mcp-server/langfuse_bridge.ts` or similar. Polls Langfuse's trace API at interval, forwards completed traces to observer `:3800/event` with `source: "langfuse"`. KB now sees cost/latency deltas per model, not just outcome deltas.
|
||
- **Gitea MCP reconnect** — the MCP server binary still installed at `/home/profit/.bun/install/cache/gitea-mcp@0.0.10/` gets wired into `mcp-server/index.ts` tool registry. Agents can open PRs, comment on issues, list commits via named tools. Closes Phase 28's repo-ops gap.
|
||
|
||
**Gate:**
|
||
- Rule like "local models for simple JSON emitters, cloud for reasoning" fires correctly by task type
|
||
- Primary fails → fallback provider hits, response still matches `/v1/chat` shape
|
||
- Daily budget hit → subsequent requests return 429 with clear retry-at header
|
||
- `/v1/usage` reports per-provider breakdown
|
||
- **Every `/v1/chat` call appears in Langfuse UI** with correct prompt, response, latency, token count within 2 seconds of the request completing
|
||
- **Langfuse → observer pipe** delivers trace deltas to KB: `GET :3800/stats?source=langfuse` shows non-zero count after a few scenarios run
|
||
- **Gitea MCP tools callable** — `list_prs`, `open_pr`, `comment_on_issue` exposed in `mcp-server/index.ts`, verifiable via a quick agent scenario
|
||
|
||
**Non-goals:** Retrieval Profile split (Phase 41), Truth Layer (Phase 42). Langfuse self-hosted UI customization / SSO.
|
||
|
||
**Risk:** Medium. Multi-provider auth + cost tracking is cross-cutting; Langfuse adds 4-5 Docker containers (PostgreSQL, ClickHouse, Redis, web, worker). Mitigation: every provider call wrapped in a single `dispatch()` function so observability flows through one point; Langfuse Docker Compose is their supported deployment path, well-tested.
|
||
|
||
---
|
||
|
||
## Phase 41 — Profile System Expansion (+ Phase 37 hot-swap async folded in)
|
||
|
||
**Goal:** The existing `ModelProfile` (Phase 17) becomes **ExecutionProfile**. Three new profile types land alongside. Profile activation is async — returns job_id, work runs in background (Phase 37 deliverable).
|
||
|
||
**Ships:**
|
||
- `crates/shared/src/profiles/` — `ExecutionProfile`, `RetrievalProfile`, `MemoryProfile`, `ObserverProfile`
|
||
- `crates/catalogd` gains per-profile-type CRUD endpoints (`/catalog/profiles/retrieval`, etc.)
|
||
- `crates/vectord/src/activation.rs` — `ActivationTracker` with background-job pattern (Phase 37 content)
|
||
- `POST /vectors/profile/{id}/activate` returns 202 + job_id, polling at `GET /vectors/profile/jobs/{id}`
|
||
- Single-flight guard: refuse new activation if one is pending/running
|
||
- Backward compat: `ModelProfile` still loads, aliased to ExecutionProfile
|
||
|
||
**Gate:**
|
||
- Activate a profile → returns 202 in <100ms → job completes in background → `/vectors/profile/jobs/{id}` shows progress + final report
|
||
- `tests/multi-agent/run_stress.ts` Phase 3 (hot-swap stress) passes (was SKIPPED)
|
||
- Retrieval + Memory + Observer profiles can be created independently of Execution profile
|
||
|
||
**Non-goals:** Truth Layer (Phase 42), validation (Phase 43), caller migration (Phase 44).
|
||
|
||
**Risk:** Medium. Schema change + async refactor. Mitigation: `#[serde(default)]` on all new fields; existing profiles load unchanged.
|
||
|
||
---
|
||
|
||
## Phase 42 — Truth Layer (staffing rules first)
|
||
|
||
**Goal:** New `crates/truth` crate holds immutable task-class constraints. Served via `/v1/context` to router and observer. No layer can override truth. **Staffing rules ship first**; Terraform/Ansible rule shapes are scaffolded but unpopulated until the long-horizon phase.
|
||
|
||
**Ships:**
|
||
- `crates/truth/src/lib.rs` — `TruthStore` with schema loading (TOML/YAML rules)
|
||
- `crates/truth/src/staffing.rs` — staffing rule shapes:
|
||
- Worker eligibility (active status, not blacklisted for client, geo match, role match, availability window)
|
||
- Contract invariants (deadline present, role/count/city/state populated, budget_per_hour_max ≥ 0)
|
||
- PII handling (redaction rules on fields tagged `PII` before any cloud call — covers existing Phase 10 sensitivity tags)
|
||
- Client blacklist enforcement (auto-applied before any fill proposal)
|
||
- Fill requirements (endorsed_names count matches target_count, no duplicate worker_ids within a single fill)
|
||
- `crates/truth/src/devops.rs` — **scaffold only**: empty rule struct for Terraform/Ansible, populated in the long-horizon phase. Keeps the dispatcher signature stable so no refactor needed later.
|
||
- `truth/` dir at repo root — rule files, versioned in git
|
||
- `/v1/context` endpoint — returns applicable rules for a task class (`staffing.fill`, `staffing.rescue`, `staffing.sms_draft`, etc.)
|
||
- Router consults truth before dispatching: if task violates a rule, hard-fail with structured error + rule citation (matches existing Phase 13 access-control pattern)
|
||
|
||
**Gate:**
|
||
- Submit a fill proposal where a worker is client-blacklisted — router returns 422 + rule citation, no cloud tokens burned
|
||
- Submit a fill with `endorsed_names.length != target_count` — 422 before dispatch
|
||
- Observer cannot promote a correction that violates truth (rejected at router gate)
|
||
- PII redaction verified: SSN / salary fields stripped from prompts before cloud calls
|
||
- Truth reload is explicit (no file-watch hot reload in this phase)
|
||
|
||
**Non-goals:** Validation execution (Phase 43), policy learning / evolution (deferred), actual Terraform/Ansible rules (long-horizon phase).
|
||
|
||
**Risk:** Medium. Domain-specific rule enumeration takes discovery — start with a minimal rule set (5-10 staffing rules, derived from existing Phase 10-13 work) and grow organically as real fills surface edge cases.
|
||
|
||
---
|
||
|
||
## Phase 43 — Validation Pipeline (staffing outputs first)
|
||
|
||
**Goal:** Staffing outputs run through schema / completeness / consistency / policy gates. Plug into Layer 5 execution loop — failure triggers observer-correction iteration. This is where the **0→85% pattern reproduces on real staffing tasks** — the iteration loop with validation in place is what made small models successful.
|
||
|
||
**Ships:**
|
||
- `crates/validator/src/lib.rs` — `Validator` trait: `validate(artifact) -> Result<Report, ValidationError>` + `Artifact` enum over output types
|
||
- `crates/validator/src/staffing/fill.rs` — fill-proposal validator:
|
||
- Schema compliance (propose_done shape matches `{fills: [{candidate_id, name}]}`)
|
||
- Completeness (endorsed count == target_count)
|
||
- Worker existence (every candidate_id present in workers_500k via SQL lookup)
|
||
- Status check (every worker has status=active, not_on_client_blacklist)
|
||
- Geo/role match (worker city/state/role matches contract)
|
||
- `crates/validator/src/staffing/email.rs` — generated email/SMS drafts:
|
||
- Schema (TO/BODY fields present)
|
||
- Length (SMS ≤ 160 chars; email subject ≤ 78 chars)
|
||
- PII absence (no SSN / salary leaked into outgoing text)
|
||
- Worker-name consistency (name in message matches worker record)
|
||
- `crates/validator/src/staffing/playbook.rs` — sealed playbook:
|
||
- Operation format (`fill: Role xN in City, ST`)
|
||
- endorsed_names non-empty, ≤ target_count × 2
|
||
- fingerprint populated (Phase 25 validity window requirement)
|
||
- `crates/validator/src/devops.rs` — **scaffold only**: stubbed Terraform/Ansible validators (`terraform validate`, `ansible-lint`) for the long-horizon phase
|
||
- Task execution loop in gateway: generate → validate → if fail, observer correction + retry (bounded by `max_iterations=3`)
|
||
- Validation results logged to observer (`data/_observer/ops.jsonl`) + KB (`data/_kb/outcomes.jsonl`)
|
||
|
||
**Gate:**
|
||
- Generate a fill proposal → validator catches a phantom worker_id → observer + cloud rescue propose correction → retry → green. This reproduces the 0→85% pattern on the live staffing pipeline.
|
||
- `/v1/usage` shows iteration count per task, provider fallback chain, and tokens-per-iteration. Cost attribution per task class visible.
|
||
- Reproduces 14× citation-lift finding from Phase 19 refinement on similar geos after validation gates.
|
||
|
||
**Non-goals:** Caller migration (Phase 44), Terraform/Ansible wired validation (long-horizon).
|
||
|
||
**Risk:** Medium. Validation shapes have to match actual executor outputs; mitigation is using real scenario runs as test fixtures (we have ~100 of them in `tests/multi-agent/playbooks/`).
|
||
|
||
---
|
||
|
||
## Phase 44 — Caller Migration + Direct-Provider Deprecation
|
||
|
||
**Goal:** Every internal LLM caller routes through `/v1/chat`. Direct sidecar / direct Ollama / direct OpenAI calls are removed or explicitly deprecated with a warning.
|
||
|
||
**Ships:**
|
||
- `aibridge::AiClient` becomes a thin `/v1/chat` client (was direct-to-sidecar)
|
||
- `crates/vectord::agent` (autotune): routes through `/v1`
|
||
- `crates/vectord::autotune`: routes through `/v1`
|
||
- `tests/multi-agent/agent.ts::generate()`: routes through `/v1`
|
||
- `bot/propose.ts`: routes through `/v1` (already proposed as Phase 38's test consumer, formalized here)
|
||
- Lint rule / grep pre-commit hook: no `fetch.*:3200/generate` outside the provider adapters
|
||
|
||
**Gate:**
|
||
- `grep -r "localhost:3200/generate\|/api/generate"` returns only adapter files + deprecation shims
|
||
- `/v1/usage` accounts for every LLM call in the system within a 1-minute window after hitting a fresh scenario
|
||
- Full scenario passes end-to-end without any caller bypassing `/v1/*`
|
||
|
||
**Non-goals:** New features. This phase is purely mechanical migration.
|
||
|
||
**Risk:** Low. Mechanical. Tests catch regressions.
|
||
|
||
---
|
||
|
||
## Phase 45 — Doc-drift detection + context7 integration
|
||
|
||
**Goal:** Playbooks know which external docs they were written against. When those docs change (Docker adds a feature, npm lib goes major, Terraform renames a resource), the playbook is automatically flagged. Small models never run confidently-outdated procedures — the drift signal reaches them before the next execution does.
|
||
|
||
**Why this phase exists at all:** The 0→85% thesis depends on the hyperfocus lane staying valid. External doc drift invalidates the lane silently — popular playbooks can compound the wrong way, accumulating boost while growing more wrong. Phase 25 already retires playbooks on *internal* schema drift; Phase 45 is the same mechanism against *external* doc drift. This is the completion of the learning loop, not an optional add-on.
|
||
|
||
**Ships:**
|
||
- `shared::types::DocRef` — `{ tool: String, version_seen: String, snippet_hash: Option<String>, source_url: Option<String>, seen_at: DateTime<Utc> }`
|
||
- `PlaybookEntry.doc_refs: Vec<DocRef>` — `#[serde(default)]` so pre-Phase-45 entries load as empty vec
|
||
- `/vectors/playbook_memory/seed` + `/revise` accept `doc_refs` in the request body
|
||
- `/vectors/playbook_memory/doc_drift/check/{id}` — manual drift check: looks up each `doc_refs[]` entry via the context7 bridge, returns per-tool `{version_seen, version_current, drifted: bool}` plus overall verdict
|
||
- `/vectors/playbook_memory/doc_drift/scan` — batch scan across all active playbooks (scheduled path for Phase 45.2)
|
||
- `mcp-server/context7_bridge.ts` — Bun HTTP bridge. Exposes `GET /docs/:tool/version` + `GET /docs/:tool/:version/diff?since=X` against the installed context7 MCP plugin. Gateway calls this over localhost.
|
||
- `PlaybookMemory::compute_boost_for_filtered_with_role` — excludes entries where `doc_drift_flagged_at.is_some() && doc_drift_review.is_none()` (same rule as retired + superseded)
|
||
- Overview model synthesis writes `data/_kb/doc_drift_corrections.jsonl` per detected drift: `{playbook_id, tool, version_seen, version_current, diff_summary, recommended_action, generated_at}`
|
||
- Human-in-the-loop re-seal path: `/vectors/playbook_memory/doc_drift/resolve/{id}` — marks reviewed, optionally triggers `revise_entry` if procedure changed
|
||
|
||
**Gate:**
|
||
- Seal a playbook referencing Docker 24.x → doc_refs captured. Bump Docker version behind the scenes → `/doc_drift/check/{id}` returns `drifted: true, from: 24.0.7, to: 25.0.1, summary: "..."`. The boosted playbook count on next `/vectors/hybrid` query drops by 1 (drift-flagged skipped).
|
||
- `doc_drift_corrections.jsonl` contains the overview model's synthesis for the drift with at least: summary of change, recommended action, cost/impact estimate.
|
||
- Human calls `/doc_drift/resolve/{id}` after reviewing → playbook returns to active boost pool (or supersedes via Phase 27 if procedure materially changed).
|
||
- Unit tests: DocRef serde default (legacy entries load as empty), drift check against mocked context7 bridge, boost exclusion when drifted+unreviewed.
|
||
|
||
**Non-goals (explicit):**
|
||
- Automatic re-seal without human review. Drift-detection → flag, not silent rewrite.
|
||
- Cross-playbook propagation of one drift diagnosis. Each playbook reviewed individually (aggregation later if warranted).
|
||
- Generating the updated procedure. T3 *suggests*; human or separate bot (see `bot/`) *writes*.
|
||
|
||
**Risk:** Medium. The context7 bridge is new infrastructure (Bun ↔ context7 MCP plugin ↔ HTTP shape for gateway consumption). Mitigation: context7 plugin is already installed; its MCP tools return structured JSON; the bridge is thin adapter code. Start with single-tool drift check (Docker) before broadening.
|
||
|
||
---
|
||
|
||
## Long-horizon domains (not in current phase sequence)
|
||
|
||
The architecture was drafted with DevOps execution (Terraform, Ansible) as the eventual target. **That remains aspirational, not current scope** — we don't start wiring `terraform validate` / `ansible-lint` until the staffing domain proves the six-layer architecture at scale.
|
||
|
||
What "proves at scale" means concretely:
|
||
- Phases 38-44 all shipped against staffing, green tests
|
||
- Live staffing pipeline handles **multiple concurrent contracts** with emails + SMS + indexed playbooks via `/v1/*`
|
||
- Observed **iteration success lift** (the 0→85% pattern) reproduced on varied staffing scenarios, not just the original proof-of-concept
|
||
- Token + cost accounting stable across provider fallback chains under real load
|
||
- Truth Layer rules prevent real fill errors before cloud burn (not just theoretical)
|
||
|
||
When staffing hits that bar, the DevOps domain lights up by:
|
||
- Populating `crates/truth/src/devops.rs` with real Terraform/Ansible rule shapes
|
||
- Populating `crates/validator/src/devops.rs` with `terraform validate` / `ansible-lint` shell-out
|
||
- Adding DevOps task classes to `/v1/context` rule lookup
|
||
- No architectural changes needed — the dispatcher, router, and execution loop stay identical.
|
||
|
||
Other candidate long-horizon domains (same pattern):
|
||
- Code generation tasks (validation via `cargo check` / `bun test`)
|
||
- SQL query generation (validation via EXPLAIN + schema compliance)
|
||
- Data pipeline definitions (validation via lineage check + schema compliance)
|
||
|
||
None of these are in the current roadmap. **Staffing first, production-proven, then expand.**
|
||
|
||
---
|
||
|
||
## 1. Purpose
|
||
|
||
Design and implement a universal AI control-plane API that enables:
|
||
|
||
- **deterministic high-stakes task execution** — the immediate domain is staffing fills (contracts, workers, emails, SMS) at scale; the same architecture extends later to DevOps (Terraform, Ansible) without redesign
|
||
- iterative capability amplification via observer loops
|
||
- hybrid local + cloud model orchestration
|
||
- structured knowledge + memory + playbook reuse
|
||
- controlled improvement over time through validated iteration
|
||
|
||
The system prioritizes **validated pipeline success over raw model intelligence**.
|
||
|
||
### Current scope — staffing at scale
|
||
|
||
The architecture must make the already-built staffing substrate reliably answer millions of inputs: pull real data, graph it across contracts, handle multiple concurrent contracts, index emails + SMS + playbooks via the hybrid SQL+vector method, and get **faster and better each iteration** via the feedback loops (Phase 19 playbook boost, Phase 22 KB pathway recommender, Phase 24 observer, Phase 26 Mem0 upsert).
|
||
|
||
DevOps is an eventual domain — see §Long-horizon domains.
|
||
|
||
## 2. Core Objectives
|
||
|
||
### 2.1 Functional Goals
|
||
|
||
- Provide a single universal API for all AI interactions
|
||
- Support multi-provider routing (local, flat-rate, token-based)
|
||
- Enable iterative execution loops with observer correction
|
||
- Store and reuse successful execution playbooks
|
||
- Integrate: S3-based knowledge storage, LanceDB retrieval/indexing, Mem0 memory layer, MCP tool ecosystem
|
||
|
||
### 2.2 Non-Functional Goals
|
||
|
||
- Deterministic behavior under constrained execution
|
||
- Full observability and cost accounting
|
||
- Safe DevOps execution (no uncontrolled mutation)
|
||
- Profile-driven routing and execution
|
||
- Reproducibility of successful runs
|
||
|
||
## 3. System Architecture
|
||
|
||
### 3.1 Layer Overview
|
||
|
||
**Layer 1 — Universal API**
|
||
|
||
Single entry point for all applications. Endpoints:
|
||
|
||
- `/v1/chat`
|
||
- `/v1/respond`
|
||
- `/v1/tools`
|
||
- `/v1/context`
|
||
- `/v1/usage`
|
||
- `/v1/sessions`
|
||
|
||
All programs must use this layer. No direct provider calls allowed.
|
||
|
||
**Layer 2 — Routing & Policy Engine**
|
||
|
||
Responsibilities: provider selection, fallback logic, cost gating, premium access control, profile enforcement.
|
||
Routing based on: task type, constraints, execution profile, system health.
|
||
|
||
**Layer 3 — Provider Adapter Layer**
|
||
|
||
Normalizes all providers: Ollama (local), OpenRouter, Gemini (direct), Claude (direct or routed), future providers.
|
||
Guarantee: no provider-specific logic leaks upward.
|
||
|
||
**Layer 4 — Knowledge & Memory Plane**
|
||
|
||
- Knowledge (S3 + LanceDB): raw documents, processed chunks, embeddings, index profiles
|
||
- Memory (Mem0): extracted facts, entity-linked memory, session-aware retrieval
|
||
- Playbooks: successful execution traces, reusable patterns, correction strategies
|
||
|
||
**Layer 5 — Execution Loop**
|
||
|
||
Each task runs through: Retrieval → Planning → Generation → Validation → Observer feedback → Iteration (if needed).
|
||
|
||
**Layer 6 — Observability & Accounting**
|
||
|
||
Every request logs: tokens (input/output), cost, latency, provider, fallback chain, profile used, iteration delta.
|
||
|
||
## 4. Execution Model
|
||
|
||
### 4.1 Iterative Loop
|
||
|
||
Each task follows: **Attempt → Validate → Observe → Adjust → Retry**
|
||
|
||
Constraints:
|
||
|
||
- max iterations (default: 3)
|
||
- minimum improvement threshold
|
||
- cost ceiling per task
|
||
|
||
### 4.2 Observer Role
|
||
|
||
Observer can: analyze failure, suggest corrections, recommend profile changes.
|
||
Observer cannot: modify truth layer, auto-promote changes, override constraints.
|
||
|
||
### 4.3 Cloud Escalation
|
||
|
||
Cloud models (Gemini, Claude) are used for: structural correction, reasoning gaps, complex decomposition.
|
||
They are not used for: brute-force retries, bulk execution.
|
||
|
||
## 5. Profile System
|
||
|
||
### 5.1 Profile Types
|
||
|
||
- **Retrieval Profile** — chunking strategy, embedding method, reranking rules
|
||
- **Memory Profile** — memory weighting, context injection rules
|
||
- **Execution Profile** — allowed providers, tool access, risk level
|
||
- **Observer Profile** — mutation aggressiveness, iteration strategy
|
||
|
||
### 5.2 Profile Constraints
|
||
|
||
- only one major profile change per iteration
|
||
- profiles must produce measurable deltas
|
||
- promotion requires repeated success
|
||
|
||
## 6. Truth Layer (Critical)
|
||
|
||
Defines non-negotiable constraints:
|
||
|
||
- Terraform rules
|
||
- Ansible structure requirements
|
||
- security policies
|
||
- organization standards
|
||
|
||
Rules:
|
||
|
||
- immutable at runtime
|
||
- referenced by all layers
|
||
- cannot be overridden by observer
|
||
|
||
## 7. Playbook System
|
||
|
||
### 7.1 Playbook Definition
|
||
|
||
Each successful run produces: task class, context used, steps executed, tools used, output artifacts, validation results, cost/latency, success score.
|
||
|
||
### 7.2 Playbook Lifecycle
|
||
|
||
- created on success
|
||
- reused for similar tasks
|
||
- decayed over time
|
||
- pruned if ineffective
|
||
|
||
## 8. Validation System
|
||
|
||
All DevOps outputs must pass: syntax validation, linting, dry-run, policy compliance.
|
||
Failure → iteration continues or task fails.
|
||
|
||
## 9. MCP Integration
|
||
|
||
MCP servers provide: tools, external data, execution capabilities.
|
||
All MCP outputs must be: normalized, validated, schema-compliant.
|
||
No direct MCP output reaches the model.
|
||
|
||
## 10. Token Accounting & Budget Control
|
||
|
||
Each request tracks: input tokens, output tokens, retries, fallback cost.
|
||
Policies: premium providers gated, cost ceilings enforced, per-task budget limits.
|
||
|
||
## 11. Failure Handling
|
||
|
||
**Recoverable failures:** bad decomposition, missing steps, weak retrieval → observer + iteration.
|
||
|
||
**Hard failures:** missing truth data, invalid task classification, unsafe execution → termination + error report.
|
||
|
||
## 12. Success Criteria
|
||
|
||
A task is successful only if:
|
||
|
||
- output is valid
|
||
- all validators pass
|
||
- no policy violations
|
||
- result is reproducible
|
||
- cost within limits
|
||
|
||
## 13. Key Risks & Mitigations
|
||
|
||
- **Observer drift** → bounded authority, confidence tracking
|
||
- **Memory poisoning** → validation layer, memory weighting
|
||
- **Cost explosion** → token accounting, iteration caps
|
||
- **Retrieval errors** → post-retrieval validation, profile tuning
|