matrix-agent-validated/docs/CONTROL_PLANE_PRD.md
profit ac01fffd9a checkpoint: matrix-agent-validated (2026-04-25)
Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.

WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.

WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
  briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
    * UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
    * REVISE: chains versions, parent.superseded_at + superseded_by stamped
    * RETIRE: marks specific trace retired with reason, excluded from retrieval
    * HISTORY: walks chain root→tip, cycle-safe

KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces

Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 19:43:27 -05:00

458 lines
26 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# PRD — Universal AI Control Plane
**Status:** Long-horizon architecture target as of 2026-04-22. Lakehouse Phases 0-37 (`docs/PRD.md`) are preserved as the reference implementation and first domain-specific consumer. Phases 38+ (control-plane layers) are sequenced below.
**Current domain: staffing.** The immediate proving ground is the staffing substrate already built — synthetic workers_500k, contracts, emails, SMS drafts, playbook memory. Everything Phase 38-44 ships is validated first against that domain. The DevOps / Terraform / Ansible framing from the original PRD draft stays as a **long-horizon target** — architecture-compatible but not in current scope. See §Long-horizon domains at the bottom.
**Owner:** J
**Cross-read:** `docs/PRD.md` for what's shipped (staffing + AI substrate, 13 crates, ~3M rows). This doc for the layered architecture those pieces now fit into.
---
## Phase Sequencing (Phases 38-44)
Ship each phase before starting the next. Each ends with green tests + docs update.
| Phase | Layer | What ships | Est. LOC | Risk |
|---|---|---|---|---|
| 38 | Layer 1 skeleton | `/v1/chat`, `/v1/usage`, `/v1/sessions` routes forwarding to existing `aibridge` → Ollama. Bot migrates as first consumer. | ~400 | Low — additive, no existing routes touched |
| 39 | Layer 3 adapters | `aibridge::ProviderAdapter` trait; Ollama + one new (OpenRouter). `/v1/chat` routes by config. | ~500 | Low-medium |
| 40 | Layer 2 engine | Rules-based routing (`config/routing.toml`), fallback chains, cost gating. Add Gemini + Claude adapters. | ~600 | Medium |
| 41 | Profile split | Separate Retrieval / Memory / Execution / Observer profiles; Phase 17 backward-compat. Absorbs Phase 37 hot-swap-async. | ~300 | Medium |
| 42 | Truth Layer | New `crates/truth`; Terraform/Ansible schemas; `/v1/context` serves rules to router + observer. | ~700 | Medium |
| 43 | Validation pipeline | Syntax/lint/dry-run/policy gates per output type. Plugs into Layer 5 execution loop. | ~400 | Medium |
| 44 | Caller migration | All internal callers route through `/v1/chat`. Direct sidecar access deprecated. | ~200 | Low |
**Total ≈3100 LOC.** Phase 37 (hot-swap async) folds into Phase 41 — it's an Execution-Profile activation concern.
---
## Phase 38 — Universal API Skeleton
**Goal:** OpenAI-compatible `/v1/*` surface exists and forwards to existing aibridge → Ollama. Nothing about multi-provider yet — just the SHAPE, so every downstream piece (adapters, routing, usage accounting) has a surface to plug into.
**Ships:**
- `crates/gateway/src/v1/mod.rs` — router + `/v1/chat`, `/v1/usage`, `/v1/sessions`
- `crates/gateway/src/v1/ollama.rs` — shape adapter (OpenAI chat ↔ existing aibridge `GenerateRequest`)
- One-line `nest("/v1", ...)` in `crates/gateway/src/main.rs`
- Unit test: `POST /v1/chat` roundtrips through mocked provider
**Gate:**
- `curl -X POST localhost:3100/v1/chat -d '{"model":"qwen3.5:latest","messages":[{"role":"user","content":"hi"}]}'` returns valid OpenAI-shape response.
- `GET localhost:3100/v1/usage` returns `{requests, prompt_tokens, completion_tokens, total_tokens}`.
- `GET localhost:3100/v1/sessions` returns `{data:[]}` (stub; real impl Phase 41).
- `cargo test -p gateway` green.
**Non-goals (explicit):** streaming, tool calls, function calling, session state, multi-provider, fallback, cost gating.
**Risk:** Low — additive, doesn't touch existing routes. Worst case: `/v1/*` returns 502 and we fix the adapter. No existing caller affected.
---
## Phase 39 — Provider Adapter Refactor
**Goal:** `aibridge` grows a `ProviderAdapter` trait. Ollama implementation wraps existing sidecar code. One new provider lands as proof: **OpenRouter** (simplest — it's OpenAI-compatible, so adapter is mostly passthrough).
**Ships:**
- `crates/aibridge/src/provider.rs``ProviderAdapter` trait with `chat()` + `embed()` + `unload()` methods
- `crates/aibridge/src/providers/ollama.rs` — existing sidecar code moved behind the trait
- `crates/aibridge/src/providers/openrouter.rs` — new, HTTP client to `openrouter.ai/api/v1/chat/completions`
- `config/providers.toml` — provider registry (name, base_url, auth, default_models)
- `/v1/chat` routes by `model` field: prefix match (e.g. `openrouter/anthropic/claude-3.5-sonnet` → OpenRouter; bare names → Ollama)
**Gate:**
- `/v1/chat` with `model: "qwen3.5:latest"` hits Ollama → green
- `/v1/chat` with `model: "openrouter/openai/gpt-4o-mini"` hits OpenRouter (key from secrets.toml) → green
- Neither call leaks provider-specific fields upward. Response is always the `/v1/chat` shape.
**Non-goals:** Fallback chain (Phase 40), cost gating (Phase 40), Gemini/Claude adapters (Phase 40).
**Risk:** Low-medium. The trait extraction is mostly a rearrange; OpenRouter is thin. Biggest risk is secret-loading conventions — `SecretsProvider` is already in place, so reuse that path.
---
## Phase 40 — Routing & Policy Engine + Observability Recovery
**Goal:** Replace hardcoded T1-T5 routing with a rules engine. Add Gemini + Claude adapters. Cost gating enforced at router level. **Reinstate Langfuse + Gitea MCP** — recovery of the observability + repo-ops stack J built previously (see `project_lost_stack` memory).
**Ships — routing:**
- `crates/aibridge/src/routing.rs` — rules engine (match on: task type, token budget, previous attempt failures, profile ID)
- `config/routing.toml` — rules in TOML (human-editable, hot-reloadable)
- `crates/aibridge/src/providers/gemini.rs``generativelanguage.googleapis.com` adapter
- `crates/aibridge/src/providers/claude.rs``api.anthropic.com` adapter
- Fallback chain support: if primary returns 5xx or times out, try next in chain
- Cost gate: per-request budget + daily budget per-provider
**Ships — observability (was lost, now restored):**
- **Langfuse** self-hosted via Docker Compose. Single source of truth for every LLM call trace: prompt / response / tokens / cost / latency / provider / fallback chain / profile used. UI at `localhost:3000`. Keys in `/etc/lakehouse/secrets.toml`.
- `crates/aibridge/src/langfuse.rs` — thin fire-and-forget trace emitter. Every `/v1/chat` call spawns a background task that POSTs to `langfuse/api/public/ingestion`. Non-blocking: trace failures never affect response.
- **Langfuse → observer pipe** — `mcp-server/langfuse_bridge.ts` or similar. Polls Langfuse's trace API at interval, forwards completed traces to observer `:3800/event` with `source: "langfuse"`. KB now sees cost/latency deltas per model, not just outcome deltas.
- **Gitea MCP reconnect** — the MCP server binary still installed at `/home/profit/.bun/install/cache/gitea-mcp@0.0.10/` gets wired into `mcp-server/index.ts` tool registry. Agents can open PRs, comment on issues, list commits via named tools. Closes Phase 28's repo-ops gap.
**Gate:**
- Rule like "local models for simple JSON emitters, cloud for reasoning" fires correctly by task type
- Primary fails → fallback provider hits, response still matches `/v1/chat` shape
- Daily budget hit → subsequent requests return 429 with clear retry-at header
- `/v1/usage` reports per-provider breakdown
- **Every `/v1/chat` call appears in Langfuse UI** with correct prompt, response, latency, token count within 2 seconds of the request completing
- **Langfuse → observer pipe** delivers trace deltas to KB: `GET :3800/stats?source=langfuse` shows non-zero count after a few scenarios run
- **Gitea MCP tools callable** — `list_prs`, `open_pr`, `comment_on_issue` exposed in `mcp-server/index.ts`, verifiable via a quick agent scenario
**Non-goals:** Retrieval Profile split (Phase 41), Truth Layer (Phase 42). Langfuse self-hosted UI customization / SSO.
**Risk:** Medium. Multi-provider auth + cost tracking is cross-cutting; Langfuse adds 4-5 Docker containers (PostgreSQL, ClickHouse, Redis, web, worker). Mitigation: every provider call wrapped in a single `dispatch()` function so observability flows through one point; Langfuse Docker Compose is their supported deployment path, well-tested.
---
## Phase 41 — Profile System Expansion (+ Phase 37 hot-swap async folded in)
**Goal:** The existing `ModelProfile` (Phase 17) becomes **ExecutionProfile**. Three new profile types land alongside. Profile activation is async — returns job_id, work runs in background (Phase 37 deliverable).
**Ships:**
- `crates/shared/src/profiles/``ExecutionProfile`, `RetrievalProfile`, `MemoryProfile`, `ObserverProfile`
- `crates/catalogd` gains per-profile-type CRUD endpoints (`/catalog/profiles/retrieval`, etc.)
- `crates/vectord/src/activation.rs``ActivationTracker` with background-job pattern (Phase 37 content)
- `POST /vectors/profile/{id}/activate` returns 202 + job_id, polling at `GET /vectors/profile/jobs/{id}`
- Single-flight guard: refuse new activation if one is pending/running
- Backward compat: `ModelProfile` still loads, aliased to ExecutionProfile
**Gate:**
- Activate a profile → returns 202 in <100ms job completes in background `/vectors/profile/jobs/{id}` shows progress + final report
- `tests/multi-agent/run_stress.ts` Phase 3 (hot-swap stress) passes (was SKIPPED)
- Retrieval + Memory + Observer profiles can be created independently of Execution profile
**Non-goals:** Truth Layer (Phase 42), validation (Phase 43), caller migration (Phase 44).
**Risk:** Medium. Schema change + async refactor. Mitigation: `#[serde(default)]` on all new fields; existing profiles load unchanged.
---
## Phase 42 — Truth Layer (staffing rules first)
**Goal:** New `crates/truth` crate holds immutable task-class constraints. Served via `/v1/context` to router and observer. No layer can override truth. **Staffing rules ship first**; Terraform/Ansible rule shapes are scaffolded but unpopulated until the long-horizon phase.
**Ships:**
- `crates/truth/src/lib.rs` `TruthStore` with schema loading (TOML/YAML rules)
- `crates/truth/src/staffing.rs` staffing rule shapes:
- Worker eligibility (active status, not blacklisted for client, geo match, role match, availability window)
- Contract invariants (deadline present, role/count/city/state populated, budget_per_hour_max 0)
- PII handling (redaction rules on fields tagged `PII` before any cloud call covers existing Phase 10 sensitivity tags)
- Client blacklist enforcement (auto-applied before any fill proposal)
- Fill requirements (endorsed_names count matches target_count, no duplicate worker_ids within a single fill)
- `crates/truth/src/devops.rs` **scaffold only**: empty rule struct for Terraform/Ansible, populated in the long-horizon phase. Keeps the dispatcher signature stable so no refactor needed later.
- `truth/` dir at repo root rule files, versioned in git
- `/v1/context` endpoint returns applicable rules for a task class (`staffing.fill`, `staffing.rescue`, `staffing.sms_draft`, etc.)
- Router consults truth before dispatching: if task violates a rule, hard-fail with structured error + rule citation (matches existing Phase 13 access-control pattern)
**Gate:**
- Submit a fill proposal where a worker is client-blacklisted router returns 422 + rule citation, no cloud tokens burned
- Submit a fill with `endorsed_names.length != target_count` 422 before dispatch
- Observer cannot promote a correction that violates truth (rejected at router gate)
- PII redaction verified: SSN / salary fields stripped from prompts before cloud calls
- Truth reload is explicit (no file-watch hot reload in this phase)
**Non-goals:** Validation execution (Phase 43), policy learning / evolution (deferred), actual Terraform/Ansible rules (long-horizon phase).
**Risk:** Medium. Domain-specific rule enumeration takes discovery start with a minimal rule set (5-10 staffing rules, derived from existing Phase 10-13 work) and grow organically as real fills surface edge cases.
---
## Phase 43 — Validation Pipeline (staffing outputs first)
**Goal:** Staffing outputs run through schema / completeness / consistency / policy gates. Plug into Layer 5 execution loop failure triggers observer-correction iteration. This is where the **0→85% pattern reproduces on real staffing tasks** the iteration loop with validation in place is what made small models successful.
**Ships:**
- `crates/validator/src/lib.rs` `Validator` trait: `validate(artifact) -> Result<Report, ValidationError>` + `Artifact` enum over output types
- `crates/validator/src/staffing/fill.rs` fill-proposal validator:
- Schema compliance (propose_done shape matches `{fills: [{candidate_id, name}]}`)
- Completeness (endorsed count == target_count)
- Worker existence (every candidate_id present in workers_500k via SQL lookup)
- Status check (every worker has status=active, not_on_client_blacklist)
- Geo/role match (worker city/state/role matches contract)
- `crates/validator/src/staffing/email.rs` generated email/SMS drafts:
- Schema (TO/BODY fields present)
- Length (SMS 160 chars; email subject 78 chars)
- PII absence (no SSN / salary leaked into outgoing text)
- Worker-name consistency (name in message matches worker record)
- `crates/validator/src/staffing/playbook.rs` sealed playbook:
- Operation format (`fill: Role xN in City, ST`)
- endorsed_names non-empty, target_count × 2
- fingerprint populated (Phase 25 validity window requirement)
- `crates/validator/src/devops.rs` **scaffold only**: stubbed Terraform/Ansible validators (`terraform validate`, `ansible-lint`) for the long-horizon phase
- Task execution loop in gateway: generate validate if fail, observer correction + retry (bounded by `max_iterations=3`)
- Validation results logged to observer (`data/_observer/ops.jsonl`) + KB (`data/_kb/outcomes.jsonl`)
**Gate:**
- Generate a fill proposal validator catches a phantom worker_id observer + cloud rescue propose correction retry green. This reproduces the 085% pattern on the live staffing pipeline.
- `/v1/usage` shows iteration count per task, provider fallback chain, and tokens-per-iteration. Cost attribution per task class visible.
- Reproduces 14× citation-lift finding from Phase 19 refinement on similar geos after validation gates.
**Non-goals:** Caller migration (Phase 44), Terraform/Ansible wired validation (long-horizon).
**Risk:** Medium. Validation shapes have to match actual executor outputs; mitigation is using real scenario runs as test fixtures (we have ~100 of them in `tests/multi-agent/playbooks/`).
---
## Phase 44 — Caller Migration + Direct-Provider Deprecation
**Goal:** Every internal LLM caller routes through `/v1/chat`. Direct sidecar / direct Ollama / direct OpenAI calls are removed or explicitly deprecated with a warning.
**Ships:**
- `aibridge::AiClient` becomes a thin `/v1/chat` client (was direct-to-sidecar)
- `crates/vectord::agent` (autotune): routes through `/v1`
- `crates/vectord::autotune`: routes through `/v1`
- `tests/multi-agent/agent.ts::generate()`: routes through `/v1`
- `bot/propose.ts`: routes through `/v1` (already proposed as Phase 38's test consumer, formalized here)
- Lint rule / grep pre-commit hook: no `fetch.*:3200/generate` outside the provider adapters
**Gate:**
- `grep -r "localhost:3200/generate\|/api/generate"` returns only adapter files + deprecation shims
- `/v1/usage` accounts for every LLM call in the system within a 1-minute window after hitting a fresh scenario
- Full scenario passes end-to-end without any caller bypassing `/v1/*`
**Non-goals:** New features. This phase is purely mechanical migration.
**Risk:** Low. Mechanical. Tests catch regressions.
---
## Phase 45 — Doc-drift detection + context7 integration
**Goal:** Playbooks know which external docs they were written against. When those docs change (Docker adds a feature, npm lib goes major, Terraform renames a resource), the playbook is automatically flagged. Small models never run confidently-outdated procedures the drift signal reaches them before the next execution does.
**Why this phase exists at all:** The 085% thesis depends on the hyperfocus lane staying valid. External doc drift invalidates the lane silently popular playbooks can compound the wrong way, accumulating boost while growing more wrong. Phase 25 already retires playbooks on *internal* schema drift; Phase 45 is the same mechanism against *external* doc drift. This is the completion of the learning loop, not an optional add-on.
**Ships:**
- `shared::types::DocRef` `{ tool: String, version_seen: String, snippet_hash: Option<String>, source_url: Option<String>, seen_at: DateTime<Utc> }`
- `PlaybookEntry.doc_refs: Vec<DocRef>` `#[serde(default)]` so pre-Phase-45 entries load as empty vec
- `/vectors/playbook_memory/seed` + `/revise` accept `doc_refs` in the request body
- `/vectors/playbook_memory/doc_drift/check/{id}` manual drift check: looks up each `doc_refs[]` entry via the context7 bridge, returns per-tool `{version_seen, version_current, drifted: bool}` plus overall verdict
- `/vectors/playbook_memory/doc_drift/scan` batch scan across all active playbooks (scheduled path for Phase 45.2)
- `mcp-server/context7_bridge.ts` Bun HTTP bridge. Exposes `GET /docs/:tool/version` + `GET /docs/:tool/:version/diff?since=X` against the installed context7 MCP plugin. Gateway calls this over localhost.
- `PlaybookMemory::compute_boost_for_filtered_with_role` excludes entries where `doc_drift_flagged_at.is_some() && doc_drift_review.is_none()` (same rule as retired + superseded)
- Overview model synthesis writes `data/_kb/doc_drift_corrections.jsonl` per detected drift: `{playbook_id, tool, version_seen, version_current, diff_summary, recommended_action, generated_at}`
- Human-in-the-loop re-seal path: `/vectors/playbook_memory/doc_drift/resolve/{id}` marks reviewed, optionally triggers `revise_entry` if procedure changed
**Gate:**
- Seal a playbook referencing Docker 24.x doc_refs captured. Bump Docker version behind the scenes `/doc_drift/check/{id}` returns `drifted: true, from: 24.0.7, to: 25.0.1, summary: "..."`. The boosted playbook count on next `/vectors/hybrid` query drops by 1 (drift-flagged skipped).
- `doc_drift_corrections.jsonl` contains the overview model's synthesis for the drift with at least: summary of change, recommended action, cost/impact estimate.
- Human calls `/doc_drift/resolve/{id}` after reviewing playbook returns to active boost pool (or supersedes via Phase 27 if procedure materially changed).
- Unit tests: DocRef serde default (legacy entries load as empty), drift check against mocked context7 bridge, boost exclusion when drifted+unreviewed.
**Non-goals (explicit):**
- Automatic re-seal without human review. Drift-detection flag, not silent rewrite.
- Cross-playbook propagation of one drift diagnosis. Each playbook reviewed individually (aggregation later if warranted).
- Generating the updated procedure. T3 *suggests*; human or separate bot (see `bot/`) *writes*.
**Risk:** Medium. The context7 bridge is new infrastructure (Bun context7 MCP plugin HTTP shape for gateway consumption). Mitigation: context7 plugin is already installed; its MCP tools return structured JSON; the bridge is thin adapter code. Start with single-tool drift check (Docker) before broadening.
---
## Long-horizon domains (not in current phase sequence)
The architecture was drafted with DevOps execution (Terraform, Ansible) as the eventual target. **That remains aspirational, not current scope** we don't start wiring `terraform validate` / `ansible-lint` until the staffing domain proves the six-layer architecture at scale.
What "proves at scale" means concretely:
- Phases 38-44 all shipped against staffing, green tests
- Live staffing pipeline handles **multiple concurrent contracts** with emails + SMS + indexed playbooks via `/v1/*`
- Observed **iteration success lift** (the 085% pattern) reproduced on varied staffing scenarios, not just the original proof-of-concept
- Token + cost accounting stable across provider fallback chains under real load
- Truth Layer rules prevent real fill errors before cloud burn (not just theoretical)
When staffing hits that bar, the DevOps domain lights up by:
- Populating `crates/truth/src/devops.rs` with real Terraform/Ansible rule shapes
- Populating `crates/validator/src/devops.rs` with `terraform validate` / `ansible-lint` shell-out
- Adding DevOps task classes to `/v1/context` rule lookup
- No architectural changes needed the dispatcher, router, and execution loop stay identical.
Other candidate long-horizon domains (same pattern):
- Code generation tasks (validation via `cargo check` / `bun test`)
- SQL query generation (validation via EXPLAIN + schema compliance)
- Data pipeline definitions (validation via lineage check + schema compliance)
None of these are in the current roadmap. **Staffing first, production-proven, then expand.**
---
## 1. Purpose
Design and implement a universal AI control-plane API that enables:
- **deterministic high-stakes task execution** the immediate domain is staffing fills (contracts, workers, emails, SMS) at scale; the same architecture extends later to DevOps (Terraform, Ansible) without redesign
- iterative capability amplification via observer loops
- hybrid local + cloud model orchestration
- structured knowledge + memory + playbook reuse
- controlled improvement over time through validated iteration
The system prioritizes **validated pipeline success over raw model intelligence**.
### Current scope — staffing at scale
The architecture must make the already-built staffing substrate reliably answer millions of inputs: pull real data, graph it across contracts, handle multiple concurrent contracts, index emails + SMS + playbooks via the hybrid SQL+vector method, and get **faster and better each iteration** via the feedback loops (Phase 19 playbook boost, Phase 22 KB pathway recommender, Phase 24 observer, Phase 26 Mem0 upsert).
DevOps is an eventual domain see §Long-horizon domains.
## 2. Core Objectives
### 2.1 Functional Goals
- Provide a single universal API for all AI interactions
- Support multi-provider routing (local, flat-rate, token-based)
- Enable iterative execution loops with observer correction
- Store and reuse successful execution playbooks
- Integrate: S3-based knowledge storage, LanceDB retrieval/indexing, Mem0 memory layer, MCP tool ecosystem
### 2.2 Non-Functional Goals
- Deterministic behavior under constrained execution
- Full observability and cost accounting
- Safe DevOps execution (no uncontrolled mutation)
- Profile-driven routing and execution
- Reproducibility of successful runs
## 3. System Architecture
### 3.1 Layer Overview
**Layer 1 — Universal API**
Single entry point for all applications. Endpoints:
- `/v1/chat`
- `/v1/respond`
- `/v1/tools`
- `/v1/context`
- `/v1/usage`
- `/v1/sessions`
All programs must use this layer. No direct provider calls allowed.
**Layer 2 — Routing & Policy Engine**
Responsibilities: provider selection, fallback logic, cost gating, premium access control, profile enforcement.
Routing based on: task type, constraints, execution profile, system health.
**Layer 3 — Provider Adapter Layer**
Normalizes all providers: Ollama (local), OpenRouter, Gemini (direct), Claude (direct or routed), future providers.
Guarantee: no provider-specific logic leaks upward.
**Layer 4 — Knowledge & Memory Plane**
- Knowledge (S3 + LanceDB): raw documents, processed chunks, embeddings, index profiles
- Memory (Mem0): extracted facts, entity-linked memory, session-aware retrieval
- Playbooks: successful execution traces, reusable patterns, correction strategies
**Layer 5 — Execution Loop**
Each task runs through: Retrieval Planning Generation Validation Observer feedback Iteration (if needed).
**Layer 6 — Observability & Accounting**
Every request logs: tokens (input/output), cost, latency, provider, fallback chain, profile used, iteration delta.
## 4. Execution Model
### 4.1 Iterative Loop
Each task follows: **Attempt → Validate → Observe → Adjust → Retry**
Constraints:
- max iterations (default: 3)
- minimum improvement threshold
- cost ceiling per task
### 4.2 Observer Role
Observer can: analyze failure, suggest corrections, recommend profile changes.
Observer cannot: modify truth layer, auto-promote changes, override constraints.
### 4.3 Cloud Escalation
Cloud models (Gemini, Claude) are used for: structural correction, reasoning gaps, complex decomposition.
They are not used for: brute-force retries, bulk execution.
## 5. Profile System
### 5.1 Profile Types
- **Retrieval Profile** chunking strategy, embedding method, reranking rules
- **Memory Profile** memory weighting, context injection rules
- **Execution Profile** allowed providers, tool access, risk level
- **Observer Profile** mutation aggressiveness, iteration strategy
### 5.2 Profile Constraints
- only one major profile change per iteration
- profiles must produce measurable deltas
- promotion requires repeated success
## 6. Truth Layer (Critical)
Defines non-negotiable constraints:
- Terraform rules
- Ansible structure requirements
- security policies
- organization standards
Rules:
- immutable at runtime
- referenced by all layers
- cannot be overridden by observer
## 7. Playbook System
### 7.1 Playbook Definition
Each successful run produces: task class, context used, steps executed, tools used, output artifacts, validation results, cost/latency, success score.
### 7.2 Playbook Lifecycle
- created on success
- reused for similar tasks
- decayed over time
- pruned if ineffective
## 8. Validation System
All DevOps outputs must pass: syntax validation, linting, dry-run, policy compliance.
Failure iteration continues or task fails.
## 9. MCP Integration
MCP servers provide: tools, external data, execution capabilities.
All MCP outputs must be: normalized, validated, schema-compliant.
No direct MCP output reaches the model.
## 10. Token Accounting & Budget Control
Each request tracks: input tokens, output tokens, retries, fallback cost.
Policies: premium providers gated, cost ceilings enforced, per-task budget limits.
## 11. Failure Handling
**Recoverable failures:** bad decomposition, missing steps, weak retrieval observer + iteration.
**Hard failures:** missing truth data, invalid task classification, unsafe execution termination + error report.
## 12. Success Criteria
A task is successful only if:
- output is valid
- all validators pass
- no policy violations
- result is reproducible
- cost within limits
## 13. Key Risks & Mitigations
- **Observer drift** bounded authority, confidence tracking
- **Memory poisoning** validation layer, memory weighting
- **Cost explosion** token accounting, iteration caps
- **Retrieval errors** post-retrieval validation, profile tuning