# AI Agent Governance System Architecture **Version:** 0.2.0 **Status:** Active Development **Last Updated:** 2026-01-23 --- ## Table of Contents 1. [Executive Summary](#executive-summary) 2. [System Architecture](#system-architecture) 3. [Core Components](#core-components) 4. [Agent Taxonomy](#agent-taxonomy) 5. [Runtime Governance](#runtime-governance) 6. [Multi-Agent Coordination](#multi-agent-coordination) 7. [Current Capabilities](#current-capabilities) 8. [Engineering Focus Areas](#engineering-focus-areas) 9. [Sample Implementations](#sample-implementations) 10. [Future Potential](#future-potential) --- ## Executive Summary This system implements a **governed AI agent framework** designed for safe, auditable, and scalable automation. The architecture enforces: - **Trust-tiered access control** via HashiCorp Vault - **Real-time governance** via DragonflyDB - **Structured agent lifecycles** with mandatory phases - **Multi-agent coordination** with parallel execution and conditional spawning - **Complete audit trails** via SQLite ledger The system prioritizes **legibility over magic** — every agent action must be explainable, reproducible, and auditable. --- ## System Architecture ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ GOVERNANCE LAYER │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────────┐ │ │ │ HashiCorp │ │ DragonflyDB │ │ SQLite Ledger │ │ │ │ Vault │ │ (Runtime) │ │ (Audit) │ │ │ │ │ │ │ │ │ │ │ │ • Policies │ │ • State │ │ • agent_actions │ │ │ │ • Secrets │ │ • Locks │ │ • agent_metrics │ │ │ │ • AppRole Auth │ │ • Heartbeats │ │ • violations │ │ │ │ • Token Leases │ │ • Errors │ │ • promotions │ │ │ └─────────────────┘ │ • Blackboard │ └─────────────────────────────┘ │ │ │ • Messages │ │ │ └─────────────────┘ │ ├─────────────────────────────────────────────────────────────────────────────┤ │ ORCHESTRATION LAYER │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ Multi-Agent Orchestrator │ │ │ │ • Parallel agent execution • Spawn condition monitoring │ │ │ │ • Performance metrics • Consensus coordination │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ ├─────────────────────────────────────────────────────────────────────────────┤ │ AGENT LAYER │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌─────────────┐ │ │ │ Agent ALPHA │ │ Agent BETA │ │ Agent GAMMA │ │ Governed │ │ │ │ (Research) │ │ (Synthesis) │ │ (Mediator) │ │ LLM Agent │ │ │ │ │ │ │ │ │ │ │ │ │ │ Parallel │◄─┼─► Direct ─┼──┼─► Spawned │ │ Single │ │ │ │ Execution │ │ Messages │ │ on │ │ Pipeline │ │ │ └───────┬───────┘ └───────┬───────┘ │ Condition │ └──────┬──────┘ │ │ │ │ └───────────────┘ │ │ │ └──────────┬───────┴────────────────────────────────────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ │ Blackboard │ (Shared Memory) │ │ │ • problem │ │ │ │ • solutions│ │ │ │ • progress │ │ │ │ • consensus│ │ │ └─────────────┘ │ ├─────────────────────────────────────────────────────────────────────────────┤ │ INFRASTRUCTURE LAYER │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────────┐ │ │ │ OpenRouter │ │ Bun Runtime │ │ WireGuard VPN │ │ │ │ (LLM API) │ │ (TypeScript) │ │ (Network) │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` --- ## Core Components ### 1. HashiCorp Vault (Policy Engine) **Purpose:** Centralized secrets management and trust-tier enforcement. ``` Location: https://127.0.0.1:8200 Storage: /opt/vault/data Policies: /opt/vault/policies/t{0-4}-*.hcl ``` **Key Features:** - AppRole authentication for agents - Dynamic secret generation - Token TTLs based on trust tier - Immediate revocation capabilities ### 2. DragonflyDB (Runtime State) **Purpose:** Real-time agent state, coordination, and governance signals. ``` Location: redis://127.0.0.1:6379 Credentials: vault:secret/data/services/dragonfly ``` **Keyspace Design:** ``` agent:{id}:packet → Instruction packet (JSON) agent:{id}:state → Runtime state (JSON) agent:{id}:errors → Error counters (Hash) agent:{id}:heartbeat → Last seen (String + TTL) agent:{id}:lock → Execution lock (String + TTL) task:{id}:active_agent → Current agent task:{id}:artifacts → Artifact references (List) blackboard:{task}:* → Shared memory sections msg:{task}:* → Direct message channels revocations:ledger → Revocation history (List) handoff:{task}:latest → Handoff objects (JSON) ``` ### 3. SQLite Ledger (Audit Trail) **Purpose:** Immutable record of all agent actions for compliance and replay. ``` Location: /opt/agent-governance/ledger/governance.db ``` **Schema:** ```sql CREATE TABLE agent_actions ( id INTEGER PRIMARY KEY, timestamp TEXT, agent_id TEXT, agent_version TEXT, tier INTEGER, action TEXT, decision TEXT, confidence REAL, success INTEGER, error_type TEXT, error_message TEXT ); CREATE TABLE agent_metrics ( agent_id TEXT PRIMARY KEY, current_tier INTEGER, total_runs INTEGER, compliant_runs INTEGER, consecutive_compliant INTEGER, last_active_at TEXT ); ``` --- ## Agent Taxonomy ### Trust Tiers | Tier | Name | Capabilities | Token TTL | |------|------|-------------|-----------| | 0 | Observer | Read docs, inventory, logs; Generate plans | 1h | | 1 | Operator | Sandbox SSH, basic Proxmox, Ansible check-mode | 30m | | 2 | Builder | Sandbox admin, create frameworks/modules | 30m | | 3 | Executor | Staging access, limited prod read, root-controlled | 15m | | 4 | Architect | Policy read, governance write, requires approval | 15m | ### Agent Lifecycle Phases ``` BOOTSTRAP → PREFLIGHT → PLAN → EXECUTE → VERIFY → PACKAGE → REPORT → EXIT │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └─ Release lock │ │ │ │ │ │ └─ Generate report │ │ │ │ │ └─ Collect artifacts │ │ │ │ └─ Verify results │ │ │ └─ Execute plan (if approved) │ │ └─ Generate plan artifact │ └─ Scope/dependency checks └─ Read revocations, load packet, acquire lock ``` ### Error Budget System ```json { "max_total_errors": 8, "max_same_error_repeats": 2, "max_procedure_violations": 1 } ``` **Automatic Revocation Triggers:** - `procedure_violations >= 1` - `same_error >= max_same_error_repeats` - `total_errors >= max_total_errors` - Missing required artifact after EXECUTE - Forbidden action detected --- ## Runtime Governance ### Instruction Packets Every agent receives a structured instruction packet defining its mission: ```typescript interface InstructionPacket { agent_id: string; task_id: string; created_for: string; objective: string; deliverables: string[]; constraints: { scope: string[]; forbidden: string[]; required_steps: string[]; }; success_criteria: string[]; error_budget: ErrorBudget; escalation_rules: string[]; created_at: string; } ``` ### Handoff Objects When an agent is revoked, it must create a handoff for the next agent: ```typescript interface HandoffObject { task_id: string; previous_agent_id: string; revoked: boolean; revocation_reason: { type: string; details: string }; last_known_state: { phase: string; step: string }; what_was_tried: string[]; blocking_issue: string; required_next_actions: string[]; constraints_reminder: string[]; artifacts: string[]; } ``` --- ## Multi-Agent Coordination ### Communication Patterns **1. Direct Messaging (Point-to-Point)** ``` Agent ALPHA ──PROPOSAL──► Agent BETA Agent BETA ──FEEDBACK──► Agent ALPHA Agent GAMMA ──HANDOFF───► ALL ``` **2. Blackboard (Shared Memory)** ``` ┌─────────────────────────────────────────┐ │ BLACKBOARD │ ├──────────┬──────────┬──────────┬────────┤ │ problem │ solutions│ progress │consensus│ ├──────────┼──────────┼──────────┼────────┤ │objective │proposal_1│eval_1 │votes │ │analysis │proposal_2│eval_2 │final │ │constraints│synthesis │gamma_res │ │ └──────────┴──────────┴──────────┴────────┘ ``` ### Conditional Agent Spawning Agent GAMMA is spawned when thresholds are exceeded: | Condition | Threshold | Description | |-----------|-----------|-------------| | STUCK | 30s | Agents inactive for 30+ seconds | | CONFLICT | 3 | 3+ unresolved proposal conflicts | | COMPLEXITY | 0.8 | Task complexity score > 0.8 | | SUCCESS | 1.0 | Task complete, validation needed | ### Consensus Mechanism ```typescript interface ConsensusVote { agent: AgentRole; proposal_id: string; vote: "ACCEPT" | "REJECT" | "ABSTAIN"; reasoning: string; timestamp: string; } // Consensus requires: // 1. All required agents have voted // 2. Accept votes > Reject votes // 3. No rejects from required agents ``` --- ## Current Capabilities ### Implemented Features | Feature | Status | Location | |---------|--------|----------| | Vault policy engine | ✅ Complete | `/opt/vault/policies/` | | Trust tier system (T0-T4) | ✅ Complete | Vault policies | | DragonflyDB runtime | ✅ Complete | `runtime/governance.py` | | SQLite audit ledger | ✅ Complete | `ledger/governance.db` | | Single-agent pipeline | ✅ Complete | `llm-planner-ts/governed-agent.ts` | | Multi-agent parallel execution | ✅ Complete | `multi-agent/orchestrator.ts` | | Blackboard shared memory | ✅ Complete | `multi-agent/coordination.ts` | | Direct messaging | ✅ Complete | `multi-agent/coordination.ts` | | Conditional spawning | ✅ Complete | `multi-agent/orchestrator.ts` | | Performance metrics | ✅ Complete | `multi-agent/coordination.ts` | | Error budget tracking | ✅ Complete | `GovernanceManager` class | | Revocation handling | ✅ Complete | `GovernanceManager` class | ### Performance Benchmarks | Metric | Single Agent | Multi-Agent (3) | |--------|-------------|-----------------| | Avg. task duration | 60-120s | 45-90s | | Messages per task | N/A | 20-30 | | Blackboard ops | 5-10 | 40-60 | | LLM calls | 2-4 | 6-12 | --- ## Engineering Focus Areas ### 1. Pipeline Programming **Goal:** Create composable, reusable agent pipelines. **Current Work:** ```typescript // Pipeline stages as composable functions type PipelineStage = (input: T, context: GovernanceContext) => Promise; // Example pipeline composition const agentPipeline = compose( bootstrap, preflight, plan, execute, verify, package, report ); ``` **Planned Features:** - Stage-level error handling and retry - Pipeline branching based on conditions - Pipeline templates for common patterns - Hot-swappable stages for testing ### 2. Bun Integration **Goal:** Leverage Bun's performance for agent execution. **Current Implementation:** ```typescript // File: agents/llm-planner-ts/governed-agent.ts import { $ } from "bun"; import { Database } from "bun:sqlite"; // Shell commands via Bun const result = await $`curl -sk ...`.json(); // SQLite via Bun native const db = new Database("/opt/agent-governance/ledger/governance.db"); ``` **Advantages:** - 4x faster startup than Node.js - Native TypeScript execution - Built-in SQLite support - Shell command integration - Excellent npm compatibility **Planned Enhancements:** - Bun's built-in test runner integration - Bun's native WebSocket for real-time coordination - Bun's worker threads for parallel LLM calls ### 3. Testing Framework **Goal:** Enable long-term iteration with confidence. **Architecture:** ``` ┌─────────────────────────────────────────────────────────────┐ │ TESTING FRAMEWORK │ ├─────────────────────────────────────────────────────────────┤ │ Unit Tests │ Integration Tests │ E2E Tests │ │ ───────────── │ ───────────────── │ ────────── │ │ • Agent methods │ • Vault + Agent │ • Full task │ │ • Blackboard ops │ • Redis + Agent │ • Multi-agent│ │ • Message parsing │ • LLM + Governance │ • Failure │ │ • Error handling │ • Pipeline stages │ recovery │ ├─────────────────────────────────────────────────────────────┤ │ Mock Infrastructure │ Test Scenarios │ Metrics │ │ ───────────────── │ ────────────── │ ─────── │ │ • MockVault │ • Happy path │ • Duration │ │ • MockDragonfly │ • Error budget │ • Coverage │ │ • MockLLM │ • Revocation │ • Flakiness │ │ • MockBlackboard │ • Consensus fail │ • Regression │ └─────────────────────────────────────────────────────────────┘ ``` **Test Categories:** ```typescript // 1. Unit Tests - Isolated component testing describe("GovernanceManager", () => { it("should track error budget correctly", async () => { const gov = new MockGovernanceManager(); await gov.recordError("agent-1", "LLM_ERROR", "timeout"); const counts = await gov.getErrorCounts("agent-1"); expect(counts.total_errors).toBe(1); }); }); // 2. Integration Tests - Component interaction describe("Agent + Vault Integration", () => { it("should bootstrap with valid token", async () => { const agent = new GovernedAgent("test-agent"); const [ok, msg] = await agent.bootstrap(); expect(ok).toBe(true); }); }); // 3. Scenario Tests - Full workflow validation describe("Multi-Agent Scenarios", () => { it("should spawn GAMMA on complexity threshold", async () => { const orchestrator = new TestOrchestrator(); const metrics = await orchestrator.runTask(highComplexityTask); expect(metrics.gamma_spawned).toBe(true); expect(metrics.gamma_spawn_reason).toBe("COMPLEXITY"); }); }); ``` **Mock Infrastructure:** ```typescript // MockLLM for deterministic testing class MockLLM { private responses: Map = new Map(); setResponse(pattern: string, response: string) { this.responses.set(pattern, response); } async complete(prompt: string): Promise { for (const [pattern, response] of this.responses) { if (prompt.includes(pattern)) return response; } return '{"confidence": 0.5, "steps": []}'; } } // MockDragonfly for state testing class MockDragonfly { private store: Map = new Map(); async set(key: string, value: any) { this.store.set(key, value); } async get(key: string) { return this.store.get(key); } async hSet(key: string, field: string, value: any) { /* ... */ } } ``` --- ## Sample Implementations ### Single Governed Agent (TypeScript/Bun) ```typescript // File: governed-agent.ts import { GovernanceManager } from "./coordination"; class GovernedAgent { private gov: GovernanceManager; private agentId: string; async bootstrap(): Promise<[boolean, string]> { this.gov = new GovernanceManager(); await this.gov.connect(); // Read revocation ledger const revocations = await this.gov.getRecentRevocations(50); for (const rev of revocations) { if (rev.agent_id === this.agentId) { return [false, "AGENT_PREVIOUSLY_REVOKED"]; } } // Load instruction packet const packet = await this.gov.getPacket(this.agentId); if (!packet) return [false, "NO_INSTRUCTION_PACKET"]; // Acquire lock if (!await this.gov.acquireLock(this.agentId)) { return [false, "CANNOT_ACQUIRE_LOCK"]; } return [true, "BOOTSTRAP_COMPLETE"]; } async transition(phase: string, step: string): Promise { await this.gov.heartbeat(this.agentId); await this.gov.refreshLock(this.agentId); const [ok, reason] = await this.gov.checkErrorBudget(this.agentId); if (!ok) { await this.gov.revokeAgent(this.agentId, "ERROR_BUDGET_EXCEEDED", reason); return false; } await this.gov.setState({ phase, step, /* ... */ }); return true; } } ``` ### Multi-Agent Orchestrator ```typescript // File: orchestrator.ts class MultiAgentOrchestrator { private alphaAgent: AgentAlpha; private betaAgent: AgentBeta; private gammaAgent?: AgentGamma; async runTask(task: TaskDefinition): Promise { // Launch ALPHA and BETA in parallel const alphaPromise = this.alphaAgent.run(task); const betaPromise = this.betaAgent.run(task); // Monitor spawn conditions this.monitorInterval = setInterval(() => { this.checkSpawnConditions(); }, 2000); await Promise.all([alphaPromise, betaPromise]); // Spawn GAMMA if needed if (this.shouldSpawnGamma()) { await this.spawnGamma(spawnReason); await this.gammaAgent.run(task); } return this.metrics.finalize(); } private async checkSpawnConditions() { // Check stuck, conflict, complexity thresholds const stuckAgents = await this.stateManager.detectStuckAgents(30); if (stuckAgents.length > 0) { await this.spawnController.updateCondition("STUCK", stuckAgents.length); } } } ``` --- ## Future Potential ### Short-Term (Q1 2026) 1. **Pipeline DSL** ```yaml pipeline: name: infrastructure-deploy stages: - plan: agent: planner artifacts: [terraform-plan] - review: type: human-gate timeout: 30m - execute: agent: executor requires: [plan] ``` 2. **Agent Templates** - Pre-configured agents for common tasks - Terraform specialist - Ansible specialist - Code review specialist 3. **Enhanced Testing** - Chaos testing for agent resilience - Load testing for multi-agent scaling - Regression test suite ### Medium-Term (Q2-Q3 2026) 1. **Hierarchical Agent Teams** ``` Team Lead Agent ├── Research Team (3 agents) ├── Implementation Team (2 agents) └── Review Team (2 agents) ``` 2. **Learning from History** - Analyze past task completions - Suggest optimizations - Predict failure patterns 3. **External Integrations** - GitHub PR automation - Slack notifications - PagerDuty escalations ### Long-Term (2027+) 1. **Self-Optimizing Pipelines** - Agents propose pipeline improvements - A/B testing of agent strategies - Automatic tier promotion 2. **Cross-System Orchestration** - Multiple infrastructure targets - Hybrid cloud coordination - Edge deployment agents --- ## File Structure ``` /opt/agent-governance/ ├── docs/ │ └── ARCHITECTURE.md # This document ├── ledger/ │ └── governance.db # SQLite audit trail ├── runtime/ │ ├── governance.py # Python governance manager │ └── monitors.py # Monitor agents ├── agents/ │ ├── llm-planner/ │ │ ├── agent.py # Python LLM agent │ │ └── governed_agent.py # Python governed agent │ ├── llm-planner-ts/ │ │ ├── index.ts # Basic TypeScript agent │ │ └── governed-agent.ts # Full governed agent (Bun) │ └── multi-agent/ │ ├── types.ts # Type definitions │ ├── coordination.ts # Blackboard, messaging, metrics │ ├── agents.ts # Alpha, Beta, Gamma agents │ └── orchestrator.ts # Multi-agent orchestrator └── /opt/vault/ ├── policies/ │ ├── t0-observer.hcl │ ├── t1-operator.hcl │ ├── t2-builder.hcl │ ├── t3-executor.hcl │ └── t4-architect.hcl └── init-keys.json # Vault credentials (chmod 600) ``` --- ## Running the System ### Prerequisites ```bash # Vault must be running and unsealed docker ps | grep vault # DragonflyDB must be running docker ps | grep dragonfly # Bun must be installed ~/.bun/bin/bun --version ``` ### Single Agent Test ```bash cd /opt/agent-governance/agents/llm-planner-ts ~/.bun/bin/bun run governed-agent.ts \ "agent-001" \ "task-001" \ "Design a microservices architecture" ``` ### Multi-Agent Test ```bash cd /opt/agent-governance/agents/multi-agent ~/.bun/bin/bun run orchestrator.ts \ "Design a distributed event-driven analytics platform" \ --timeout 120 ``` --- ## Contributing When adding new features: 1. **Follow the agent lifecycle** - All agents must implement the standard phases 2. **Log to the ledger** - Every action must be auditable 3. **Respect error budgets** - Check and track errors properly 4. **Write tests** - Unit, integration, and scenario tests required 5. **Document changes** - Update this architecture document --- ## References - [Agent Foundation Document](/root/agents_foundation.md) - [Runtime Governance Spec](/root/agent_runtime_governance.md) - [Implementation Plan](/root/agent-taxonomy-implementation-plan.md) - [Vault Documentation](https://developer.hashicorp.com/vault/docs) - [Bun Documentation](https://bun.sh/docs)