Phase 8 Production Hardening with complete governance infrastructure: - Vault integration with tiered policies (T0-T4) - DragonflyDB state management - SQLite audit ledger - Pipeline DSL and templates - Promotion/revocation engine - Checkpoint system for session persistence - Health manager and circuit breaker for fault tolerance - GitHub/Slack integrations - Architectural test pipeline with bug watcher, suggestion engine, council review - Multi-agent chaos testing framework Test Results: - Governance tests: 68/68 passing - E2E workflow: 16/16 passing - Phase 2 Vault: 14/14 passing - Integration tests: 27/27 passing Coverage: 57.6% average across 12 phases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
26 KiB
AI Agent Governance System Architecture
Version: 0.2.0 Status: Active Development Last Updated: 2026-01-23
Table of Contents
- Executive Summary
- System Architecture
- Core Components
- Agent Taxonomy
- Runtime Governance
- Multi-Agent Coordination
- Current Capabilities
- Engineering Focus Areas
- Sample Implementations
- Future Potential
Executive Summary
This system implements a governed AI agent framework designed for safe, auditable, and scalable automation. The architecture enforces:
- Trust-tiered access control via HashiCorp Vault
- Real-time governance via DragonflyDB
- Structured agent lifecycles with mandatory phases
- Multi-agent coordination with parallel execution and conditional spawning
- Complete audit trails via SQLite ledger
The system prioritizes legibility over magic — every agent action must be explainable, reproducible, and auditable.
System Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ GOVERNANCE LAYER │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────────┐ │
│ │ HashiCorp │ │ DragonflyDB │ │ SQLite Ledger │ │
│ │ Vault │ │ (Runtime) │ │ (Audit) │ │
│ │ │ │ │ │ │ │
│ │ • Policies │ │ • State │ │ • agent_actions │ │
│ │ • Secrets │ │ • Locks │ │ • agent_metrics │ │
│ │ • AppRole Auth │ │ • Heartbeats │ │ • violations │ │
│ │ • Token Leases │ │ • Errors │ │ • promotions │ │
│ └─────────────────┘ │ • Blackboard │ └─────────────────────────────┘ │
│ │ • Messages │ │
│ └─────────────────┘ │
├─────────────────────────────────────────────────────────────────────────────┤
│ ORCHESTRATION LAYER │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Multi-Agent Orchestrator │ │
│ │ • Parallel agent execution • Spawn condition monitoring │ │
│ │ • Performance metrics • Consensus coordination │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────────────┤
│ AGENT LAYER │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌─────────────┐ │
│ │ Agent ALPHA │ │ Agent BETA │ │ Agent GAMMA │ │ Governed │ │
│ │ (Research) │ │ (Synthesis) │ │ (Mediator) │ │ LLM Agent │ │
│ │ │ │ │ │ │ │ │ │
│ │ Parallel │◄─┼─► Direct ─┼──┼─► Spawned │ │ Single │ │
│ │ Execution │ │ Messages │ │ on │ │ Pipeline │ │
│ └───────┬───────┘ └───────┬───────┘ │ Condition │ └──────┬──────┘ │
│ │ │ └───────────────┘ │ │
│ └──────────┬───────┴────────────────────────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Blackboard │ (Shared Memory) │
│ │ • problem │ │
│ │ • solutions│ │
│ │ • progress │ │
│ │ • consensus│ │
│ └─────────────┘ │
├─────────────────────────────────────────────────────────────────────────────┤
│ INFRASTRUCTURE LAYER │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────────┐ │
│ │ OpenRouter │ │ Bun Runtime │ │ WireGuard VPN │ │
│ │ (LLM API) │ │ (TypeScript) │ │ (Network) │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
Core Components
1. HashiCorp Vault (Policy Engine)
Purpose: Centralized secrets management and trust-tier enforcement.
Location: https://127.0.0.1:8200
Storage: /opt/vault/data
Policies: /opt/vault/policies/t{0-4}-*.hcl
Key Features:
- AppRole authentication for agents
- Dynamic secret generation
- Token TTLs based on trust tier
- Immediate revocation capabilities
2. DragonflyDB (Runtime State)
Purpose: Real-time agent state, coordination, and governance signals.
Location: redis://127.0.0.1:6379
Credentials: vault:secret/data/services/dragonfly
Keyspace Design:
agent:{id}:packet → Instruction packet (JSON)
agent:{id}:state → Runtime state (JSON)
agent:{id}:errors → Error counters (Hash)
agent:{id}:heartbeat → Last seen (String + TTL)
agent:{id}:lock → Execution lock (String + TTL)
task:{id}:active_agent → Current agent
task:{id}:artifacts → Artifact references (List)
blackboard:{task}:* → Shared memory sections
msg:{task}:* → Direct message channels
revocations:ledger → Revocation history (List)
handoff:{task}:latest → Handoff objects (JSON)
3. SQLite Ledger (Audit Trail)
Purpose: Immutable record of all agent actions for compliance and replay.
Location: /opt/agent-governance/ledger/governance.db
Schema:
CREATE TABLE agent_actions (
id INTEGER PRIMARY KEY,
timestamp TEXT,
agent_id TEXT,
agent_version TEXT,
tier INTEGER,
action TEXT,
decision TEXT,
confidence REAL,
success INTEGER,
error_type TEXT,
error_message TEXT
);
CREATE TABLE agent_metrics (
agent_id TEXT PRIMARY KEY,
current_tier INTEGER,
total_runs INTEGER,
compliant_runs INTEGER,
consecutive_compliant INTEGER,
last_active_at TEXT
);
Agent Taxonomy
Trust Tiers
| Tier | Name | Capabilities | Token TTL |
|---|---|---|---|
| 0 | Observer | Read docs, inventory, logs; Generate plans | 1h |
| 1 | Operator | Sandbox SSH, basic Proxmox, Ansible check-mode | 30m |
| 2 | Builder | Sandbox admin, create frameworks/modules | 30m |
| 3 | Executor | Staging access, limited prod read, root-controlled | 15m |
| 4 | Architect | Policy read, governance write, requires approval | 15m |
Agent Lifecycle Phases
BOOTSTRAP → PREFLIGHT → PLAN → EXECUTE → VERIFY → PACKAGE → REPORT → EXIT
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ └─ Release lock
│ │ │ │ │ │ └─ Generate report
│ │ │ │ │ └─ Collect artifacts
│ │ │ │ └─ Verify results
│ │ │ └─ Execute plan (if approved)
│ │ └─ Generate plan artifact
│ └─ Scope/dependency checks
└─ Read revocations, load packet, acquire lock
Error Budget System
{
"max_total_errors": 8,
"max_same_error_repeats": 2,
"max_procedure_violations": 1
}
Automatic Revocation Triggers:
procedure_violations >= 1same_error >= max_same_error_repeatstotal_errors >= max_total_errors- Missing required artifact after EXECUTE
- Forbidden action detected
Runtime Governance
Instruction Packets
Every agent receives a structured instruction packet defining its mission:
interface InstructionPacket {
agent_id: string;
task_id: string;
created_for: string;
objective: string;
deliverables: string[];
constraints: {
scope: string[];
forbidden: string[];
required_steps: string[];
};
success_criteria: string[];
error_budget: ErrorBudget;
escalation_rules: string[];
created_at: string;
}
Handoff Objects
When an agent is revoked, it must create a handoff for the next agent:
interface HandoffObject {
task_id: string;
previous_agent_id: string;
revoked: boolean;
revocation_reason: { type: string; details: string };
last_known_state: { phase: string; step: string };
what_was_tried: string[];
blocking_issue: string;
required_next_actions: string[];
constraints_reminder: string[];
artifacts: string[];
}
Multi-Agent Coordination
Communication Patterns
1. Direct Messaging (Point-to-Point)
Agent ALPHA ──PROPOSAL──► Agent BETA
Agent BETA ──FEEDBACK──► Agent ALPHA
Agent GAMMA ──HANDOFF───► ALL
2. Blackboard (Shared Memory)
┌─────────────────────────────────────────┐
│ BLACKBOARD │
├──────────┬──────────┬──────────┬────────┤
│ problem │ solutions│ progress │consensus│
├──────────┼──────────┼──────────┼────────┤
│objective │proposal_1│eval_1 │votes │
│analysis │proposal_2│eval_2 │final │
│constraints│synthesis │gamma_res │ │
└──────────┴──────────┴──────────┴────────┘
Conditional Agent Spawning
Agent GAMMA is spawned when thresholds are exceeded:
| Condition | Threshold | Description |
|---|---|---|
| STUCK | 30s | Agents inactive for 30+ seconds |
| CONFLICT | 3 | 3+ unresolved proposal conflicts |
| COMPLEXITY | 0.8 | Task complexity score > 0.8 |
| SUCCESS | 1.0 | Task complete, validation needed |
Consensus Mechanism
interface ConsensusVote {
agent: AgentRole;
proposal_id: string;
vote: "ACCEPT" | "REJECT" | "ABSTAIN";
reasoning: string;
timestamp: string;
}
// Consensus requires:
// 1. All required agents have voted
// 2. Accept votes > Reject votes
// 3. No rejects from required agents
Current Capabilities
Implemented Features
| Feature | Status | Location |
|---|---|---|
| Vault policy engine | ✅ Complete | /opt/vault/policies/ |
| Trust tier system (T0-T4) | ✅ Complete | Vault policies |
| DragonflyDB runtime | ✅ Complete | runtime/governance.py |
| SQLite audit ledger | ✅ Complete | ledger/governance.db |
| Single-agent pipeline | ✅ Complete | llm-planner-ts/governed-agent.ts |
| Multi-agent parallel execution | ✅ Complete | multi-agent/orchestrator.ts |
| Blackboard shared memory | ✅ Complete | multi-agent/coordination.ts |
| Direct messaging | ✅ Complete | multi-agent/coordination.ts |
| Conditional spawning | ✅ Complete | multi-agent/orchestrator.ts |
| Performance metrics | ✅ Complete | multi-agent/coordination.ts |
| Error budget tracking | ✅ Complete | GovernanceManager class |
| Revocation handling | ✅ Complete | GovernanceManager class |
Performance Benchmarks
| Metric | Single Agent | Multi-Agent (3) |
|---|---|---|
| Avg. task duration | 60-120s | 45-90s |
| Messages per task | N/A | 20-30 |
| Blackboard ops | 5-10 | 40-60 |
| LLM calls | 2-4 | 6-12 |
Engineering Focus Areas
1. Pipeline Programming
Goal: Create composable, reusable agent pipelines.
Current Work:
// Pipeline stages as composable functions
type PipelineStage<T, U> = (input: T, context: GovernanceContext) => Promise<U>;
// Example pipeline composition
const agentPipeline = compose(
bootstrap,
preflight,
plan,
execute,
verify,
package,
report
);
Planned Features:
- Stage-level error handling and retry
- Pipeline branching based on conditions
- Pipeline templates for common patterns
- Hot-swappable stages for testing
2. Bun Integration
Goal: Leverage Bun's performance for agent execution.
Current Implementation:
// File: agents/llm-planner-ts/governed-agent.ts
import { $ } from "bun";
import { Database } from "bun:sqlite";
// Shell commands via Bun
const result = await $`curl -sk ...`.json();
// SQLite via Bun native
const db = new Database("/opt/agent-governance/ledger/governance.db");
Advantages:
- 4x faster startup than Node.js
- Native TypeScript execution
- Built-in SQLite support
- Shell command integration
- Excellent npm compatibility
Planned Enhancements:
- Bun's built-in test runner integration
- Bun's native WebSocket for real-time coordination
- Bun's worker threads for parallel LLM calls
3. Testing Framework
Goal: Enable long-term iteration with confidence.
Architecture:
┌─────────────────────────────────────────────────────────────┐
│ TESTING FRAMEWORK │
├─────────────────────────────────────────────────────────────┤
│ Unit Tests │ Integration Tests │ E2E Tests │
│ ───────────── │ ───────────────── │ ────────── │
│ • Agent methods │ • Vault + Agent │ • Full task │
│ • Blackboard ops │ • Redis + Agent │ • Multi-agent│
│ • Message parsing │ • LLM + Governance │ • Failure │
│ • Error handling │ • Pipeline stages │ recovery │
├─────────────────────────────────────────────────────────────┤
│ Mock Infrastructure │ Test Scenarios │ Metrics │
│ ───────────────── │ ────────────── │ ─────── │
│ • MockVault │ • Happy path │ • Duration │
│ • MockDragonfly │ • Error budget │ • Coverage │
│ • MockLLM │ • Revocation │ • Flakiness │
│ • MockBlackboard │ • Consensus fail │ • Regression │
└─────────────────────────────────────────────────────────────┘
Test Categories:
// 1. Unit Tests - Isolated component testing
describe("GovernanceManager", () => {
it("should track error budget correctly", async () => {
const gov = new MockGovernanceManager();
await gov.recordError("agent-1", "LLM_ERROR", "timeout");
const counts = await gov.getErrorCounts("agent-1");
expect(counts.total_errors).toBe(1);
});
});
// 2. Integration Tests - Component interaction
describe("Agent + Vault Integration", () => {
it("should bootstrap with valid token", async () => {
const agent = new GovernedAgent("test-agent");
const [ok, msg] = await agent.bootstrap();
expect(ok).toBe(true);
});
});
// 3. Scenario Tests - Full workflow validation
describe("Multi-Agent Scenarios", () => {
it("should spawn GAMMA on complexity threshold", async () => {
const orchestrator = new TestOrchestrator();
const metrics = await orchestrator.runTask(highComplexityTask);
expect(metrics.gamma_spawned).toBe(true);
expect(metrics.gamma_spawn_reason).toBe("COMPLEXITY");
});
});
Mock Infrastructure:
// MockLLM for deterministic testing
class MockLLM {
private responses: Map<string, string> = new Map();
setResponse(pattern: string, response: string) {
this.responses.set(pattern, response);
}
async complete(prompt: string): Promise<string> {
for (const [pattern, response] of this.responses) {
if (prompt.includes(pattern)) return response;
}
return '{"confidence": 0.5, "steps": []}';
}
}
// MockDragonfly for state testing
class MockDragonfly {
private store: Map<string, any> = new Map();
async set(key: string, value: any) { this.store.set(key, value); }
async get(key: string) { return this.store.get(key); }
async hSet(key: string, field: string, value: any) { /* ... */ }
}
Sample Implementations
Single Governed Agent (TypeScript/Bun)
// File: governed-agent.ts
import { GovernanceManager } from "./coordination";
class GovernedAgent {
private gov: GovernanceManager;
private agentId: string;
async bootstrap(): Promise<[boolean, string]> {
this.gov = new GovernanceManager();
await this.gov.connect();
// Read revocation ledger
const revocations = await this.gov.getRecentRevocations(50);
for (const rev of revocations) {
if (rev.agent_id === this.agentId) {
return [false, "AGENT_PREVIOUSLY_REVOKED"];
}
}
// Load instruction packet
const packet = await this.gov.getPacket(this.agentId);
if (!packet) return [false, "NO_INSTRUCTION_PACKET"];
// Acquire lock
if (!await this.gov.acquireLock(this.agentId)) {
return [false, "CANNOT_ACQUIRE_LOCK"];
}
return [true, "BOOTSTRAP_COMPLETE"];
}
async transition(phase: string, step: string): Promise<boolean> {
await this.gov.heartbeat(this.agentId);
await this.gov.refreshLock(this.agentId);
const [ok, reason] = await this.gov.checkErrorBudget(this.agentId);
if (!ok) {
await this.gov.revokeAgent(this.agentId, "ERROR_BUDGET_EXCEEDED", reason);
return false;
}
await this.gov.setState({ phase, step, /* ... */ });
return true;
}
}
Multi-Agent Orchestrator
// File: orchestrator.ts
class MultiAgentOrchestrator {
private alphaAgent: AgentAlpha;
private betaAgent: AgentBeta;
private gammaAgent?: AgentGamma;
async runTask(task: TaskDefinition): Promise<Metrics> {
// Launch ALPHA and BETA in parallel
const alphaPromise = this.alphaAgent.run(task);
const betaPromise = this.betaAgent.run(task);
// Monitor spawn conditions
this.monitorInterval = setInterval(() => {
this.checkSpawnConditions();
}, 2000);
await Promise.all([alphaPromise, betaPromise]);
// Spawn GAMMA if needed
if (this.shouldSpawnGamma()) {
await this.spawnGamma(spawnReason);
await this.gammaAgent.run(task);
}
return this.metrics.finalize();
}
private async checkSpawnConditions() {
// Check stuck, conflict, complexity thresholds
const stuckAgents = await this.stateManager.detectStuckAgents(30);
if (stuckAgents.length > 0) {
await this.spawnController.updateCondition("STUCK", stuckAgents.length);
}
}
}
Future Potential
Short-Term (Q1 2026)
-
Pipeline DSL
pipeline: name: infrastructure-deploy stages: - plan: agent: planner artifacts: [terraform-plan] - review: type: human-gate timeout: 30m - execute: agent: executor requires: [plan] -
Agent Templates
- Pre-configured agents for common tasks
- Terraform specialist
- Ansible specialist
- Code review specialist
-
Enhanced Testing
- Chaos testing for agent resilience
- Load testing for multi-agent scaling
- Regression test suite
Medium-Term (Q2-Q3 2026)
-
Hierarchical Agent Teams
Team Lead Agent ├── Research Team (3 agents) ├── Implementation Team (2 agents) └── Review Team (2 agents) -
Learning from History
- Analyze past task completions
- Suggest optimizations
- Predict failure patterns
-
External Integrations
- GitHub PR automation
- Slack notifications
- PagerDuty escalations
Long-Term (2027+)
-
Self-Optimizing Pipelines
- Agents propose pipeline improvements
- A/B testing of agent strategies
- Automatic tier promotion
-
Cross-System Orchestration
- Multiple infrastructure targets
- Hybrid cloud coordination
- Edge deployment agents
File Structure
/opt/agent-governance/
├── docs/
│ └── ARCHITECTURE.md # This document
├── ledger/
│ └── governance.db # SQLite audit trail
├── runtime/
│ ├── governance.py # Python governance manager
│ └── monitors.py # Monitor agents
├── agents/
│ ├── llm-planner/
│ │ ├── agent.py # Python LLM agent
│ │ └── governed_agent.py # Python governed agent
│ ├── llm-planner-ts/
│ │ ├── index.ts # Basic TypeScript agent
│ │ └── governed-agent.ts # Full governed agent (Bun)
│ └── multi-agent/
│ ├── types.ts # Type definitions
│ ├── coordination.ts # Blackboard, messaging, metrics
│ ├── agents.ts # Alpha, Beta, Gamma agents
│ └── orchestrator.ts # Multi-agent orchestrator
└── /opt/vault/
├── policies/
│ ├── t0-observer.hcl
│ ├── t1-operator.hcl
│ ├── t2-builder.hcl
│ ├── t3-executor.hcl
│ └── t4-architect.hcl
└── init-keys.json # Vault credentials (chmod 600)
Running the System
Prerequisites
# Vault must be running and unsealed
docker ps | grep vault
# DragonflyDB must be running
docker ps | grep dragonfly
# Bun must be installed
~/.bun/bin/bun --version
Single Agent Test
cd /opt/agent-governance/agents/llm-planner-ts
~/.bun/bin/bun run governed-agent.ts \
"agent-001" \
"task-001" \
"Design a microservices architecture"
Multi-Agent Test
cd /opt/agent-governance/agents/multi-agent
~/.bun/bin/bun run orchestrator.ts \
"Design a distributed event-driven analytics platform" \
--timeout 120
Contributing
When adding new features:
- Follow the agent lifecycle - All agents must implement the standard phases
- Log to the ledger - Every action must be auditable
- Respect error budgets - Check and track errors properly
- Write tests - Unit, integration, and scenario tests required
- Document changes - Update this architecture document