profit 77655c298c Initial commit: Agent Governance System Phase 8

Phase 8 Production Hardening with complete governance infrastructure:

- Vault integration with tiered policies (T0-T4)
- DragonflyDB state management
- SQLite audit ledger
- Pipeline DSL and templates
- Promotion/revocation engine
- Checkpoint system for session persistence
- Health manager and circuit breaker for fault tolerance
- GitHub/Slack integrations
- Architectural test pipeline with bug watcher, suggestion engine, council review
- Multi-agent chaos testing framework

Test Results:
- Governance tests: 68/68 passing
- E2E workflow: 16/16 passing
- Phase 2 Vault: 14/14 passing
- Integration tests: 27/27 passing

Coverage: 57.6% average across 12 phases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-23 22:07:06 -05:00

26 KiB

Raw Blame History

AI Agent Governance System Architecture

Version: 0.2.0 Status: Active Development Last Updated: 2026-01-23

Executive Summary
System Architecture
Core Components
Agent Taxonomy
Runtime Governance
Multi-Agent Coordination
Current Capabilities
Engineering Focus Areas
Sample Implementations
Future Potential

Executive Summary

This system implements a governed AI agent framework designed for safe, auditable, and scalable automation. The architecture enforces:

Trust-tiered access control via HashiCorp Vault
Real-time governance via DragonflyDB
Structured agent lifecycles with mandatory phases
Multi-agent coordination with parallel execution and conditional spawning
Complete audit trails via SQLite ledger

The system prioritizes legibility over magic — every agent action must be explainable, reproducible, and auditable.

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           GOVERNANCE LAYER                                   │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────────────┐  │
│  │  HashiCorp      │  │   DragonflyDB   │  │      SQLite Ledger          │  │
│  │  Vault          │  │   (Runtime)     │  │      (Audit)                │  │
│  │                 │  │                 │  │                             │  │
│  │  • Policies     │  │  • State        │  │  • agent_actions            │  │
│  │  • Secrets      │  │  • Locks        │  │  • agent_metrics            │  │
│  │  • AppRole Auth │  │  • Heartbeats   │  │  • violations               │  │
│  │  • Token Leases │  │  • Errors       │  │  • promotions               │  │
│  └─────────────────┘  │  • Blackboard   │  └─────────────────────────────┘  │
│                       │  • Messages     │                                    │
│                       └─────────────────┘                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                           ORCHESTRATION LAYER                                │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    Multi-Agent Orchestrator                          │    │
│  │  • Parallel agent execution      • Spawn condition monitoring        │    │
│  │  • Performance metrics           • Consensus coordination            │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
├─────────────────────────────────────────────────────────────────────────────┤
│                             AGENT LAYER                                      │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐  ┌─────────────┐  │
│  │  Agent ALPHA  │  │  Agent BETA   │  │  Agent GAMMA  │  │  Governed   │  │
│  │  (Research)   │  │  (Synthesis)  │  │  (Mediator)   │  │  LLM Agent  │  │
│  │               │  │               │  │               │  │             │  │
│  │  Parallel     │◄─┼─► Direct     ─┼──┼─► Spawned    │  │  Single     │  │
│  │  Execution    │  │    Messages   │  │    on        │  │  Pipeline   │  │
│  └───────┬───────┘  └───────┬───────┘  │    Condition │  └──────┬──────┘  │
│          │                  │          └───────────────┘         │         │
│          └──────────┬───────┴────────────────────────────────────┘         │
│                     │                                                       │
│              ┌──────▼──────┐                                                │
│              │  Blackboard │  (Shared Memory)                               │
│              │  • problem  │                                                │
│              │  • solutions│                                                │
│              │  • progress │                                                │
│              │  • consensus│                                                │
│              └─────────────┘                                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                           INFRASTRUCTURE LAYER                               │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────────────┐  │
│  │  OpenRouter     │  │  Bun Runtime    │  │  WireGuard VPN              │  │
│  │  (LLM API)      │  │  (TypeScript)   │  │  (Network)                  │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

Core Components

1. HashiCorp Vault (Policy Engine)

Purpose: Centralized secrets management and trust-tier enforcement.

Location: https://127.0.0.1:8200
Storage:  /opt/vault/data
Policies: /opt/vault/policies/t{0-4}-*.hcl

Key Features:

AppRole authentication for agents
Dynamic secret generation
Token TTLs based on trust tier
Immediate revocation capabilities

2. DragonflyDB (Runtime State)

Purpose: Real-time agent state, coordination, and governance signals.

Location: redis://127.0.0.1:6379
Credentials: vault:secret/data/services/dragonfly

Keyspace Design:

agent:{id}:packet      → Instruction packet (JSON)
agent:{id}:state       → Runtime state (JSON)
agent:{id}:errors      → Error counters (Hash)
agent:{id}:heartbeat   → Last seen (String + TTL)
agent:{id}:lock        → Execution lock (String + TTL)
task:{id}:active_agent → Current agent
task:{id}:artifacts    → Artifact references (List)
blackboard:{task}:*    → Shared memory sections
msg:{task}:*           → Direct message channels
revocations:ledger     → Revocation history (List)
handoff:{task}:latest  → Handoff objects (JSON)

3. SQLite Ledger (Audit Trail)

Purpose: Immutable record of all agent actions for compliance and replay.

Location: /opt/agent-governance/ledger/governance.db

Schema:

CREATE TABLE agent_actions (
    id INTEGER PRIMARY KEY,
    timestamp TEXT,
    agent_id TEXT,
    agent_version TEXT,
    tier INTEGER,
    action TEXT,
    decision TEXT,
    confidence REAL,
    success INTEGER,
    error_type TEXT,
    error_message TEXT
);

CREATE TABLE agent_metrics (
    agent_id TEXT PRIMARY KEY,
    current_tier INTEGER,
    total_runs INTEGER,
    compliant_runs INTEGER,
    consecutive_compliant INTEGER,
    last_active_at TEXT
);

Agent Taxonomy

Trust Tiers

Tier	Name	Capabilities	Token TTL
0	Observer	Read docs, inventory, logs; Generate plans	1h
1	Operator	Sandbox SSH, basic Proxmox, Ansible check-mode	30m
2	Builder	Sandbox admin, create frameworks/modules	30m
3	Executor	Staging access, limited prod read, root-controlled	15m
4	Architect	Policy read, governance write, requires approval	15m

Agent Lifecycle Phases

BOOTSTRAP → PREFLIGHT → PLAN → EXECUTE → VERIFY → PACKAGE → REPORT → EXIT
    │           │         │        │         │        │         │       │
    │           │         │        │         │        │         │       └─ Release lock
    │           │         │        │         │        │         └─ Generate report
    │           │         │        │         │        └─ Collect artifacts
    │           │         │        │         └─ Verify results
    │           │         │        └─ Execute plan (if approved)
    │           │         └─ Generate plan artifact
    │           └─ Scope/dependency checks
    └─ Read revocations, load packet, acquire lock

Error Budget System

{
  "max_total_errors": 8,
  "max_same_error_repeats": 2,
  "max_procedure_violations": 1
}

Automatic Revocation Triggers:

procedure_violations >= 1
same_error >= max_same_error_repeats
total_errors >= max_total_errors
Missing required artifact after EXECUTE
Forbidden action detected

Runtime Governance

Instruction Packets

Every agent receives a structured instruction packet defining its mission:

interface InstructionPacket {
  agent_id: string;
  task_id: string;
  created_for: string;
  objective: string;
  deliverables: string[];
  constraints: {
    scope: string[];
    forbidden: string[];
    required_steps: string[];
  };
  success_criteria: string[];
  error_budget: ErrorBudget;
  escalation_rules: string[];
  created_at: string;
}

Handoff Objects

When an agent is revoked, it must create a handoff for the next agent:

interface HandoffObject {
  task_id: string;
  previous_agent_id: string;
  revoked: boolean;
  revocation_reason: { type: string; details: string };
  last_known_state: { phase: string; step: string };
  what_was_tried: string[];
  blocking_issue: string;
  required_next_actions: string[];
  constraints_reminder: string[];
  artifacts: string[];
}

Multi-Agent Coordination

Communication Patterns

1. Direct Messaging (Point-to-Point)

Agent ALPHA ──PROPOSAL──► Agent BETA
Agent BETA  ──FEEDBACK──► Agent ALPHA
Agent GAMMA ──HANDOFF───► ALL

2. Blackboard (Shared Memory)

┌─────────────────────────────────────────┐
│              BLACKBOARD                  │
├──────────┬──────────┬──────────┬────────┤
│ problem  │ solutions│ progress │consensus│
├──────────┼──────────┼──────────┼────────┤
│objective │proposal_1│eval_1    │votes   │
│analysis  │proposal_2│eval_2    │final   │
│constraints│synthesis │gamma_res │        │
└──────────┴──────────┴──────────┴────────┘

Conditional Agent Spawning

Agent GAMMA is spawned when thresholds are exceeded:

Condition	Threshold	Description
STUCK	30s	Agents inactive for 30+ seconds
CONFLICT	3	3+ unresolved proposal conflicts
COMPLEXITY	0.8	Task complexity score > 0.8
SUCCESS	1.0	Task complete, validation needed

Consensus Mechanism

interface ConsensusVote {
  agent: AgentRole;
  proposal_id: string;
  vote: "ACCEPT" | "REJECT" | "ABSTAIN";
  reasoning: string;
  timestamp: string;
}

// Consensus requires:
// 1. All required agents have voted
// 2. Accept votes > Reject votes
// 3. No rejects from required agents

Current Capabilities

Implemented Features

Feature	Status	Location
Vault policy engine	✅ Complete	`/opt/vault/policies/`
Trust tier system (T0-T4)	✅ Complete	Vault policies
DragonflyDB runtime	✅ Complete	`runtime/governance.py`
SQLite audit ledger	✅ Complete	`ledger/governance.db`
Single-agent pipeline	✅ Complete	`llm-planner-ts/governed-agent.ts`
Multi-agent parallel execution	✅ Complete	`multi-agent/orchestrator.ts`
Blackboard shared memory	✅ Complete	`multi-agent/coordination.ts`
Direct messaging	✅ Complete	`multi-agent/coordination.ts`
Conditional spawning	✅ Complete	`multi-agent/orchestrator.ts`
Performance metrics	✅ Complete	`multi-agent/coordination.ts`
Error budget tracking	✅ Complete	`GovernanceManager` class
Revocation handling	✅ Complete	`GovernanceManager` class

Performance Benchmarks

Metric	Single Agent	Multi-Agent (3)
Avg. task duration	60-120s	45-90s
Messages per task	N/A	20-30
Blackboard ops	5-10	40-60
LLM calls	2-4	6-12

Engineering Focus Areas

1. Pipeline Programming

Goal: Create composable, reusable agent pipelines.

Current Work:

// Pipeline stages as composable functions
type PipelineStage<T, U> = (input: T, context: GovernanceContext) => Promise<U>;

// Example pipeline composition
const agentPipeline = compose(
  bootstrap,
  preflight,
  plan,
  execute,
  verify,
  package,
  report
);

Planned Features:

Stage-level error handling and retry
Pipeline branching based on conditions
Pipeline templates for common patterns
Hot-swappable stages for testing

2. Bun Integration

Goal: Leverage Bun's performance for agent execution.

Current Implementation:

// File: agents/llm-planner-ts/governed-agent.ts
import { $ } from "bun";
import { Database } from "bun:sqlite";

// Shell commands via Bun
const result = await $`curl -sk ...`.json();

// SQLite via Bun native
const db = new Database("/opt/agent-governance/ledger/governance.db");

Advantages:

4x faster startup than Node.js
Native TypeScript execution
Built-in SQLite support
Shell command integration
Excellent npm compatibility

Planned Enhancements:

Bun's built-in test runner integration
Bun's native WebSocket for real-time coordination
Bun's worker threads for parallel LLM calls

3. Testing Framework

Goal: Enable long-term iteration with confidence.

Architecture:

┌─────────────────────────────────────────────────────────────┐
│                    TESTING FRAMEWORK                         │
├─────────────────────────────────────────────────────────────┤
│  Unit Tests          │  Integration Tests   │  E2E Tests    │
│  ─────────────       │  ─────────────────   │  ──────────   │
│  • Agent methods     │  • Vault + Agent     │  • Full task  │
│  • Blackboard ops    │  • Redis + Agent     │  • Multi-agent│
│  • Message parsing   │  • LLM + Governance  │  • Failure    │
│  • Error handling    │  • Pipeline stages   │    recovery   │
├─────────────────────────────────────────────────────────────┤
│  Mock Infrastructure │  Test Scenarios      │  Metrics      │
│  ─────────────────   │  ──────────────      │  ───────      │
│  • MockVault         │  • Happy path        │  • Duration   │
│  • MockDragonfly     │  • Error budget      │  • Coverage   │
│  • MockLLM           │  • Revocation        │  • Flakiness  │
│  • MockBlackboard    │  • Consensus fail    │  • Regression │
└─────────────────────────────────────────────────────────────┘

Test Categories:

// 1. Unit Tests - Isolated component testing
describe("GovernanceManager", () => {
  it("should track error budget correctly", async () => {
    const gov = new MockGovernanceManager();
    await gov.recordError("agent-1", "LLM_ERROR", "timeout");
    const counts = await gov.getErrorCounts("agent-1");
    expect(counts.total_errors).toBe(1);
  });
});

// 2. Integration Tests - Component interaction
describe("Agent + Vault Integration", () => {
  it("should bootstrap with valid token", async () => {
    const agent = new GovernedAgent("test-agent");
    const [ok, msg] = await agent.bootstrap();
    expect(ok).toBe(true);
  });
});

// 3. Scenario Tests - Full workflow validation
describe("Multi-Agent Scenarios", () => {
  it("should spawn GAMMA on complexity threshold", async () => {
    const orchestrator = new TestOrchestrator();
    const metrics = await orchestrator.runTask(highComplexityTask);
    expect(metrics.gamma_spawned).toBe(true);
    expect(metrics.gamma_spawn_reason).toBe("COMPLEXITY");
  });
});

Mock Infrastructure:

// MockLLM for deterministic testing
class MockLLM {
  private responses: Map<string, string> = new Map();

  setResponse(pattern: string, response: string) {
    this.responses.set(pattern, response);
  }

  async complete(prompt: string): Promise<string> {
    for (const [pattern, response] of this.responses) {
      if (prompt.includes(pattern)) return response;
    }
    return '{"confidence": 0.5, "steps": []}';
  }
}

// MockDragonfly for state testing
class MockDragonfly {
  private store: Map<string, any> = new Map();

  async set(key: string, value: any) { this.store.set(key, value); }
  async get(key: string) { return this.store.get(key); }
  async hSet(key: string, field: string, value: any) { /* ... */ }
}

Sample Implementations

Single Governed Agent (TypeScript/Bun)

// File: governed-agent.ts
import { GovernanceManager } from "./coordination";

class GovernedAgent {
  private gov: GovernanceManager;
  private agentId: string;

  async bootstrap(): Promise<[boolean, string]> {
    this.gov = new GovernanceManager();
    await this.gov.connect();

    // Read revocation ledger
    const revocations = await this.gov.getRecentRevocations(50);
    for (const rev of revocations) {
      if (rev.agent_id === this.agentId) {
        return [false, "AGENT_PREVIOUSLY_REVOKED"];
      }
    }

    // Load instruction packet
    const packet = await this.gov.getPacket(this.agentId);
    if (!packet) return [false, "NO_INSTRUCTION_PACKET"];

    // Acquire lock
    if (!await this.gov.acquireLock(this.agentId)) {
      return [false, "CANNOT_ACQUIRE_LOCK"];
    }

    return [true, "BOOTSTRAP_COMPLETE"];
  }

  async transition(phase: string, step: string): Promise<boolean> {
    await this.gov.heartbeat(this.agentId);
    await this.gov.refreshLock(this.agentId);

    const [ok, reason] = await this.gov.checkErrorBudget(this.agentId);
    if (!ok) {
      await this.gov.revokeAgent(this.agentId, "ERROR_BUDGET_EXCEEDED", reason);
      return false;
    }

    await this.gov.setState({ phase, step, /* ... */ });
    return true;
  }
}

Multi-Agent Orchestrator

// File: orchestrator.ts
class MultiAgentOrchestrator {
  private alphaAgent: AgentAlpha;
  private betaAgent: AgentBeta;
  private gammaAgent?: AgentGamma;

  async runTask(task: TaskDefinition): Promise<Metrics> {
    // Launch ALPHA and BETA in parallel
    const alphaPromise = this.alphaAgent.run(task);
    const betaPromise = this.betaAgent.run(task);

    // Monitor spawn conditions
    this.monitorInterval = setInterval(() => {
      this.checkSpawnConditions();
    }, 2000);

    await Promise.all([alphaPromise, betaPromise]);

    // Spawn GAMMA if needed
    if (this.shouldSpawnGamma()) {
      await this.spawnGamma(spawnReason);
      await this.gammaAgent.run(task);
    }

    return this.metrics.finalize();
  }

  private async checkSpawnConditions() {
    // Check stuck, conflict, complexity thresholds
    const stuckAgents = await this.stateManager.detectStuckAgents(30);
    if (stuckAgents.length > 0) {
      await this.spawnController.updateCondition("STUCK", stuckAgents.length);
    }
  }
}

Future Potential

Short-Term (Q1 2026)

Pipeline DSL

pipeline:
  name: infrastructure-deploy
  stages:
    - plan:
        agent: planner
        artifacts: [terraform-plan]
    - review:
        type: human-gate
        timeout: 30m
    - execute:
        agent: executor
        requires: [plan]

Agent Templates
- Pre-configured agents for common tasks
- Terraform specialist
- Ansible specialist
- Code review specialist
Enhanced Testing
- Chaos testing for agent resilience
- Load testing for multi-agent scaling
- Regression test suite

Medium-Term (Q2-Q3 2026)

Hierarchical Agent Teams

Team Lead Agent
├── Research Team (3 agents)
├── Implementation Team (2 agents)
└── Review Team (2 agents)

Learning from History
- Analyze past task completions
- Suggest optimizations
- Predict failure patterns
External Integrations
- GitHub PR automation
- Slack notifications
- PagerDuty escalations

Long-Term (2027+)

Self-Optimizing Pipelines
- Agents propose pipeline improvements
- A/B testing of agent strategies
- Automatic tier promotion
Cross-System Orchestration
- Multiple infrastructure targets
- Hybrid cloud coordination
- Edge deployment agents

File Structure

/opt/agent-governance/
├── docs/
│   └── ARCHITECTURE.md          # This document
├── ledger/
│   └── governance.db            # SQLite audit trail
├── runtime/
│   ├── governance.py            # Python governance manager
│   └── monitors.py              # Monitor agents
├── agents/
│   ├── llm-planner/
│   │   ├── agent.py             # Python LLM agent
│   │   └── governed_agent.py    # Python governed agent
│   ├── llm-planner-ts/
│   │   ├── index.ts             # Basic TypeScript agent
│   │   └── governed-agent.ts    # Full governed agent (Bun)
│   └── multi-agent/
│       ├── types.ts             # Type definitions
│       ├── coordination.ts      # Blackboard, messaging, metrics
│       ├── agents.ts            # Alpha, Beta, Gamma agents
│       └── orchestrator.ts      # Multi-agent orchestrator
└── /opt/vault/
    ├── policies/
    │   ├── t0-observer.hcl
    │   ├── t1-operator.hcl
    │   ├── t2-builder.hcl
    │   ├── t3-executor.hcl
    │   └── t4-architect.hcl
    └── init-keys.json           # Vault credentials (chmod 600)

Running the System

Prerequisites

# Vault must be running and unsealed
docker ps | grep vault

# DragonflyDB must be running
docker ps | grep dragonfly

# Bun must be installed
~/.bun/bin/bun --version

Single Agent Test

cd /opt/agent-governance/agents/llm-planner-ts
~/.bun/bin/bun run governed-agent.ts \
  "agent-001" \
  "task-001" \
  "Design a microservices architecture"

Multi-Agent Test

cd /opt/agent-governance/agents/multi-agent
~/.bun/bin/bun run orchestrator.ts \
  "Design a distributed event-driven analytics platform" \
  --timeout 120

Contributing

When adding new features:

Follow the agent lifecycle - All agents must implement the standard phases
Log to the ledger - Every action must be auditable
Respect error budgets - Check and track errors properly
Write tests - Unit, integration, and scenario tests required
Document changes - Update this architecture document

26 KiB Raw Blame History