Implements detection and recovery for when agents fail to reach consensus: - Orchestrator exits with code 2 on consensus failure (distinct from error=1) - Records failed run context (proposals, agent states, conflicts) to Dragonfly - Provides fallback options: rerun same, rerun with GAMMA, escalate tier, accept partial - Adds UI alert with action buttons for user-driven recovery - Adds failure details modal and downloadable failure report - Only marks pipeline complete when consensus achieved or user accepts fallback Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Agent Governance System
A comprehensive framework for governing AI agent execution with security, auditability, and coordination.
Overview
The Agent Governance System provides infrastructure for running AI agents with:
- Tiered permissions (T0 observer, T1 executor, T2 admin)
- Audit trails via SQLite ledger
- Secure credentials via HashiCorp Vault
- State coordination via DragonflyDB
- Pipeline orchestration for multi-agent workflows
- Context management for long-running sessions
Quick Start
# Check system status
checkpoint load # Load session state
status dashboard # View directory progress
memory stats # Check memory usage
# Create checkpoint after work
checkpoint now --notes "Description of completed work"
Key Components
| Directory | Purpose | Status |
|---|---|---|
pipeline/ |
Pipeline DSL and core definitions | ✅ Complete |
runtime/ |
Agent lifecycle and governance | ✅ Complete |
checkpoint/ |
Session state management | ✅ Complete |
memory/ |
External memory layer | ✅ Complete |
teams/ |
Hierarchical team framework | ✅ Complete |
analytics/ |
Learning and pattern detection | ✅ Complete |
tests/ |
Test suites including chaos tests | 🚧 In Progress |
CLI Tools
Context Management
# Checkpoints - session state snapshots
checkpoint now --notes "..." # Create checkpoint
checkpoint load # Load latest
checkpoint report # Combined status view
checkpoint timeline # History
# Status - per-directory tracking
status sweep # Check all directories
status update <dir> --phase <p> # Update status
status dashboard # Overview
# Memory - large content storage
memory log --stdin # Store from pipe
memory fetch <id> -s # Get summary
memory list # Browse entries
Agent Operations
# Run chaos tests
python tests/multi-agent-chaos/orchestrator.py
# Validate pipelines
python pipeline/pipeline.py validate <file.yaml>
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Agent Governance │
├──────────────┬──────────────┬──────────────┬───────────────┤
│ Agents │ Pipeline │ Runtime │ Context │
│ │ │ │ │
│ • T0 Observer│ • DSL Parser │ • Lifecycle │ • Checkpoints │
│ • T1 Executor│ • Stages │ • Governance │ • STATUS │
│ • T2 Admin │ • Templates │ • Revocation │ • Memory │
├──────────────┴──────────────┴──────────────┴───────────────┤
│ Infrastructure │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │ Vault │ │ Dragonfly│ │ Ledger │ │ Evidence │ │
│ │ (secrets)│ │ (state) │ │ (audit) │ │ (artifacts)│ │
│ └──────────┘ └──────────┘ └──────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────┘
Documentation
| Document | Description |
|---|---|
| ARCHITECTURE.md | Full system design |
| CONTEXT_MANAGEMENT.md | Checkpoints, STATUS, Memory |
| MEMORY_LAYER.md | External memory details |
| STATUS_PROTOCOL.md | Directory status protocol |
Directory Structure
agent-governance/
├── agents/ # Agent implementations (T0, T1, T2)
├── analytics/ # Learning and pattern detection
├── bin/ # CLI tools (checkpoint, status, memory)
├── checkpoint/ # Session state management
├── docs/ # Documentation
├── evidence/ # Audit evidence packages
├── integrations/ # External integrations (GitHub, Slack)
├── ledger/ # SQLite audit ledger
├── memory/ # External memory layer
├── orchestrator/ # Multi-agent orchestration
├── pipeline/ # Pipeline DSL and templates
├── preflight/ # Pre-execution validation
├── runtime/ # Agent lifecycle governance
├── sandbox/ # Sandboxed execution (Terraform, Ansible)
├── schemas/ # JSON schemas
├── teams/ # Hierarchical team framework
├── tests/ # Test suites
└── wrappers/ # Tool wrappers
Current Status
Progress: ███████░░░░░░░░░░░░░░░░░░░░░░░ 23%
✅ Complete: 14 directories
🚧 In Progress: 5 directories
Run status dashboard for current details.
Recovery After Reset
# 1. Load checkpoint
checkpoint load
# 2. View combined status
checkpoint report
# 3. Check memory
memory list --limit 5
# 4. Resume work
status update ./target-dir --task "Resuming work"
Dependencies
| Service | Purpose | Port |
|---|---|---|
| HashiCorp Vault | Secrets management | 8200 |
| DragonflyDB | State coordination | 6379 |
| SQLite | Audit ledger | File |
Phase 8: Production Hardening - In Progress
Completed Phases: 1-7 ✅ | Foundation, Vault, Pipeline, Promotion/Revocation, Agent Bootstrap, DSL/Templates/Testing, Teams/Learning
Description
Languages
Python
70.8%
TypeScript
25.5%
Shell
3.6%
HCL
0.1%