Phase 8 Production Hardening with complete governance infrastructure: - Vault integration with tiered policies (T0-T4) - DragonflyDB state management - SQLite audit ledger - Pipeline DSL and templates - Promotion/revocation engine - Checkpoint system for session persistence - Health manager and circuit breaker for fault tolerance - GitHub/Slack integrations - Architectural test pipeline with bug watcher, suggestion engine, council review - Multi-agent chaos testing framework Test Results: - Governance tests: 68/68 passing - E2E workflow: 16/16 passing - Phase 2 Vault: 14/14 passing - Integration tests: 27/27 passing Coverage: 57.6% average across 12 phases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
164 lines
6.2 KiB
Markdown
164 lines
6.2 KiB
Markdown
# Agent Governance System
|
|
|
|
> A comprehensive framework for governing AI agent execution with security, auditability, and coordination.
|
|
|
|
## Overview
|
|
|
|
The Agent Governance System provides infrastructure for running AI agents with:
|
|
- **Tiered permissions** (T0 observer, T1 executor, T2 admin)
|
|
- **Audit trails** via SQLite ledger
|
|
- **Secure credentials** via HashiCorp Vault
|
|
- **State coordination** via DragonflyDB
|
|
- **Pipeline orchestration** for multi-agent workflows
|
|
- **Context management** for long-running sessions
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Check system status
|
|
checkpoint load # Load session state
|
|
status dashboard # View directory progress
|
|
memory stats # Check memory usage
|
|
|
|
# Create checkpoint after work
|
|
checkpoint now --notes "Description of completed work"
|
|
```
|
|
|
|
## Key Components
|
|
|
|
| Directory | Purpose | Status |
|
|
|-----------|---------|--------|
|
|
| `pipeline/` | Pipeline DSL and core definitions | ✅ Complete |
|
|
| `runtime/` | Agent lifecycle and governance | ✅ Complete |
|
|
| `checkpoint/` | Session state management | ✅ Complete |
|
|
| `memory/` | External memory layer | ✅ Complete |
|
|
| `teams/` | Hierarchical team framework | ✅ Complete |
|
|
| `analytics/` | Learning and pattern detection | ✅ Complete |
|
|
| `tests/` | Test suites including chaos tests | 🚧 In Progress |
|
|
|
|
## CLI Tools
|
|
|
|
### Context Management
|
|
|
|
```bash
|
|
# Checkpoints - session state snapshots
|
|
checkpoint now --notes "..." # Create checkpoint
|
|
checkpoint load # Load latest
|
|
checkpoint report # Combined status view
|
|
checkpoint timeline # History
|
|
|
|
# Status - per-directory tracking
|
|
status sweep # Check all directories
|
|
status update <dir> --phase <p> # Update status
|
|
status dashboard # Overview
|
|
|
|
# Memory - large content storage
|
|
memory log --stdin # Store from pipe
|
|
memory fetch <id> -s # Get summary
|
|
memory list # Browse entries
|
|
```
|
|
|
|
### Agent Operations
|
|
|
|
```bash
|
|
# Run chaos tests
|
|
python tests/multi-agent-chaos/orchestrator.py
|
|
|
|
# Validate pipelines
|
|
python pipeline/pipeline.py validate <file.yaml>
|
|
```
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Agent Governance │
|
|
├──────────────┬──────────────┬──────────────┬───────────────┤
|
|
│ Agents │ Pipeline │ Runtime │ Context │
|
|
│ │ │ │ │
|
|
│ • T0 Observer│ • DSL Parser │ • Lifecycle │ • Checkpoints │
|
|
│ • T1 Executor│ • Stages │ • Governance │ • STATUS │
|
|
│ • T2 Admin │ • Templates │ • Revocation │ • Memory │
|
|
├──────────────┴──────────────┴──────────────┴───────────────┤
|
|
│ Infrastructure │
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
|
|
│ │ Vault │ │ Dragonfly│ │ Ledger │ │ Evidence │ │
|
|
│ │ (secrets)│ │ (state) │ │ (audit) │ │ (artifacts)│ │
|
|
│ └──────────┘ └──────────┘ └──────────┘ └────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Documentation
|
|
|
|
| Document | Description |
|
|
|----------|-------------|
|
|
| [ARCHITECTURE.md](docs/ARCHITECTURE.md) | Full system design |
|
|
| [CONTEXT_MANAGEMENT.md](docs/CONTEXT_MANAGEMENT.md) | Checkpoints, STATUS, Memory |
|
|
| [MEMORY_LAYER.md](docs/MEMORY_LAYER.md) | External memory details |
|
|
| [STATUS_PROTOCOL.md](docs/STATUS_PROTOCOL.md) | Directory status protocol |
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
agent-governance/
|
|
├── agents/ # Agent implementations (T0, T1, T2)
|
|
├── analytics/ # Learning and pattern detection
|
|
├── bin/ # CLI tools (checkpoint, status, memory)
|
|
├── checkpoint/ # Session state management
|
|
├── docs/ # Documentation
|
|
├── evidence/ # Audit evidence packages
|
|
├── integrations/ # External integrations (GitHub, Slack)
|
|
├── ledger/ # SQLite audit ledger
|
|
├── memory/ # External memory layer
|
|
├── orchestrator/ # Multi-agent orchestration
|
|
├── pipeline/ # Pipeline DSL and templates
|
|
├── preflight/ # Pre-execution validation
|
|
├── runtime/ # Agent lifecycle governance
|
|
├── sandbox/ # Sandboxed execution (Terraform, Ansible)
|
|
├── schemas/ # JSON schemas
|
|
├── teams/ # Hierarchical team framework
|
|
├── tests/ # Test suites
|
|
└── wrappers/ # Tool wrappers
|
|
```
|
|
|
|
## Current Status
|
|
|
|
```
|
|
Progress: ███████░░░░░░░░░░░░░░░░░░░░░░░ 23%
|
|
|
|
✅ Complete: 14 directories
|
|
🚧 In Progress: 5 directories
|
|
```
|
|
|
|
Run `status dashboard` for current details.
|
|
|
|
## Recovery After Reset
|
|
|
|
```bash
|
|
# 1. Load checkpoint
|
|
checkpoint load
|
|
|
|
# 2. View combined status
|
|
checkpoint report
|
|
|
|
# 3. Check memory
|
|
memory list --limit 5
|
|
|
|
# 4. Resume work
|
|
status update ./target-dir --task "Resuming work"
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
| Service | Purpose | Port |
|
|
|---------|---------|------|
|
|
| HashiCorp Vault | Secrets management | 8200 |
|
|
| DragonflyDB | State coordination | 6379 |
|
|
| SQLite | Audit ledger | File |
|
|
|
|
---
|
|
|
|
*Phase 8: Production Hardening - In Progress*
|
|
|
|
**Completed Phases:** 1-7 ✅ | Foundation, Vault, Pipeline, Promotion/Revocation, Agent Bootstrap, DSL/Templates/Testing, Teams/Learning
|