Go to file

profit 09be7eff4b Add consensus failure handling with fallback options for multi-agent pipelines

Implements detection and recovery for when agents fail to reach consensus:
- Orchestrator exits with code 2 on consensus failure (distinct from error=1)
- Records failed run context (proposals, agent states, conflicts) to Dragonfly
- Provides fallback options: rerun same, rerun with GAMMA, escalate tier, accept partial
- Adds UI alert with action buttons for user-driven recovery
- Adds failure details modal and downloadable failure report
- Only marks pipeline complete when consensus achieved or user accepts fallback

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-24 18:24:19 -05:00

agents

Add consensus failure handling with fallback options for multi-agent pipelines

2026-01-24 18:24:19 -05:00

analytics

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

bin

Add bug status tracking with API and UI

2026-01-24 17:17:43 -05:00

checkpoint

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

docs

Add consensus failure handling with fallback options for multi-agent pipelines

2026-01-24 18:24:19 -05:00

evidence

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

integrations

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

inventory

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

ledger

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

lib

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

memory

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

orchestrator

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

pipeline

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

preflight

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

runtime

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

sandbox

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

schemas

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

teams

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

testing

Add bug status tracking with API and UI

2026-01-24 17:17:43 -05:00

tests

Add 17 missing governance tests - coverage 57.6% → 70.2%

2026-01-23 22:22:26 -05:00

Add consensus failure handling with fallback options for multi-agent pipelines

2026-01-24 18:24:19 -05:00

wrappers

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

.gitignore

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

README.md

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

STATUS.md

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

README.md

Agent Governance System

A comprehensive framework for governing AI agent execution with security, auditability, and coordination.

Overview

The Agent Governance System provides infrastructure for running AI agents with:

Tiered permissions (T0 observer, T1 executor, T2 admin)
Audit trails via SQLite ledger
Secure credentials via HashiCorp Vault
State coordination via DragonflyDB
Pipeline orchestration for multi-agent workflows
Context management for long-running sessions

Quick Start

# Check system status
checkpoint load                    # Load session state
status dashboard                   # View directory progress
memory stats                       # Check memory usage

# Create checkpoint after work
checkpoint now --notes "Description of completed work"

Key Components

Directory	Purpose	Status
`pipeline/`	Pipeline DSL and core definitions	✅ Complete
`runtime/`	Agent lifecycle and governance	✅ Complete
`checkpoint/`	Session state management	✅ Complete
`memory/`	External memory layer	✅ Complete
`teams/`	Hierarchical team framework	✅ Complete
`analytics/`	Learning and pattern detection	✅ Complete
`tests/`	Test suites including chaos tests	🚧 In Progress

CLI Tools

Context Management

# Checkpoints - session state snapshots
checkpoint now --notes "..."       # Create checkpoint
checkpoint load                    # Load latest
checkpoint report                  # Combined status view
checkpoint timeline               # History

# Status - per-directory tracking
status sweep                       # Check all directories
status update <dir> --phase <p>    # Update status
status dashboard                   # Overview

# Memory - large content storage
memory log --stdin                 # Store from pipe
memory fetch <id> -s              # Get summary
memory list                        # Browse entries

Agent Operations

# Run chaos tests
python tests/multi-agent-chaos/orchestrator.py

# Validate pipelines
python pipeline/pipeline.py validate <file.yaml>

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Agent Governance                         │
├──────────────┬──────────────┬──────────────┬───────────────┤
│   Agents     │   Pipeline   │   Runtime    │   Context     │
│              │              │              │               │
│ • T0 Observer│ • DSL Parser │ • Lifecycle  │ • Checkpoints │
│ • T1 Executor│ • Stages     │ • Governance │ • STATUS      │
│ • T2 Admin   │ • Templates  │ • Revocation │ • Memory      │
├──────────────┴──────────────┴──────────────┴───────────────┤
│                    Infrastructure                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐  │
│  │  Vault   │  │ Dragonfly│  │  Ledger  │  │  Evidence  │  │
│  │ (secrets)│  │  (state) │  │  (audit) │  │ (artifacts)│  │
│  └──────────┘  └──────────┘  └──────────┘  └────────────┘  │
└─────────────────────────────────────────────────────────────┘

Documentation

Document	Description
ARCHITECTURE.md	Full system design
CONTEXT_MANAGEMENT.md	Checkpoints, STATUS, Memory
MEMORY_LAYER.md	External memory details
STATUS_PROTOCOL.md	Directory status protocol

Directory Structure

agent-governance/
├── agents/           # Agent implementations (T0, T1, T2)
├── analytics/        # Learning and pattern detection
├── bin/              # CLI tools (checkpoint, status, memory)
├── checkpoint/       # Session state management
├── docs/             # Documentation
├── evidence/         # Audit evidence packages
├── integrations/     # External integrations (GitHub, Slack)
├── ledger/           # SQLite audit ledger
├── memory/           # External memory layer
├── orchestrator/     # Multi-agent orchestration
├── pipeline/         # Pipeline DSL and templates
├── preflight/        # Pre-execution validation
├── runtime/          # Agent lifecycle governance
├── sandbox/          # Sandboxed execution (Terraform, Ansible)
├── schemas/          # JSON schemas
├── teams/            # Hierarchical team framework
├── tests/            # Test suites
└── wrappers/         # Tool wrappers

Current Status

Progress: ███████░░░░░░░░░░░░░░░░░░░░░░░ 23%

✅ Complete:       14 directories
🚧 In Progress:     5 directories

Run status dashboard for current details.

Recovery After Reset

# 1. Load checkpoint
checkpoint load

# 2. View combined status
checkpoint report

# 3. Check memory
memory list --limit 5

# 4. Resume work
status update ./target-dir --task "Resuming work"

Dependencies

Service	Purpose	Port
HashiCorp Vault	Secrets management	8200
DragonflyDB	State coordination	6379
SQLite	Audit ledger	File

Phase 8: Production Hardening - In Progress

Completed Phases: 1-7 ✅ | Foundation, Vault, Pipeline, Promotion/Revocation, Agent Bootstrap, DSL/Templates/Testing, Teams/Learning