profit 77655c298c Initial commit: Agent Governance System Phase 8
Phase 8 Production Hardening with complete governance infrastructure:

- Vault integration with tiered policies (T0-T4)
- DragonflyDB state management
- SQLite audit ledger
- Pipeline DSL and templates
- Promotion/revocation engine
- Checkpoint system for session persistence
- Health manager and circuit breaker for fault tolerance
- GitHub/Slack integrations
- Architectural test pipeline with bug watcher, suggestion engine, council review
- Multi-agent chaos testing framework

Test Results:
- Governance tests: 68/68 passing
- E2E workflow: 16/16 passing
- Phase 2 Vault: 14/14 passing
- Integration tests: 27/27 passing

Coverage: 57.6% average across 12 phases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 22:07:06 -05:00

297 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Architectural Test Pipeline
> Multi-layer oversight system ensuring no single hidden bug can compromise the Agent Governance System.
## Overview
The Architectural Test Pipeline provides continuous validation across all 12 phases through multiple oversight layers that monitor, analyze, review, and report on system health.
## Architecture
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ARCHITECTURAL TEST PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Bug Window │───▶│ Suggestion │───▶│ Council │ │
│ │ Watcher │ │ Engine │ │ Review │ │
│ │ │ │ │ │ │ │
│ │ • Real-time │ │ • Context-aware │ │ • Safety │ │
│ │ • All phases │ │ • Risk-ranked │ │ • Performance │ │
│ │ • Anomalies │ │ • Auto-fixable │ │ • Architecture │ │
│ └────────┬────────┘ └────────┬────────┘ │ • Compliance │ │
│ │ │ │ • Quality │ │
│ │ │ └────────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Phase Validator │ │
│ │ Phase 1 ✅ │ Phase 2 ✅ │ Phase 3 ✅ │ Phase 4 ✅ │ ... │ │
│ │ Phase 5 ⭐ │ Phase 6 ✅ │ Phase 7 ✅ │ Phase 8 🚧 │ ... │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Error Injector │ │ Reporter │ │
│ │ │ │ │ │
│ │ • Safe mode │ │ • Markdown │ │
│ │ • Scenarios │ │ • Per-phase │ │
│ │ • Validation │ │ • Actions │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Oversight Layers
### 1. Bug Window Watcher (`bug_watcher.py`)
Real-time monitoring of every pipeline stage.
**Features:**
- Monitors all 12 phases continuously
- Detects anomalies: errors, regressions, missing artifacts, state inconsistencies
- Links findings to phase, directory, STATUS.md, and checkpoint entries
- Persists to DragonflyDB for cross-session tracking
**Anomaly Types:**
| Type | Description | Severity Range |
|------|-------------|----------------|
| UNHANDLED_ERROR | Uncaught exceptions | Medium-Critical |
| REGRESSION | Behavior change from baseline | High |
| MISSING_ARTIFACT | Required file/config missing | Low-High |
| STATE_INCONSISTENCY | Status mismatch | Medium |
| DEPENDENCY_UNAVAILABLE | Vault/Dragonfly/Ledger down | Critical |
| SECURITY_VIOLATION | Unacknowledged violation | Critical |
### 2. Suggestion Engine (`suggestion_engine.py`)
AI-driven analysis using historical context.
**Features:**
- Gathers context from checkpoints, memory, STATUS files
- Pattern-based suggestions from known fixes
- Context-aware suggestions from historical outcomes
- Risk/impact ranking for prioritization
**Suggestion Ranking:**
```
Priority Score = Impact × (1 - Risk)
Impact Levels: transformative (1.0) > high (0.8) > medium (0.6) > low (0.4)
Risk Levels: critical (0.2) < high (0.4) < medium (0.6) < low (0.8)
```
### 3. Council Review (`council.py`)
Multi-perspective review with 5 specialized reviewers.
**Reviewers:**
| Role | Focus | Risk Tolerance |
|------|-------|----------------|
| Safety | Security, access control | Very Low (0.2) |
| Performance | Latency, throughput | Medium (0.6) |
| Architecture | Design, maintainability | Medium (0.5) |
| Compliance | Governance, policies | Low (0.3) |
| Quality | Testing, documentation | Low (0.4) |
**Decision Types:**
- `AUTO_APPROVE` - Safe to auto-implement
- `HUMAN_APPROVE` - Approved, needs human
- `DEFER` - Needs more discussion
- `REJECT` - Do not implement
- `ESCALATE` - Needs higher authority
### 4. Phase Validator (`phase_validator.py`)
Ensures all phases have required components.
**Validation Levels:**
| Level | Description |
|-------|-------------|
| BASIC | Existence checks only |
| STANDARD | + Functionality tests |
| THOROUGH | + Integration tests |
| COMPREHENSIVE | + Chaos/edge cases |
**Special Attention: Phase 5**
Phase 5 (Agent Bootstrapping) receives extra validation as the current focus.
### 5. Error Injector (`error_injector.py`)
Controlled fault injection for testing oversight.
**Predefined Scenarios:**
| Scenario | Type | Target |
|----------|------|--------|
| missing_config | MISSING_FILE | agents/tier0-agent/config/agent.json |
| corrupted_status | INVALID_STATUS | checkpoint/STATUS.md |
| stale_checkpoint | STATE_INCONSISTENCY | checkpoint/storage/ |
| redis_key_missing | DEPENDENCY_FAILURE | oversight:watcher |
| violation_unacked | SECURITY_VIOLATION | violations table |
| blocked_directory | STATE_INCONSISTENCY | preflight/ |
**Safe Mode:** By default, injections are simulated without modifying files.
### 6. Reporter (`reporter.py`)
Comprehensive reporting with actionable insights.
**Report Sections:**
- Executive Summary
- Phase Status Matrix
- Bug Watcher Summary
- Suggestion Engine Summary
- Council Decisions
- Injection Test Results
- Pending Actions
- Critical Issues
## CLI Usage
```bash
# Full pipeline execution
oversight run
# With injection tests
oversight run --inject
# Focus on specific phase
oversight run --phase 5
# Quick validation
oversight quick
# Validate specific phase in detail
oversight validate --phase 5
# Generate report only
oversight report
# Show phase matrix
oversight matrix
# JSON output
oversight run --json
```
## Integration with Checkpoints/Status/Memory
### Checkpoints
- Watcher checks checkpoint consistency and staleness
- Suggestions reference recent checkpoint context
- Reports include checkpoint link
### STATUS.md Files
- Watcher scans STATUS.md for BLOCKED states
- Phase validator checks STATUS.md existence
- Reports track per-directory status
### Memory Layer
- Suggestion engine queries memory for related entries
- Context gathered from summaries directory
- Report counts available memory entries
## Running Tests
### Injection Test Suite
```bash
# Run all injection scenarios
oversight run --inject
# Or use injector directly
cd /opt/agent-governance/testing/oversight
python -m testing.oversight.error_injector test-all
```
### Expected Results
A healthy system should:
1. Detect all injected errors (100% detection rate)
2. Generate relevant suggestions (accurate quality)
3. Produce council decisions for each suggestion
4. Pass all injection tests
## Extending the Pipeline
### Adding a New Anomaly Type
1. Add to `AnomalyType` enum in `bug_watcher.py`
2. Add detection logic in `_run_phase_specific_checks()`
3. Add fix patterns in `SuggestionEngine.FIX_PATTERNS`
### Adding a New Council Reviewer
1. Add role to `ReviewerRole` enum in `council.py`
2. Create `ReviewerProfile` in `REVIEWERS` dict
3. Implement `_<role>_review()` method
### Adding a New Injection Scenario
1. Add to `SCENARIOS` dict in `error_injector.py`
2. Implement injection/cleanup in `_perform_injection()`
## File Structure
```
testing/oversight/
├── __init__.py # Package exports
├── pipeline.py # Main orchestrator
├── bug_watcher.py # Real-time anomaly detection
├── suggestion_engine.py # Fix recommendations
├── council.py # Multi-agent review
├── phase_validator.py # Phase coverage
├── error_injector.py # Fault injection
├── reporter.py # Report generation
├── README.md # This file
└── reports/ # Generated reports
```
## Example Report
```
# Architectural Test Pipeline Report
**Generated:** 2026-01-23T12:00:00Z
**Report ID:** rpt-20260123-120000
## Executive Summary
- **Phases Validated:** 12
- **Average Coverage:** 75.3%
- **Total Anomalies:** 8
- **Critical Gaps:** 2
## Phase Status Matrix
| Phase | Name | Status | Coverage | Bugs |
|-------|------|--------|----------|------|
| 1 | Foundation | ✅ complete | 95.0% | 0 |
| 5 | Agent Bootstrapping | 🚧 in_progress | 80.0% | 2 |
| 8 | Production Hardening | ❌ blocked | 40.0% | 3 |
...
```
## Troubleshooting
### Pipeline Fails to Start
- Verify DragonflyDB is running: `redis-cli -p 6379 -a governance2026 PING`
- Check Vault status: `docker exec vault vault status`
### No Anomalies Detected
- Ensure STATUS.md files exist in directories
- Check checkpoint storage has recent entries
### Injection Tests Fail
- Verify safe mode is enabled (default)
- Check file permissions in target directories
## Related Documentation
- [CONTEXT_MANAGEMENT.md](../../docs/CONTEXT_MANAGEMENT.md) - Checkpoints and STATUS
- [MEMORY_LAYER.md](../../docs/MEMORY_LAYER.md) - External memory
- [STATUS_PROTOCOL.md](../../docs/STATUS_PROTOCOL.md) - Directory status protocol