Phase 8 Production Hardening with complete governance infrastructure: - Vault integration with tiered policies (T0-T4) - DragonflyDB state management - SQLite audit ledger - Pipeline DSL and templates - Promotion/revocation engine - Checkpoint system for session persistence - Health manager and circuit breaker for fault tolerance - GitHub/Slack integrations - Architectural test pipeline with bug watcher, suggestion engine, council review - Multi-agent chaos testing framework Test Results: - Governance tests: 68/68 passing - E2E workflow: 16/16 passing - Phase 2 Vault: 14/14 passing - Integration tests: 27/27 passing Coverage: 57.6% average across 12 phases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
297 lines
11 KiB
Markdown
297 lines
11 KiB
Markdown
# Architectural Test Pipeline
|
||
|
||
> Multi-layer oversight system ensuring no single hidden bug can compromise the Agent Governance System.
|
||
|
||
## Overview
|
||
|
||
The Architectural Test Pipeline provides continuous validation across all 12 phases through multiple oversight layers that monitor, analyze, review, and report on system health.
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||
│ ARCHITECTURAL TEST PIPELINE │
|
||
├─────────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
||
│ │ Bug Window │───▶│ Suggestion │───▶│ Council │ │
|
||
│ │ Watcher │ │ Engine │ │ Review │ │
|
||
│ │ │ │ │ │ │ │
|
||
│ │ • Real-time │ │ • Context-aware │ │ • Safety │ │
|
||
│ │ • All phases │ │ • Risk-ranked │ │ • Performance │ │
|
||
│ │ • Anomalies │ │ • Auto-fixable │ │ • Architecture │ │
|
||
│ └────────┬────────┘ └────────┬────────┘ │ • Compliance │ │
|
||
│ │ │ │ • Quality │ │
|
||
│ │ │ └────────┬────────┘ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||
│ │ Phase Validator │ │
|
||
│ │ Phase 1 ✅ │ Phase 2 ✅ │ Phase 3 ✅ │ Phase 4 ✅ │ ... │ │
|
||
│ │ Phase 5 ⭐ │ Phase 6 ✅ │ Phase 7 ✅ │ Phase 8 🚧 │ ... │ │
|
||
│ └─────────────────────────────────────────────────────────────┘ │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ ┌─────────────────┐ ┌─────────────────┐ │
|
||
│ │ Error Injector │ │ Reporter │ │
|
||
│ │ │ │ │ │
|
||
│ │ • Safe mode │ │ • Markdown │ │
|
||
│ │ • Scenarios │ │ • Per-phase │ │
|
||
│ │ • Validation │ │ • Actions │ │
|
||
│ └─────────────────┘ └─────────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Oversight Layers
|
||
|
||
### 1. Bug Window Watcher (`bug_watcher.py`)
|
||
|
||
Real-time monitoring of every pipeline stage.
|
||
|
||
**Features:**
|
||
- Monitors all 12 phases continuously
|
||
- Detects anomalies: errors, regressions, missing artifacts, state inconsistencies
|
||
- Links findings to phase, directory, STATUS.md, and checkpoint entries
|
||
- Persists to DragonflyDB for cross-session tracking
|
||
|
||
**Anomaly Types:**
|
||
| Type | Description | Severity Range |
|
||
|------|-------------|----------------|
|
||
| UNHANDLED_ERROR | Uncaught exceptions | Medium-Critical |
|
||
| REGRESSION | Behavior change from baseline | High |
|
||
| MISSING_ARTIFACT | Required file/config missing | Low-High |
|
||
| STATE_INCONSISTENCY | Status mismatch | Medium |
|
||
| DEPENDENCY_UNAVAILABLE | Vault/Dragonfly/Ledger down | Critical |
|
||
| SECURITY_VIOLATION | Unacknowledged violation | Critical |
|
||
|
||
### 2. Suggestion Engine (`suggestion_engine.py`)
|
||
|
||
AI-driven analysis using historical context.
|
||
|
||
**Features:**
|
||
- Gathers context from checkpoints, memory, STATUS files
|
||
- Pattern-based suggestions from known fixes
|
||
- Context-aware suggestions from historical outcomes
|
||
- Risk/impact ranking for prioritization
|
||
|
||
**Suggestion Ranking:**
|
||
```
|
||
Priority Score = Impact × (1 - Risk)
|
||
|
||
Impact Levels: transformative (1.0) > high (0.8) > medium (0.6) > low (0.4)
|
||
Risk Levels: critical (0.2) < high (0.4) < medium (0.6) < low (0.8)
|
||
```
|
||
|
||
### 3. Council Review (`council.py`)
|
||
|
||
Multi-perspective review with 5 specialized reviewers.
|
||
|
||
**Reviewers:**
|
||
| Role | Focus | Risk Tolerance |
|
||
|------|-------|----------------|
|
||
| Safety | Security, access control | Very Low (0.2) |
|
||
| Performance | Latency, throughput | Medium (0.6) |
|
||
| Architecture | Design, maintainability | Medium (0.5) |
|
||
| Compliance | Governance, policies | Low (0.3) |
|
||
| Quality | Testing, documentation | Low (0.4) |
|
||
|
||
**Decision Types:**
|
||
- `AUTO_APPROVE` - Safe to auto-implement
|
||
- `HUMAN_APPROVE` - Approved, needs human
|
||
- `DEFER` - Needs more discussion
|
||
- `REJECT` - Do not implement
|
||
- `ESCALATE` - Needs higher authority
|
||
|
||
### 4. Phase Validator (`phase_validator.py`)
|
||
|
||
Ensures all phases have required components.
|
||
|
||
**Validation Levels:**
|
||
| Level | Description |
|
||
|-------|-------------|
|
||
| BASIC | Existence checks only |
|
||
| STANDARD | + Functionality tests |
|
||
| THOROUGH | + Integration tests |
|
||
| COMPREHENSIVE | + Chaos/edge cases |
|
||
|
||
**Special Attention: Phase 5**
|
||
Phase 5 (Agent Bootstrapping) receives extra validation as the current focus.
|
||
|
||
### 5. Error Injector (`error_injector.py`)
|
||
|
||
Controlled fault injection for testing oversight.
|
||
|
||
**Predefined Scenarios:**
|
||
| Scenario | Type | Target |
|
||
|----------|------|--------|
|
||
| missing_config | MISSING_FILE | agents/tier0-agent/config/agent.json |
|
||
| corrupted_status | INVALID_STATUS | checkpoint/STATUS.md |
|
||
| stale_checkpoint | STATE_INCONSISTENCY | checkpoint/storage/ |
|
||
| redis_key_missing | DEPENDENCY_FAILURE | oversight:watcher |
|
||
| violation_unacked | SECURITY_VIOLATION | violations table |
|
||
| blocked_directory | STATE_INCONSISTENCY | preflight/ |
|
||
|
||
**Safe Mode:** By default, injections are simulated without modifying files.
|
||
|
||
### 6. Reporter (`reporter.py`)
|
||
|
||
Comprehensive reporting with actionable insights.
|
||
|
||
**Report Sections:**
|
||
- Executive Summary
|
||
- Phase Status Matrix
|
||
- Bug Watcher Summary
|
||
- Suggestion Engine Summary
|
||
- Council Decisions
|
||
- Injection Test Results
|
||
- Pending Actions
|
||
- Critical Issues
|
||
|
||
## CLI Usage
|
||
|
||
```bash
|
||
# Full pipeline execution
|
||
oversight run
|
||
|
||
# With injection tests
|
||
oversight run --inject
|
||
|
||
# Focus on specific phase
|
||
oversight run --phase 5
|
||
|
||
# Quick validation
|
||
oversight quick
|
||
|
||
# Validate specific phase in detail
|
||
oversight validate --phase 5
|
||
|
||
# Generate report only
|
||
oversight report
|
||
|
||
# Show phase matrix
|
||
oversight matrix
|
||
|
||
# JSON output
|
||
oversight run --json
|
||
```
|
||
|
||
## Integration with Checkpoints/Status/Memory
|
||
|
||
### Checkpoints
|
||
- Watcher checks checkpoint consistency and staleness
|
||
- Suggestions reference recent checkpoint context
|
||
- Reports include checkpoint link
|
||
|
||
### STATUS.md Files
|
||
- Watcher scans STATUS.md for BLOCKED states
|
||
- Phase validator checks STATUS.md existence
|
||
- Reports track per-directory status
|
||
|
||
### Memory Layer
|
||
- Suggestion engine queries memory for related entries
|
||
- Context gathered from summaries directory
|
||
- Report counts available memory entries
|
||
|
||
## Running Tests
|
||
|
||
### Injection Test Suite
|
||
|
||
```bash
|
||
# Run all injection scenarios
|
||
oversight run --inject
|
||
|
||
# Or use injector directly
|
||
cd /opt/agent-governance/testing/oversight
|
||
python -m testing.oversight.error_injector test-all
|
||
```
|
||
|
||
### Expected Results
|
||
|
||
A healthy system should:
|
||
1. Detect all injected errors (100% detection rate)
|
||
2. Generate relevant suggestions (accurate quality)
|
||
3. Produce council decisions for each suggestion
|
||
4. Pass all injection tests
|
||
|
||
## Extending the Pipeline
|
||
|
||
### Adding a New Anomaly Type
|
||
|
||
1. Add to `AnomalyType` enum in `bug_watcher.py`
|
||
2. Add detection logic in `_run_phase_specific_checks()`
|
||
3. Add fix patterns in `SuggestionEngine.FIX_PATTERNS`
|
||
|
||
### Adding a New Council Reviewer
|
||
|
||
1. Add role to `ReviewerRole` enum in `council.py`
|
||
2. Create `ReviewerProfile` in `REVIEWERS` dict
|
||
3. Implement `_<role>_review()` method
|
||
|
||
### Adding a New Injection Scenario
|
||
|
||
1. Add to `SCENARIOS` dict in `error_injector.py`
|
||
2. Implement injection/cleanup in `_perform_injection()`
|
||
|
||
## File Structure
|
||
|
||
```
|
||
testing/oversight/
|
||
├── __init__.py # Package exports
|
||
├── pipeline.py # Main orchestrator
|
||
├── bug_watcher.py # Real-time anomaly detection
|
||
├── suggestion_engine.py # Fix recommendations
|
||
├── council.py # Multi-agent review
|
||
├── phase_validator.py # Phase coverage
|
||
├── error_injector.py # Fault injection
|
||
├── reporter.py # Report generation
|
||
├── README.md # This file
|
||
└── reports/ # Generated reports
|
||
```
|
||
|
||
## Example Report
|
||
|
||
```
|
||
# Architectural Test Pipeline Report
|
||
|
||
**Generated:** 2026-01-23T12:00:00Z
|
||
**Report ID:** rpt-20260123-120000
|
||
|
||
## Executive Summary
|
||
|
||
- **Phases Validated:** 12
|
||
- **Average Coverage:** 75.3%
|
||
- **Total Anomalies:** 8
|
||
- **Critical Gaps:** 2
|
||
|
||
## Phase Status Matrix
|
||
|
||
| Phase | Name | Status | Coverage | Bugs |
|
||
|-------|------|--------|----------|------|
|
||
| 1 | Foundation | ✅ complete | 95.0% | 0 |
|
||
| 5 | Agent Bootstrapping | 🚧 in_progress | 80.0% | 2 |
|
||
| 8 | Production Hardening | ❌ blocked | 40.0% | 3 |
|
||
|
||
...
|
||
```
|
||
|
||
## Troubleshooting
|
||
|
||
### Pipeline Fails to Start
|
||
- Verify DragonflyDB is running: `redis-cli -p 6379 -a governance2026 PING`
|
||
- Check Vault status: `docker exec vault vault status`
|
||
|
||
### No Anomalies Detected
|
||
- Ensure STATUS.md files exist in directories
|
||
- Check checkpoint storage has recent entries
|
||
|
||
### Injection Tests Fail
|
||
- Verify safe mode is enabled (default)
|
||
- Check file permissions in target directories
|
||
|
||
## Related Documentation
|
||
|
||
- [CONTEXT_MANAGEMENT.md](../../docs/CONTEXT_MANAGEMENT.md) - Checkpoints and STATUS
|
||
- [MEMORY_LAYER.md](../../docs/MEMORY_LAYER.md) - External memory
|
||
- [STATUS_PROTOCOL.md](../../docs/STATUS_PROTOCOL.md) - Directory status protocol
|