agent-governance/testing/oversight/README.md

# Architectural Test Pipeline

> Multi-layer oversight system ensuring no single hidden bug can compromise the Agent Governance System.

## Overview

The Architectural Test Pipeline provides continuous validation across all 12 phases through multiple oversight layers that monitor, analyze, review, and report on system health.

## Architecture

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    ARCHITECTURAL TEST PIPELINE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐         │
│  │ Bug Window      │───▶│ Suggestion      │───▶│ Council         │         │
│  │ Watcher         │    │ Engine          │    │ Review          │         │
│  │                 │    │                 │    │                 │         │
│  │ • Real-time     │    │ • Context-aware │    │ • Safety        │         │
│  │ • All phases    │    │ • Risk-ranked   │    │ • Performance   │         │
│  │ • Anomalies     │    │ • Auto-fixable  │    │ • Architecture  │         │
│  └────────┬────────┘    └────────┬────────┘    │ • Compliance    │         │
│           │                      │             │ • Quality       │         │
│           │                      │             └────────┬────────┘         │
│           │                      │                      │                  │
│           ▼                      ▼                      ▼                  │
│  ┌─────────────────────────────────────────────────────────────┐           │
│  │                    Phase Validator                          │           │
│  │  Phase 1 ✅ │ Phase 2 ✅ │ Phase 3 ✅ │ Phase 4 ✅ │ ... │           │
│  │  Phase 5 ⭐ │ Phase 6 ✅ │ Phase 7 ✅ │ Phase 8 🚧 │ ... │           │
│  └─────────────────────────────────────────────────────────────┘           │
│                              │                                              │
│                              ▼                                              │
│  ┌─────────────────┐    ┌─────────────────┐                               │
│  │ Error Injector  │    │ Reporter        │                               │
│  │                 │    │                 │                               │
│  │ • Safe mode     │    │ • Markdown      │                               │
│  │ • Scenarios     │    │ • Per-phase     │                               │
│  │ • Validation    │    │ • Actions       │                               │
│  └─────────────────┘    └─────────────────┘                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

## Oversight Layers

### 1. Bug Window Watcher (`bug_watcher.py`)

Real-time monitoring of every pipeline stage.

**Features:**
- Monitors all 12 phases continuously
- Detects anomalies: errors, regressions, missing artifacts, state inconsistencies
- Links findings to phase, directory, STATUS.md, and checkpoint entries
- Persists to DragonflyDB for cross-session tracking

**Anomaly Types:**
| Type | Description | Severity Range |
|------|-------------|----------------|
| UNHANDLED_ERROR | Uncaught exceptions | Medium-Critical |
| REGRESSION | Behavior change from baseline | High |
| MISSING_ARTIFACT | Required file/config missing | Low-High |
| STATE_INCONSISTENCY | Status mismatch | Medium |
| DEPENDENCY_UNAVAILABLE | Vault/Dragonfly/Ledger down | Critical |
| SECURITY_VIOLATION | Unacknowledged violation | Critical |

### 2. Suggestion Engine (`suggestion_engine.py`)

AI-driven analysis using historical context.

**Features:**
- Gathers context from checkpoints, memory, STATUS files
- Pattern-based suggestions from known fixes
- Context-aware suggestions from historical outcomes
- Risk/impact ranking for prioritization

**Suggestion Ranking:**
```
Priority Score = Impact × (1 - Risk)

Impact Levels: transformative (1.0) > high (0.8) > medium (0.6) > low (0.4)
Risk Levels: critical (0.2) < high (0.4) < medium (0.6) < low (0.8)
```

### 3. Council Review (`council.py`)

Multi-perspective review with 5 specialized reviewers.

**Reviewers:**
| Role | Focus | Risk Tolerance |
|------|-------|----------------|
| Safety | Security, access control | Very Low (0.2) |
| Performance | Latency, throughput | Medium (0.6) |
| Architecture | Design, maintainability | Medium (0.5) |
| Compliance | Governance, policies | Low (0.3) |
| Quality | Testing, documentation | Low (0.4) |

**Decision Types:**
- `AUTO_APPROVE` - Safe to auto-implement
- `HUMAN_APPROVE` - Approved, needs human
- `DEFER` - Needs more discussion
- `REJECT` - Do not implement
- `ESCALATE` - Needs higher authority

### 4. Phase Validator (`phase_validator.py`)

Ensures all phases have required components.

**Validation Levels:**
| Level | Description |
|-------|-------------|
| BASIC | Existence checks only |
| STANDARD | + Functionality tests |
| THOROUGH | + Integration tests |
| COMPREHENSIVE | + Chaos/edge cases |

**Special Attention: Phase 5**
Phase 5 (Agent Bootstrapping) receives extra validation as the current focus.

### 5. Error Injector (`error_injector.py`)

Controlled fault injection for testing oversight.

**Predefined Scenarios:**
| Scenario | Type | Target |
|----------|------|--------|
| missing_config | MISSING_FILE | agents/tier0-agent/config/agent.json |
| corrupted_status | INVALID_STATUS | checkpoint/STATUS.md |
| stale_checkpoint | STATE_INCONSISTENCY | checkpoint/storage/ |
| redis_key_missing | DEPENDENCY_FAILURE | oversight:watcher |
| violation_unacked | SECURITY_VIOLATION | violations table |
| blocked_directory | STATE_INCONSISTENCY | preflight/ |

**Safe Mode:** By default, injections are simulated without modifying files.

### 6. Reporter (`reporter.py`)

Comprehensive reporting with actionable insights.

**Report Sections:**
- Executive Summary
- Phase Status Matrix
- Bug Watcher Summary
- Suggestion Engine Summary
- Council Decisions
- Injection Test Results
- Pending Actions
- Critical Issues

## CLI Usage

```bash
# Full pipeline execution
oversight run

# With injection tests
oversight run --inject

# Focus on specific phase
oversight run --phase 5

# Quick validation
oversight quick

# Validate specific phase in detail
oversight validate --phase 5

# Generate report only
oversight report

# Show phase matrix
oversight matrix

# JSON output
oversight run --json
```

## Integration with Checkpoints/Status/Memory

### Checkpoints
- Watcher checks checkpoint consistency and staleness
- Suggestions reference recent checkpoint context
- Reports include checkpoint link

### STATUS.md Files
- Watcher scans STATUS.md for BLOCKED states
- Phase validator checks STATUS.md existence
- Reports track per-directory status

### Memory Layer
- Suggestion engine queries memory for related entries
- Context gathered from summaries directory
- Report counts available memory entries

## Running Tests

### Injection Test Suite

```bash
# Run all injection scenarios
oversight run --inject

# Or use injector directly
cd /opt/agent-governance/testing/oversight
python -m testing.oversight.error_injector test-all
```

### Expected Results

A healthy system should:
1. Detect all injected errors (100% detection rate)
2. Generate relevant suggestions (accurate quality)
3. Produce council decisions for each suggestion
4. Pass all injection tests

## Extending the Pipeline

### Adding a New Anomaly Type

1. Add to `AnomalyType` enum in `bug_watcher.py`
2. Add detection logic in `_run_phase_specific_checks()`
3. Add fix patterns in `SuggestionEngine.FIX_PATTERNS`

### Adding a New Council Reviewer

1. Add role to `ReviewerRole` enum in `council.py`
2. Create `ReviewerProfile` in `REVIEWERS` dict
3. Implement `_<role>_review()` method

### Adding a New Injection Scenario

1. Add to `SCENARIOS` dict in `error_injector.py`
2. Implement injection/cleanup in `_perform_injection()`

## File Structure

```
testing/oversight/
├── __init__.py           # Package exports
├── pipeline.py           # Main orchestrator
├── bug_watcher.py        # Real-time anomaly detection
├── suggestion_engine.py  # Fix recommendations
├── council.py            # Multi-agent review
├── phase_validator.py    # Phase coverage
├── error_injector.py     # Fault injection
├── reporter.py           # Report generation
├── README.md             # This file
└── reports/              # Generated reports
```

## Example Report

```
# Architectural Test Pipeline Report

**Generated:** 2026-01-23T12:00:00Z
**Report ID:** rpt-20260123-120000

## Executive Summary

- **Phases Validated:** 12
- **Average Coverage:** 75.3%
- **Total Anomalies:** 8
- **Critical Gaps:** 2

## Phase Status Matrix

| Phase | Name | Status | Coverage | Bugs |
|-------|------|--------|----------|------|
| 1 | Foundation | ✅ complete | 95.0% | 0 |
| 5 | Agent Bootstrapping | 🚧 in_progress | 80.0% | 2 |
| 8 | Production Hardening | ❌ blocked | 40.0% | 3 |

...
```

## Troubleshooting

### Pipeline Fails to Start
- Verify DragonflyDB is running: `redis-cli -p 6379 -a governance2026 PING`
- Check Vault status: `docker exec vault vault status`

### No Anomalies Detected
- Ensure STATUS.md files exist in directories
- Check checkpoint storage has recent entries

### Injection Tests Fail
- Verify safe mode is enabled (default)
- Check file permissions in target directories

## Related Documentation

- [CONTEXT_MANAGEMENT.md](../../docs/CONTEXT_MANAGEMENT.md) - Checkpoints and STATUS
- [MEMORY_LAYER.md](../../docs/MEMORY_LAYER.md) - External memory
- [STATUS_PROTOCOL.md](../../docs/STATUS_PROTOCOL.md) - Directory status protocol