profit 92d3602852 Add 17 missing governance tests - coverage 57.6% → 70.2%
Phase 1 (Foundation): 62.5% → 100%
- test_ledger_connection.py
- test_vault_status.py
- test_audit_logging.py

Phase 3 (Execution): 70% → 100%
- test_preflight_gate.py
- test_wrapper_enforcement.py
- test_evidence_collection.py

Phase 4 (Promotion): 57.1% → 100%
- test_promotion_logic.py
- test_revocation_triggers.py
- test_monitor_daemon.py

Phase 5 (Bootstrapping): 60% → 100%
- test_checkpoint_create_load.py
- test_tier0_agent_constraints.py
- test_orchestrator_delegation.py
- test_context_preservation.py

All 8 critical gaps now resolved.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 22:22:26 -05:00
..

Architectural Test Pipeline

Multi-layer oversight system ensuring no single hidden bug can compromise the Agent Governance System.

Overview

The Architectural Test Pipeline provides continuous validation across all 12 phases through multiple oversight layers that monitor, analyze, review, and report on system health.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                    ARCHITECTURAL TEST PIPELINE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐         │
│  │ Bug Window      │───▶│ Suggestion      │───▶│ Council         │         │
│  │ Watcher         │    │ Engine          │    │ Review          │         │
│  │                 │    │                 │    │                 │         │
│  │ • Real-time     │    │ • Context-aware │    │ • Safety        │         │
│  │ • All phases    │    │ • Risk-ranked   │    │ • Performance   │         │
│  │ • Anomalies     │    │ • Auto-fixable  │    │ • Architecture  │         │
│  └────────┬────────┘    └────────┬────────┘    │ • Compliance    │         │
│           │                      │             │ • Quality       │         │
│           │                      │             └────────┬────────┘         │
│           │                      │                      │                  │
│           ▼                      ▼                      ▼                  │
│  ┌─────────────────────────────────────────────────────────────┐           │
│  │                    Phase Validator                          │           │
│  │  Phase 1 ✅ │ Phase 2 ✅ │ Phase 3 ✅ │ Phase 4 ✅ │ ... │           │
│  │  Phase 5 ⭐ │ Phase 6 ✅ │ Phase 7 ✅ │ Phase 8 🚧 │ ... │           │
│  └─────────────────────────────────────────────────────────────┘           │
│                              │                                              │
│                              ▼                                              │
│  ┌─────────────────┐    ┌─────────────────┐                               │
│  │ Error Injector  │    │ Reporter        │                               │
│  │                 │    │                 │                               │
│  │ • Safe mode     │    │ • Markdown      │                               │
│  │ • Scenarios     │    │ • Per-phase     │                               │
│  │ • Validation    │    │ • Actions       │                               │
│  └─────────────────┘    └─────────────────┘                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Oversight Layers

1. Bug Window Watcher (bug_watcher.py)

Real-time monitoring of every pipeline stage.

Features:

  • Monitors all 12 phases continuously
  • Detects anomalies: errors, regressions, missing artifacts, state inconsistencies
  • Links findings to phase, directory, STATUS.md, and checkpoint entries
  • Persists to DragonflyDB for cross-session tracking

Anomaly Types:

Type Description Severity Range
UNHANDLED_ERROR Uncaught exceptions Medium-Critical
REGRESSION Behavior change from baseline High
MISSING_ARTIFACT Required file/config missing Low-High
STATE_INCONSISTENCY Status mismatch Medium
DEPENDENCY_UNAVAILABLE Vault/Dragonfly/Ledger down Critical
SECURITY_VIOLATION Unacknowledged violation Critical

2. Suggestion Engine (suggestion_engine.py)

AI-driven analysis using historical context.

Features:

  • Gathers context from checkpoints, memory, STATUS files
  • Pattern-based suggestions from known fixes
  • Context-aware suggestions from historical outcomes
  • Risk/impact ranking for prioritization

Suggestion Ranking:

Priority Score = Impact × (1 - Risk)

Impact Levels: transformative (1.0) > high (0.8) > medium (0.6) > low (0.4)
Risk Levels: critical (0.2) < high (0.4) < medium (0.6) < low (0.8)

3. Council Review (council.py)

Multi-perspective review with 5 specialized reviewers.

Reviewers:

Role Focus Risk Tolerance
Safety Security, access control Very Low (0.2)
Performance Latency, throughput Medium (0.6)
Architecture Design, maintainability Medium (0.5)
Compliance Governance, policies Low (0.3)
Quality Testing, documentation Low (0.4)

Decision Types:

  • AUTO_APPROVE - Safe to auto-implement
  • HUMAN_APPROVE - Approved, needs human
  • DEFER - Needs more discussion
  • REJECT - Do not implement
  • ESCALATE - Needs higher authority

4. Phase Validator (phase_validator.py)

Ensures all phases have required components.

Validation Levels:

Level Description
BASIC Existence checks only
STANDARD + Functionality tests
THOROUGH + Integration tests
COMPREHENSIVE + Chaos/edge cases

Special Attention: Phase 5 Phase 5 (Agent Bootstrapping) receives extra validation as the current focus.

5. Error Injector (error_injector.py)

Controlled fault injection for testing oversight.

Predefined Scenarios:

Scenario Type Target
missing_config MISSING_FILE agents/tier0-agent/config/agent.json
corrupted_status INVALID_STATUS checkpoint/STATUS.md
stale_checkpoint STATE_INCONSISTENCY checkpoint/storage/
redis_key_missing DEPENDENCY_FAILURE oversight:watcher
violation_unacked SECURITY_VIOLATION violations table
blocked_directory STATE_INCONSISTENCY preflight/

Safe Mode: By default, injections are simulated without modifying files.

6. Reporter (reporter.py)

Comprehensive reporting with actionable insights.

Report Sections:

  • Executive Summary
  • Phase Status Matrix
  • Bug Watcher Summary
  • Suggestion Engine Summary
  • Council Decisions
  • Injection Test Results
  • Pending Actions
  • Critical Issues

CLI Usage

# Full pipeline execution
oversight run

# With injection tests
oversight run --inject

# Focus on specific phase
oversight run --phase 5

# Quick validation
oversight quick

# Validate specific phase in detail
oversight validate --phase 5

# Generate report only
oversight report

# Show phase matrix
oversight matrix

# JSON output
oversight run --json

Integration with Checkpoints/Status/Memory

Checkpoints

  • Watcher checks checkpoint consistency and staleness
  • Suggestions reference recent checkpoint context
  • Reports include checkpoint link

STATUS.md Files

  • Watcher scans STATUS.md for BLOCKED states
  • Phase validator checks STATUS.md existence
  • Reports track per-directory status

Memory Layer

  • Suggestion engine queries memory for related entries
  • Context gathered from summaries directory
  • Report counts available memory entries

Running Tests

Injection Test Suite

# Run all injection scenarios
oversight run --inject

# Or use injector directly
cd /opt/agent-governance/testing/oversight
python -m testing.oversight.error_injector test-all

Expected Results

A healthy system should:

  1. Detect all injected errors (100% detection rate)
  2. Generate relevant suggestions (accurate quality)
  3. Produce council decisions for each suggestion
  4. Pass all injection tests

Extending the Pipeline

Adding a New Anomaly Type

  1. Add to AnomalyType enum in bug_watcher.py
  2. Add detection logic in _run_phase_specific_checks()
  3. Add fix patterns in SuggestionEngine.FIX_PATTERNS

Adding a New Council Reviewer

  1. Add role to ReviewerRole enum in council.py
  2. Create ReviewerProfile in REVIEWERS dict
  3. Implement _<role>_review() method

Adding a New Injection Scenario

  1. Add to SCENARIOS dict in error_injector.py
  2. Implement injection/cleanup in _perform_injection()

File Structure

testing/oversight/
├── __init__.py           # Package exports
├── pipeline.py           # Main orchestrator
├── bug_watcher.py        # Real-time anomaly detection
├── suggestion_engine.py  # Fix recommendations
├── council.py            # Multi-agent review
├── phase_validator.py    # Phase coverage
├── error_injector.py     # Fault injection
├── reporter.py           # Report generation
├── README.md             # This file
└── reports/              # Generated reports

Example Report

# Architectural Test Pipeline Report

**Generated:** 2026-01-23T12:00:00Z
**Report ID:** rpt-20260123-120000

## Executive Summary

- **Phases Validated:** 12
- **Average Coverage:** 75.3%
- **Total Anomalies:** 8
- **Critical Gaps:** 2

## Phase Status Matrix

| Phase | Name | Status | Coverage | Bugs |
|-------|------|--------|----------|------|
| 1 | Foundation | ✅ complete | 95.0% | 0 |
| 5 | Agent Bootstrapping | 🚧 in_progress | 80.0% | 2 |
| 8 | Production Hardening | ❌ blocked | 40.0% | 3 |

...

Troubleshooting

Pipeline Fails to Start

  • Verify DragonflyDB is running: redis-cli -p 6379 -a governance2026 PING
  • Check Vault status: docker exec vault vault status

No Anomalies Detected

  • Ensure STATUS.md files exist in directories
  • Check checkpoint storage has recent entries

Injection Tests Fail

  • Verify safe mode is enabled (default)
  • Check file permissions in target directories