History

profit 92d3602852 Add 17 missing governance tests - coverage 57.6% → 70.2%

Phase 1 (Foundation): 62.5% → 100%
- test_ledger_connection.py
- test_vault_status.py
- test_audit_logging.py

Phase 3 (Execution): 70% → 100%
- test_preflight_gate.py
- test_wrapper_enforcement.py
- test_evidence_collection.py

Phase 4 (Promotion): 57.1% → 100%
- test_promotion_logic.py
- test_revocation_triggers.py
- test_monitor_daemon.py

Phase 5 (Bootstrapping): 60% → 100%
- test_checkpoint_create_load.py
- test_tier0_agent_constraints.py
- test_orchestrator_delegation.py
- test_context_preservation.py

All 8 critical gaps now resolved.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-23 22:22:26 -05:00

reports

Add 17 missing governance tests - coverage 57.6% → 70.2%

2026-01-23 22:22:26 -05:00

__init__.py

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

bug_watcher.py

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

council.py

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

error_injector.py

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

phase_validator.py

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

pipeline.py

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

README.md

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

reporter.py

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

suggestion_engine.py

Initial commit: Agent Governance System Phase 8

2026-01-23 22:07:06 -05:00

README.md

Architectural Test Pipeline

Multi-layer oversight system ensuring no single hidden bug can compromise the Agent Governance System.

Overview

The Architectural Test Pipeline provides continuous validation across all 12 phases through multiple oversight layers that monitor, analyze, review, and report on system health.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                    ARCHITECTURAL TEST PIPELINE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐         │
│  │ Bug Window      │───▶│ Suggestion      │───▶│ Council         │         │
│  │ Watcher         │    │ Engine          │    │ Review          │         │
│  │                 │    │                 │    │                 │         │
│  │ • Real-time     │    │ • Context-aware │    │ • Safety        │         │
│  │ • All phases    │    │ • Risk-ranked   │    │ • Performance   │         │
│  │ • Anomalies     │    │ • Auto-fixable  │    │ • Architecture  │         │
│  └────────┬────────┘    └────────┬────────┘    │ • Compliance    │         │
│           │                      │             │ • Quality       │         │
│           │                      │             └────────┬────────┘         │
│           │                      │                      │                  │
│           ▼                      ▼                      ▼                  │
│  ┌─────────────────────────────────────────────────────────────┐           │
│  │                    Phase Validator                          │           │
│  │  Phase 1 ✅ │ Phase 2 ✅ │ Phase 3 ✅ │ Phase 4 ✅ │ ... │           │
│  │  Phase 5 ⭐ │ Phase 6 ✅ │ Phase 7 ✅ │ Phase 8 🚧 │ ... │           │
│  └─────────────────────────────────────────────────────────────┘           │
│                              │                                              │
│                              ▼                                              │
│  ┌─────────────────┐    ┌─────────────────┐                               │
│  │ Error Injector  │    │ Reporter        │                               │
│  │                 │    │                 │                               │
│  │ • Safe mode     │    │ • Markdown      │                               │
│  │ • Scenarios     │    │ • Per-phase     │                               │
│  │ • Validation    │    │ • Actions       │                               │
│  └─────────────────┘    └─────────────────┘                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Oversight Layers

1. Bug Window Watcher (`bug_watcher.py`)

Real-time monitoring of every pipeline stage.

Features:

Monitors all 12 phases continuously
Detects anomalies: errors, regressions, missing artifacts, state inconsistencies
Links findings to phase, directory, STATUS.md, and checkpoint entries
Persists to DragonflyDB for cross-session tracking

Anomaly Types:

Type	Description	Severity Range
UNHANDLED_ERROR	Uncaught exceptions	Medium-Critical
REGRESSION	Behavior change from baseline	High
MISSING_ARTIFACT	Required file/config missing	Low-High
STATE_INCONSISTENCY	Status mismatch	Medium
DEPENDENCY_UNAVAILABLE	Vault/Dragonfly/Ledger down	Critical
SECURITY_VIOLATION	Unacknowledged violation	Critical

2. Suggestion Engine (`suggestion_engine.py`)

AI-driven analysis using historical context.

Features:

Gathers context from checkpoints, memory, STATUS files
Pattern-based suggestions from known fixes
Context-aware suggestions from historical outcomes
Risk/impact ranking for prioritization

Suggestion Ranking:

Priority Score = Impact × (1 - Risk)

Impact Levels: transformative (1.0) > high (0.8) > medium (0.6) > low (0.4)
Risk Levels: critical (0.2) < high (0.4) < medium (0.6) < low (0.8)

3. Council Review (`council.py`)

Multi-perspective review with 5 specialized reviewers.

Reviewers:

Role	Focus	Risk Tolerance
Safety	Security, access control	Very Low (0.2)
Performance	Latency, throughput	Medium (0.6)
Architecture	Design, maintainability	Medium (0.5)
Compliance	Governance, policies	Low (0.3)
Quality	Testing, documentation	Low (0.4)

Decision Types:

AUTO_APPROVE - Safe to auto-implement
HUMAN_APPROVE - Approved, needs human
DEFER - Needs more discussion
REJECT - Do not implement
ESCALATE - Needs higher authority

4. Phase Validator (`phase_validator.py`)

Ensures all phases have required components.

Validation Levels:

Level	Description
BASIC	Existence checks only
STANDARD	+ Functionality tests
THOROUGH	+ Integration tests
COMPREHENSIVE	+ Chaos/edge cases

Special Attention: Phase 5 Phase 5 (Agent Bootstrapping) receives extra validation as the current focus.

5. Error Injector (`error_injector.py`)

Controlled fault injection for testing oversight.

Predefined Scenarios:

Scenario	Type	Target
missing_config	MISSING_FILE	agents/tier0-agent/config/agent.json
corrupted_status	INVALID_STATUS	checkpoint/STATUS.md
stale_checkpoint	STATE_INCONSISTENCY	checkpoint/storage/
redis_key_missing	DEPENDENCY_FAILURE	oversight:watcher
violation_unacked	SECURITY_VIOLATION	violations table
blocked_directory	STATE_INCONSISTENCY	preflight/

Safe Mode: By default, injections are simulated without modifying files.

6. Reporter (`reporter.py`)

Comprehensive reporting with actionable insights.

Report Sections:

Executive Summary
Phase Status Matrix
Bug Watcher Summary
Suggestion Engine Summary
Council Decisions
Injection Test Results
Pending Actions
Critical Issues

CLI Usage

# Full pipeline execution
oversight run

# With injection tests
oversight run --inject

# Focus on specific phase
oversight run --phase 5

# Quick validation
oversight quick

# Validate specific phase in detail
oversight validate --phase 5

# Generate report only
oversight report

# Show phase matrix
oversight matrix

# JSON output
oversight run --json

Integration with Checkpoints/Status/Memory

Checkpoints

Watcher checks checkpoint consistency and staleness
Suggestions reference recent checkpoint context
Reports include checkpoint link

STATUS.md Files

Watcher scans STATUS.md for BLOCKED states
Phase validator checks STATUS.md existence
Reports track per-directory status

Memory Layer

Suggestion engine queries memory for related entries
Context gathered from summaries directory
Report counts available memory entries

Running Tests

Injection Test Suite

# Run all injection scenarios
oversight run --inject

# Or use injector directly
cd /opt/agent-governance/testing/oversight
python -m testing.oversight.error_injector test-all

Expected Results

A healthy system should:

Detect all injected errors (100% detection rate)
Generate relevant suggestions (accurate quality)
Produce council decisions for each suggestion
Pass all injection tests

Extending the Pipeline

Adding a New Anomaly Type

Add to AnomalyType enum in bug_watcher.py
Add detection logic in _run_phase_specific_checks()
Add fix patterns in SuggestionEngine.FIX_PATTERNS

Adding a New Council Reviewer

Add role to ReviewerRole enum in council.py
Create ReviewerProfile in REVIEWERS dict
Implement _<role>_review() method

Adding a New Injection Scenario

Add to SCENARIOS dict in error_injector.py
Implement injection/cleanup in _perform_injection()

File Structure

testing/oversight/
├── __init__.py           # Package exports
├── pipeline.py           # Main orchestrator
├── bug_watcher.py        # Real-time anomaly detection
├── suggestion_engine.py  # Fix recommendations
├── council.py            # Multi-agent review
├── phase_validator.py    # Phase coverage
├── error_injector.py     # Fault injection
├── reporter.py           # Report generation
├── README.md             # This file
└── reports/              # Generated reports

Example Report

# Architectural Test Pipeline Report

**Generated:** 2026-01-23T12:00:00Z
**Report ID:** rpt-20260123-120000

## Executive Summary

- **Phases Validated:** 12
- **Average Coverage:** 75.3%
- **Total Anomalies:** 8
- **Critical Gaps:** 2

## Phase Status Matrix

| Phase | Name | Status | Coverage | Bugs |
|-------|------|--------|----------|------|
| 1 | Foundation | ✅ complete | 95.0% | 0 |
| 5 | Agent Bootstrapping | 🚧 in_progress | 80.0% | 2 |
| 8 | Production Hardening | ❌ blocked | 40.0% | 3 |

...

Troubleshooting

Pipeline Fails to Start

Verify DragonflyDB is running: redis-cli -p 6379 -a governance2026 PING
Check Vault status: docker exec vault vault status

No Anomalies Detected

Ensure STATUS.md files exist in directories
Check checkpoint storage has recent entries

Injection Tests Fail

Verify safe mode is enabled (default)
Check file permissions in target directories

CONTEXT_MANAGEMENT.md - Checkpoints and STATUS
MEMORY_LAYER.md - External memory
STATUS_PROTOCOL.md - Directory status protocol

README.md Unescape Escape

Architectural Test Pipeline

Overview

Architecture

Oversight Layers

1. Bug Window Watcher (bug_watcher.py)

2. Suggestion Engine (suggestion_engine.py)

3. Council Review (council.py)

4. Phase Validator (phase_validator.py)

5. Error Injector (error_injector.py)

6. Reporter (reporter.py)

CLI Usage

Integration with Checkpoints/Status/Memory

Checkpoints

STATUS.md Files

Memory Layer

Running Tests

Injection Test Suite

Expected Results

Extending the Pipeline

Adding a New Anomaly Type

Adding a New Council Reviewer

Adding a New Injection Scenario

File Structure

Example Report

Troubleshooting

Pipeline Fails to Start

No Anomalies Detected

Injection Tests Fail

Related Documentation

README.md

1. Bug Window Watcher (`bug_watcher.py`)

2. Suggestion Engine (`suggestion_engine.py`)

3. Council Review (`council.py`)

4. Phase Validator (`phase_validator.py`)

5. Error Injector (`error_injector.py`)

6. Reporter (`reporter.py`)