Major additions: - marketplace/: Agent template registry with FTS5 search, ratings, versioning - observability/: Prometheus metrics, distributed tracing, structured logging - ledger/migrations/: Database migration scripts for multi-tenant support - tests/governance/: 15 new test files for phases 6-12 (295 total tests) - bin/validate-phases: Full 12-phase validation script New features: - Multi-tenant support with tenant isolation and quota enforcement - Agent marketplace with semantic versioning and search - Observability with metrics, tracing, and log correlation - Tier-1 agent bootstrap scripts Updated components: - ledger/api.py: Extended API for tenants, marketplace, observability - ledger/schema.sql: Added tenant, project, marketplace tables - testing/framework.ts: Enhanced test framework - checkpoint/checkpoint.py: Improved checkpoint management Archived: - External integrations (Slack/GitHub/PagerDuty) moved to .archive/ - Old checkpoint files cleaned up Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
158 lines
4.4 KiB
Markdown
158 lines
4.4 KiB
Markdown
# Testing
|
|
|
|
> Test utilities, mocks, and architectural oversight for the Agent Governance System
|
|
|
|
## Overview
|
|
|
|
This directory provides two complementary testing systems:
|
|
|
|
1. **framework.ts** - TypeScript testing framework with mocks for agent development
|
|
2. **oversight/** - Python architectural test pipeline for continuous system validation
|
|
|
|
## Quick Start
|
|
|
|
### TypeScript Framework
|
|
|
|
```bash
|
|
cd /opt/agent-governance/testing
|
|
|
|
# Default: REAL mode - requires Vault and DragonflyDB
|
|
bun run framework.ts
|
|
|
|
# Explicit mock mode - uses mocks with clear warnings
|
|
bun run framework.ts --use-mocks
|
|
|
|
# Validate services without running tests
|
|
bun run framework.ts --validate-only
|
|
|
|
# Hybrid mode - real where available, mocks otherwise
|
|
bun run framework.ts --hybrid
|
|
```
|
|
|
|
**Important:** Tests fail by default if real services are unavailable. Use `--use-mocks` to explicitly enable mock mode.
|
|
|
|
### Python Oversight Pipeline
|
|
|
|
```bash
|
|
# Run full validation
|
|
cd /opt/agent-governance
|
|
python3 -c "from testing.oversight import ArchitecturalTestPipeline; print(ArchitecturalTestPipeline().run())"
|
|
|
|
# Quick validation
|
|
python3 -c "from testing.oversight import ArchitecturalTestPipeline; print(ArchitecturalTestPipeline().run_quick_validation())"
|
|
```
|
|
|
|
## Components
|
|
|
|
### framework.ts (1158 lines)
|
|
|
|
TypeScript testing framework with Bun-native test support and explicit mock control.
|
|
|
|
| Class | Description |
|
|
|-------|-------------|
|
|
| `MockVault` | Simulates HashiCorp Vault (secrets, tokens, policies) |
|
|
| `MockDragonfly` | Simulates DragonflyDB (strings, hashes, lists, pub/sub) |
|
|
| `RealDragonfly` | Real DragonflyDB client for integration tests |
|
|
| `MockLLM` | Simulates LLM responses with latency/failure injection |
|
|
| `TestHarness` | Runs test scenarios with mode awareness |
|
|
| `TestContext` | Shared context tracking mock usage |
|
|
|
|
| Function | Description |
|
|
|----------|-------------|
|
|
| `validateServices()` | Check Vault, DragonflyDB, required files |
|
|
| `createTestContext()` | Create context, fails if REAL mode + services unavailable |
|
|
|
|
| Mode | Behavior |
|
|
|------|----------|
|
|
| `REAL` (default) | Fails if services unavailable |
|
|
| `MOCK` (`--use-mocks`) | Uses mocks with clear warnings |
|
|
| `HYBRID` (`--hybrid`) | Real where available, mocks otherwise |
|
|
|
|
**Pre-built Scenarios:**
|
|
- `happyPath` - Agent completes successfully
|
|
- `errorBudgetExceeded` - Agent revoked on errors
|
|
- `stuckDetection` - GAMMA spawn when stuck
|
|
- `conflictResolution` - Multi-proposal conflict
|
|
|
|
### oversight/ (~4000 lines)
|
|
|
|
Python architectural test pipeline for multi-layer oversight.
|
|
|
|
| Module | Lines | Description |
|
|
|--------|-------|-------------|
|
|
| `pipeline.py` | 476 | Main orchestrator |
|
|
| `bug_watcher.py` | 713 | Real-time anomaly detection |
|
|
| `suggestion_engine.py` | 656 | AI-driven fix recommendations |
|
|
| `council.py` | 648 | Multi-agent decision making |
|
|
| `phase_validator.py` | 640 | Phase coverage validation |
|
|
| `error_injector.py` | 576 | Controlled fault injection |
|
|
| `reporter.py` | 455 | Comprehensive reporting |
|
|
|
|
See [oversight/README.md](./oversight/README.md) for detailed documentation.
|
|
|
|
## Usage Examples
|
|
|
|
### Creating Test Context
|
|
|
|
```typescript
|
|
import { createTestContext, generateInstructionPacket } from './framework';
|
|
|
|
const ctx = createTestContext();
|
|
const packet = generateInstructionPacket('task-1', 'agent-1', 'Test objective');
|
|
|
|
// Set up mock responses
|
|
ctx.mockLLM.setResponse('plan', '{"confidence": 0.9}');
|
|
ctx.mockVault.setSecret('test/key', { value: 'secret' });
|
|
await ctx.mockDragonfly.set('key', 'value');
|
|
```
|
|
|
|
### Running Oversight Pipeline
|
|
|
|
```python
|
|
from testing.oversight import ArchitecturalTestPipeline
|
|
|
|
pipeline = ArchitecturalTestPipeline()
|
|
|
|
# Full validation
|
|
report = pipeline.run()
|
|
|
|
# Validate specific phase
|
|
result = pipeline.validate_phase(5) # Phase 5: Agent Bootstrapping
|
|
|
|
# Quick status check
|
|
status = pipeline.get_status()
|
|
```
|
|
|
|
### Error Injection Testing
|
|
|
|
```python
|
|
from testing.oversight import ErrorInjector
|
|
|
|
injector = ErrorInjector(safe_mode=True) # Won't modify files
|
|
injector.inject('missing_config')
|
|
# ... run tests ...
|
|
injector.cleanup()
|
|
```
|
|
|
|
## Test Results
|
|
|
|
| Suite | Passed | Failed | Coverage |
|
|
|-------|--------|--------|----------|
|
|
| framework.ts | 4 | 0 | 100% |
|
|
| oversight imports | 7 | 0 | 100% |
|
|
|
|
## Status
|
|
|
|
**COMPLETE**
|
|
|
|
See [STATUS.md](./STATUS.md) for detailed progress tracking.
|
|
|
|
## Architecture Reference
|
|
|
|
Part of the [Agent Governance System](/opt/agent-governance/docs/ARCHITECTURE.md).
|
|
|
|
Parent: [Project Root](/opt/agent-governance)
|
|
|
|
---
|
|
*Last updated: 2026-01-24*
|