profit 77655c298c Initial commit: Agent Governance System Phase 8

Phase 8 Production Hardening with complete governance infrastructure:

- Vault integration with tiered policies (T0-T4)
- DragonflyDB state management
- SQLite audit ledger
- Pipeline DSL and templates
- Promotion/revocation engine
- Checkpoint system for session persistence
- Health manager and circuit breaker for fault tolerance
- GitHub/Slack integrations
- Architectural test pipeline with bug watcher, suggestion engine, council review
- Multi-agent chaos testing framework

Test Results:
- Governance tests: 68/68 passing
- E2E workflow: 16/16 passing
- Phase 2 Vault: 14/14 passing
- Integration tests: 27/27 passing

Coverage: 57.6% average across 12 phases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-23 22:07:06 -05:00

7.4 KiB

Raw Blame History

Directory-Level README/STATUS Protocol

Guidelines for maintaining documentation and status tracking across the Agent Governance System.

Overview

Every significant subdirectory in the project should contain two files:

File	Purpose
`README.md`	Overview, purpose, key files, architecture references
`STATUS.md`	Ongoing status log with phase, tasks, timestamps, dependencies

This protocol ensures:

Consistent documentation across the codebase
Clear status visibility for humans and agents
Traceable progress through the project lifecycle

File Templates

README.md Structure

# Directory Name

> One-line purpose description

## Overview
Brief explanation of what this directory contains.

## Key Files
| File | Description |
|------|-------------|
| `main.py` | Primary module |
| `config.yaml` | Configuration |

## Interfaces / APIs
Document any CLI commands, APIs, or interfaces.

## Status
**✅ COMPLETE** (or 🚧 IN_PROGRESS, ❗ BLOCKED, etc.)

See [STATUS.md](./STATUS.md) for detailed progress.

## Architecture Reference
Part of the [Agent Governance System](/opt/agent-governance/docs/ARCHITECTURE.md).
Parent: [Parent Directory](../)

STATUS.md Structure

# Status: Directory Name

## Current Phase
**🚧 IN_PROGRESS**

## Tasks
| Status | Task | Updated |
|--------|------|---------|
| ✓ | Initial implementation | 2026-01-20 |
| ☐ | Add tests | - |

## Dependencies
- Requires `pipeline/core.py`
- Depends on DragonflyDB service

## Issues / Blockers
- Waiting on schema finalization

## Activity Log
### 2026-01-23 10:30:00 UTC
- **Phase**: IN_PROGRESS
- **Action**: Added chaos testing
- **Details**: Implemented error injection framework

Status Phases

Phase	Icon	Description
`complete`	✅	Work is finished and verified
`in_progress`	🚧	Active development underway
`blocked`	❗	Waiting on external dependencies
`needs_review`	⚠️	Requires attention or review
`not_started`	⬜	No work has begun

CLI Commands

The status command provides management tools:

# Check all directories for missing/outdated files
status sweep

# Auto-create missing README/STATUS files
status sweep --fix

# Update a directory's status
status update ./pipeline --phase complete
status update ./tests --phase in_progress --task "Add integration tests"

# Initialize files for a new directory
status init ./new-module

# Show project-wide dashboard
status dashboard

# View templates
status template readme
status template status

Agent Workflow Integration

When Entering a Directory

Agents should:

Read README.md to understand the directory's purpose
Read STATUS.md to see current progress and any blockers
Check dependencies before starting work

While Working

Agents should:

Update STATUS.md when starting a significant task
Log activity entries for notable changes
Update task checkboxes as work completes

Before Leaving a Directory

Agents should:

Update STATUS.md with final state
Mark completed tasks as done
Add any new issues discovered
Update the timestamp

Example Agent Workflow

# At start of work
async def enter_directory(dir_path: str):
    # Read context
    readme = await read_file(f"{dir_path}/README.md")
    status = await read_file(f"{dir_path}/STATUS.md")

    # Log entry
    logger.info(f"Entering {dir_path}, phase: {parse_phase(status)}")

# During work
async def log_progress(dir_path: str, task: str):
    await run_command(f"status update {dir_path} --task '{task}'")

# At end of work
async def exit_directory(dir_path: str, phase: str):
    await run_command(f"status update {dir_path} --phase {phase}")

Diagnostics

Finding Incomplete Directories

# Show all non-complete directories
status dashboard

# Example output:
# ❗ BLOCKED:
#     integrations/slack (updated 3d ago)
# 🚧 IN_PROGRESS:
#     tests/multi-agent-chaos (updated today)
#     pipeline (updated 1d ago)

Automated Sweeps

Run regular sweeps to catch missing documentation:

# Quick check
status sweep

# Auto-fix missing files
status sweep --fix

Checkpoint Integration

The status system is integrated with the checkpoint system for unified state management.

How They Work Together

When checkpoint now runs: The checkpoint captures an aggregate snapshot of all directory statuses, including:
- Status summary (counts by phase)
- List of all directories with their current phase
- Timestamps and task counts
When status update runs: A lightweight checkpoint delta is automatically created to record the change (unless --no-checkpoint is specified).
Unified reporting: Use checkpoint report to see both checkpoint metadata and directory statuses side-by-side.

Key Commands

# Create checkpoint with directory status snapshot
checkpoint now --notes "Completed pipeline work"

# View combined checkpoint + status report
checkpoint report

# View timeline of checkpoints with status changes
checkpoint timeline

# Update status (auto-creates checkpoint delta)
status update ./pipeline --phase complete

# Update status without checkpoint
status update ./tests --phase in_progress --no-checkpoint

Recovery After Reset

After a context reset or session restart:

Load the latest checkpoint:
```
checkpoint load
```

Inspect per-directory statuses:

checkpoint report
# or
status dashboard

Resume work by finding active directories:
```
checkpoint report --phase in_progress
```
Read local STATUS.md in the target directory for task-level context.

Example: Full Recovery Workflow

# 1. Load checkpoint to get context
checkpoint load
# Shows: Phase, Dependencies, Status Summary

# 2. See what's active
checkpoint report --phase in_progress
# Lists: Active directories with their progress

# 3. Pick up work in a directory
cd /opt/agent-governance/pipeline
cat STATUS.md
# Read activity log, pending tasks

# 4. Continue work and update status
status update . --task "Resumed: adding validation" --phase in_progress
# Auto-creates checkpoint delta

# 5. When done, mark complete
status update . --phase complete
# Creates final checkpoint

Directory Exclusions

The following directories are excluded from status tracking:

__pycache__ - Python cache
node_modules - Node.js dependencies
.git - Version control
logs - Runtime logs
storage - Generated data
credentials - Sensitive data
workspace - Agent workspaces
dragonfly-data - Database files

Best Practices

Keep README.md Stable: Update only when structure changes
Update STATUS.md Frequently: Log all significant activity
Use Clear Task Descriptions: Be specific about what needs doing
Document Blockers: Help others understand dependencies
Timestamp Everything: Enable tracking of staleness
Link to Architecture: Reference parent docs for context

Integration with Checkpoints

The status system integrates with the checkpoint system:

# After major changes, create checkpoint
checkpoint now --notes "Updated pipeline status to complete"

# Status can inform checkpoint decisions
status dashboard  # See what's in progress
checkpoint now    # Save current state

Part of the Agent Governance System

7.4 KiB Raw Blame History

Directory-Level README/STATUS Protocol

Overview

File Templates

README.md Structure

STATUS.md Structure

Status Phases

CLI Commands

Agent Workflow Integration

When Entering a Directory

While Working

Before Leaving a Directory

Example Agent Workflow

Diagnostics

Finding Incomplete Directories

Automated Sweeps

Checkpoint Integration

How They Work Together

Key Commands

Recovery After Reset

Example: Full Recovery Workflow

Directory Exclusions

Best Practices

Integration with Checkpoints

7.4 KiB

Raw Blame History