profit 77655c298c Initial commit: Agent Governance System Phase 8

Phase 8 Production Hardening with complete governance infrastructure:

- Vault integration with tiered policies (T0-T4)
- DragonflyDB state management
- SQLite audit ledger
- Pipeline DSL and templates
- Promotion/revocation engine
- Checkpoint system for session persistence
- Health manager and circuit breaker for fault tolerance
- GitHub/Slack integrations
- Architectural test pipeline with bug watcher, suggestion engine, council review
- Multi-agent chaos testing framework

Test Results:
- Governance tests: 68/68 passing
- E2E workflow: 16/16 passing
- Phase 2 Vault: 14/14 passing
- Integration tests: 27/27 passing

Coverage: 57.6% average across 12 phases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-23 22:07:06 -05:00

8.5 KiB

Raw Blame History

External Memory Layer

Token-efficient persistent storage for large outputs, transcripts, and context.

Overview

The External Memory Layer provides a system for storing and retrieving large content outside the token window. Instead of including full outputs in prompts, agents store content in memory and work with summaries + retrieval references.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Token Window                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ Checkpoint  │  │   STATUS    │  │     Memory References   │  │
│  │  Summary    │  │  Summaries  │  │  [ID] summary (tokens)  │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    External Memory Layer                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │   SQLite    │  │   Chunks    │  │       DragonflyDB       │  │
│  │  (metadata) │  │   (files)   │  │    (hot cache, opt)     │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Token Thresholds

Content Size	Storage Strategy
< 500 tokens	Stored inline in database
500-4000 tokens	Stored in compressed file + summary
> 4000 tokens	Auto-chunked (multiple files) + parent summary

CLI Commands

Store Content

# Store inline content
memory log "Test results: all 42 tests passed"

# Store from file
memory log --file /path/to/large-output.txt --tag "test-results"

# Store from stdin (common pattern)
pytest tests/ 2>&1 | memory log --stdin --tag "pytest" --checkpoint ckpt-xxx

# Store with directory linkage
memory log --file output.txt --directory ./pipeline --tag "validation"

Retrieve Content

# Get full entry (includes content if small, or loads from file)
memory fetch mem-20260123-123456-abcd1234

# Get just the summary (token-efficient)
memory fetch mem-20260123-123456-abcd1234 --summary-only

# Get specific chunk (for large entries)
memory fetch mem-20260123-123456-abcd1234 --chunk 2

List and Search

# List recent entries
memory list --limit 10

# Filter by type
memory list --type output --limit 20

# Filter by directory
memory list --directory ./tests

# Search content
memory search "error" --limit 5

Memory References

# Get references linked to a checkpoint
memory refs --checkpoint ckpt-20260123-123456

# Get references for a directory
memory refs --directory ./pipeline

Maintenance

# Show statistics
memory stats

# Prune old entries
memory prune --keep-days 7 --keep-entries 500

Integration with Checkpoint

When checkpoint now runs:

Collects references to recent memory entries
Includes memory summary (counts, total tokens)
Stores lightweight refs instead of full content

# Checkpoint includes memory refs
checkpoint now --notes "After test run"

# View memory info in checkpoint report
checkpoint report
# Shows:
# [MEMORY REFERENCES]
#   mem-xxx: pytest results (12000 tokens)
#   mem-yyy: build output (3200 tokens)

Integration with STATUS

STATUS.md files can include memory pointers for detailed context:

## Context References

- Test Results: `mem-20260123-123456-abcd` (12000 tokens)
- Build Log: `mem-20260123-123457-efgh` (3200 tokens)

Use `memory fetch <id>` to retrieve full content.

Agent Guidelines

When to Use Memory

Large outputs - If output would exceed ~500 tokens, store it
Test results - Store full test output, reference summary
Build logs - Store full log, include just errors inline
Generated code - Store in memory, reference in plan

Pattern: Store and Reference

# Instead of including large output in response:
# "Here are all 500 lines of test output: ..."

# Do this:
# 1. Store the output
result = subprocess.run(["pytest"], capture_output=True)
# memory log --stdin <<< result.stdout

# 2. Reference it
# "Test completed. Full output stored in mem-xxx (2400 tokens).
#  Summary: 42 passed, 3 failed. Failed tests: test_auth, test_db, test_cache"

Pattern: Chunk Retrieval

For very large content (>4000 tokens), memory auto-chunks:

# Store 50KB log file
memory log --file build.log
# Output: ID: mem-xxx, Chunks: 12

# Retrieve specific chunk
memory fetch mem-xxx --chunk 5

# Or get just the summary
memory fetch mem-xxx --summary-only

Reset/Recovery Workflow

After a context reset or session restart:

Step 1: Load Checkpoint

checkpoint load
# Shows: phase, dependencies, memory refs, status summary

Step 2: Check Memory References

# See what's in memory
memory refs --checkpoint ckpt-latest
# Output:
#   mem-abc: pytest results (12000 tokens)
#   mem-def: deployment log (8000 tokens)

Step 3: Fetch Needed Context

# Get summary of test results
memory fetch mem-abc --summary-only
# "42 tests: 40 passed, 2 failed (test_auth, test_db)"

# If needed, get specific chunk
memory fetch mem-abc --chunk 0  # First chunk with failures

Step 4: Resume Work

# Check directory status
cat ./tests/STATUS.md
# Shows current phase, pending tasks, memory refs

# Continue where you left off
status update ./tests --task "Fixing test_auth failure"

Memory Entry Types

Type	Purpose	Example
`transcript`	Full conversation logs	Chat history
`output`	Command/tool outputs	Test results, build logs
`summary`	Generated summaries	Checkpoint summaries
`context`	Saved context state	Variables, environment
`chunk`	Part of larger entry	Auto-generated

Storage Details

SQLite Database (`memory/memory.db`)

Entry metadata (ID, type, timestamps)
Content for small entries
Summaries
Links to checkpoints/directories
Tags for searching

Chunk Files (`memory/chunks/`)

Gzip-compressed content
Named by entry ID
Auto-pruned after 30 days

DragonflyDB (optional)

Hot cache for recent entries
1-hour TTL
Faster retrieval for active work

Best Practices

Store proactively - Don't wait for context overflow
Tag consistently - Use meaningful tags for search
Link to context - Connect to checkpoints and directories
Use summaries - Fetch summary first, full content only if needed
Prune regularly - Keep memory lean with periodic pruning

Example Session

# 1. Run tests, store output
pytest tests/ 2>&1 | memory log --stdin --tag "pytest" --tag "integration"
# Stored: mem-20260123-100000-abcd (8500 tokens, 3 chunks)

# 2. Create checkpoint with memory ref
checkpoint now --notes "Integration tests complete"

# 3. Later, after context reset, recover
checkpoint load
# Phase: Testing, Memory: 1 entry (8500 tokens)

memory fetch mem-20260123-100000-abcd --summary-only
# "Integration tests: 156 passed, 2 failed
#  Failures: test_oauth_flow (line 234), test_rate_limit (line 567)"

# 4. Get specific failure details
memory fetch mem-20260123-100000-abcd --chunk 1
# (Shows chunk containing the failures)

# 5. Continue work
status update ./tests --task "Fixing test_oauth_flow"

Part of the Agent Governance System

8.5 KiB Raw Blame History