Phase 8 Production Hardening with complete governance infrastructure: - Vault integration with tiered policies (T0-T4) - DragonflyDB state management - SQLite audit ledger - Pipeline DSL and templates - Promotion/revocation engine - Checkpoint system for session persistence - Health manager and circuit breaker for fault tolerance - GitHub/Slack integrations - Architectural test pipeline with bug watcher, suggestion engine, council review - Multi-agent chaos testing framework Test Results: - Governance tests: 68/68 passing - E2E workflow: 16/16 passing - Phase 2 Vault: 14/14 passing - Integration tests: 27/27 passing Coverage: 57.6% average across 12 phases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
8.5 KiB
8.5 KiB
External Memory Layer
Token-efficient persistent storage for large outputs, transcripts, and context.
Overview
The External Memory Layer provides a system for storing and retrieving large content outside the token window. Instead of including full outputs in prompts, agents store content in memory and work with summaries + retrieval references.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Token Window │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Checkpoint │ │ STATUS │ │ Memory References │ │
│ │ Summary │ │ Summaries │ │ [ID] summary (tokens) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ External Memory Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ SQLite │ │ Chunks │ │ DragonflyDB │ │
│ │ (metadata) │ │ (files) │ │ (hot cache, opt) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Token Thresholds
| Content Size | Storage Strategy |
|---|---|
| < 500 tokens | Stored inline in database |
| 500-4000 tokens | Stored in compressed file + summary |
| > 4000 tokens | Auto-chunked (multiple files) + parent summary |
CLI Commands
Store Content
# Store inline content
memory log "Test results: all 42 tests passed"
# Store from file
memory log --file /path/to/large-output.txt --tag "test-results"
# Store from stdin (common pattern)
pytest tests/ 2>&1 | memory log --stdin --tag "pytest" --checkpoint ckpt-xxx
# Store with directory linkage
memory log --file output.txt --directory ./pipeline --tag "validation"
Retrieve Content
# Get full entry (includes content if small, or loads from file)
memory fetch mem-20260123-123456-abcd1234
# Get just the summary (token-efficient)
memory fetch mem-20260123-123456-abcd1234 --summary-only
# Get specific chunk (for large entries)
memory fetch mem-20260123-123456-abcd1234 --chunk 2
List and Search
# List recent entries
memory list --limit 10
# Filter by type
memory list --type output --limit 20
# Filter by directory
memory list --directory ./tests
# Search content
memory search "error" --limit 5
Memory References
# Get references linked to a checkpoint
memory refs --checkpoint ckpt-20260123-123456
# Get references for a directory
memory refs --directory ./pipeline
Maintenance
# Show statistics
memory stats
# Prune old entries
memory prune --keep-days 7 --keep-entries 500
Integration with Checkpoint
When checkpoint now runs:
- Collects references to recent memory entries
- Includes memory summary (counts, total tokens)
- Stores lightweight refs instead of full content
# Checkpoint includes memory refs
checkpoint now --notes "After test run"
# View memory info in checkpoint report
checkpoint report
# Shows:
# [MEMORY REFERENCES]
# mem-xxx: pytest results (12000 tokens)
# mem-yyy: build output (3200 tokens)
Integration with STATUS
STATUS.md files can include memory pointers for detailed context:
## Context References
- Test Results: `mem-20260123-123456-abcd` (12000 tokens)
- Build Log: `mem-20260123-123457-efgh` (3200 tokens)
Use `memory fetch <id>` to retrieve full content.
Agent Guidelines
When to Use Memory
- Large outputs - If output would exceed ~500 tokens, store it
- Test results - Store full test output, reference summary
- Build logs - Store full log, include just errors inline
- Generated code - Store in memory, reference in plan
Pattern: Store and Reference
# Instead of including large output in response:
# "Here are all 500 lines of test output: ..."
# Do this:
# 1. Store the output
result = subprocess.run(["pytest"], capture_output=True)
# memory log --stdin <<< result.stdout
# 2. Reference it
# "Test completed. Full output stored in mem-xxx (2400 tokens).
# Summary: 42 passed, 3 failed. Failed tests: test_auth, test_db, test_cache"
Pattern: Chunk Retrieval
For very large content (>4000 tokens), memory auto-chunks:
# Store 50KB log file
memory log --file build.log
# Output: ID: mem-xxx, Chunks: 12
# Retrieve specific chunk
memory fetch mem-xxx --chunk 5
# Or get just the summary
memory fetch mem-xxx --summary-only
Reset/Recovery Workflow
After a context reset or session restart:
Step 1: Load Checkpoint
checkpoint load
# Shows: phase, dependencies, memory refs, status summary
Step 2: Check Memory References
# See what's in memory
memory refs --checkpoint ckpt-latest
# Output:
# mem-abc: pytest results (12000 tokens)
# mem-def: deployment log (8000 tokens)
Step 3: Fetch Needed Context
# Get summary of test results
memory fetch mem-abc --summary-only
# "42 tests: 40 passed, 2 failed (test_auth, test_db)"
# If needed, get specific chunk
memory fetch mem-abc --chunk 0 # First chunk with failures
Step 4: Resume Work
# Check directory status
cat ./tests/STATUS.md
# Shows current phase, pending tasks, memory refs
# Continue where you left off
status update ./tests --task "Fixing test_auth failure"
Memory Entry Types
| Type | Purpose | Example |
|---|---|---|
transcript |
Full conversation logs | Chat history |
output |
Command/tool outputs | Test results, build logs |
summary |
Generated summaries | Checkpoint summaries |
context |
Saved context state | Variables, environment |
chunk |
Part of larger entry | Auto-generated |
Storage Details
SQLite Database (memory/memory.db)
- Entry metadata (ID, type, timestamps)
- Content for small entries
- Summaries
- Links to checkpoints/directories
- Tags for searching
Chunk Files (memory/chunks/)
- Gzip-compressed content
- Named by entry ID
- Auto-pruned after 30 days
DragonflyDB (optional)
- Hot cache for recent entries
- 1-hour TTL
- Faster retrieval for active work
Best Practices
- Store proactively - Don't wait for context overflow
- Tag consistently - Use meaningful tags for search
- Link to context - Connect to checkpoints and directories
- Use summaries - Fetch summary first, full content only if needed
- Prune regularly - Keep memory lean with periodic pruning
Example Session
# 1. Run tests, store output
pytest tests/ 2>&1 | memory log --stdin --tag "pytest" --tag "integration"
# Stored: mem-20260123-100000-abcd (8500 tokens, 3 chunks)
# 2. Create checkpoint with memory ref
checkpoint now --notes "Integration tests complete"
# 3. Later, after context reset, recover
checkpoint load
# Phase: Testing, Memory: 1 entry (8500 tokens)
memory fetch mem-20260123-100000-abcd --summary-only
# "Integration tests: 156 passed, 2 failed
# Failures: test_oauth_flow (line 234), test_rate_limit (line 567)"
# 4. Get specific failure details
memory fetch mem-20260123-100000-abcd --chunk 1
# (Shows chunk containing the failures)
# 5. Continue work
status update ./tests --task "Fixing test_oauth_flow"
Part of the Agent Governance System