Phase 8 Production Hardening with complete governance infrastructure: - Vault integration with tiered policies (T0-T4) - DragonflyDB state management - SQLite audit ledger - Pipeline DSL and templates - Promotion/revocation engine - Checkpoint system for session persistence - Health manager and circuit breaker for fault tolerance - GitHub/Slack integrations - Architectural test pipeline with bug watcher, suggestion engine, council review - Multi-agent chaos testing framework Test Results: - Governance tests: 68/68 passing - E2E workflow: 16/16 passing - Phase 2 Vault: 14/14 passing - Integration tests: 27/27 passing Coverage: 57.6% average across 12 phases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
283 lines
8.5 KiB
Markdown
283 lines
8.5 KiB
Markdown
# External Memory Layer
|
|
|
|
> Token-efficient persistent storage for large outputs, transcripts, and context.
|
|
|
|
## Overview
|
|
|
|
The External Memory Layer provides a system for storing and retrieving large content outside the token window. Instead of including full outputs in prompts, agents store content in memory and work with summaries + retrieval references.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Token Window │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
|
│ │ Checkpoint │ │ STATUS │ │ Memory References │ │
|
|
│ │ Summary │ │ Summaries │ │ [ID] summary (tokens) │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ External Memory Layer │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
|
│ │ SQLite │ │ Chunks │ │ DragonflyDB │ │
|
|
│ │ (metadata) │ │ (files) │ │ (hot cache, opt) │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Token Thresholds
|
|
|
|
| Content Size | Storage Strategy |
|
|
|-------------|------------------|
|
|
| < 500 tokens | Stored inline in database |
|
|
| 500-4000 tokens | Stored in compressed file + summary |
|
|
| > 4000 tokens | Auto-chunked (multiple files) + parent summary |
|
|
|
|
## CLI Commands
|
|
|
|
### Store Content
|
|
|
|
```bash
|
|
# Store inline content
|
|
memory log "Test results: all 42 tests passed"
|
|
|
|
# Store from file
|
|
memory log --file /path/to/large-output.txt --tag "test-results"
|
|
|
|
# Store from stdin (common pattern)
|
|
pytest tests/ 2>&1 | memory log --stdin --tag "pytest" --checkpoint ckpt-xxx
|
|
|
|
# Store with directory linkage
|
|
memory log --file output.txt --directory ./pipeline --tag "validation"
|
|
```
|
|
|
|
### Retrieve Content
|
|
|
|
```bash
|
|
# Get full entry (includes content if small, or loads from file)
|
|
memory fetch mem-20260123-123456-abcd1234
|
|
|
|
# Get just the summary (token-efficient)
|
|
memory fetch mem-20260123-123456-abcd1234 --summary-only
|
|
|
|
# Get specific chunk (for large entries)
|
|
memory fetch mem-20260123-123456-abcd1234 --chunk 2
|
|
```
|
|
|
|
### List and Search
|
|
|
|
```bash
|
|
# List recent entries
|
|
memory list --limit 10
|
|
|
|
# Filter by type
|
|
memory list --type output --limit 20
|
|
|
|
# Filter by directory
|
|
memory list --directory ./tests
|
|
|
|
# Search content
|
|
memory search "error" --limit 5
|
|
```
|
|
|
|
### Memory References
|
|
|
|
```bash
|
|
# Get references linked to a checkpoint
|
|
memory refs --checkpoint ckpt-20260123-123456
|
|
|
|
# Get references for a directory
|
|
memory refs --directory ./pipeline
|
|
```
|
|
|
|
### Maintenance
|
|
|
|
```bash
|
|
# Show statistics
|
|
memory stats
|
|
|
|
# Prune old entries
|
|
memory prune --keep-days 7 --keep-entries 500
|
|
```
|
|
|
|
## Integration with Checkpoint
|
|
|
|
When `checkpoint now` runs:
|
|
1. Collects references to recent memory entries
|
|
2. Includes memory summary (counts, total tokens)
|
|
3. Stores lightweight refs instead of full content
|
|
|
|
```bash
|
|
# Checkpoint includes memory refs
|
|
checkpoint now --notes "After test run"
|
|
|
|
# View memory info in checkpoint report
|
|
checkpoint report
|
|
# Shows:
|
|
# [MEMORY REFERENCES]
|
|
# mem-xxx: pytest results (12000 tokens)
|
|
# mem-yyy: build output (3200 tokens)
|
|
```
|
|
|
|
## Integration with STATUS
|
|
|
|
STATUS.md files can include memory pointers for detailed context:
|
|
|
|
```markdown
|
|
## Context References
|
|
|
|
- Test Results: `mem-20260123-123456-abcd` (12000 tokens)
|
|
- Build Log: `mem-20260123-123457-efgh` (3200 tokens)
|
|
|
|
Use `memory fetch <id>` to retrieve full content.
|
|
```
|
|
|
|
## Agent Guidelines
|
|
|
|
### When to Use Memory
|
|
|
|
1. **Large outputs** - If output would exceed ~500 tokens, store it
|
|
2. **Test results** - Store full test output, reference summary
|
|
3. **Build logs** - Store full log, include just errors inline
|
|
4. **Generated code** - Store in memory, reference in plan
|
|
|
|
### Pattern: Store and Reference
|
|
|
|
```python
|
|
# Instead of including large output in response:
|
|
# "Here are all 500 lines of test output: ..."
|
|
|
|
# Do this:
|
|
# 1. Store the output
|
|
result = subprocess.run(["pytest"], capture_output=True)
|
|
# memory log --stdin <<< result.stdout
|
|
|
|
# 2. Reference it
|
|
# "Test completed. Full output stored in mem-xxx (2400 tokens).
|
|
# Summary: 42 passed, 3 failed. Failed tests: test_auth, test_db, test_cache"
|
|
```
|
|
|
|
### Pattern: Chunk Retrieval
|
|
|
|
For very large content (>4000 tokens), memory auto-chunks:
|
|
|
|
```bash
|
|
# Store 50KB log file
|
|
memory log --file build.log
|
|
# Output: ID: mem-xxx, Chunks: 12
|
|
|
|
# Retrieve specific chunk
|
|
memory fetch mem-xxx --chunk 5
|
|
|
|
# Or get just the summary
|
|
memory fetch mem-xxx --summary-only
|
|
```
|
|
|
|
## Reset/Recovery Workflow
|
|
|
|
After a context reset or session restart:
|
|
|
|
### Step 1: Load Checkpoint
|
|
```bash
|
|
checkpoint load
|
|
# Shows: phase, dependencies, memory refs, status summary
|
|
```
|
|
|
|
### Step 2: Check Memory References
|
|
```bash
|
|
# See what's in memory
|
|
memory refs --checkpoint ckpt-latest
|
|
# Output:
|
|
# mem-abc: pytest results (12000 tokens)
|
|
# mem-def: deployment log (8000 tokens)
|
|
```
|
|
|
|
### Step 3: Fetch Needed Context
|
|
```bash
|
|
# Get summary of test results
|
|
memory fetch mem-abc --summary-only
|
|
# "42 tests: 40 passed, 2 failed (test_auth, test_db)"
|
|
|
|
# If needed, get specific chunk
|
|
memory fetch mem-abc --chunk 0 # First chunk with failures
|
|
```
|
|
|
|
### Step 4: Resume Work
|
|
```bash
|
|
# Check directory status
|
|
cat ./tests/STATUS.md
|
|
# Shows current phase, pending tasks, memory refs
|
|
|
|
# Continue where you left off
|
|
status update ./tests --task "Fixing test_auth failure"
|
|
```
|
|
|
|
## Memory Entry Types
|
|
|
|
| Type | Purpose | Example |
|
|
|------|---------|---------|
|
|
| `transcript` | Full conversation logs | Chat history |
|
|
| `output` | Command/tool outputs | Test results, build logs |
|
|
| `summary` | Generated summaries | Checkpoint summaries |
|
|
| `context` | Saved context state | Variables, environment |
|
|
| `chunk` | Part of larger entry | Auto-generated |
|
|
|
|
## Storage Details
|
|
|
|
### SQLite Database (`memory/memory.db`)
|
|
- Entry metadata (ID, type, timestamps)
|
|
- Content for small entries
|
|
- Summaries
|
|
- Links to checkpoints/directories
|
|
- Tags for searching
|
|
|
|
### Chunk Files (`memory/chunks/`)
|
|
- Gzip-compressed content
|
|
- Named by entry ID
|
|
- Auto-pruned after 30 days
|
|
|
|
### DragonflyDB (optional)
|
|
- Hot cache for recent entries
|
|
- 1-hour TTL
|
|
- Faster retrieval for active work
|
|
|
|
## Best Practices
|
|
|
|
1. **Store proactively** - Don't wait for context overflow
|
|
2. **Tag consistently** - Use meaningful tags for search
|
|
3. **Link to context** - Connect to checkpoints and directories
|
|
4. **Use summaries** - Fetch summary first, full content only if needed
|
|
5. **Prune regularly** - Keep memory lean with periodic pruning
|
|
|
|
## Example Session
|
|
|
|
```bash
|
|
# 1. Run tests, store output
|
|
pytest tests/ 2>&1 | memory log --stdin --tag "pytest" --tag "integration"
|
|
# Stored: mem-20260123-100000-abcd (8500 tokens, 3 chunks)
|
|
|
|
# 2. Create checkpoint with memory ref
|
|
checkpoint now --notes "Integration tests complete"
|
|
|
|
# 3. Later, after context reset, recover
|
|
checkpoint load
|
|
# Phase: Testing, Memory: 1 entry (8500 tokens)
|
|
|
|
memory fetch mem-20260123-100000-abcd --summary-only
|
|
# "Integration tests: 156 passed, 2 failed
|
|
# Failures: test_oauth_flow (line 234), test_rate_limit (line 567)"
|
|
|
|
# 4. Get specific failure details
|
|
memory fetch mem-20260123-100000-abcd --chunk 1
|
|
# (Shows chunk containing the failures)
|
|
|
|
# 5. Continue work
|
|
status update ./tests --task "Fixing test_oauth_flow"
|
|
```
|
|
|
|
---
|
|
|
|
*Part of the [Agent Governance System](/opt/agent-governance/docs/ARCHITECTURE.md)*
|