agent-governance/docs/MEMORY_LAYER.md

# External Memory Layer

> Token-efficient persistent storage for large outputs, transcripts, and context.

## Overview

The External Memory Layer provides a system for storing and retrieving large content outside the token window. Instead of including full outputs in prompts, agents store content in memory and work with summaries + retrieval references.

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                         Token Window                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ Checkpoint  │  │   STATUS    │  │     Memory References   │  │
│  │  Summary    │  │  Summaries  │  │  [ID] summary (tokens)  │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    External Memory Layer                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │   SQLite    │  │   Chunks    │  │       DragonflyDB       │  │
│  │  (metadata) │  │   (files)   │  │    (hot cache, opt)     │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
```

## Token Thresholds

| Content Size | Storage Strategy |
|-------------|------------------|
| < 500 tokens | Stored inline in database |
| 500-4000 tokens | Stored in compressed file + summary |
| > 4000 tokens | Auto-chunked (multiple files) + parent summary |

## CLI Commands

### Store Content

```bash
# Store inline content
memory log "Test results: all 42 tests passed"

# Store from file
memory log --file /path/to/large-output.txt --tag "test-results"

# Store from stdin (common pattern)
pytest tests/ 2>&1 | memory log --stdin --tag "pytest" --checkpoint ckpt-xxx

# Store with directory linkage
memory log --file output.txt --directory ./pipeline --tag "validation"
```

### Retrieve Content

```bash
# Get full entry (includes content if small, or loads from file)
memory fetch mem-20260123-123456-abcd1234

# Get just the summary (token-efficient)
memory fetch mem-20260123-123456-abcd1234 --summary-only

# Get specific chunk (for large entries)
memory fetch mem-20260123-123456-abcd1234 --chunk 2
```

### List and Search

```bash
# List recent entries
memory list --limit 10

# Filter by type
memory list --type output --limit 20

# Filter by directory
memory list --directory ./tests

# Search content
memory search "error" --limit 5
```

### Memory References

```bash
# Get references linked to a checkpoint
memory refs --checkpoint ckpt-20260123-123456

# Get references for a directory
memory refs --directory ./pipeline
```

### Maintenance

```bash
# Show statistics
memory stats

# Prune old entries
memory prune --keep-days 7 --keep-entries 500
```

## Integration with Checkpoint

When `checkpoint now` runs:
1. Collects references to recent memory entries
2. Includes memory summary (counts, total tokens)
3. Stores lightweight refs instead of full content

```bash
# Checkpoint includes memory refs
checkpoint now --notes "After test run"

# View memory info in checkpoint report
checkpoint report
# Shows:
# [MEMORY REFERENCES]
#   mem-xxx: pytest results (12000 tokens)
#   mem-yyy: build output (3200 tokens)
```

## Integration with STATUS

STATUS.md files can include memory pointers for detailed context:

```markdown
## Context References

- Test Results: `mem-20260123-123456-abcd` (12000 tokens)
- Build Log: `mem-20260123-123457-efgh` (3200 tokens)

Use `memory fetch <id>` to retrieve full content.
```

## Agent Guidelines

### When to Use Memory

1. **Large outputs** - If output would exceed ~500 tokens, store it
2. **Test results** - Store full test output, reference summary
3. **Build logs** - Store full log, include just errors inline
4. **Generated code** - Store in memory, reference in plan

### Pattern: Store and Reference

```python
# Instead of including large output in response:
# "Here are all 500 lines of test output: ..."

# Do this:
# 1. Store the output
result = subprocess.run(["pytest"], capture_output=True)
# memory log --stdin <<< result.stdout

# 2. Reference it
# "Test completed. Full output stored in mem-xxx (2400 tokens).
#  Summary: 42 passed, 3 failed. Failed tests: test_auth, test_db, test_cache"
```

### Pattern: Chunk Retrieval

For very large content (>4000 tokens), memory auto-chunks:

```bash
# Store 50KB log file
memory log --file build.log
# Output: ID: mem-xxx, Chunks: 12

# Retrieve specific chunk
memory fetch mem-xxx --chunk 5

# Or get just the summary
memory fetch mem-xxx --summary-only
```

## Reset/Recovery Workflow

After a context reset or session restart:

### Step 1: Load Checkpoint
```bash
checkpoint load
# Shows: phase, dependencies, memory refs, status summary
```

### Step 2: Check Memory References
```bash
# See what's in memory
memory refs --checkpoint ckpt-latest
# Output:
#   mem-abc: pytest results (12000 tokens)
#   mem-def: deployment log (8000 tokens)
```

### Step 3: Fetch Needed Context
```bash
# Get summary of test results
memory fetch mem-abc --summary-only
# "42 tests: 40 passed, 2 failed (test_auth, test_db)"

# If needed, get specific chunk
memory fetch mem-abc --chunk 0  # First chunk with failures
```

### Step 4: Resume Work
```bash
# Check directory status
cat ./tests/STATUS.md
# Shows current phase, pending tasks, memory refs

# Continue where you left off
status update ./tests --task "Fixing test_auth failure"
```

## Memory Entry Types

| Type | Purpose | Example |
|------|---------|---------|
| `transcript` | Full conversation logs | Chat history |
| `output` | Command/tool outputs | Test results, build logs |
| `summary` | Generated summaries | Checkpoint summaries |
| `context` | Saved context state | Variables, environment |
| `chunk` | Part of larger entry | Auto-generated |

## Storage Details

### SQLite Database (`memory/memory.db`)
- Entry metadata (ID, type, timestamps)
- Content for small entries
- Summaries
- Links to checkpoints/directories
- Tags for searching

### Chunk Files (`memory/chunks/`)
- Gzip-compressed content
- Named by entry ID
- Auto-pruned after 30 days

### DragonflyDB (optional)
- Hot cache for recent entries
- 1-hour TTL
- Faster retrieval for active work

## Best Practices

1. **Store proactively** - Don't wait for context overflow
2. **Tag consistently** - Use meaningful tags for search
3. **Link to context** - Connect to checkpoints and directories
4. **Use summaries** - Fetch summary first, full content only if needed
5. **Prune regularly** - Keep memory lean with periodic pruning

## Example Session

```bash
# 1. Run tests, store output
pytest tests/ 2>&1 | memory log --stdin --tag "pytest" --tag "integration"
# Stored: mem-20260123-100000-abcd (8500 tokens, 3 chunks)

# 2. Create checkpoint with memory ref
checkpoint now --notes "Integration tests complete"

# 3. Later, after context reset, recover
checkpoint load
# Phase: Testing, Memory: 1 entry (8500 tokens)

memory fetch mem-20260123-100000-abcd --summary-only
# "Integration tests: 156 passed, 2 failed
#  Failures: test_oauth_flow (line 234), test_rate_limit (line 567)"

# 4. Get specific failure details
memory fetch mem-20260123-100000-abcd --chunk 1
# (Shows chunk containing the failures)

# 5. Continue work
status update ./tests --task "Fixing test_oauth_flow"
```

---

*Part of the [Agent Governance System](/opt/agent-governance/docs/ARCHITECTURE.md)*