# Context Checkpoint Skill **Phase 5: Agent Bootstrapping** A context preservation system that helps maintain session state across token window resets, CLI restarts, and sub-agent orchestration. ## Overview The checkpoint skill provides: 1. **Periodic State Capture** - Captures phase, tasks, dependencies, variables, and outputs 2. **Token-Aware Summarization** - Creates minimal context summaries for sub-agent calls 3. **CLI Integration** - Manual and automatic checkpoint management 4. **Extensible Storage** - JSON files with DragonflyDB caching (future: remote sync) ## Quick Start ```bash # Add to PATH (optional) export PATH="/opt/agent-governance/bin:$PATH" # Create a checkpoint checkpoint now --notes "Starting task X" # Load latest checkpoint checkpoint load # Compare with previous checkpoint diff # Get compact context summary checkpoint summary # List all checkpoints checkpoint list ``` ## Commands ### `checkpoint now` Create a new checkpoint capturing current state. ```bash checkpoint now # Basic checkpoint checkpoint now --notes "Phase 5 start" # With notes checkpoint now --var key value # With variables checkpoint now --var a 1 --var b 2 # Multiple variables checkpoint now --json # Output JSON ``` **Captures:** - Current phase (from implementation plan) - Task states (from governance DB) - Dependency status (Vault, DragonflyDB, Ledger) - Custom variables - Recent evidence outputs - Agent ID and tier ### `checkpoint load` Load a checkpoint for review or restoration. ```bash checkpoint load # Load latest checkpoint load ckpt-20260123-120000-abc # Load specific checkpoint load --json # Output JSON ``` ### `checkpoint diff` Compare two checkpoints to see what changed. ```bash checkpoint diff # Latest vs previous checkpoint diff --from ckpt-A --to ckpt-B # Specific comparison checkpoint diff --json # Output JSON ``` **Detects:** - Phase changes - Task additions/removals/status changes - Dependency status changes - Variable additions/changes/removals ### `checkpoint list` List available checkpoints. ```bash checkpoint list # Last 20 checkpoint list --limit 50 # Custom limit checkpoint list --json # Output JSON ``` ### `checkpoint summary` Generate context summary for review or sub-agent injection. ```bash checkpoint summary # Compact (default) checkpoint summary --level minimal # ~500 tokens checkpoint summary --level compact # ~1000 tokens checkpoint summary --level standard # ~2000 tokens checkpoint summary --level full # ~4000 tokens checkpoint summary --for terraform # Task-specific checkpoint summary --for governance # Governance context ``` ### `checkpoint prune` Remove old checkpoints. ```bash checkpoint prune # Keep default (50) checkpoint prune --keep 10 # Keep only 10 ``` ## Token-Aware Sub-Agent Context When orchestrating sub-agents, use the summarizer to minimize tokens while preserving essential context: ```python from checkpoint import CheckpointManager, ContextSummarizer # Get latest checkpoint manager = CheckpointManager() checkpoint = manager.get_latest_checkpoint() # Create task-specific summary summarizer = ContextSummarizer(checkpoint) # For infrastructure tasks (~1000 tokens) context = summarizer.for_subagent("terraform", max_tokens=1000) # For governance tasks with specific variables context = summarizer.for_subagent( "promotion", relevant_keys=["agent_id", "current_tier"], max_tokens=500 ) # Pass context to sub-agent subagent.run(context + "\n\n" + task_prompt) ``` ### Summary Levels | Level | Tokens | Contents | |-------|--------|----------| | `minimal` | ~500 | Phase, active task, agent, available deps | | `compact` | ~1000 | + In-progress tasks, pending count, key vars | | `standard` | ~2000 | + All tasks, all deps, recent outputs | | `full` | ~4000 | + All variables, completed phases, metadata | ### Task-Specific Contexts | Task Type | Included Context | |-----------|-----------------| | `terraform`, `ansible`, `infrastructure` | Infrastructure dependencies, service status | | `database`, `query`, `ledger` | Database connections, endpoints | | `promotion`, `revocation`, `governance` | Agent tier, governance variables | ## Auto-Checkpoint Events Checkpoints can be automatically created by integrating with governance hooks: ```python # In your agent code from checkpoint import CheckpointManager manager = CheckpointManager() # Auto-checkpoint on phase transitions def on_phase_complete(phase_num): manager.create_checkpoint( notes=f"Phase {phase_num} complete", variables={"completed_phase": phase_num} ) # Auto-checkpoint on task completion def on_task_complete(task_id, result): manager.create_checkpoint( variables={"last_task": task_id, "result": result} ) # Auto-checkpoint on error def on_error(error_type, message): manager.create_checkpoint( notes=f"Error: {error_type}", variables={"error_type": error_type, "error_msg": message} ) ``` ## Restoring After Restart After a CLI restart or token window reset: ```bash # 1. Load the latest checkpoint checkpoint load # 2. Review what was happening checkpoint summary --level full # 3. If needed, compare with earlier state checkpoint diff # 4. Resume work with context checkpoint summary --level compact > /tmp/context.txt # Use context.txt as preamble for new session ``` ### Programmatic Restoration ```python from checkpoint import CheckpointManager, ContextSummarizer manager = CheckpointManager() checkpoint = manager.get_latest_checkpoint() if checkpoint: # Restore environment variables for key, value in checkpoint.variables.items(): os.environ[f"CKPT_{key.upper()}"] = str(value) # Get context for continuation summarizer = ContextSummarizer(checkpoint) context = summarizer.standard_summary() print(f"Restored from: {checkpoint.checkpoint_id}") print(f"Phase: {checkpoint.phase.name if checkpoint.phase else 'Unknown'}") print(f"Tasks: {len(checkpoint.tasks)}") ``` ## Storage Format ### Checkpoint JSON Structure ```json { "checkpoint_id": "ckpt-20260123-120000-abc12345", "created_at": "2026-01-23T12:00:00.000000+00:00", "session_id": "optional-session-id", "phase": { "name": "Phase 5: Agent Bootstrapping", "number": 5, "status": "in_progress", "started_at": "2026-01-23T10:00:00+00:00", "notes": "Working on checkpoint skill" }, "phases_completed": [1, 2, 3, 4], "tasks": [ { "id": "1", "subject": "Create checkpoint module", "status": "completed", "owner": null, "blocks": [], "blocked_by": [] } ], "active_task_id": "2", "dependencies": [ { "name": "vault", "type": "service", "status": "available", "endpoint": "https://127.0.0.1:8200", "last_checked": "2026-01-23T12:00:00+00:00" } ], "variables": { "custom_key": "custom_value" }, "recent_outputs": [ { "type": "evidence", "id": "evd-20260123-...", "action": "terraform", "success": true, "timestamp": "2026-01-23T11:30:00+00:00" } ], "agent_id": "my-agent", "agent_tier": 1, "content_hash": "abc123...", "parent_checkpoint_id": "ckpt-20260123-110000-...", "estimated_tokens": 450 } ``` ### File Locations ``` /opt/agent-governance/checkpoint/ ├── checkpoint.py # Core module ├── README.md # This documentation ├── storage/ # Checkpoint JSON files │ ├── ckpt-20260123-120000-abc.json │ ├── ckpt-20260123-110000-def.json │ └── ... └── templates/ # Future: checkpoint templates /opt/agent-governance/bin/ └── checkpoint # CLI wrapper ``` ## Extensibility ### Adding Custom State Collectors ```python from checkpoint import CheckpointManager class MyCheckpointManager(CheckpointManager): def collect_my_state(self) -> dict: # Custom state collection return {"my_data": "..."} def create_checkpoint(self, **kwargs): # Add custom state to variables custom_vars = kwargs.get("variables", {}) custom_vars.update(self.collect_my_state()) kwargs["variables"] = custom_vars return super().create_checkpoint(**kwargs) ``` ### Remote Storage (Future) ```python # Planned: S3/remote sync class RemoteCheckpointManager(CheckpointManager): def __init__(self, s3_bucket: str): super().__init__() self.s3_bucket = s3_bucket def save_checkpoint(self, checkpoint): # Save locally local_path = super().save_checkpoint(checkpoint) # Sync to S3 self._upload_to_s3(local_path) return local_path def sync_from_remote(self): # Download checkpoints from S3 pass ``` ## Integration with Agent Governance The checkpoint skill integrates with the existing governance system: ``` ┌─────────────────────────────────────────────────────────────┐ │ Agent Governance │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Preflight │ │ Wrappers │ │ Evidence │ │Checkpoint│ │ │ │ Gate │ │tf/ansible│ │ Package │ │ Skill │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ │ │ │ │ v v v v │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ DragonflyDB │ │ │ │ agent:* states | checkpoint:* | revocations:ledger │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ v v v v │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ SQLite Ledger │ │ │ │ agent_actions | violations | promotions | tasks │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ ``` ## Best Practices 1. **Checkpoint Before Risky Operations** ```bash checkpoint now --notes "Before production deployment" ``` 2. **Include Relevant Variables** ```bash checkpoint now --var target_env production --var rollback_id v1.2.3 ``` 3. **Use Task-Specific Summaries for Sub-Agents** ```bash checkpoint summary --for terraform > context.txt ``` 4. **Review Diffs After Long Operations** ```bash checkpoint diff # What changed? ``` 5. **Prune Regularly in Long-Running Sessions** ```bash checkpoint prune --keep 20 ``` ## Troubleshooting ### "No checkpoint found" Create one first: ```bash checkpoint now ``` ### High token estimates Use more aggressive summarization: ```bash checkpoint summary --level minimal ``` ### Missing dependencies Check services: ```bash docker exec vault vault status redis-cli -p 6379 -a governance2026 PING ``` ### Stale checkpoints Prune and recreate: ```bash checkpoint prune --keep 5 checkpoint now --notes "Fresh start" ```