Orchestrator changes:
- Add dumpAgentHandoff() to dump proposals/analysis before abort
- Add loadRecoveryContext() to load inherited context on recovery runs
- Add preseedBlackboard() to pre-seed inherited proposals
- Force-spawn GAMMA immediately on recovery runs
- Track isRecoveryRun, recoveryAttempt, inheritedContext, forceGamma
Server changes:
- Update recordConsensusFailure() to read orchestrator handoff JSON
- Add collectFromBlackboard() helper as fallback
- Update triggerAutoRecovery() with comprehensive context passing
- Store inherited_handoff reference for recovery pipelines
- Track retry_count, abort_reason, handoff_ref in recovery:* keys
- Add recovery badge and prior pipeline link in UI
Test coverage:
- test_auto_recovery.py: 6 unit tests
- test_e2e_auto_recovery.py: 5 E2E tests (handoff dump, recovery
pipeline creation, inherited context, retry tracking, status update)
Redis tracking keys:
- handoff:{pipeline_id}:agents - orchestrator dumps proposals here
- handoff:{recovery_id}:inherited - recovery pipeline inherits from
- recovery:{pipeline_id} - retry_count, abort_reason, handoff_ref
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Multi-Agent Coordination System
Orchestrator for parallel agent execution and coordination
Overview
The Multi-Agent Coordination System manages parallel execution of multiple agents, providing shared state via a blackboard pattern, message passing, dynamic agent spawning, and comprehensive metrics collection.
Architecture
┌─────────────────────────────────────────────────────────────────────────┐
│ Multi-Agent Orchestrator │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Coordination Layer │ │
│ │ ┌───────────┐ ┌────────────┐ ┌───────────┐ ┌──────────────┐ │ │
│ │ │Blackboard │ │AgentState │ │ Spawn │ │ Metrics │ │ │
│ │ │ (Shared) │ │ Manager │ │Controller │ │ Collector │ │ │
│ │ └───────────┘ └────────────┘ └───────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Agent Alpha │ │ Agent Beta │ │ Agent Gamma │ │
│ │ (Planner) │ │ (Executor) │ │ (Validator) │ │
│ │ │ │ │ │ (Dynamic) │ │
│ │ MessageBus │ │ MessageBus │ │ MessageBus │ │
│ └─────────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
└──────────────┼───────────────┼───────────────┼──────────────────────────┘
│ │ │
▼ ▼ ▼
┌───────────────────────────────────────────┐
│ DragonflyDB │
│ (State, Messages, Locks, Metrics) │
└───────────────────────────────────────────┘
Components
Orchestrator (orchestrator.ts - 410 lines)
Main coordination entry point:
- Task initialization
- Agent lifecycle management
- Parallel execution control
- Spawn condition monitoring
- Results aggregation
const orchestrator = new MultiAgentOrchestrator("anthropic/claude-sonnet-4");
await orchestrator.initialize();
const results = await orchestrator.execute(taskDefinition);
Agents (agents.ts - 850 lines)
Three agent types with distinct roles:
| Agent | Role | Capabilities |
|---|---|---|
| Alpha | Planner | Analyzes tasks, creates execution plans |
| Beta | Executor | Executes plan steps, reports progress |
| Gamma | Validator | Validates results, spawned conditionally |
Coordination (coordination.ts - 450 lines)
Shared infrastructure classes:
| Class | Purpose |
|---|---|
Blackboard |
Shared state storage (key-value) |
MessageBus |
Inter-agent message passing |
AgentStateManager |
Agent lifecycle and phase tracking |
SpawnController |
Dynamic agent spawning |
MetricsCollector |
Performance and compliance metrics |
Types (types.ts - 65 lines)
TypeScript type definitions for:
TaskDefinitionCoordinationMetricsSpawnConditionAgentRole
Quick Start
# Enter directory
cd /opt/agent-governance/agents/multi-agent
# Install dependencies
bun install
# Run orchestrator
bun run orchestrator.ts
# Run with custom model
bun run orchestrator.ts --model "anthropic/claude-sonnet-4"
Coordination Patterns
Blackboard Pattern
Shared state accessible by all agents:
// Write to blackboard
await blackboard.set("plan", planData);
// Read from blackboard
const plan = await blackboard.get("plan");
// Watch for changes
blackboard.watch("results", (key, value) => {
console.log(`Results updated: ${value}`);
});
Message Passing
Async communication between agents:
// Send message
await alphaBus.publish({
from: "ALPHA",
to: "BETA",
type: "TASK_READY",
payload: { stepId: "step-001" }
});
// Receive messages
betaBus.subscribe((message) => {
if (message.type === "TASK_READY") {
executeStep(message.payload.stepId);
}
});
Dynamic Spawning
Agents spawned based on conditions:
// Define spawn condition
const gammaCondition: SpawnCondition = {
trigger: "VALIDATION_NEEDED",
threshold: 0.8,
agentType: "GAMMA"
};
// Controller monitors and spawns
spawnController.registerCondition(gammaCondition);
Agent Lifecycle
INIT → READY → PLANNING → EXECUTING → VALIDATING → COMPLETE
│ │
└──── FAILED ←──────────┘
Phase Transitions
// Update agent phase
await stateManager.setPhase("ALPHA", AgentPhase.PLANNING);
// Check phase
const phase = await stateManager.getPhase("BETA");
Metrics Collection
Comprehensive metrics tracked:
interface CoordinationMetrics {
taskId: string;
startTime: number;
endTime?: number;
agentMetrics: {
[agentId: string]: {
phases: string[];
messagesSent: number;
messagesReceived: number;
errors: number;
}
};
blackboardWrites: number;
blackboardReads: number;
spawnEvents: number;
}
Example Task Execution
import { MultiAgentOrchestrator } from "./orchestrator";
import type { TaskDefinition } from "./types";
const task: TaskDefinition = {
id: "deploy-001",
type: "deployment",
description: "Deploy web service to sandbox",
constraints: ["sandbox-only", "no-secrets"],
timeout: 300000 // 5 minutes
};
const orchestrator = new MultiAgentOrchestrator();
await orchestrator.initialize();
const results = await orchestrator.execute(task);
console.log(`Status: ${results.status}`);
console.log(`Duration: ${results.duration}ms`);
console.log(`Agents used: ${results.agentsUsed.join(", ")}`);
DragonflyDB Keys
| Key Pattern | Purpose |
|---|---|
task:{id}:blackboard:* |
Shared state |
task:{id}:state:{agent} |
Agent state |
task:{id}:bus:{agent} |
Message queue |
task:{id}:metrics |
Coordination metrics |
task:{id}:locks:* |
Distributed locks |
Error Handling
try {
await orchestrator.execute(task);
} catch (error) {
if (error instanceof AgentTimeoutError) {
// Agent exceeded timeout
} else if (error instanceof CoordinationError) {
// Infrastructure failure
} else if (error instanceof SpawnLimitError) {
// Too many agents spawned
}
}
Testing
# Type check
bun run tsc --noEmit
# Run coordination tests
bun test
# Run with mock infrastructure
bun run orchestrator.ts --mock
Dependencies
| Package | Purpose |
|---|---|
| typescript | Type system |
| redis | DragonflyDB client |
| openai | LLM integration |
Configuration
const config = {
maxAgents: 5, // Maximum concurrent agents
spawnTimeout: 10000, // Spawn timeout (ms)
messageTimeout: 5000, // Message delivery timeout
blackboardTTL: 3600, // Key expiration (seconds)
metricsInterval: 1000 // Metrics collection interval
};
Architecture Reference
Part of the Agent Governance System.
See also:
- LLM Planner - Single-agent planner
- Tier 1 Agent - Execution-capable agent
- Pipeline System - Pipeline orchestration
Last updated: 2026-01-24