Major additions: - marketplace/: Agent template registry with FTS5 search, ratings, versioning - observability/: Prometheus metrics, distributed tracing, structured logging - ledger/migrations/: Database migration scripts for multi-tenant support - tests/governance/: 15 new test files for phases 6-12 (295 total tests) - bin/validate-phases: Full 12-phase validation script New features: - Multi-tenant support with tenant isolation and quota enforcement - Agent marketplace with semantic versioning and search - Observability with metrics, tracing, and log correlation - Tier-1 agent bootstrap scripts Updated components: - ledger/api.py: Extended API for tenants, marketplace, observability - ledger/schema.sql: Added tenant, project, marketplace tables - testing/framework.ts: Enhanced test framework - checkpoint/checkpoint.py: Improved checkpoint management Archived: - External integrations (Slack/GitHub/PagerDuty) moved to .archive/ - Old checkpoint files cleaned up Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
295 lines
9.1 KiB
Markdown
295 lines
9.1 KiB
Markdown
# Multi-Agent Coordination System
|
|
|
|
> Orchestrator for parallel agent execution and coordination
|
|
|
|
## Overview
|
|
|
|
The Multi-Agent Coordination System manages parallel execution of multiple agents, providing shared state via a blackboard pattern, message passing, dynamic agent spawning, and comprehensive metrics collection.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ Multi-Agent Orchestrator │
|
|
│ │
|
|
│ ┌──────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Coordination Layer │ │
|
|
│ │ ┌───────────┐ ┌────────────┐ ┌───────────┐ ┌──────────────┐ │ │
|
|
│ │ │Blackboard │ │AgentState │ │ Spawn │ │ Metrics │ │ │
|
|
│ │ │ (Shared) │ │ Manager │ │Controller │ │ Collector │ │ │
|
|
│ │ └───────────┘ └────────────┘ └───────────┘ └──────────────┘ │ │
|
|
│ └──────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌───────────────┼───────────────┐ │
|
|
│ ▼ ▼ ▼ │
|
|
│ ┌─────────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
│ │ Agent Alpha │ │ Agent Beta │ │ Agent Gamma │ │
|
|
│ │ (Planner) │ │ (Executor) │ │ (Validator) │ │
|
|
│ │ │ │ │ │ (Dynamic) │ │
|
|
│ │ MessageBus │ │ MessageBus │ │ MessageBus │ │
|
|
│ └─────────────────┘ └─────────────┘ └─────────────┘ │
|
|
│ │ │ │ │
|
|
└──────────────┼───────────────┼───────────────┼──────────────────────────┘
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌───────────────────────────────────────────┐
|
|
│ DragonflyDB │
|
|
│ (State, Messages, Locks, Metrics) │
|
|
└───────────────────────────────────────────┘
|
|
```
|
|
|
|
## Components
|
|
|
|
### Orchestrator (`orchestrator.ts` - 410 lines)
|
|
|
|
Main coordination entry point:
|
|
- Task initialization
|
|
- Agent lifecycle management
|
|
- Parallel execution control
|
|
- Spawn condition monitoring
|
|
- Results aggregation
|
|
|
|
```typescript
|
|
const orchestrator = new MultiAgentOrchestrator("anthropic/claude-sonnet-4");
|
|
await orchestrator.initialize();
|
|
const results = await orchestrator.execute(taskDefinition);
|
|
```
|
|
|
|
### Agents (`agents.ts` - 850 lines)
|
|
|
|
Three agent types with distinct roles:
|
|
|
|
| Agent | Role | Capabilities |
|
|
|-------|------|--------------|
|
|
| Alpha | Planner | Analyzes tasks, creates execution plans |
|
|
| Beta | Executor | Executes plan steps, reports progress |
|
|
| Gamma | Validator | Validates results, spawned conditionally |
|
|
|
|
### Coordination (`coordination.ts` - 450 lines)
|
|
|
|
Shared infrastructure classes:
|
|
|
|
| Class | Purpose |
|
|
|-------|---------|
|
|
| `Blackboard` | Shared state storage (key-value) |
|
|
| `MessageBus` | Inter-agent message passing |
|
|
| `AgentStateManager` | Agent lifecycle and phase tracking |
|
|
| `SpawnController` | Dynamic agent spawning |
|
|
| `MetricsCollector` | Performance and compliance metrics |
|
|
|
|
### Types (`types.ts` - 65 lines)
|
|
|
|
TypeScript type definitions for:
|
|
- `TaskDefinition`
|
|
- `CoordinationMetrics`
|
|
- `SpawnCondition`
|
|
- `AgentRole`
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Enter directory
|
|
cd /opt/agent-governance/agents/multi-agent
|
|
|
|
# Install dependencies
|
|
bun install
|
|
|
|
# Run orchestrator
|
|
bun run orchestrator.ts
|
|
|
|
# Run with custom model
|
|
bun run orchestrator.ts --model "anthropic/claude-sonnet-4"
|
|
```
|
|
|
|
## Coordination Patterns
|
|
|
|
### Blackboard Pattern
|
|
|
|
Shared state accessible by all agents:
|
|
|
|
```typescript
|
|
// Write to blackboard
|
|
await blackboard.set("plan", planData);
|
|
|
|
// Read from blackboard
|
|
const plan = await blackboard.get("plan");
|
|
|
|
// Watch for changes
|
|
blackboard.watch("results", (key, value) => {
|
|
console.log(`Results updated: ${value}`);
|
|
});
|
|
```
|
|
|
|
### Message Passing
|
|
|
|
Async communication between agents:
|
|
|
|
```typescript
|
|
// Send message
|
|
await alphaBus.publish({
|
|
from: "ALPHA",
|
|
to: "BETA",
|
|
type: "TASK_READY",
|
|
payload: { stepId: "step-001" }
|
|
});
|
|
|
|
// Receive messages
|
|
betaBus.subscribe((message) => {
|
|
if (message.type === "TASK_READY") {
|
|
executeStep(message.payload.stepId);
|
|
}
|
|
});
|
|
```
|
|
|
|
### Dynamic Spawning
|
|
|
|
Agents spawned based on conditions:
|
|
|
|
```typescript
|
|
// Define spawn condition
|
|
const gammaCondition: SpawnCondition = {
|
|
trigger: "VALIDATION_NEEDED",
|
|
threshold: 0.8,
|
|
agentType: "GAMMA"
|
|
};
|
|
|
|
// Controller monitors and spawns
|
|
spawnController.registerCondition(gammaCondition);
|
|
```
|
|
|
|
## Agent Lifecycle
|
|
|
|
```
|
|
INIT → READY → PLANNING → EXECUTING → VALIDATING → COMPLETE
|
|
│ │
|
|
└──── FAILED ←──────────┘
|
|
```
|
|
|
|
### Phase Transitions
|
|
|
|
```typescript
|
|
// Update agent phase
|
|
await stateManager.setPhase("ALPHA", AgentPhase.PLANNING);
|
|
|
|
// Check phase
|
|
const phase = await stateManager.getPhase("BETA");
|
|
```
|
|
|
|
## Metrics Collection
|
|
|
|
Comprehensive metrics tracked:
|
|
|
|
```typescript
|
|
interface CoordinationMetrics {
|
|
taskId: string;
|
|
startTime: number;
|
|
endTime?: number;
|
|
agentMetrics: {
|
|
[agentId: string]: {
|
|
phases: string[];
|
|
messagesSent: number;
|
|
messagesReceived: number;
|
|
errors: number;
|
|
}
|
|
};
|
|
blackboardWrites: number;
|
|
blackboardReads: number;
|
|
spawnEvents: number;
|
|
}
|
|
```
|
|
|
|
## Example Task Execution
|
|
|
|
```typescript
|
|
import { MultiAgentOrchestrator } from "./orchestrator";
|
|
import type { TaskDefinition } from "./types";
|
|
|
|
const task: TaskDefinition = {
|
|
id: "deploy-001",
|
|
type: "deployment",
|
|
description: "Deploy web service to sandbox",
|
|
constraints: ["sandbox-only", "no-secrets"],
|
|
timeout: 300000 // 5 minutes
|
|
};
|
|
|
|
const orchestrator = new MultiAgentOrchestrator();
|
|
await orchestrator.initialize();
|
|
|
|
const results = await orchestrator.execute(task);
|
|
|
|
console.log(`Status: ${results.status}`);
|
|
console.log(`Duration: ${results.duration}ms`);
|
|
console.log(`Agents used: ${results.agentsUsed.join(", ")}`);
|
|
```
|
|
|
|
## DragonflyDB Keys
|
|
|
|
| Key Pattern | Purpose |
|
|
|-------------|---------|
|
|
| `task:{id}:blackboard:*` | Shared state |
|
|
| `task:{id}:state:{agent}` | Agent state |
|
|
| `task:{id}:bus:{agent}` | Message queue |
|
|
| `task:{id}:metrics` | Coordination metrics |
|
|
| `task:{id}:locks:*` | Distributed locks |
|
|
|
|
## Error Handling
|
|
|
|
```typescript
|
|
try {
|
|
await orchestrator.execute(task);
|
|
} catch (error) {
|
|
if (error instanceof AgentTimeoutError) {
|
|
// Agent exceeded timeout
|
|
} else if (error instanceof CoordinationError) {
|
|
// Infrastructure failure
|
|
} else if (error instanceof SpawnLimitError) {
|
|
// Too many agents spawned
|
|
}
|
|
}
|
|
```
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Type check
|
|
bun run tsc --noEmit
|
|
|
|
# Run coordination tests
|
|
bun test
|
|
|
|
# Run with mock infrastructure
|
|
bun run orchestrator.ts --mock
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
| Package | Purpose |
|
|
|---------|---------|
|
|
| typescript | Type system |
|
|
| redis | DragonflyDB client |
|
|
| openai | LLM integration |
|
|
|
|
## Configuration
|
|
|
|
```typescript
|
|
const config = {
|
|
maxAgents: 5, // Maximum concurrent agents
|
|
spawnTimeout: 10000, // Spawn timeout (ms)
|
|
messageTimeout: 5000, // Message delivery timeout
|
|
blackboardTTL: 3600, // Key expiration (seconds)
|
|
metricsInterval: 1000 // Metrics collection interval
|
|
};
|
|
```
|
|
|
|
## Architecture Reference
|
|
|
|
Part of the [Agent Governance System](../../docs/ARCHITECTURE.md).
|
|
|
|
See also:
|
|
- [LLM Planner](../llm-planner) - Single-agent planner
|
|
- [Tier 1 Agent](../tier1-agent) - Execution-capable agent
|
|
- [Pipeline System](../../pipeline) - Pipeline orchestration
|
|
|
|
---
|
|
*Last updated: 2026-01-24*
|