profit 8c6e7831e9 Add Phase 10-12 implementation: multi-tenant, marketplace, observability
Major additions:
- marketplace/: Agent template registry with FTS5 search, ratings, versioning
- observability/: Prometheus metrics, distributed tracing, structured logging
- ledger/migrations/: Database migration scripts for multi-tenant support
- tests/governance/: 15 new test files for phases 6-12 (295 total tests)
- bin/validate-phases: Full 12-phase validation script

New features:
- Multi-tenant support with tenant isolation and quota enforcement
- Agent marketplace with semantic versioning and search
- Observability with metrics, tracing, and log correlation
- Tier-1 agent bootstrap scripts

Updated components:
- ledger/api.py: Extended API for tenants, marketplace, observability
- ledger/schema.sql: Added tenant, project, marketplace tables
- testing/framework.ts: Enhanced test framework
- checkpoint/checkpoint.py: Improved checkpoint management

Archived:
- External integrations (Slack/GitHub/PagerDuty) moved to .archive/
- Old checkpoint files cleaned up

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 18:39:47 -05:00

295 lines
9.1 KiB
Markdown

# Multi-Agent Coordination System
> Orchestrator for parallel agent execution and coordination
## Overview
The Multi-Agent Coordination System manages parallel execution of multiple agents, providing shared state via a blackboard pattern, message passing, dynamic agent spawning, and comprehensive metrics collection.
## Architecture
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Multi-Agent Orchestrator │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Coordination Layer │ │
│ │ ┌───────────┐ ┌────────────┐ ┌───────────┐ ┌──────────────┐ │ │
│ │ │Blackboard │ │AgentState │ │ Spawn │ │ Metrics │ │ │
│ │ │ (Shared) │ │ Manager │ │Controller │ │ Collector │ │ │
│ │ └───────────┘ └────────────┘ └───────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Agent Alpha │ │ Agent Beta │ │ Agent Gamma │ │
│ │ (Planner) │ │ (Executor) │ │ (Validator) │ │
│ │ │ │ │ │ (Dynamic) │ │
│ │ MessageBus │ │ MessageBus │ │ MessageBus │ │
│ └─────────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
└──────────────┼───────────────┼───────────────┼──────────────────────────┘
│ │ │
▼ ▼ ▼
┌───────────────────────────────────────────┐
│ DragonflyDB │
│ (State, Messages, Locks, Metrics) │
└───────────────────────────────────────────┘
```
## Components
### Orchestrator (`orchestrator.ts` - 410 lines)
Main coordination entry point:
- Task initialization
- Agent lifecycle management
- Parallel execution control
- Spawn condition monitoring
- Results aggregation
```typescript
const orchestrator = new MultiAgentOrchestrator("anthropic/claude-sonnet-4");
await orchestrator.initialize();
const results = await orchestrator.execute(taskDefinition);
```
### Agents (`agents.ts` - 850 lines)
Three agent types with distinct roles:
| Agent | Role | Capabilities |
|-------|------|--------------|
| Alpha | Planner | Analyzes tasks, creates execution plans |
| Beta | Executor | Executes plan steps, reports progress |
| Gamma | Validator | Validates results, spawned conditionally |
### Coordination (`coordination.ts` - 450 lines)
Shared infrastructure classes:
| Class | Purpose |
|-------|---------|
| `Blackboard` | Shared state storage (key-value) |
| `MessageBus` | Inter-agent message passing |
| `AgentStateManager` | Agent lifecycle and phase tracking |
| `SpawnController` | Dynamic agent spawning |
| `MetricsCollector` | Performance and compliance metrics |
### Types (`types.ts` - 65 lines)
TypeScript type definitions for:
- `TaskDefinition`
- `CoordinationMetrics`
- `SpawnCondition`
- `AgentRole`
## Quick Start
```bash
# Enter directory
cd /opt/agent-governance/agents/multi-agent
# Install dependencies
bun install
# Run orchestrator
bun run orchestrator.ts
# Run with custom model
bun run orchestrator.ts --model "anthropic/claude-sonnet-4"
```
## Coordination Patterns
### Blackboard Pattern
Shared state accessible by all agents:
```typescript
// Write to blackboard
await blackboard.set("plan", planData);
// Read from blackboard
const plan = await blackboard.get("plan");
// Watch for changes
blackboard.watch("results", (key, value) => {
console.log(`Results updated: ${value}`);
});
```
### Message Passing
Async communication between agents:
```typescript
// Send message
await alphaBus.publish({
from: "ALPHA",
to: "BETA",
type: "TASK_READY",
payload: { stepId: "step-001" }
});
// Receive messages
betaBus.subscribe((message) => {
if (message.type === "TASK_READY") {
executeStep(message.payload.stepId);
}
});
```
### Dynamic Spawning
Agents spawned based on conditions:
```typescript
// Define spawn condition
const gammaCondition: SpawnCondition = {
trigger: "VALIDATION_NEEDED",
threshold: 0.8,
agentType: "GAMMA"
};
// Controller monitors and spawns
spawnController.registerCondition(gammaCondition);
```
## Agent Lifecycle
```
INIT → READY → PLANNING → EXECUTING → VALIDATING → COMPLETE
│ │
└──── FAILED ←──────────┘
```
### Phase Transitions
```typescript
// Update agent phase
await stateManager.setPhase("ALPHA", AgentPhase.PLANNING);
// Check phase
const phase = await stateManager.getPhase("BETA");
```
## Metrics Collection
Comprehensive metrics tracked:
```typescript
interface CoordinationMetrics {
taskId: string;
startTime: number;
endTime?: number;
agentMetrics: {
[agentId: string]: {
phases: string[];
messagesSent: number;
messagesReceived: number;
errors: number;
}
};
blackboardWrites: number;
blackboardReads: number;
spawnEvents: number;
}
```
## Example Task Execution
```typescript
import { MultiAgentOrchestrator } from "./orchestrator";
import type { TaskDefinition } from "./types";
const task: TaskDefinition = {
id: "deploy-001",
type: "deployment",
description: "Deploy web service to sandbox",
constraints: ["sandbox-only", "no-secrets"],
timeout: 300000 // 5 minutes
};
const orchestrator = new MultiAgentOrchestrator();
await orchestrator.initialize();
const results = await orchestrator.execute(task);
console.log(`Status: ${results.status}`);
console.log(`Duration: ${results.duration}ms`);
console.log(`Agents used: ${results.agentsUsed.join(", ")}`);
```
## DragonflyDB Keys
| Key Pattern | Purpose |
|-------------|---------|
| `task:{id}:blackboard:*` | Shared state |
| `task:{id}:state:{agent}` | Agent state |
| `task:{id}:bus:{agent}` | Message queue |
| `task:{id}:metrics` | Coordination metrics |
| `task:{id}:locks:*` | Distributed locks |
## Error Handling
```typescript
try {
await orchestrator.execute(task);
} catch (error) {
if (error instanceof AgentTimeoutError) {
// Agent exceeded timeout
} else if (error instanceof CoordinationError) {
// Infrastructure failure
} else if (error instanceof SpawnLimitError) {
// Too many agents spawned
}
}
```
## Testing
```bash
# Type check
bun run tsc --noEmit
# Run coordination tests
bun test
# Run with mock infrastructure
bun run orchestrator.ts --mock
```
## Dependencies
| Package | Purpose |
|---------|---------|
| typescript | Type system |
| redis | DragonflyDB client |
| openai | LLM integration |
## Configuration
```typescript
const config = {
maxAgents: 5, // Maximum concurrent agents
spawnTimeout: 10000, // Spawn timeout (ms)
messageTimeout: 5000, // Message delivery timeout
blackboardTTL: 3600, // Key expiration (seconds)
metricsInterval: 1000 // Metrics collection interval
};
```
## Architecture Reference
Part of the [Agent Governance System](../../docs/ARCHITECTURE.md).
See also:
- [LLM Planner](../llm-planner) - Single-agent planner
- [Tier 1 Agent](../tier1-agent) - Execution-capable agent
- [Pipeline System](../../pipeline) - Pipeline orchestration
---
*Last updated: 2026-01-24*