# Production Pipeline: Report → OpenRouter Orchestration ## Overview This document describes the automatic transition from the UI "view report" stage into the live multi-agent pipeline, including OpenRouter-driven parallel execution. **Created:** 2026-01-24 **Status:** Implemented --- ## Architecture Flow ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ UI DASHBOARD │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ │ │ SPAWN │───▶│ RUNNING │───▶│ REPORT │───▶│ AUTO-ORCHESTRATE │ │ │ │ Pipeline │ │ Agents │ │ Stage │ │ (NEW) │ │ │ └──────────┘ └──────────┘ └──────────┘ └────────┬─────────┘ │ │ │ │ └───────────────────────────────────────────────────────────┼─────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ OPENROUTER ORCHESTRATION │ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ MultiAgentOrchestrator │ │ │ │ │ │ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ ALPHA │◄────────────▶│ BETA │ │ │ │ │ │ (Research) │ Messages │ (Synthesis) │ │ │ │ │ │ Python │ │ Bun │ │ │ │ │ └──────┬──────┘ └──────┬──────┘ │ │ │ │ │ │ │ │ │ │ └─────────┬──────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌─────────────┐ │ │ │ │ │ GAMMA │ (Spawned on STUCK/CONFLICT) │ │ │ │ │ (Mediator) │ │ │ │ │ └─────────────┘ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ Shared Infrastructure: │ │ • Blackboard (DragonflyDB) - Proposals, solutions, consensus │ │ • MessageBus (Redis PubSub) - Agent coordination │ │ • MetricsCollector - Performance tracking │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ COMPLETION & AUDIT │ │ │ │ • Results written to SQLite ledger │ │ • Checkpoint created with final state │ │ • WebSocket broadcast to UI │ │ • Pipeline status → COMPLETED │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` --- ## Implementation Components ### 1. Auto-Orchestration Trigger **Location:** `/opt/agent-governance/ui/server.ts` **Trigger Conditions:** - Pipeline reaches REPORT phase - All agents have completed or timed out - No critical failures blocking continuation **New Endpoint:** `POST /api/pipeline/continue` ```typescript { pipeline_id: string; mode: "openrouter" | "local"; // openrouter = full LLM, local = mock model?: string; // Default: anthropic/claude-sonnet-4 timeout?: number; // Default: 120s } ``` ### 2. Parallel Agent Execution **Python Agent (ALPHA):** - Path: `/opt/agent-governance/agents/llm-planner/governed_agent.py` - Role: Research, analysis, proposal generation - Runtime: Python 3.11 with venv **Bun Agent (BETA):** - Path: `/opt/agent-governance/agents/llm-planner-ts/governed-agent.ts` - Role: Synthesis, evaluation, solution building - Runtime: Bun (4x faster than Node.js) **Coordination:** - Both agents connect to same DragonflyDB instance - Shared Blackboard for structured data exchange - MessageBus for real-time communication - SpawnController monitors for GAMMA trigger conditions ### 3. OpenRouter Integration **Credential Flow:** ``` Vault (secret/data/api-keys/openrouter) │ ▼ getVaultSecret() in agent code │ ▼ OpenAI client with baseURL: "https://openrouter.ai/api/v1" │ ▼ Model: anthropic/claude-sonnet-4 ``` **Rate Limiting:** - Handled by OpenRouter API - Circuit breaker in governance layer (5 failures → open) - Per-agent token budget tracking ### 4. Error Handling & Failover **Level 1: Agent-Level Recovery** ``` Error Budget per agent: - max_total_errors: 8 - max_same_error_repeats: 2 - max_procedure_violations: 1 On budget exceeded → Agent revoked, handoff created ``` **Level 2: Pipeline-Level Recovery** ``` On agent failure: 1. Record failure in DragonflyDB 2. Check if partner agent can continue alone 3. If both fail → Pipeline status = FAILED 4. Create checkpoint with failure details ``` **Level 3: Orchestration-Level Recovery** ``` On orchestration timeout (120s default): 1. Force-stop running agents 2. Collect partial results from Blackboard 3. Generate partial report 4. Pipeline status = TIMEOUT ``` **GAMMA Spawn Conditions:** | Condition | Threshold | Action | |-----------|-----------|--------| | STUCK | 30s no progress | Spawn GAMMA mediator | | CONFLICT | 3+ unresolved proposals | Spawn GAMMA to arbitrate | | COMPLEXITY | Score > 0.8 | Spawn GAMMA for decomposition | --- ## API Endpoints ### Existing (Modified) - `POST /api/spawn` - Creates pipeline, now includes `auto_continue: boolean` - `GET /api/checkpoint/report` - Returns report with continuation status ### New - `POST /api/pipeline/continue` - Triggers OpenRouter orchestration - `GET /api/pipeline/{id}/orchestration` - Gets orchestration status - `POST /api/pipeline/{id}/stop` - Emergency stop --- ## WebSocket Events ### New Events ```typescript // Orchestration started { type: "orchestration_started", data: { pipeline_id, model, agents: ["ALPHA", "BETA"] } } // Agent spawned { type: "agent_spawned", data: { pipeline_id, agent_id, role, runtime } } // Agent message { type: "agent_message", data: { pipeline_id, from, to, content } } // GAMMA spawned (conditional) { type: "gamma_spawned", data: { pipeline_id, reason: "STUCK" | "CONFLICT" | "COMPLEXITY" } } // Consensus reached { type: "consensus_reached", data: { pipeline_id, proposal_id, votes } } // Orchestration complete { type: "orchestration_complete", data: { pipeline_id, status, results } } ``` --- ## Configuration ### Environment Variables ```bash # Enable auto-orchestration after report AUTO_ORCHESTRATE=true # Default model for OpenRouter OPENROUTER_MODEL=anthropic/claude-sonnet-4 # Orchestration timeout (seconds) ORCHESTRATION_TIMEOUT=120 # GAMMA spawn thresholds GAMMA_STUCK_THRESHOLD=30 GAMMA_CONFLICT_THRESHOLD=3 GAMMA_COMPLEXITY_THRESHOLD=0.8 ``` ### Vault Secrets Required ``` secret/data/api-keys/openrouter └── api_key: "sk-or-..." secret/data/services/dragonfly └── password: "..." ``` --- ## Implementation Steps ### Step 1: Add Auto-Continue Logic to UI Server - [x] Add `triggerOrchestration()` function - [x] Modify `checkPipelineCompletion()` to check for auto_continue - [x] Add `/api/pipeline/continue` endpoint ### Step 2: Connect to Multi-Agent Orchestrator - [x] Spawn orchestrator.ts from UI via Bun.spawn() - [x] Pass pipeline context (task_id, objective, model, timeout) - [x] Wire up WebSocket events (orchestration_started, agent_message, consensus_event, orchestration_complete) ### Step 3: Add Orchestration Status Tracking - [x] Track orchestration state in Redis (ORCHESTRATING status) - [x] Add orchestration_started_at timestamp - [x] Create checkpoint on completion ### Step 4: Implement Error Handling - [x] Add timeout handling via orchestrator --timeout flag - [x] Capture exit codes and error messages - [x] Set ORCHESTRATION_FAILED or ORCHESTRATION_ERROR status on failure ### Step 5: Test End-to-End - [x] Spawn pipeline with objective - [x] Verify report generation - [x] Verify auto-trigger to orchestration - [x] Verify parallel agent execution - [x] Verify results collection ### Demonstration Results (2026-01-24) Successfully tested with `pipeline-mksufe23`: - Pipeline spawned → ALPHA/BETA ran → Report generated → Auto-orchestration triggered - GAMMA spawned due to complexity (0.8 threshold) - Total orchestration time: 51.4 seconds - Final status: COMPLETED --- ## Testing ### Manual Test Command ```bash # 1. Start UI server cd /opt/agent-governance/ui && bun run server.ts # 2. Spawn pipeline via API curl -X POST http://localhost:3000/api/spawn \ -H "Content-Type: application/json" \ -d '{"objective": "Design a caching strategy", "auto_continue": true}' # 3. Watch WebSocket for events # Pipeline should: SPAWN → RUNNING → REPORT → ORCHESTRATE → COMPLETE ``` ### Validation Criteria - [x] Pipeline reaches ORCHESTRATION phase automatically - [x] Both ALPHA and BETA agents spawn - [x] Agents communicate via MessageBus - [x] Results appear in Blackboard - [x] Final checkpoint created - [x] Audit trail in SQLite --- ## Rollback Plan If orchestration fails repeatedly: 1. Set `AUTO_ORCHESTRATE=false` 2. Pipeline will stop at REPORT phase 3. Manual intervention can trigger orchestration 4. Review logs in `/api/pipeline/logs` --- *Document Version: 1.0* *Last Updated: 2026-01-24*