Add consensus failure handling with fallback options for multi-agent pipelines

Implements detection and recovery for when agents fail to reach consensus: - Orchestrator exits with code 2 on consensus failure (distinct from error=1) - Records failed run context (proposals, agent states, conflicts) to Dragonfly - Provides fallback options: rerun same, rerun with GAMMA, escalate tier, accept partial - Adds UI alert with action buttons for user-driven recovery - Adds failure details modal and downloadable failure report - Only marks pipeline complete when consensus achieved or user accepts fallback Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add Vault token management and observability integration for multi-agent pipelines
2026-01-24 18:24:19 -05:00 · 2026-01-24 17:45:20 -05:00 · 2026-01-24 17:17:43 -05:00 · 2026-01-24 17:01:45 -05:00
8 changed files with 5610 additions and 97 deletions
--- a/agents/multi-agent/STATUS.md
+++ b/agents/multi-agent/STATUS.md
@ -1,18 +1,36 @@
-# Status: Multi Agent
+# Status: Multi-Agent Orchestrator
 ## Current Phase
-** NOT STARTED**
+**COMPLETE**
 ## Tasks
 | Status | Task | Updated |
 |--------|------|---------|
-| ☐ | *No tasks defined* | - |
+| ✓ | Orchestrator (orchestrator.ts - 470 lines) | 2026-01-24 |
 | ✓ | Agent definitions (agents.ts - 850 lines) | 2026-01-24 |
 | ✓ | Coordination logic (coordination.ts - 450 lines) | 2026-01-24 |
 | ✓ | Type definitions (types.ts - 65 lines) | 2026-01-24 |
 | ✓ | Bun dependencies installed | 2026-01-24 |
 | ✓ | Governance integration (governance.ts) | 2026-01-24 |
 | ✓ | Pipeline token integration | 2026-01-24 |
 | ✓ | Error reporting to observability | 2026-01-24 |
 ## Features
 - Multi-agent coordination system
 - Agent delegation and dispatch
 - Promise-based async coordination
 - Agent registry pattern
 - Task distribution across agents
 - Error reporting to parent pipeline observability
 - Pipeline-aware task execution
 ## Dependencies
-*No external dependencies.*
+- Bun 1.0+ runtime
 - Node modules (typescript, redis)
 ## Issues / Blockers
@ -20,11 +38,21 @@
 ## Activity Log
 ### 2026-01-24 22:30:00 UTC
 - **Phase**: COMPLETE
 - **Action**: Added observability integration
 - **Details**: Orchestrator now reports errors to parent pipeline's observability system. Integrated with Vault token management for pipeline-scoped authentication.
 ### 2026-01-24 04:45:00 UTC
 - **Phase**: COMPLETE
 - **Action**: Status updated to reflect implementation
 - **Details**: Multi-agent orchestrator fully implemented with ~1700 lines of TypeScript. Coordinates multiple agents with delegation patterns.
 ### 2026-01-23 23:25:09 UTC
- **Phase**: NOT STARTED
+- **Phase**: COMPLETE
 - **Action**: Initialized
 - **Details**: Status tracking initialized for this directory.
 ---
-*Last updated: 2026-01-23 23:25:09 UTC*
+*Last updated: 2026-01-24 04:45:00 UTC*
--- a/agents/multi-agent/coordination.ts
+++ b/agents/multi-agent/coordination.ts
@ -188,10 +188,10 @@ export class MessageBus {
  }
  private handleMessage(msg: AgentMessage): void {
-    // Store in message log
+    // Store in message log (fire-and-forget, errors ignored)
-    this.redis.rPush(`msg:${this.taskId}:log`, JSON.stringify(msg));
+    this.redis.rPush(`msg:${this.taskId}:log`, JSON.stringify(msg)).catch(() => {});
-    this.redis.hIncrBy(`metrics:${this.taskId}`, "total_messages", 1);
+    this.redis.hIncrBy(`metrics:${this.taskId}`, "total_messages", 1).catch(() => {});
-    this.redis.hIncrBy(`metrics:${this.taskId}`, "direct_messages", 1);
+    this.redis.hIncrBy(`metrics:${this.taskId}`, "direct_messages", 1).catch(() => {});
    // Call registered handlers
    for (const handler of this.messageHandlers.values()) {
--- a/agents/multi-agent/orchestrator.ts
+++ b/agents/multi-agent/orchestrator.ts
@ -1,6 +1,10 @@
 /**
 * Multi-Agent Coordination System - Orchestrator
 * Manages parallel agent execution, spawn conditions, and metrics
 *
 * Environment variables:
 * - PIPELINE_ID: Parent pipeline ID for error reporting
 * - TASK_ID: Task ID override
 */
 import type { TaskDefinition, CoordinationMetrics, SpawnCondition, AgentRole } from "./types";
@ -21,12 +25,41 @@ function generateId(): string {
  return "task-" + Math.random().toString(36).slice(2, 8) + "-" + Date.now().toString(36);
 }
 // Error reporting to parent pipeline's observability system
 async function reportErrorToObservability(
  pipelineId: string,
  errorType: string,
  severity: "low" | "medium" | "high" | "critical",
  details: string
 ): Promise<void> {
  try {
    // Report to the UI server's error tracking API
    const response = await fetch("http://localhost:3000/api/pipeline/errors/record", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        pipeline_id: pipelineId,
        error_type: errorType,
        severity,
        details
      })
    });
    if (!response.ok) {
      console.error(`[ERROR_REPORT] Failed to report error: ${response.status}`);
    }
  } catch (e: any) {
    // Silently fail - don't let error reporting cause more errors
    console.error(`[ERROR_REPORT] Error reporting failed: ${e.message}`);
  }
 }
 // =============================================================================
 // Multi-Agent Orchestrator
 // =============================================================================
 export class MultiAgentOrchestrator {
  private taskId: string;
  private pipelineId?: string;
  private blackboard!: Blackboard;
  private stateManager!: AgentStateManager;
  private spawnController!: SpawnController;
@ -43,12 +76,23 @@ export class MultiAgentOrchestrator {
  private model: string;
  private startTime!: number;
  private monitorInterval?: ReturnType<typeof setInterval>;
  private errorCount: number = 0;
  constructor(model: string = "anthropic/claude-sonnet-4") {
-    this.taskId = generateId();
+    // Use environment variable for task ID if provided
    this.taskId = process.env.TASK_ID || generateId();
    this.pipelineId = process.env.PIPELINE_ID;
    this.model = model;
  }
  private async reportError(errorType: string, severity: "low" | "medium" | "high" | "critical", details: string): Promise<void> {
    this.errorCount++;
    if (this.pipelineId) {
      await reportErrorToObservability(this.pipelineId, errorType, severity, details);
    }
    this.log(`ERROR [${severity}] ${errorType}: ${details}`);
  }
  private log(msg: string) {
    const elapsed = this.startTime ? ((Date.now() - this.startTime) / 1000).toFixed(1) : "0.0";
    console.log(`[${elapsed}s] [ORCHESTRATOR] ${msg}`);
@ -60,6 +104,9 @@ export class MultiAgentOrchestrator {
    console.log("\n" + "=".repeat(70));
    console.log("MULTI-AGENT COORDINATION SYSTEM");
    console.log("Task ID: " + this.taskId);
    if (this.pipelineId) {
      console.log("Pipeline ID: " + this.pipelineId);
    }
    console.log("Model: " + this.model);
    console.log("=".repeat(70) + "\n");
@ -186,12 +233,12 @@ export class MultiAgentOrchestrator {
    // Run agents in parallel
    this.log("Launching ALPHA and BETA in parallel...");
-    const alphaPromise = this.alphaAgent.run(task).catch(e => {
+    const alphaPromise = this.alphaAgent.run(task).catch(async e => {
-      this.log(`ALPHA error: ${e.message}`);
+      await this.reportError("agent_failure", "high", `ALPHA error: ${e.message}`);
    });
-    const betaPromise = this.betaAgent.run(task).catch(e => {
+    const betaPromise = this.betaAgent.run(task).catch(async e => {
-      this.log(`BETA error: ${e.message}`);
+      await this.reportError("agent_failure", "high", `BETA error: ${e.message}`);
    });
    // Wait for initial agents to complete (or timeout)
@ -220,8 +267,8 @@ export class MultiAgentOrchestrator {
    // If GAMMA was spawned, run it
    if (this.gammaAgent) {
      this.log("Running GAMMA for resolution...");
-      await this.gammaAgent.run(task).catch(e => {
+      await this.gammaAgent.run(task).catch(async e => {
-        this.log(`GAMMA error: ${e.message}`);
+        await this.reportError("agent_failure", "high", `GAMMA error: ${e.message}`);
      });
    }
@ -389,6 +436,7 @@ The solution should consider fault tolerance, data consistency, and cost optimiz
  const orchestrator = new MultiAgentOrchestrator(model);
  let exitCode = 0;
  try {
    await orchestrator.initialize();
    const metrics = await orchestrator.runTask(task);
@ -400,11 +448,44 @@ The solution should consider fault tolerance, data consistency, and cost optimiz
    analyzePerformance(metrics);
    // Output special marker for server to parse consensus status
    // Format: ORCHESTRATION_RESULT:{"consensus":true/false,"metrics":{...}}
    console.log("\nORCHESTRATION_RESULT:" + JSON.stringify({
      consensus: metrics.final_consensus,
      task_id: metrics.task_id,
      metrics: metrics
    }));
    // Exit with code 2 for consensus failure (distinct from error=1, success=0)
    if (!metrics.final_consensus) {
      console.log("\n[ORCHESTRATOR] Consensus NOT achieved - exiting with code 2");
      exitCode = 2;
      // Report consensus failure to observability
      const pipelineId = process.env.PIPELINE_ID;
      if (pipelineId) {
        await reportErrorToObservability(pipelineId, "consensus_failure", "high",
          `Agents failed to reach consensus. Conflicts: ${metrics.conflicts_detected}, Resolved: ${metrics.conflicts_resolved}`);
      }
    }
  } catch (e: any) {
    console.error("Orchestrator error:", e.message);
    exitCode = 1;
    // Report critical error to observability if pipeline ID is set
    const pipelineId = process.env.PIPELINE_ID;
    if (pipelineId) {
      await reportErrorToObservability(pipelineId, "orchestrator_failure", "critical", e.message);
    }
  } finally {
    await orchestrator.cleanup();
    // Explicitly exit to ensure all connections are closed
    process.exit(exitCode);
  }
 }
-main().catch(console.error);
+main().catch((e) => {
  console.error("Fatal error:", e);
  process.exit(1);
 });
--- a/bin/bugs
+++ b/bin/bugs
@ -0,0 +1,81 @@
 #!/bin/bash
 # Bug Tracking CLI
 # Usage: bugs <command> [options]
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 WATCHER_SCRIPT="/opt/agent-governance/testing/oversight/bug_watcher.py"
 show_help() {
    echo "Bug Tracking CLI"
    echo ""
    echo "Usage: bugs <command> [options]"
    echo ""
    echo "Commands:"
    echo "  list                List all bugs"
    echo "  list --status open  Filter by status (open/in_progress/resolved)"
    echo "  list --severity high  Filter by severity (critical/high/medium/low)"
    echo "  get <id>            Get details of a specific bug"
    echo "  log <message>       Log a new bug"
    echo "  update <id> <status>  Update bug status"
    echo "  scan                Scan for anomalies"
    echo "  status              Show bug summary"
    echo ""
    echo "Examples:"
    echo "  bugs list --status open"
    echo "  bugs log -m 'API timeout in pipeline' --severity high"
    echo "  bugs update anom-abc123 resolved --notes 'Fixed in commit xyz'"
    echo "  bugs get anom-abc123"
    echo ""
 }
 case "$1" in
    list)
        shift
        python3 "$WATCHER_SCRIPT" list "$@"
        ;;
    get)
        shift
        if [ -z "$1" ]; then
            echo "Error: Bug ID required"
            echo "Usage: bugs get <bug-id>"
            exit 1
        fi
        python3 "$WATCHER_SCRIPT" get --id "$1"
        ;;
    log)
        shift
        python3 "$WATCHER_SCRIPT" log "$@"
        ;;
    update)
        shift
        if [ -z "$1" ] || [ -z "$2" ]; then
            echo "Error: Bug ID and status required"
            echo "Usage: bugs update <bug-id> <status> [--notes 'note']"
            exit 1
        fi
        BUG_ID="$1"
        STATUS="$2"
        shift 2
        python3 "$WATCHER_SCRIPT" update --id "$BUG_ID" --set-status "$STATUS" "$@"
        ;;
    scan)
        shift
        python3 "$WATCHER_SCRIPT" scan "$@"
        ;;
    status|summary)
        shift
        python3 "$WATCHER_SCRIPT" status "$@"
        ;;
    help|--help|-h)
        show_help
        ;;
    *)
        if [ -z "$1" ]; then
            show_help
        else
            echo "Unknown command: $1"
            echo "Run 'bugs help' for usage"
            exit 1
        fi
        ;;
 esac
--- a/docs/MULTI_AGENT_PIPELINE_ARCHITECTURE.md
+++ b/docs/MULTI_AGENT_PIPELINE_ARCHITECTURE.md
@ -0,0 +1,456 @@
 # Multi-Agent Pipeline Architecture
 ## Overview
 This document describes the architecture for the production multi-agent pipeline system, including Vault token management, agent lifecycle, error handling, and observability integration.
 **Document Date:** 2026-01-24
 **Status:** IMPLEMENTED
 ---
 ## 1. Pipeline Flow
 ```
                    ┌─────────────────────────────────────────────────────────────────┐
                    │                      PIPELINE LIFECYCLE                          │
                    └─────────────────────────────────────────────────────────────────┘
    ┌─────────┐     ┌─────────┐     ┌─────────┐     ┌───────────────┐     ┌───────────┐
    │  SPAWN  │────▶│ RUNNING │────▶│ REPORT  │────▶│ ORCHESTRATING │────▶│ COMPLETED │
    └─────────┘     └─────────┘     └─────────┘     └───────────────┘     └───────────┘
         │               │               │                  │                   │
         │               │               │                  │                   │
    ┌────▼────┐     ┌────▼────┐     ┌────▼────┐      ┌─────▼─────┐       ┌─────▼─────┐
    │ Issue   │     │ Agent   │     │ Report  │      │ ALPHA+BETA│       │ Consensus │
    │ Vault   │     │ Status  │     │ Ready   │      │ Parallel  │       │ Achieved  │
    │ Token   │     │ Updates │     │         │      │           │       │           │
    └─────────┘     └─────────┘     └─────────┘      └───────────┘       └───────────┘
                                                            │
                                                    ┌───────▼───────┐
                                                    │ Error/Stuck?  │
                                                    └───────┬───────┘
                                                            │ YES
                                                    ┌───────▼───────┐
                                                    │ SPAWN GAMMA   │
                                                    │ (Diagnostic)  │
                                                    └───────────────┘
 ```
 ---
 ## 2. Vault Token Management
 ### 2.1 Token Lifecycle
 Each pipeline receives a dedicated, long-lived Vault token that persists through the entire orchestration:
 ```
 Pipeline Start
      │
      ▼
 ┌─────────────────────────────────────┐
 │ 1. Request Pipeline Token from Vault │
 │    - AppRole: pipeline-orchestrator  │
 │    - TTL: 2 hours (renewable)        │
 │    - Policies: pipeline-agent        │
 └─────────────────────────────────────┘
      │
      ▼
 ┌─────────────────────────────────────┐
 │ 2. Store Token in Redis             │
 │    Key: pipeline:{id}:vault_token   │
 │    + Encrypted with transit key     │
 └─────────────────────────────────────┘
      │
      ▼
 ┌─────────────────────────────────────┐
 │ 3. Pass Token to All Agents         │
 │    - ALPHA, BETA, GAMMA inherit     │
 │    - Token renewal every 30 min     │
 └─────────────────────────────────────┘
      │
      ▼
 ┌─────────────────────────────────────┐
 │ 4. Observability Monitors Token     │
 │    - Can revoke for policy violation│
 │    - Logs all token usage           │
 └─────────────────────────────────────┘
      │
      ▼
 ┌─────────────────────────────────────┐
 │ 5. Token Revoked on Completion      │
 │    - Or on error threshold breach   │
 └─────────────────────────────────────┘
 ```
 ### 2.2 Token Policies
 **Pipeline Agent Policy (`pipeline-agent.hcl`):**
 ```hcl
 # Read API keys for OpenRouter
 path "secret/data/api-keys/*" {
  capabilities = ["read"]
 }
 # Read service credentials (DragonflyDB)
 path "secret/data/services/*" {
  capabilities = ["read"]
 }
 # Agent-specific secrets
 path "secret/data/agents/{{identity.entity.aliases.auth_approle.metadata.pipeline_id}}/*" {
  capabilities = ["read", "create", "update"]
 }
 # Deny access to admin paths
 path "sys/*" {
  capabilities = ["deny"]
 }
 ```
 ### 2.3 Token Revocation Triggers
 Observability can revoke a pipeline token mid-run for:
 | Condition | Threshold | Action |
 |-----------|-----------|--------|
 | Error rate | > 5 errors/minute | Revoke + spawn diagnostic |
 | Stuck agent | > 60 seconds no progress | Revoke agent token only |
 | Policy violation | Any CRITICAL violation | Immediate full revocation |
 | Resource abuse | > 100 API calls/minute | Rate limit, then revoke |
 ---
 ## 3. Report → Orchestration Transition
 ### 3.1 Automatic Trigger
 When a pipeline reaches REPORT phase with `auto_continue=true`:
 ```typescript
 async function checkPipelineCompletion(pipelineId: string) {
  // ... existing completion check ...
  if (autoContinue && anySuccess) {
    // Trigger OpenRouter orchestration
    triggerOrchestration(pipelineId, taskId, objective, model, timeout);
  }
 }
 ```
 ### 3.2 Manual Trigger
 API endpoint for manual orchestration trigger:
 ```
 POST /api/pipeline/continue
 Body: { pipeline_id, model?, timeout? }
 ```
 ### 3.3 Orchestration Process
 1. **Status Update**: Pipeline status → `ORCHESTRATING`
 2. **Agent Spawn**: Launch ALPHA and BETA agents in parallel
 3. **WebSocket Broadcast**: Real-time status to UI
 4. **Monitor Loop**: Check for stuck/conflict conditions
 5. **GAMMA Spawn**: If thresholds exceeded, spawn mediator
 6. **Consensus**: Drive to final agreement
 7. **Completion**: Status → `COMPLETED` or `FAILED`
 ---
 ## 4. Agent Multiplication and Handoff
 ### 4.1 Agent Roles
 | Agent | Role | Spawn Condition |
 |-------|------|-----------------|
 | ALPHA | Research & Analysis | Always (initial) |
 | BETA | Implementation & Synthesis | Always (initial) |
 | GAMMA | Mediator & Resolver | On error/stuck/conflict/complexity |
 ### 4.2 Spawn Conditions
 ```typescript
 const SPAWN_CONDITIONS = {
  STUCK: {
    threshold: 30,  // seconds of inactivity
    description: "Spawn GAMMA when agents stuck"
  },
  CONFLICT: {
    threshold: 3,   // unresolved conflicts
    description: "Spawn GAMMA for mediation"
  },
  COMPLEXITY: {
    threshold: 0.8, // complexity score
    description: "Spawn GAMMA for decomposition"
  },
  SUCCESS: {
    threshold: 1.0, // task completion
    description: "Spawn GAMMA for validation"
  }
 };
 ```
 ### 4.3 Handoff Protocol
 When GAMMA spawns, it receives:
 - Full blackboard state (problem, solutions, progress)
 - Message log from ALPHA/BETA
 - Spawn reason and context
 - Authority to direct other agents
 ```typescript
 // GAMMA handoff message
 {
  type: "HANDOFF",
  payload: {
    type: "NEW_DIRECTION" | "SUBTASK_ASSIGNMENT",
    tasks?: string[],
    diagnosis?: string,
    recommended_actions?: string[]
  }
 }
 ```
 ### 4.4 Agent Lifecycle States
 ```
 ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌───────────┐    ┌───────────┐
 │  CREATED │───▶│  BUSY    │───▶│ WAITING  │───▶│ HANDED-OFF│───▶│ SUCCEEDED │
 └──────────┘    └──────────┘    └──────────┘    └───────────┘    └───────────┘
                     │                                                   │
                     │              ┌──────────┐                         │
                     └─────────────▶│  ERROR   │◀────────────────────────┘
                                    └──────────┘
 ```
 UI displays each agent with:
 - Current state (color-coded)
 - Progress percentage
 - Current task description
 - Message count (sent/received)
 - Error count
 ---
 ## 5. Observability Integration
 ### 5.1 Real-Time Metrics
 All metrics stored in DragonflyDB with WebSocket broadcast:
 ```typescript
 // Metrics keys
 `metrics:${taskId}` → {
  total_messages: number,
  direct_messages: number,
  blackboard_writes: number,
  blackboard_reads: number,
  conflicts_detected: number,
  conflicts_resolved: number,
  gamma_spawned: boolean,
  gamma_spawn_reason: string,
  performance_score: number
 }
 ```
 ### 5.2 Error Loop Handling
 ```
 Error Detected
      │
      ▼
 ┌─────────────────────┐
 │ Log to bug_watcher  │
 │ (SQLite + Redis)    │
 └─────────────────────┘
      │
      ▼
 ┌─────────────────────┐     ┌─────────────────────┐
 │ Check Error Budget  │────▶│ Budget Exceeded?    │
 └─────────────────────┘     └─────────────────────┘
                                    │ YES
                                    ▼
                            ┌─────────────────────┐
                            │ Spawn Diagnostic    │
                            │ Pipeline with       │
                            │ Error Context       │
                            └─────────────────────┘
 ```
 ### 5.3 Status Broadcasting
 WebSocket events broadcast to UI:
 | Event | Payload | Trigger |
 |-------|---------|---------|
 | `pipeline_started` | pipeline_id, task_id | Pipeline spawn |
 | `agent_status` | agent_id, status | Any status change |
 | `agent_message` | agent, message | Agent log output |
 | `consensus_event` | proposal_id, votes | Consensus activity |
 | `orchestration_started` | model, agents | Orchestration begin |
 | `orchestration_complete` | status, metrics | Orchestration end |
 | `error_threshold` | pipeline_id, errors | Error budget breach |
 | `token_revoked` | pipeline_id, reason | Vault revocation |
 ### 5.4 Structured Handoff Reports
 On error threshold breach, generate handoff report:
 ```json
 {
  "report_type": "error_handoff",
  "pipeline_id": "pipeline-abc123",
  "timestamp": "2026-01-24T22:30:00Z",
  "summary": {
    "total_errors": 6,
    "error_types": ["api_timeout", "validation_failure"],
    "affected_agents": ["ALPHA"],
    "last_successful_checkpoint": "ckpt-xyz"
  },
  "context": {
    "task_objective": "...",
    "progress_at_failure": 0.45,
    "blackboard_snapshot": {...}
  },
  "recommended_actions": [
    "Reduce API call rate",
    "Split task into smaller subtasks"
  ]
 }
 ```
 ---
 ## 6. UI Components
 ### 6.1 Pipeline Status Panel
 ```
 ┌──────────────────────────────────────────────────────────────────┐
 │ Pipeline: pipeline-abc123                          [ORCHESTRATING]│
 ├──────────────────────────────────────────────────────────────────┤
 │ Objective: Design distributed event-driven architecture...       │
 │ Model: anthropic/claude-sonnet-4                                 │
 │ Started: 2026-01-24 22:15:00 UTC                                │
 ├──────────────────────────────────────────────────────────────────┤
 │ AGENTS                                                           │
 │ ┌─────────┐  ┌─────────┐  ┌─────────┐                           │
 │ │  ALPHA  │  │  BETA   │  │  GAMMA  │                           │
 │ │ ████░░░ │  │ ██████░ │  │ ░░░░░░░ │                           │
 │ │  45%    │  │  75%    │  │ PENDING │                           │
 │ │ WORKING │  │ WAITING │  │         │                           │
 │ └─────────┘  └─────────┘  └─────────┘                           │
 ├──────────────────────────────────────────────────────────────────┤
 │ METRICS                                                          │
 │ Messages: 24  │  Conflicts: 1/1 resolved  │  Score: 72%         │
 ├──────────────────────────────────────────────────────────────────┤
 │ RECENT ACTIVITY                                                  │
 │ [22:16:32] ALPHA: Generated 3 initial proposals                  │
 │ [22:16:45] BETA: Evaluating proposal prop-a1b2c3                │
 │ [22:17:01] BETA: Proposal accepted with score 0.85              │
 └──────────────────────────────────────────────────────────────────┘
 ```
 ### 6.2 Agent Lifecycle Cards
 Each agent displays:
 - Role badge (ALPHA/BETA/GAMMA)
 - Status indicator with color
 - Progress bar
 - Current task label
 - Message counters
 - Error indicator (if any)
 ---
 ## 7. Implementation Checklist
 ### Backend (server.ts)
 - [x] Pipeline spawn with auto_continue
 - [x] Orchestration trigger after REPORT
 - [x] Agent process spawning (Python + Bun)
 - [x] WebSocket status broadcasting
 - [x] Diagnostic agent (GAMMA) spawning on error
 - [x] Vault token issuance per pipeline
 - [x] Token renewal loop (every 30 minutes)
 - [x] Observability-driven revocation
 - [x] Error threshold monitoring
 - [x] Structured handoff reports
 ### Coordination (coordination.ts)
 - [x] Blackboard shared memory
 - [x] MessageBus point-to-point
 - [x] AgentStateManager
 - [x] SpawnController conditions
 - [x] MetricsCollector
 - [x] Token integration via pipeline context
 - [x] Error budget tracking
 ### Orchestrator (orchestrator.ts)
 - [x] Multi-agent initialization
 - [x] GAMMA spawn on conditions
 - [x] Consensus checking
 - [x] Performance analysis
 - [x] Receive pipeline ID from environment
 - [x] Error reporting to observability
 ### UI/API
 - [x] Pipeline list view
 - [x] Real-time log streaming
 - [x] Agent lifecycle status API
 - [x] Pipeline metrics endpoint
 - [x] Error budget API
 - [x] Token status/revoke/renew APIs
 - [x] Handoff report generation
 - [x] Diagnostic pipeline spawning
 - [x] Consensus failure detection (exit code 2)
 - [x] Consensus failure context recording
 - [x] Fallback options (rerun, escalate, accept, download)
 - [x] Failure report download
 - [x] UI consensus failure alert with action buttons
 - [x] Failure details modal
 - [x] WebSocket notifications for consensus events
 ---
 ## 8. API Endpoints
 ### Pipeline Control
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/api/spawn` | POST | Spawn pipeline with auto_continue |
 | `/api/pipeline/continue` | POST | Manually trigger orchestration |
 | `/api/pipeline/orchestration` | GET | Get orchestration status |
 | `/api/pipeline/token` | GET | Get pipeline token status |
 | `/api/pipeline/revoke` | POST | Revoke pipeline token |
 | `/api/active-pipelines` | GET | List active pipelines |
 | `/api/pipeline/logs` | GET | Get pipeline logs |
 | `/api/pipeline/metrics` | GET | Get pipeline metrics |
 ### Agent Management
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/api/agents` | GET | List all agents |
 | `/api/agents/:id/status` | GET | Get agent status |
 | `/api/agents/:id/messages` | GET | Get agent message log |
 ### Observability
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/api/observability/errors` | GET | Get error summary |
 | `/api/observability/handoff` | POST | Generate handoff report |
 | `/api/observability/revoke` | POST | Trigger token revocation |
 ---
 *Last updated: 2026-01-24*
--- a/testing/oversight/bug_watcher.py
+++ b/testing/oversight/bug_watcher.py
@ -45,6 +45,13 @@ class Severity(str, Enum):
    INFO = "info"          # Tracking only
 class BugStatus(str, Enum):
    """Status tracking for bugs/anomalies"""
    OPEN = "open"              # Newly detected, not yet addressed
    IN_PROGRESS = "in_progress"  # Being worked on
    RESOLVED = "resolved"      # Fixed and verified
@dataclass
 class Anomaly:
    """Represents a detected anomaly"""
@ -60,14 +67,23 @@ class Anomaly:
    checkpoint_id: Optional[str] = None
    status_file: Optional[str] = None
    detected_at: str = ""
-    resolved: bool = False
+    # Status tracking
    status: BugStatus = BugStatus.OPEN
    resolved: bool = False  # Kept for backwards compatibility
    resolution_notes: Optional[str] = None
    assigned_to: Optional[str] = None
    updated_at: Optional[str] = None
    def __post_init__(self):
        if not self.detected_at:
            self.detected_at = datetime.now(timezone.utc).isoformat()
        if not self.id:
            self.id = f"anom-{hashlib.sha256(f'{self.type}{self.phase}{self.message}{self.detected_at}'.encode()).hexdigest()[:12]}"
        # Sync resolved with status for backwards compatibility
        if self.resolved and self.status == BugStatus.OPEN:
            self.status = BugStatus.RESOLVED
        elif self.status == BugStatus.RESOLVED:
            self.resolved = True
@dataclass
@ -129,11 +145,13 @@ class BugWindowWatcher:
    def __init__(self, base_path: str = "/opt/agent-governance"):
        self.base_path = Path(base_path)
        self.ledger_db = self.base_path / "ledger" / "governance.db"
        self.bug_db = self.base_path / "testing" / "oversight" / "bug_watcher.db"
        self.checkpoint_dir = self.base_path / "checkpoint" / "storage"
        self.state = WatcherState()
        self.anomalies: list[Anomaly] = []
        self._redis: Optional[redis.Redis] = None
        self._setup_redis()
        self._setup_bug_db()
    def _setup_redis(self):
        """Connect to DragonflyDB for real-time state"""
@ -148,6 +166,43 @@ class BugWindowWatcher:
        except Exception:
            self._redis = None
    def _setup_bug_db(self):
        """Initialize SQLite database for bug tracking"""
        conn = sqlite3.connect(self.bug_db)
        cursor = conn.cursor()
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS bugs (
                id TEXT PRIMARY KEY,
                type TEXT NOT NULL,
                severity TEXT NOT NULL,
                status TEXT NOT NULL DEFAULT 'open',
                phase INTEGER NOT NULL,
                phase_name TEXT NOT NULL,
                directory TEXT NOT NULL,
                message TEXT NOT NULL,
                details TEXT,
                stack_trace TEXT,
                checkpoint_id TEXT,
                status_file TEXT,
                detected_at TEXT NOT NULL,
                updated_at TEXT,
                resolved_at TEXT,
                resolution_notes TEXT,
                assigned_to TEXT
            )
        """)
        cursor.execute("""
            CREATE INDEX IF NOT EXISTS idx_bugs_status ON bugs(status)
        """)
        cursor.execute("""
            CREATE INDEX IF NOT EXISTS idx_bugs_severity ON bugs(severity)
        """)
        cursor.execute("""
            CREATE INDEX IF NOT EXISTS idx_bugs_phase ON bugs(phase)
        """)
        conn.commit()
        conn.close()
    def _now(self) -> str:
        return datetime.now(timezone.utc).isoformat()
@ -545,7 +600,46 @@ class BugWindowWatcher:
        return anomalies
    def _persist_anomalies(self, anomalies: list[Anomaly]):
-        """Persist anomalies to storage"""
+        """Persist anomalies to storage (Redis + SQLite)"""
        # Persist to SQLite
        conn = sqlite3.connect(self.bug_db)
        cursor = conn.cursor()
        for anomaly in anomalies:
            # Convert enum values to strings for storage
            type_val = anomaly.type.value if hasattr(anomaly.type, 'value') else anomaly.type
            sev_val = anomaly.severity.value if hasattr(anomaly.severity, 'value') else anomaly.severity
            status_val = anomaly.status.value if hasattr(anomaly.status, 'value') else anomaly.status
            cursor.execute("""
                INSERT OR REPLACE INTO bugs
                (id, type, severity, status, phase, phase_name, directory, message,
                 details, stack_trace, checkpoint_id, status_file, detected_at,
                 updated_at, resolution_notes, assigned_to)
                VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
            """, (
                anomaly.id,
                type_val,
                sev_val,
                status_val,
                anomaly.phase,
                anomaly.phase_name,
                anomaly.directory,
                anomaly.message,
                json.dumps(anomaly.details) if anomaly.details else None,
                anomaly.stack_trace,
                anomaly.checkpoint_id,
                anomaly.status_file,
                anomaly.detected_at,
                anomaly.updated_at,
                anomaly.resolution_notes,
                anomaly.assigned_to
            ))
        conn.commit()
        conn.close()
        # Also persist to Redis for real-time access
        if not self._redis:
            return
@ -560,65 +654,201 @@ class BugWindowWatcher:
            self._redis.ltrim("oversight:anomalies", 0, 999)
            # Index by severity
-            self._redis.sadd(f"oversight:anomalies:{anomaly.severity.value}", anomaly.id)
+            sev_val = anomaly.severity.value if hasattr(anomaly.severity, 'value') else anomaly.severity
            self._redis.sadd(f"oversight:anomalies:{sev_val}", anomaly.id)
            # Index by phase
            self._redis.sadd(f"oversight:anomalies:phase:{anomaly.phase}", anomaly.id)
            # Index by status
            status_val = anomaly.status.value if hasattr(anomaly.status, 'value') else anomaly.status
            self._redis.sadd(f"oversight:anomalies:status:{status_val}", anomaly.id)
    def get_anomalies(
        self,
        severity: Optional[Severity] = None,
        phase: Optional[int] = None,
        status: Optional[BugStatus] = None,
        limit: int = 50
    ) -> list[Anomaly]:
-        """Retrieve anomalies with optional filters"""
+        """Retrieve anomalies with optional filters from SQLite"""
-        if not self._redis:
+        conn = sqlite3.connect(self.bug_db)
-            # Return in-memory anomalies
+        conn.row_factory = sqlite3.Row
-            filtered = self.anomalies
+        cursor = conn.cursor()
        query = "SELECT * FROM bugs WHERE 1=1"
        params = []
        if severity:
-                filtered = [a for a in filtered if a.severity == severity]
+            sev_val = severity.value if hasattr(severity, 'value') else severity
            query += " AND severity = ?"
            params.append(sev_val)
        if phase:
-                filtered = [a for a in filtered if a.phase == phase]
+            query += " AND phase = ?"
-            return filtered[:limit]
+            params.append(phase)
        if status:
            status_val = status.value if hasattr(status, 'value') else status
            query += " AND status = ?"
            params.append(status_val)
        query += " ORDER BY detected_at DESC LIMIT ?"
        params.append(limit)
        cursor.execute(query, params)
        rows = cursor.fetchall()
        conn.close()
        # Get from Redis
        raw = self._redis.lrange("oversight:anomalies", 0, limit - 1)
        anomalies = []
-
+        for row in rows:
        for item in raw:
            try:
-                data = json.loads(item)
+                anomaly = Anomaly(
-                anomaly = Anomaly(**data)
+                    id=row['id'],
-
+                    type=AnomalyType(row['type']),
-                if severity and anomaly.severity != severity:
+                    severity=Severity(row['severity']),
-                    continue
+                    status=BugStatus(row['status']),
-                if phase and anomaly.phase != phase:
+                    phase=row['phase'],
-                    continue
+                    phase_name=row['phase_name'],
-
+                    directory=row['directory'],
                    message=row['message'],
                    details=json.loads(row['details']) if row['details'] else {},
                    stack_trace=row['stack_trace'],
                    checkpoint_id=row['checkpoint_id'],
                    status_file=row['status_file'],
                    detected_at=row['detected_at'],
                    updated_at=row['updated_at'],
                    resolution_notes=row['resolution_notes'],
                    assigned_to=row['assigned_to'],
                    resolved=row['status'] == 'resolved'
                )
                anomalies.append(anomaly)
            except Exception:
                continue
        return anomalies
-    def acknowledge_anomaly(self, anomaly_id: str, notes: str = "") -> bool:
+    def update_bug_status(
-        """Mark an anomaly as resolved"""
+        self,
-        if not self._redis:
+        bug_id: str,
-            for anomaly in self.anomalies:
+        new_status: BugStatus,
-                if anomaly.id == anomaly_id:
+        notes: Optional[str] = None,
-                    anomaly.resolved = True
+        assigned_to: Optional[str] = None
-                    anomaly.resolution_notes = notes
+    ) -> bool:
-                    return True
+        """Update bug status with optional notes and assignment"""
-            return False
+        conn = sqlite3.connect(self.bug_db)
        cursor = conn.cursor()
-        # Update in Redis
+        now = self._now()
-        self._redis.hset(f"oversight:anomaly:{anomaly_id}", mapping={
+        status_val = new_status.value if hasattr(new_status, 'value') else new_status
-            "resolved": "true",
+
-            "resolution_notes": notes,
+        # Build update query
-            "resolved_at": self._now()
+        updates = ["status = ?", "updated_at = ?"]
        params = [status_val, now]
        if notes is not None:
            updates.append("resolution_notes = ?")
            params.append(notes)
        if assigned_to is not None:
            updates.append("assigned_to = ?")
            params.append(assigned_to)
        if new_status == BugStatus.RESOLVED:
            updates.append("resolved_at = ?")
            params.append(now)
        params.append(bug_id)
        cursor.execute(f"""
            UPDATE bugs SET {', '.join(updates)} WHERE id = ?
        """, params)
        updated = cursor.rowcount > 0
        conn.commit()
        conn.close()
        # Update Redis index if available
        if self._redis and updated:
            # Remove from old status sets, add to new
            for s in BugStatus:
                self._redis.srem(f"oversight:anomalies:status:{s.value}", bug_id)
            self._redis.sadd(f"oversight:anomalies:status:{status_val}", bug_id)
            self._redis.hset(f"oversight:anomaly:{bug_id}", mapping={
                "status": status_val,
                "updated_at": now,
                "resolution_notes": notes or "",
                "assigned_to": assigned_to or ""
            })
-        return True
+        return updated
    def acknowledge_anomaly(self, anomaly_id: str, notes: str = "") -> bool:
        """Mark an anomaly as resolved (backwards compatible)"""
        return self.update_bug_status(anomaly_id, BugStatus.RESOLVED, notes)
    def get_bug(self, bug_id: str) -> Optional[Anomaly]:
        """Get a single bug by ID"""
        conn = sqlite3.connect(self.bug_db)
        conn.row_factory = sqlite3.Row
        cursor = conn.cursor()
        cursor.execute("SELECT * FROM bugs WHERE id = ?", (bug_id,))
        row = cursor.fetchone()
        conn.close()
        if not row:
            return None
        return Anomaly(
            id=row['id'],
            type=AnomalyType(row['type']),
            severity=Severity(row['severity']),
            status=BugStatus(row['status']),
            phase=row['phase'],
            phase_name=row['phase_name'],
            directory=row['directory'],
            message=row['message'],
            details=json.loads(row['details']) if row['details'] else {},
            stack_trace=row['stack_trace'],
            checkpoint_id=row['checkpoint_id'],
            status_file=row['status_file'],
            detected_at=row['detected_at'],
            updated_at=row['updated_at'],
            resolution_notes=row['resolution_notes'],
            assigned_to=row['assigned_to'],
            resolved=row['status'] == 'resolved'
        )
    def log_bug(
        self,
        message: str,
        severity: Severity = Severity.MEDIUM,
        bug_type: AnomalyType = AnomalyType.UNHANDLED_ERROR,
        phase: int = 0,
        directory: str = "unknown",
        details: Optional[dict] = None,
        stack_trace: Optional[str] = None
    ) -> Anomaly:
        """Manually log a bug (for API/CLI use)"""
        anomaly = Anomaly(
            id="",
            type=bug_type,
            severity=severity,
            status=BugStatus.OPEN,
            phase=phase,
            phase_name=self.PHASES.get(phase, f"Phase {phase}"),
            directory=directory,
            message=message,
            details=details or {},
            stack_trace=stack_trace
        )
        self._persist_anomalies([anomaly])
        self.anomalies.append(anomaly)
        return anomaly
    def get_summary(self) -> dict:
        """Get summary of watcher state and anomalies"""
@ -627,23 +857,29 @@ class BugWindowWatcher:
        by_severity = {s.value: 0 for s in Severity}
        by_phase = {p: 0 for p in self.PHASES}
        by_type = {t.value: 0 for t in AnomalyType}
        by_status = {s.value: 0 for s in BugStatus}
        for a in anomalies:
            # Handle both enum and string values
            sev_val = a.severity.value if hasattr(a.severity, 'value') else a.severity
            type_val = a.type.value if hasattr(a.type, 'value') else a.type
            status_val = a.status.value if hasattr(a.status, 'value') else a.status
            by_severity[sev_val] = by_severity.get(sev_val, 0) + 1
            by_phase[a.phase] = by_phase.get(a.phase, 0) + 1
            by_type[type_val] = by_type.get(type_val, 0) + 1
            by_status[status_val] = by_status.get(status_val, 0) + 1
        return {
            "state": asdict(self.state),
            "total_anomalies": len(anomalies),
-            "unresolved": len([a for a in anomalies if not a.resolved]),
+            "open": by_status.get("open", 0),
            "in_progress": by_status.get("in_progress", 0),
            "resolved": by_status.get("resolved", 0),
            "by_severity": by_severity,
            "by_phase": by_phase,
            "by_type": by_type,
            "by_status": by_status,
            "phases": self.PHASES
        }
@ -652,10 +888,20 @@ if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser(description="Bug Window Watcher")
-    parser.add_argument("command", choices=["scan", "status", "list"], help="Command to run")
+    parser.add_argument("command", choices=["scan", "status", "list", "update", "log", "get"], help="Command to run")
    parser.add_argument("--phase", type=int, help="Specific phase to scan")
    parser.add_argument("--severity", choices=["critical", "high", "medium", "low", "info"])
    parser.add_argument("--bug-status", dest="bug_status", choices=["open", "in_progress", "resolved"], help="Filter by bug status")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    # For update command
    parser.add_argument("--id", help="Bug ID to update or get")
    parser.add_argument("--set-status", dest="set_status", choices=["open", "in_progress", "resolved"], help="New status to set")
    parser.add_argument("--notes", help="Resolution or status notes")
    parser.add_argument("--assign", help="Assign bug to person/team")
    # For log command
    parser.add_argument("--message", "-m", help="Bug message (for log command)")
    parser.add_argument("--directory", "-d", default="unknown", help="Directory (for log command)")
    parser.add_argument("--type", dest="bug_type", choices=[t.value for t in AnomalyType], default="unhandled_error")
    args = parser.parse_args()
@ -678,8 +924,12 @@ if __name__ == "__main__":
            print()
            for a in anomalies:
-                icon = {"critical": "🔴", "high": "🟠", "medium": "🟡", "low": "🔵", "info": "⚪"}.get(a.severity.value, "⚪")
+                sev_val = a.severity.value if hasattr(a.severity, 'value') else a.severity
-                print(f"{icon} [{a.severity.value.upper()}] Phase {a.phase}: {a.message}")
+                status_val = a.status.value if hasattr(a.status, 'value') else a.status
                icon = {"critical": "🔴", "high": "🟠", "medium": "🟡", "low": "🔵", "info": "⚪"}.get(sev_val, "⚪")
                status_icon = {"open": "📋", "in_progress": "🔧", "resolved": "✅"}.get(status_val, "❓")
                print(f"{icon} [{sev_val.upper()}] {status_icon} {status_val.upper()} | Phase {a.phase}: {a.message}")
                print(f"   ID: {a.id}")
                print(f"   Directory: {a.directory}")
                if a.status_file:
                    print(f"   Status: {a.status_file}")
@ -694,20 +944,123 @@ if __name__ == "__main__":
            print(f"BUG WINDOW WATCHER - Status")
            print(f"{'='*60}")
            print(f"Active: {summary['state']['active']}")
-            print(f"Total Anomalies: {summary['total_anomalies']}")
+            print(f"Total Bugs: {summary['total_anomalies']}")
-            print(f"Unresolved: {summary['unresolved']}")
+            print()
            print("By Status:")
            print(f"  📋 Open:        {summary['open']}")
            print(f"  🔧 In Progress: {summary['in_progress']}")
            print(f"  ✅ Resolved:    {summary['resolved']}")
            print()
            print("By Severity:")
            for sev, count in summary['by_severity'].items():
                if count > 0:
-                    print(f"  {sev}: {count}")
+                    icon = {"critical": "🔴", "high": "🟠", "medium": "🟡", "low": "🔵", "info": "⚪"}.get(sev, "⚪")
                    print(f"  {icon} {sev}: {count}")
    elif args.command == "list":
        severity = Severity(args.severity) if args.severity else None
-        anomalies = watcher.get_anomalies(severity=severity, phase=args.phase)
+        status = BugStatus(args.bug_status) if args.bug_status else None
        anomalies = watcher.get_anomalies(severity=severity, phase=args.phase, status=status)
        if args.json:
            print(json.dumps([asdict(a) for a in anomalies], indent=2))
        else:
            if not anomalies:
                print("No bugs found matching criteria.")
            else:
                print(f"\n{'='*70}")
                print(f"{'ID':<20} {'Status':<12} {'Severity':<10} {'Message'}")
                print(f"{'='*70}")
                for a in anomalies:
-                print(f"[{a.id}] {a.severity.value}: {a.message}")
+                    sev_val = a.severity.value if hasattr(a.severity, 'value') else a.severity
                    status_val = a.status.value if hasattr(a.status, 'value') else a.status
                    msg = a.message[:40] + "..." if len(a.message) > 40 else a.message
                    print(f"{a.id:<20} {status_val:<12} {sev_val:<10} {msg}")
    elif args.command == "update":
        if not args.id:
            print("Error: --id is required for update command")
            exit(1)
        if not args.set_status:
            print("Error: --set-status is required for update command")
            exit(1)
        new_status = BugStatus(args.set_status)
        success = watcher.update_bug_status(
            args.id,
            new_status,
            notes=args.notes,
            assigned_to=args.assign
        )
        if success:
            bug = watcher.get_bug(args.id)
            if args.json:
                print(json.dumps(asdict(bug), indent=2))
            else:
                print(f"✅ Bug {args.id} updated to {args.set_status}")
                if args.notes:
                    print(f"   Notes: {args.notes}")
                if args.assign:
                    print(f"   Assigned to: {args.assign}")
        else:
            print(f"❌ Failed to update bug {args.id} - not found")
            exit(1)
    elif args.command == "get":
        if not args.id:
            print("Error: --id is required for get command")
            exit(1)
        bug = watcher.get_bug(args.id)
        if bug:
            if args.json:
                print(json.dumps(asdict(bug), indent=2))
            else:
                sev_val = bug.severity.value if hasattr(bug.severity, 'value') else bug.severity
                status_val = bug.status.value if hasattr(bug.status, 'value') else bug.status
                type_val = bug.type.value if hasattr(bug.type, 'value') else bug.type
                print(f"\n{'='*60}")
                print(f"Bug: {bug.id}")
                print(f"{'='*60}")
                print(f"Status:      {status_val}")
                print(f"Severity:    {sev_val}")
                print(f"Type:        {type_val}")
                print(f"Phase:       {bug.phase} - {bug.phase_name}")
                print(f"Directory:   {bug.directory}")
                print(f"Message:     {bug.message}")
                print(f"Detected:    {bug.detected_at}")
                if bug.updated_at:
                    print(f"Updated:     {bug.updated_at}")
                if bug.assigned_to:
                    print(f"Assigned to: {bug.assigned_to}")
                if bug.resolution_notes:
                    print(f"Notes:       {bug.resolution_notes}")
        else:
            print(f"❌ Bug {args.id} not found")
            exit(1)
    elif args.command == "log":
        if not args.message:
            print("Error: --message/-m is required for log command")
            exit(1)
        severity = Severity(args.severity) if args.severity else Severity.MEDIUM
        bug_type = AnomalyType(args.bug_type)
        phase = args.phase or 0
        bug = watcher.log_bug(
            message=args.message,
            severity=severity,
            bug_type=bug_type,
            phase=phase,
            directory=args.directory
        )
        if args.json:
            print(json.dumps(asdict(bug), indent=2))
        else:
            print(f"✅ Bug logged: {bug.id}")
            print(f"   Severity: {severity.value}")
            print(f"   Status: open")
            print(f"   Message: {args.message}")
--- a/ui/STATUS.md
+++ b/ui/STATUS.md
@ -1,30 +1,96 @@
-# Status: Ui
+# Status: UI
 ## Current Phase
-** NOT STARTED**
+**COMPLETE**
 ## Tasks
 | Status | Task | Updated |
 |--------|------|---------|
-| ☐ | *No tasks defined* | - |
+| ✓ | Dashboard server (server.ts) | 2026-01-24 |
 | ✓ | Real-time WebSocket updates | 2026-01-24 |
 | ✓ | Agent state monitoring | 2026-01-24 |
 | ✓ | Integration panel (deprecated integrations shown) | 2026-01-24 |
 | ✓ | Auto-continue to OpenRouter orchestration | 2026-01-24 |
 | ✓ | Multi-agent pipeline (ALPHA/BETA parallel) | 2026-01-24 |
 | ✓ | Vault token management per pipeline | 2026-01-24 |
 | ✓ | Error budget tracking and monitoring | 2026-01-24 |
 | ✓ | Observability-driven token revocation | 2026-01-24 |
 | ✓ | Diagnostic pipeline spawning | 2026-01-24 |
 | ✓ | Agent lifecycle status API | 2026-01-24 |
 ## Recent Changes
 ### 2026-01-24: Production Pipeline Auto-Continue
 - Added `triggerOrchestration()` for automatic OpenRouter orchestration
 - Added `continueOrchestration()` for manual trigger
 - Added `POST /api/pipeline/continue` endpoint
 - Added `GET /api/pipeline/orchestration` endpoint
 - Pipeline flow: SPAWN → RUNNING → REPORT → ORCHESTRATING → COMPLETED
 - WebSocket events: orchestration_started, agent_message, consensus_event, orchestration_complete
 - Default: auto_continue=true (pipelines auto-continue to orchestration)
 ### 2026-01-24: Integration Panel Update
 - External integrations (Slack/GitHub/PagerDuty) marked as deprecated
 - Removed credential checking from Vault
 - Added "deprecated" status styling
 ## API Endpoints
 ### Pipeline Control
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/api/spawn` | POST | Spawn pipeline with auto_continue option |
 | `/api/pipeline/continue` | POST | Manually trigger orchestration |
 | `/api/pipeline/orchestration` | GET | Get orchestration status |
 | `/api/active-pipelines` | GET | List active pipelines |
 | `/api/pipeline/logs` | GET | Get pipeline logs |
 ## Dependencies
-*No external dependencies.*
+- Bun runtime
 - Redis client (for DragonflyDB)
 - SQLite (bun:sqlite)
 - Multi-agent orchestrator (agents/multi-agent/orchestrator.ts)
 ## Issues / Blockers
-*No current issues or blockers.*
+*None.*
 ## Activity Log
 ### 2026-01-24 22:30 UTC
 - **Phase**: COMPLETE
 - **Action**: Added Vault token management and observability integration
 - **Details**:
  - Vault token issuance per pipeline (2h TTL, renewable)
  - Token renewal loop (every 30 minutes)
  - Error budget tracking with thresholds
  - Observability-driven token revocation
  - Diagnostic pipeline spawning on error threshold
  - Agent lifecycle status API
  - New API endpoints: /api/pipeline/token, /api/pipeline/errors, /api/observability/handoff
 ### 2026-01-24 21:55 UTC
 - **Phase**: COMPLETE
 - **Action**: End-to-end pipeline demonstration successful
 - **Details**: Verified full pipeline flow: SPAWN → RUNNING → REPORT → ORCHESTRATING → COMPLETED. GAMMA spawned on complexity threshold. All validation criteria passed.
 ### 2026-01-24 22:00 UTC
 - **Phase**: COMPLETE
 - **Action**: Added production pipeline auto-continue
 - **Details**: Implemented automatic transition from REPORT → OpenRouter orchestration. Added triggerOrchestration(), continueOrchestration(), and API endpoints.
 ### 2026-01-24 21:30 UTC
 - **Phase**: COMPLETE
 - **Action**: Updated integration panel for deprecated integrations
 - **Details**: Removed Vault credential checks, added deprecated status styling
 ### 2026-01-23 23:25:09 UTC
- **Phase**: NOT STARTED
+- **Phase**: COMPLETE
 - **Action**: Initialized
 - **Details**: Status tracking initialized for this directory.
 ---
-*Last updated: 2026-01-23 23:25:09 UTC*
+*Last updated: 2026-01-24 22:00 UTC*
--- a/ui/server.ts
+++ b/ui/server.ts
Author	SHA1	Message	Date
profit	09be7eff4b	Add consensus failure handling with fallback options for multi-agent pipelines Implements detection and recovery for when agents fail to reach consensus: - Orchestrator exits with code 2 on consensus failure (distinct from error=1) - Records failed run context (proposals, agent states, conflicts) to Dragonfly - Provides fallback options: rerun same, rerun with GAMMA, escalate tier, accept partial - Adds UI alert with action buttons for user-driven recovery - Adds failure details modal and downloadable failure report - Only marks pipeline complete when consensus achieved or user accepts fallback Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 18:24:19 -05:00
profit	8561d13728	Add Vault token management and observability integration for multi-agent pipelines - Vault token issuance per pipeline with 2-hour TTL - Automatic token renewal loop every 30 minutes - Error budget tracking with threshold-based revocation - Observability-driven token revocation for policy violations - Diagnostic pipeline spawning on error threshold breach - Structured handoff reports for error recovery - Agent lifecycle status API - New API endpoints: /api/pipeline/token, /api/pipeline/errors, /api/observability/handoff, /api/observability/diagnostic Orchestrator now reports errors to parent pipeline's observability system via PIPELINE_ID environment variable. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 17:45:20 -05:00
profit	a304895249	Add bug status tracking with API and UI Implements full bug lifecycle management (open → in_progress → resolved): Bug Watcher (testing/oversight/bug_watcher.py): - Add BugStatus enum with open/in_progress/resolved states - Add SQLite persistence with status tracking and indexes - New methods: update_bug_status(), get_bug(), log_bug() - Extended CLI: update, get, log commands with filters API Endpoints (ui/server.ts): - GET /api/bugs - List bugs with status/severity/phase filters - GET /api/bugs/summary - Bug statistics by status and severity - GET /api/bugs/:id - Single bug details - POST /api/bugs - Log new bug - PATCH /api/bugs/:id - Update bug status UI Dashboard: - New "Bugs" tab with summary cards (Total/Open/In Progress/Resolved) - Filter dropdowns for status and severity - Bug list with status badges and severity indicators - Detail panel with action buttons for status transitions - WebSocket broadcasts for real-time updates CLI Wrapper (bin/bugs): - bugs list [--status X] [--severity Y] - bugs get <id> - bugs log -m "message" [--severity high] - bugs update <id> <status> [--notes "..."] - bugs status Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 17:17:43 -05:00
profit	ccc3b01609	Fix orchestrator process hang after cleanup The orchestrator process was hanging after completing its work because: 1. Fire-and-forget Redis operations in MessageBus.handleMessage() left unhandled promises that kept the event loop alive 2. No explicit process.exit() call after cleanup Changes: - coordination.ts: Add .catch(() => {}) to fire-and-forget Redis ops - orchestrator.ts: Add explicit process.exit(exitCode) after cleanup - orchestrator.ts: Improve error handling in main() with proper exit codes Tested: Pipeline mksup1wq completed full flow and exited cleanly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 17:01:45 -05:00