History

profit 8c6e7831e9 Add Phase 10-12 implementation: multi-tenant, marketplace, observability

Major additions:
- marketplace/: Agent template registry with FTS5 search, ratings, versioning
- observability/: Prometheus metrics, distributed tracing, structured logging
- ledger/migrations/: Database migration scripts for multi-tenant support
- tests/governance/: 15 new test files for phases 6-12 (295 total tests)
- bin/validate-phases: Full 12-phase validation script

New features:
- Multi-tenant support with tenant isolation and quota enforcement
- Agent marketplace with semantic versioning and search
- Observability with metrics, tracing, and log correlation
- Tier-1 agent bootstrap scripts

Updated components:
- ledger/api.py: Extended API for tenants, marketplace, observability
- ledger/schema.sql: Added tenant, project, marketplace tables
- testing/framework.ts: Enhanced test framework
- checkpoint/checkpoint.py: Improved checkpoint management

Archived:
- External integrations (Slack/GitHub/PagerDuty) moved to .archive/
- Old checkpoint files cleaned up

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-24 18:39:47 -05:00

__init__.py

Add Phase 10-12 implementation: multi-tenant, marketplace, observability

2026-01-24 18:39:47 -05:00

logging.py

Add Phase 10-12 implementation: multi-tenant, marketplace, observability

2026-01-24 18:39:47 -05:00

metrics.py

Add Phase 10-12 implementation: multi-tenant, marketplace, observability

2026-01-24 18:39:47 -05:00

README.md

Add Phase 10-12 implementation: multi-tenant, marketplace, observability

2026-01-24 18:39:47 -05:00

STATUS.md

Add Phase 10-12 implementation: multi-tenant, marketplace, observability

2026-01-24 18:39:47 -05:00

tracing.py

Add Phase 10-12 implementation: multi-tenant, marketplace, observability

2026-01-24 18:39:47 -05:00

README.md

Observability

Metrics, tracing, and structured logging for the agent governance system

Overview

This module provides comprehensive observability infrastructure including Prometheus-format metrics, distributed tracing with span propagation, and structured JSON logging with trace correlation.

Key Files

File	Description
`metrics.py`	Prometheus metrics (Counter, Gauge, Histogram) with FastAPI router
`tracing.py`	Distributed tracing with Span/Trace classes and context propagation
`logging.py`	Structured JSON logging with SQLite persistence
`__init__.py`	Module exports and unified API

Components

Metrics (`metrics.py`)

Prometheus-format metrics with automatic collection:

Metric	Type	Description
`agent_executions_total`	Counter	Total executions by tier, action, status
`agent_execution_duration_seconds`	Histogram	Execution latency distribution
`agent_violations_total`	Counter	Violations by type and severity
`agent_promotions_total`	Counter	Tier promotions
`api_requests_total`	Counter	API requests by method, endpoint, status
`api_request_duration_seconds`	Histogram	Request latency
`component_health`	Gauge	Health status (1=healthy, 0=unhealthy)
`tenant_quota_usage_ratio`	Gauge	Quota usage per tenant

Tracing (`tracing.py`)

Distributed tracing with automatic context propagation:

Span: Individual operation with timing, attributes, events
Trace: Collection of related spans
Context Propagation: Thread-local storage + HTTP headers (X-Trace-ID, X-Span-ID)

Logging (`logging.py`)

Structured JSON logging with:

Automatic trace/span ID correlation
SQLite persistence with full-text search
Multi-tenant support
Configurable retention (default: 7 days)

API Endpoints

Metrics

GET /metrics - Prometheus format export

Tracing

GET /traces - List traces with filters
GET /traces/{trace_id} - Full trace details

Logging

GET /logs - Search logs with filters
GET /logs/trace/{trace_id} - Logs for a trace
GET /logs/stats - Log statistics
POST /logs/cleanup - Clean old logs

Health

GET /health/detailed - Component health with details

Usage

from observability import (
    # Metrics
    registry,
    record_agent_execution,
    record_violation,
    record_promotion,
    MetricsMiddleware,

    # Tracing
    get_tracer,
    get_current_trace_id,

    # Logging
    get_logger
)

# Create a logger
logger = get_logger("my_agent")

# Get the tracer
tracer = get_tracer()

# Trace an operation
with tracer.trace("agent_execution", agent_id="agent-123") as span:
    logger.info("Starting execution", agent_id="agent-123")

    try:
        # Do work...
        with tracer.span("sub_operation") as child:
            # Child span automatically linked
            pass

        record_agent_execution(tier=1, action="update_config", success=True, duration=0.45)

    except Exception as e:
        span.set_error(e)
        record_violation("unauthorized_action", "high")
        logger.error("Execution failed", error=str(e))

FastAPI Integration

from fastapi import FastAPI
from observability import metrics_router, tracing_router, logging_router, MetricsMiddleware

app = FastAPI()

# Add metrics middleware
app.add_middleware(MetricsMiddleware)

# Mount routers
app.include_router(metrics_router)
app.include_router(tracing_router)
app.include_router(logging_router)

Configuration

Setting	Default	Description
`DB_PATH`	`/opt/agent-governance/ledger/governance.db`	SQLite database
`LOG_LEVEL`	`INFO`	Minimum log level
`LOG_RETENTION_DAYS`	`7`	Days to retain logs

Status

Complete

See STATUS.md for detailed progress tracking.

Architecture Reference

Part of the Agent Governance System.

Parent: Project Root

Last updated: 2026-01-24 UTC

README.md

Observability

Overview

Key Files

Components

Metrics (metrics.py)

Tracing (tracing.py)

Logging (logging.py)

API Endpoints

Metrics

Tracing

Logging

Health

Usage

FastAPI Integration

Configuration

Status

Architecture Reference

Metrics (`metrics.py`)

Tracing (`tracing.py`)

Logging (`logging.py`)