phase-39+41: land promised artifacts — providers.toml, activation.rs, profiles/

Three PRD gaps closed in one coherent batch — all were cosmetic or
scaffold-shaped, now real files:

Phase 39 (PRD:57):
  + config/providers.toml — provider registry (name/base_url/auth/
    default_model) for ollama, ollama_cloud, openrouter. Commented
    stubs for gemini + claude pending adapter work. Secrets stay in
    /etc/lakehouse/secrets.toml or env, NEVER inline.

Phase 41 (PRD:115):
  + crates/vectord/src/activation.rs — ActivationTracker with the
    PRD-named single-flight guard ("refuse new activation if one is
    pending/running"). Per-profile granularity — activating A doesn't
    block B. 5 tests cover the full state machine. Handler body stays
    in service.rs for now; tracker usage integration is a follow-up.

Phase 41 (PRD:113):
  + crates/shared/src/profiles/ with 4 submodules:
      * execution.rs — `pub use crate::types::ModelProfile as
        ExecutionProfile` (backward-compat rename per PRD)
      * retrieval.rs — top_k, rerank_top_k, freshness cutoff,
        playbook boost, sensitivity-gate enforcement
      * memory.rs — playbook boost ceiling, history cap, doc
        staleness, auto-retire-on-failure
      * observer.rs — failure cluster size, alert cooldown, ring
        size, langfuse forwarding
    All fields `#[serde(default)]` so existing ModelProfile files
    load unchanged.

Still open from the same phases:
  - Gemini + Claude provider adapters (Phase 40 — 100-200 LOC each)
  - Full activate_profile handler extraction into activation.rs
    (Phase 41 — module-structure refactor)
  - Catalogd CRUD endpoints for retrieval/memory/observer profiles
    (Phase 41 — exists at list level, no create/update/delete yet)
  - truth/ repo-root directory for file-backed rules (Phase 42 —
    TOML loader + schema)
  - crates/validator crate (Phase 43 — full greenfield)

Workspace warnings still at 0. 5 new tests, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-04-24 13:32:40 -05:00
parent 021c1b557f
commit 2f1b9c9768
9 changed files with 352 additions and 0 deletions

62
config/providers.toml Normal file
View File

@ -0,0 +1,62 @@
# Phase 39: Provider Registry
#
# Per-provider base_url, auth scheme, and default model. The gateway's
# /v1/chat dispatcher reads this file at boot to populate its provider
# table. Secrets (API keys) come from /etc/lakehouse/secrets.toml or
# environment variables — NEVER inline a key here.
#
# Adding a new provider:
# 1. New [[provider]] block with name, base_url, auth, default_model
# 2. Matching adapter at crates/aibridge/src/providers/<name>.rs
# implementing the ProviderAdapter trait (chat + embed + unload)
# 3. Route arm in crates/gateway/src/v1/mod.rs matching on `name`
# 4. Model-prefix routing hint in resolve_provider() if the provider
# uses an "<name>/..." model prefix (e.g. "openrouter/...")
[[provider]]
name = "ollama"
base_url = "http://localhost:3200"
auth = "none"
default_model = "qwen3.5:latest"
# Hot-path local inference. No bearer needed — Python sidecar on
# localhost handles the Ollama API. Model names are bare
# (e.g. "qwen3.5:latest", not "ollama/qwen3.5:latest").
[[provider]]
name = "ollama_cloud"
base_url = "https://ollama.com"
auth = "bearer"
auth_env = "OLLAMA_CLOUD_KEY"
default_model = "gpt-oss:120b"
# Cloud-tier Ollama. Key resolved from OLLAMA_CLOUD_KEY env at gateway
# boot. Model-prefix routing: "cloud/<model>" auto-routes here
# (see gateway::v1::resolve_provider).
[[provider]]
name = "openrouter"
base_url = "https://openrouter.ai/api/v1"
auth = "bearer"
auth_env = "OPENROUTER_API_KEY"
auth_fallback_files = ["/home/profit/.env", "/root/llm_team_config.json"]
default_model = "openai/gpt-oss-120b:free"
# Multi-provider gateway. Covers Anthropic, Google, OpenAI, MiniMax,
# Qwen, Gemma, etc. Key resolved via crates/gateway/src/v1/openrouter.rs
# resolve_openrouter_key() — env first, then fallback files.
# Model-prefix routing: "openrouter/<vendor>/<model>" auto-routes here,
# prefix stripped before upstream call.
# Planned (Phase 40 long-horizon — adapters not yet shipped):
#
# [[provider]]
# name = "gemini"
# base_url = "https://generativelanguage.googleapis.com/v1beta"
# auth = "api_key_query"
# auth_env = "GEMINI_API_KEY"
# default_model = "gemini-2.0-flash"
#
# [[provider]]
# name = "claude"
# base_url = "https://api.anthropic.com/v1"
# auth = "x_api_key"
# auth_env = "ANTHROPIC_API_KEY"
# default_model = "claude-3-5-sonnet-latest"

View File

@ -5,3 +5,4 @@ pub mod config;
pub mod pii; pub mod pii;
pub mod secrets; pub mod secrets;
pub mod model_matrix; pub mod model_matrix;
pub mod profiles;

View File

@ -0,0 +1,14 @@
//! ExecutionProfile — the Phase 41 rename of Phase 17's ModelProfile.
//!
//! Carries what's needed to RUN inference: model tag, dataset bindings,
//! HNSW config, embed model, bucket binding. Today this is a type
//! alias over `crate::types::ModelProfile` — the PRD's
//! "Backward compat: ModelProfile still loads, aliased to
//! ExecutionProfile" line, honored literally.
//!
//! When the migration off the old name finishes, this file can either
//! absorb the full struct definition or continue as an alias. Callers
//! should reference `ExecutionProfile` going forward; `ModelProfile`
//! stays exported from `types` for on-disk schema compat.
pub use crate::types::ModelProfile as ExecutionProfile;

View File

@ -0,0 +1,38 @@
//! MemoryProfile — how the agent's execution memory is kept.
//!
//! Phase 41 decomposition: the Phase 19 playbook_memory + Phase 26
//! successful_playbooks + Phase 45 doc_refs all need per-profile
//! tuning. Rather than bolt those onto ExecutionProfile, they live
//! here so a "thin" execution profile can reuse a "fat" memory
//! profile and vice versa.
use serde::{Deserialize, Serialize};
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct MemoryProfile {
pub id: String,
#[serde(default)]
pub description: String,
/// Phase 19: ceiling for playbook_memory boost on retrieval. 0
/// disables the boost entirely.
#[serde(default = "default_boost_ceiling")]
pub playbook_boost_ceiling: f32,
/// Phase 26: max history entries retained before rotation.
#[serde(default = "default_history_cap")]
pub history_cap: usize,
/// Phase 45: stale threshold for doc_refs before drift check
/// fires (hours).
#[serde(default = "default_stale_hours")]
pub doc_stale_hours: u32,
/// Phase 28: auto-retire playbooks that fail 3+ consecutive runs.
#[serde(default = "default_true")]
pub auto_retire_on_failure: bool,
pub created_at: chrono::DateTime<chrono::Utc>,
#[serde(default)]
pub created_by: String,
}
fn default_boost_ceiling() -> f32 { 0.35 }
fn default_history_cap() -> usize { 1000 }
fn default_stale_hours() -> u32 { 168 } // one week
fn default_true() -> bool { true }

View File

@ -0,0 +1,28 @@
//! Phase 41 profile types.
//!
//! The existing `ModelProfile` (Phase 17) is aliased as
//! `ExecutionProfile` here — it continues to carry the model +
//! bindings + HNSW config needed to run inference. Three new profile
//! types land alongside: `RetrievalProfile`, `MemoryProfile`,
//! `ObserverProfile` — each owns a distinct slice of what used to be
//! bundled.
//!
//! Backward-compat rule (PRD Phase 41): existing `ModelProfile` on
//! disk continues to deserialize unchanged. New fields on the new
//! profile types are `#[serde(default)]` so old payloads load with
//! empty defaults.
//!
//! These are the canonical shapes — downstream code converts via
//! `From<ModelProfile> for ExecutionProfile` (they're the same struct
//! today, just named differently) and constructs the other three
//! as needed.
pub mod execution;
pub mod retrieval;
pub mod memory;
pub mod observer;
pub use execution::ExecutionProfile;
pub use memory::MemoryProfile;
pub use observer::ObserverProfile;
pub use retrieval::RetrievalProfile;

View File

@ -0,0 +1,38 @@
//! ObserverProfile — how loudly the observer logs this workload.
//!
//! Phase 41 decomposition: the observer's alert thresholds, escalation
//! cadence, and log retention need per-workload tuning. Hot-path
//! staffing workflows want aggressive alerting; batch backfills want
//! quieter. This profile is read by mcp-server/observer.ts at
//! activation-time.
use serde::{Deserialize, Serialize};
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct ObserverProfile {
pub id: String,
#[serde(default)]
pub description: String,
/// How many consecutive failures trigger a cluster escalation to
/// LLM Team `/v1/chat` (qwen3-coder:480b).
#[serde(default = "default_failure_cluster_size")]
pub failure_cluster_size: u32,
/// Minimum seconds between alert emails for the same sig_hash.
/// Prevents alert storms during a regression.
#[serde(default = "default_alert_cooldown")]
pub alert_cooldown_secs: u32,
/// Observer ring buffer size. Older events fall off when full.
#[serde(default = "default_ring_size")]
pub ring_size: usize,
/// Whether to forward events to external Langfuse
/// (/v1/langfuse_trace). Off by default.
#[serde(default)]
pub forward_to_langfuse: bool,
pub created_at: chrono::DateTime<chrono::Utc>,
#[serde(default)]
pub created_by: String,
}
fn default_failure_cluster_size() -> u32 { 3 }
fn default_alert_cooldown() -> u32 { 300 } // 5 minutes
fn default_ring_size() -> usize { 2000 }

View File

@ -0,0 +1,52 @@
//! RetrievalProfile — what + how the agent reaches into memory.
//!
//! Phase 41 decomposition: the old ModelProfile bundled "what dataset
//! can I read" (bound_datasets) AND "how do I rank results"
//! (hnsw_config) with the model tag. Retrieval concerns split out here
//! so a profile can swap its retrieval strategy without re-activating
//! the model.
//!
//! Fields chosen for what's actually varied per-workload today:
//! - `top_k` / `rerank_top_k` — how many hits to fetch + rerank
//! - `freshness_cutoff_days` — Phase 45 doc-drift uses this
//! - `boost_playbook_memory` — Phase 19 meta-index feedback
//! - `enforce_sensitivity_gates` — Phase 13 access-control integration
//!
//! All fields are `#[serde(default)]` so loading a profile file that
//! predates Phase 41 works without migration.
use serde::{Deserialize, Serialize};
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct RetrievalProfile {
/// Unique id — slug form, separate namespace from ExecutionProfile.
pub id: String,
/// Free-text operator description.
#[serde(default)]
pub description: String,
/// Default top-K for /vectors/search + /vectors/hybrid.
#[serde(default = "default_top_k")]
pub top_k: u32,
/// How many of the top-K to pass through the reranker. 0 disables
/// reranking for this profile.
#[serde(default = "default_rerank_top_k")]
pub rerank_top_k: u32,
/// Don't consider playbooks / docs older than this (days). 0 or
/// absent = no freshness filter.
#[serde(default)]
pub freshness_cutoff_days: u32,
/// Phase 19: boost workers/results by playbook_memory similarity.
#[serde(default)]
pub boost_playbook_memory: bool,
/// Phase 13: apply access-control masking on sensitive columns.
/// Default on — safety-first.
#[serde(default = "default_true")]
pub enforce_sensitivity_gates: bool,
pub created_at: chrono::DateTime<chrono::Utc>,
#[serde(default)]
pub created_by: String,
}
fn default_top_k() -> u32 { 10 }
fn default_rerank_top_k() -> u32 { 5 }
fn default_true() -> bool { true }

View File

@ -0,0 +1,118 @@
//! Profile activation tracking (Phase 41 PRD).
//!
//! Phase 41 PRD called out `crates/vectord/src/activation.rs` with
//! `ActivationTracker` + background-job pattern. The activation
//! handler itself lives in `service.rs::activate_profile` (200+ lines
//! of warm-up + bucket binding that's wired to VectorState); this
//! module provides the type the PRD named and a single-flight guard
//! that satisfies the PRD gate "refuse new activation if one is
//! pending/running."
//!
//! Handler extraction (moving the body of `activate_profile` here)
//! is deliberately NOT in this commit — it's a module-structure
//! refactor, not a semantic change. When that lands, the inline
//! `tokio::spawn` in `service.rs` moves into `ActivationTracker::start`
//! and the HTTP handler shrinks to ~20 lines of validate + start +
//! respond-202.
use std::collections::HashMap;
use std::sync::Arc;
use tokio::sync::RwLock;
/// Tracks in-flight profile activations. The PRD's "single-flight guard"
/// lives here: callers check `is_pending` before starting a new activation
/// and register via `mark_pending` if they proceed. On completion, they
/// call `mark_complete` so the next caller can start.
///
/// Per-profile granularity — activating profile A doesn't block B.
#[derive(Clone, Default)]
pub struct ActivationTracker {
pending: Arc<RwLock<HashMap<String, String>>>, // profile_id → job_id
}
impl ActivationTracker {
pub fn new() -> Self {
Self::default()
}
/// Check if a profile has an activation already running. Returns the
/// in-flight job_id if so. Safe to call without holding a lock.
pub async fn is_pending(&self, profile_id: &str) -> Option<String> {
self.pending.read().await.get(profile_id).cloned()
}
/// Register a new activation as pending. Returns false if an
/// activation is already running for the same profile (caller should
/// return 409 Conflict or surface the existing job_id). Returns true
/// on successful registration.
pub async fn mark_pending(&self, profile_id: &str, job_id: &str) -> bool {
let mut guard = self.pending.write().await;
if guard.contains_key(profile_id) {
return false;
}
guard.insert(profile_id.to_string(), job_id.to_string());
true
}
/// Remove the pending marker when activation finishes (success OR
/// failure — both free the slot for the next caller).
pub async fn mark_complete(&self, profile_id: &str) {
self.pending.write().await.remove(profile_id);
}
/// How many activations are currently in-flight across all profiles.
pub async fn in_flight_count(&self) -> usize {
self.pending.read().await.len()
}
}
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn empty_tracker_has_no_pending() {
let t = ActivationTracker::new();
assert_eq!(t.in_flight_count().await, 0);
assert!(t.is_pending("any-profile").await.is_none());
}
#[tokio::test]
async fn mark_pending_registers_the_job() {
let t = ActivationTracker::new();
assert!(t.mark_pending("profile-A", "job-1").await);
assert_eq!(t.in_flight_count().await, 1);
assert_eq!(t.is_pending("profile-A").await, Some("job-1".into()));
}
#[tokio::test]
async fn single_flight_guard_refuses_second_activation_same_profile() {
// PRD Phase 41 gate: "refuse new activation if one is
// pending/running." Same profile twice → second call returns
// false, caller must surface the in-flight job_id.
let t = ActivationTracker::new();
assert!(t.mark_pending("profile-A", "job-1").await);
assert!(!t.mark_pending("profile-A", "job-2").await);
// Still the first job — second registration didn't overwrite.
assert_eq!(t.is_pending("profile-A").await, Some("job-1".into()));
}
#[tokio::test]
async fn different_profiles_dont_block_each_other() {
// Per-profile granularity — activating A doesn't block B.
let t = ActivationTracker::new();
assert!(t.mark_pending("profile-A", "job-1").await);
assert!(t.mark_pending("profile-B", "job-2").await);
assert_eq!(t.in_flight_count().await, 2);
}
#[tokio::test]
async fn mark_complete_frees_the_slot() {
let t = ActivationTracker::new();
t.mark_pending("profile-A", "job-1").await;
t.mark_complete("profile-A").await;
assert_eq!(t.in_flight_count().await, 0);
// Next activation can now proceed.
assert!(t.mark_pending("profile-A", "job-2").await);
}
}

View File

@ -7,6 +7,7 @@ pub mod harness;
pub mod hnsw; pub mod hnsw;
pub mod index_registry; pub mod index_registry;
pub mod jobs; pub mod jobs;
pub mod activation;
pub mod playbook_memory; pub mod playbook_memory;
pub mod pathway_memory; pub mod pathway_memory;
pub mod doc_drift; pub mod doc_drift;