profit a6f12e2609 Phase 21 Rust port + Phase 27 playbook versioning + doc-sync
Phase 21 — Rust port of scratchpad + tree-split primitives (companion to
the 2026-04-21 TS shipment). New crates/aibridge modules:

  context.rs       — estimate_tokens (chars/4 ceil), context_window_for,
                     assert_context_budget returning a BudgetCheck with
                     numeric diagnostics on both success and overflow.
                     Windows table mirrors config/models.json.
  continuation.rs  — generate_continuable<G: TextGenerator>. Handles the
                     two failure modes: empty-response from thinking
                     models (geometric 2x budget backoff up to budget_cap)
                     and truncated-non-empty (continuation with partial
                     as scratchpad). is_structurally_complete balances
                     braces then JSON.parse-checks. Guards the degen case
                     "all retries empty, don't loop on empty partial".
  tree_split.rs    — generate_tree_split map->reduce with running
                     scratchpad. Per-shard + reduce-prompt go through
                     assert_context_budget first; loud-fails rather than
                     silently truncating. Oldest-digest-first scratchpad
                     truncation at scratchpad_budget (default 6000 t).

TextGenerator trait (native async-fn-in-trait, edition 2024). AiClient
implements it; ScriptedGenerator test double lets tests inject canned
sequences without a live Ollama.

GenerateRequest gained think: Option<bool> — forwards to sidecar for
per-call hidden-reasoning opt-out on hot-path JSON emitters. Three
existing callsites updated (rag.rs x2, service.rs hybrid answer).

Phase 27 — Playbook versioning. PlaybookEntry gained four optional
fields (all #[serde(default)] so pre-Phase-27 state loads as roots):

  version           u32, default 1
  parent_id         Option<String>, previous version's playbook_id
  superseded_at     Option<String>, set when newer version replaces
  superseded_by     Option<String>, the playbook_id that replaced

New methods:

  revise_entry(parent_id, new_entry) — appends new version, stamps
    superseded_at+superseded_by on parent, inherits parent_id and sets
    version = parent + 1 on the new entry. Rejects revising a retired
    or already-superseded parent (tip-of-chain is the only valid
    revise target).
  history(playbook_id) — returns full chain root->tip from any node.
    Walks parent_id back to root, then superseded_by forward to tip.
    Cycle-safe.

Superseded entries excluded from boost (same rule as retired): filter
in compute_boost_for_filtered_with_role (both active-entries prefilter
and geo-filtered path), rebuild_geo_index, and upsert_entry's existing-
idx search. status_counts returns (total, retired, superseded, failures);
/status JSON reports active = total - retired - superseded.

Endpoints:
  POST /vectors/playbook_memory/revise
  GET  /vectors/playbook_memory/history/{id}

Doc-sync — PHASES.md + PRD.md drifted from git after Phases 24-26
shipped. Fixes applied:

  - Phase 24 marked shipped (commit b95dd86) with detail of observer
    HTTP ingest + scenario outcome streaming. PRD "NOT YET WIRED"
    rewritten to reflect shipped state.
  - Phase 25 (validity windows, commit e0a843d) added to PHASES +
    PRD.
  - Phase 26 (Mem0 upsert + Letta hot cache, commit 640db8c) added.
  - Phase 27 entry added to both docs.
  - Phase 19.6 time decay corrected: was documented as "deferred",
    actually wired via BOOST_HALF_LIFE_DAYS = 30.0 in playbook_memory.rs.
  - Phase E/Phase 8 tombstone-at-compaction limit note updated —
    Phase E.2 closed it.

Tests: 8 new version_tests in vectord (chain-metadata stamping,
retired/superseded parent rejection, boost exclusion, history from
root/tip/middle, legacy default round-trip, status counts). 25 new
aibridge tests (context/continuation/tree_split). Workspace total
145 green (was 120).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:40:49 -05:00

2767 lines
104 KiB
Rust
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

use axum::{
Json, Router,
extract::{Path, Query, State},
http::StatusCode,
response::IntoResponse,
routing::{get, post},
};
use object_store::ObjectStore;
use serde::{Deserialize, Serialize};
use std::sync::Arc;
use aibridge::client::{AiClient, EmbedRequest, GenerateRequest};
use catalogd::registry::Registry as CatalogRegistry;
use storaged::registry::BucketRegistry;
use crate::{agent, autotune, chunker, embedding_cache, harness, hnsw, index_registry, jobs, lance_backend, playbook_memory, promotion, rag, refresh, search, store, supervisor, trial};
#[derive(Clone)]
pub struct VectorState {
pub store: Arc<dyn ObjectStore>,
pub ai_client: AiClient,
pub job_tracker: jobs::JobTracker,
pub index_registry: index_registry::IndexRegistry,
pub hnsw_store: hnsw::HnswStore,
pub embedding_cache: embedding_cache::EmbeddingCache,
pub trial_journal: trial::TrialJournal,
/// Federation-aware harness store — resolves eval artifacts to each
/// index's recorded bucket, falling back to primary for legacy evals.
pub harness_store: harness::HarnessStore,
/// Catalog registry — needed by the Phase C refresh path to mark/clear
/// staleness and look up dataset manifests.
pub catalog: CatalogRegistry,
/// Phase 16: promoted HNSW configs. Activation + autotune read/write here.
pub promotion_registry: promotion::PromotionRegistry,
/// Phase 16.2: handle to the background autotune agent. Always
/// present — if the agent is disabled in config, the handle drops
/// incoming triggers silently.
pub agent_handle: agent::AgentHandle,
/// Phase B (federation layer 2): bucket registry for per-profile
/// bucket auto-provisioning on activation.
pub bucket_registry: Arc<BucketRegistry>,
/// Phase C (two-profile VRAM gate): tracks which profile is currently
/// "active" on the GPU. Singleton — one profile at a time holds its
/// model in VRAM. Swapping profiles with different ollama_name unloads
/// the previous one (keep_alive=0) before preloading the new one.
///
/// `None` = no profile has been activated this session; any first
/// activation just preloads and takes the slot.
pub active_profile: Arc<tokio::sync::RwLock<Option<ActiveProfileSlot>>>,
/// ADR-019 hybrid: handles to Lance datasets keyed by index name.
/// Lazy-created on first /vectors/lance/* call.
pub lance: lance_backend::LanceRegistry,
/// Phase 19 — meta-index feedback. Embeds past successful_playbooks
/// and, when `use_playbook_memory` is set on /vectors/hybrid, boosts
/// workers that were actually filled in semantically-similar past ops.
pub playbook_memory: playbook_memory::PlaybookMemory,
}
/// What the active-profile singleton records. Narrow — we don't need the
/// full ModelProfile here, just enough to know what to unload on swap.
#[derive(Debug, Clone, Serialize)]
pub struct ActiveProfileSlot {
pub profile_id: String,
pub ollama_name: String,
pub activated_at: chrono::DateTime<chrono::Utc>,
}
pub fn router(state: VectorState) -> Router {
Router::new()
.route("/health", get(health))
.route("/index", post(create_index))
.route("/indexes", get(list_indexes))
.route("/indexes/{name}", get(get_index_meta))
.route("/indexes/{name}/bucket", axum::routing::patch(migrate_index_bucket))
.route("/jobs", get(list_jobs))
.route("/jobs/{id}", get(get_job))
.route("/search", post(search_index))
.route("/rag", post(rag_query))
.route("/hybrid", post(hybrid_search))
.route("/hnsw/build", post(build_hnsw))
.route("/hnsw/search", post(search_hnsw))
.route("/hnsw/list", get(list_hnsw))
// Trial system — parameterized tuning loop
.route("/hnsw/trial", post(run_trial))
.route("/hnsw/trials/{index_name}", get(list_trials))
.route("/hnsw/trials/{index_name}/best", get(best_trial))
// Eval sets
.route("/hnsw/evals", get(list_evals))
.route("/hnsw/evals/{name}", get(get_eval).put(put_eval))
.route("/hnsw/evals/{name}/autogen", post(autogen_eval))
// Cache management
.route("/hnsw/cache/stats", get(cache_stats))
.route("/hnsw/cache/{index_name}", axum::routing::delete(cache_evict))
// Phase C: embedding refresh
.route("/refresh/{dataset_name}", post(refresh_dataset))
.route("/stale", get(list_stale))
// Phase 17: profile activation — pre-load caches + HNSW for this
// model's bound data. First search after activate is warm.
.route("/profile/{id}/activate", post(activate_profile))
.route("/profile/{id}/deactivate", post(deactivate_profile))
.route("/profile/{id}/search", post(profile_scoped_search))
// Phase 17 VRAM gate: which profile currently owns the GPU?
.route("/profile/active", get(get_active_profile))
// Phase 16: promotion + autotune
.route("/hnsw/promote/{index}/{trial_id}", post(promote_trial))
.route("/hnsw/rollback/{index}", post(rollback_promotion))
.route("/hnsw/promoted/{index}", get(get_promoted))
.route("/hnsw/autotune", post(run_autotune_endpoint))
// Phase 16.2: background autotune agent
.route("/agent/status", get(agent_status))
.route("/agent/stop", post(agent_stop))
.route("/agent/enqueue/{index_name}", post(agent_enqueue))
// ADR-019: Lance hybrid backend
.route("/lance/migrate/{index_name}", post(lance_migrate))
.route("/lance/index/{index_name}", post(lance_build_index))
.route("/lance/search/{index_name}", post(lance_search))
.route("/lance/doc/{index_name}/{doc_id}", get(lance_get_doc))
.route("/lance/append/{index_name}", post(lance_append))
.route("/lance/stats/{index_name}", get(lance_stats))
.route("/lance/scalar-index/{index_name}/{column}", post(lance_build_scalar_index))
.route("/lance/recall/{index_name}", post(lance_recall_harness))
// Phase 19: playbook memory — the meta-index feedback loop
.route("/playbook_memory/rebuild", post(rebuild_playbook_memory))
.route("/playbook_memory/stats", get(playbook_memory_stats))
.route("/playbook_memory/seed", post(seed_playbook_memory))
.route("/playbook_memory/persist_sql", post(persist_playbook_memory_sql))
.route("/playbook_memory/patterns", post(discover_playbook_patterns))
.route("/playbook_memory/mark_failed", post(mark_playbook_failed))
.route("/playbook_memory/retire", post(retire_playbook_memory))
.route("/playbook_memory/revise", post(revise_playbook_memory))
.route("/playbook_memory/history/{id}", get(playbook_memory_history))
.route("/playbook_memory/status", get(playbook_memory_status))
.with_state(state)
}
async fn health() -> &'static str {
"vectord ok"
}
// --- Background Index Creation ---
#[derive(Deserialize)]
struct CreateIndexRequest {
index_name: String,
source: String,
documents: Vec<DocInput>,
chunk_size: Option<usize>,
overlap: Option<usize>,
/// Federation layer 2: optional bucket to hold this index's trial
/// journal + promotion file. Defaults to "primary" — pre-existing
/// clients that don't know about federation keep working unchanged.
#[serde(default)]
bucket: Option<String>,
}
#[derive(Deserialize)]
struct DocInput {
id: String,
text: String,
}
#[derive(Serialize)]
struct CreateIndexResponse {
job_id: String,
index_name: String,
documents: usize,
chunks: usize,
message: String,
}
async fn create_index(
State(state): State<VectorState>,
Json(req): Json<CreateIndexRequest>,
) -> impl IntoResponse {
let chunk_size = req.chunk_size.unwrap_or(500);
let overlap = req.overlap.unwrap_or(50);
// Chunk synchronously (fast)
let doc_ids: Vec<String> = req.documents.iter().map(|d| d.id.clone()).collect();
let texts: Vec<String> = req.documents.iter().map(|d| d.text.clone()).collect();
let chunks = chunker::chunk_column(&req.source, &doc_ids, &texts, chunk_size, overlap);
if chunks.is_empty() {
return Err((StatusCode::BAD_REQUEST, "no text to index".to_string()));
}
let n_docs = req.documents.len();
let n_chunks = chunks.len();
let index_name = req.index_name.clone();
let bucket = req.bucket.clone().unwrap_or_else(|| "primary".to_string());
// Create job and return immediately
let job_id = state.job_tracker.create(&index_name, n_chunks).await;
tracing::info!("job {job_id}: indexing '{}' — {} docs → {} chunks (background)", index_name, n_docs, n_chunks);
// Spawn supervised dual-pipeline embedding
let tracker = state.job_tracker.clone();
let ai_client = state.ai_client.clone();
let obj_store = state.store.clone();
let registry = state.index_registry.clone();
let jid = job_id.clone();
let source_name = req.source.clone();
let idx_name = req.index_name.clone();
tokio::spawn(async move {
let start_time = std::time::Instant::now();
let config = supervisor::SupervisorConfig::default();
let result = supervisor::run_supervised(
&jid, &idx_name, chunks, &ai_client, &obj_store, &tracker, config,
).await;
match result {
Ok(key) => {
let elapsed = start_time.elapsed().as_secs_f32();
let rate = if elapsed > 0.0 { n_chunks as f32 / elapsed } else { 0.0 };
// Register index metadata with model version info
let meta = index_registry::IndexMeta {
index_name: idx_name.clone(),
source: source_name,
model_name: "nomic-embed-text".to_string(), // from sidecar config
model_version: "latest".to_string(),
dimensions: 768,
chunk_count: n_chunks,
doc_count: n_docs,
chunk_size: chunk_size,
overlap: overlap,
storage_key: key.clone(),
created_at: chrono::Utc::now(),
build_time_secs: elapsed,
chunks_per_sec: rate,
bucket: bucket.clone(),
vector_backend: shared::types::VectorBackend::Parquet,
id_prefix: None,
};
let _ = registry.register(meta).await;
tracker.complete(&jid, key).await;
tracing::info!("job {jid}: completed — {n_chunks} chunks in {elapsed:.0}s ({rate:.0}/sec)");
}
Err(e) => {
tracker.fail(&jid, e.clone()).await;
tracing::error!("job {jid}: failed — {e}");
}
}
});
Ok((StatusCode::ACCEPTED, Json(CreateIndexResponse {
job_id,
index_name: req.index_name,
documents: n_docs,
chunks: n_chunks,
message: format!("embedding {} chunks in background — poll /vectors/jobs/{{id}} for progress", n_chunks),
})))
}
// --- Index Registry ---
#[derive(Deserialize)]
struct IndexListQuery {
source: Option<String>,
model: Option<String>,
}
async fn list_indexes(
State(state): State<VectorState>,
Query(q): Query<IndexListQuery>,
) -> impl IntoResponse {
let indexes = state.index_registry.list(q.source.as_deref(), q.model.as_deref()).await;
Json(indexes)
}
async fn get_index_meta(
State(state): State<VectorState>,
Path(name): Path<String>,
) -> impl IntoResponse {
match state.index_registry.get(&name).await {
Some(meta) => Ok(Json(meta)),
None => Err((StatusCode::NOT_FOUND, format!("index not found: {name}"))),
}
}
#[derive(Deserialize)]
struct MigrateBucketRequest {
dest_bucket: String,
/// If true, delete artifacts from the source bucket after the pointer
/// flip. Default false — keeping source copies means a failed migration
/// is recoverable by editing IndexMeta.bucket back, and a successful
/// migration leaves inspectable forensics until an operator sweeps.
#[serde(default)]
delete_source: bool,
}
#[derive(Serialize)]
struct MigrateBucketReport {
index_name: String,
source_bucket: String,
dest_bucket: String,
/// Artifact keys that were copied (or attempted). Order follows copy order.
copied: Vec<String>,
/// Artifact prefixes that had nothing to copy (optional files missing,
/// trial journal empty, etc).
skipped: Vec<String>,
/// Subset of `copied` that was subsequently deleted from the source.
deleted_source: Vec<String>,
duration_secs: f32,
}
/// Move an index's artifacts from its current bucket to `dest_bucket`.
/// Parquet-backed indexes only — Lance migration needs URI rewriting that
/// isn't in scope for this endpoint. Copies the vector data, trial journal,
/// promotion file, and auto-generated harness; updates `IndexMeta.bucket`
/// last so a mid-flight failure leaves the index still usable at its
/// original location. Evicts the `EmbeddingCache` entry so the next load
/// re-reads from the new bucket.
async fn migrate_index_bucket(
State(state): State<VectorState>,
Path(name): Path<String>,
Json(req): Json<MigrateBucketRequest>,
) -> Result<Json<MigrateBucketReport>, (StatusCode, String)> {
let t0 = std::time::Instant::now();
let mut meta = state
.index_registry
.get(&name)
.await
.ok_or_else(|| (StatusCode::NOT_FOUND, format!("index '{name}' not found")))?;
if meta.vector_backend == shared::types::VectorBackend::Lance {
return Err((
StatusCode::BAD_REQUEST,
"Lance-backed indexes cannot be migrated via this endpoint — \
Lance URIs are bucket-specific; a separate migrate_lance tool \
is needed".into(),
));
}
if !state.bucket_registry.contains(&req.dest_bucket) {
return Err((
StatusCode::BAD_REQUEST,
format!("dest bucket '{}' not registered", req.dest_bucket),
));
}
let source_bucket = meta.bucket.clone();
if source_bucket == req.dest_bucket {
return Err((
StatusCode::BAD_REQUEST,
format!("source and dest are both '{source_bucket}' — nothing to migrate"),
));
}
let src = state
.bucket_registry
.get(&source_bucket)
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?;
let dst = state
.bucket_registry
.get(&req.dest_bucket)
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?;
let mut copied: Vec<String> = Vec::new();
let mut skipped: Vec<String> = Vec::new();
// 1. Vector data (single parquet file for this backend).
copy_key(&src, &dst, &meta.storage_key)
.await
.map_err(|e| {
(StatusCode::INTERNAL_SERVER_ERROR,
format!("copy {}: {e}", meta.storage_key))
})?;
copied.push(meta.storage_key.clone());
// 2. Trial journal batches — per-index directory of JSONL files.
let trial_prefix = format!("_hnsw_trials/{name}/");
let trial_keys = storaged::ops::list(&src, Some(&trial_prefix))
.await
.unwrap_or_default();
if trial_keys.is_empty() {
skipped.push(trial_prefix);
}
for k in &trial_keys {
copy_key(&src, &dst, k)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, format!("copy {k}: {e}")))?;
copied.push(k.clone());
}
// 3. Promotion file (optional — absent for never-promoted indexes).
let promo_key = format!("_hnsw_promotions/{name}.json");
match copy_key(&src, &dst, &promo_key).await {
Ok(()) => copied.push(promo_key),
Err(_) => skipped.push(promo_key),
}
// 4. Auto-generated harness (optional — absent if agent never ran).
let harness_key = format!("_hnsw_evals/{name}_auto.json");
match copy_key(&src, &dst, &harness_key).await {
Ok(()) => copied.push(harness_key),
Err(_) => skipped.push(harness_key),
}
// 5. Pointer flip — IndexMeta.bucket now points at destination. This
// is the commit point; earlier failures leave copies in dest but the
// index still usable at source.
meta.bucket = req.dest_bucket.clone();
state
.index_registry
.register(meta)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, format!("update meta: {e}")))?;
// 6. Cache eviction — next load reads the new bucket's parquet.
state.embedding_cache.evict(&name).await;
// 7. Optional source cleanup.
let mut deleted_source: Vec<String> = Vec::new();
if req.delete_source {
for k in &copied {
if storaged::ops::delete(&src, k).await.is_ok() {
deleted_source.push(k.clone());
}
}
}
Ok(Json(MigrateBucketReport {
index_name: name,
source_bucket,
dest_bucket: req.dest_bucket,
copied,
skipped,
deleted_source,
duration_secs: t0.elapsed().as_secs_f32(),
}))
}
/// Stream a single object from one bucket to another. Uses the existing
/// `storaged::ops` get + put primitives — no native copy in object_store
/// across heterogeneous backends (local ↔ S3), so an in-memory hop is
/// unavoidable. Bounded by individual object size, which for our parquet
/// + jsonl artifacts tops out around a few hundred MB.
async fn copy_key(
src: &Arc<dyn ObjectStore>,
dst: &Arc<dyn ObjectStore>,
key: &str,
) -> Result<(), String> {
let data = storaged::ops::get(src, key).await?;
storaged::ops::put(dst, key, data).await
}
// --- unused legacy function below, kept for reference ---
#[allow(dead_code)]
/// Legacy single-pipeline embedding (replaced by supervisor).
async fn _run_embedding_job_legacy(
job_id: &str,
index_name: &str,
chunks: &[chunker::TextChunk],
ai_client: &AiClient,
store: &Arc<dyn ObjectStore>,
tracker: &jobs::JobTracker,
) -> Result<String, String> {
let batch_size = 32;
let mut all_vectors: Vec<Vec<f64>> = Vec::new();
let start = std::time::Instant::now();
for (i, batch) in chunks.chunks(batch_size).enumerate() {
let texts: Vec<String> = batch.iter().map(|c| c.text.clone()).collect();
let embed_resp = ai_client.embed(EmbedRequest {
texts,
model: None,
}).await.map_err(|e| format!("embed batch {} error: {e}", i))?;
all_vectors.extend(embed_resp.embeddings);
// Update progress
let elapsed = start.elapsed().as_secs_f32();
let rate = if elapsed > 0.0 { all_vectors.len() as f32 / elapsed } else { 0.0 };
tracker.update_progress(job_id, all_vectors.len(), rate).await;
// Log every 100 batches
if (i + 1) % 100 == 0 {
let pct = (all_vectors.len() as f32 / chunks.len() as f32) * 100.0;
let eta = if rate > 0.0 { (chunks.len() - all_vectors.len()) as f32 / rate } else { 0.0 };
tracing::info!("job {job_id}: {}/{} chunks ({pct:.0}%), {rate:.0}/sec, ETA {eta:.0}s",
all_vectors.len(), chunks.len());
}
}
// Store
let key = store::store_embeddings(store, index_name, chunks, &all_vectors).await?;
Ok(key)
}
// --- Job Status ---
async fn list_jobs(State(state): State<VectorState>) -> impl IntoResponse {
let jobs = state.job_tracker.list().await;
Json(jobs)
}
async fn get_job(
State(state): State<VectorState>,
Path(id): Path<String>,
) -> impl IntoResponse {
match state.job_tracker.get(&id).await {
Some(job) => Ok(Json(job)),
None => Err((StatusCode::NOT_FOUND, format!("job not found: {id}"))),
}
}
// --- Search ---
#[derive(Deserialize)]
struct SearchRequest {
index_name: String,
query: String,
top_k: Option<usize>,
}
#[derive(Serialize)]
struct SearchResponse {
results: Vec<search::SearchResult>,
query: String,
}
async fn search_index(
State(state): State<VectorState>,
Json(req): Json<SearchRequest>,
) -> impl IntoResponse {
let top_k = req.top_k.unwrap_or(5);
let embed_resp = state.ai_client.embed(EmbedRequest {
texts: vec![req.query.clone()],
model: None,
}).await.map_err(|e| (StatusCode::BAD_GATEWAY, format!("embed error: {e}")))?;
if embed_resp.embeddings.is_empty() {
return Err((StatusCode::BAD_GATEWAY, "no embedding returned".to_string()));
}
let query_vec: Vec<f32> = embed_resp.embeddings[0].iter().map(|&x| x as f32).collect();
let embeddings = store::load_embeddings(&state.store, &req.index_name)
.await
.map_err(|e| (StatusCode::NOT_FOUND, format!("index not found: {e}")))?;
let results = search::search(&query_vec, &embeddings, top_k);
Ok(Json(SearchResponse {
results,
query: req.query,
}))
}
// --- RAG ---
#[derive(Deserialize)]
struct RagRequest {
index_name: String,
question: String,
top_k: Option<usize>,
}
async fn rag_query(
State(state): State<VectorState>,
Json(req): Json<RagRequest>,
) -> impl IntoResponse {
let top_k = req.top_k.unwrap_or(5);
match rag::query(&req.question, &req.index_name, top_k, &state.store, &state.ai_client).await {
Ok(resp) => Ok(Json(resp)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
// --- Hybrid SQL+Vector Search ---
//
// The fix for the core RAG gap: vector search alone can't do structured
// filtering (state, role, reliability threshold). SQL alone can't do
// semantic similarity ("who could handle this kind of work"). Hybrid
// does both: SQL narrows to structurally-valid candidates, vector
// ranks them by semantic relevance, LLM generates from verified context.
#[derive(Deserialize)]
struct HybridRequest {
/// Natural language question — used for embedding + LLM generation.
question: String,
/// Vector index to search against.
index_name: String,
/// SQL WHERE clause to pre-filter. Applied against the index's source
/// dataset. Example: "state = 'IL' AND reliability > 0.8"
/// Safety: runs through DataFusion's parser so injection is bounded
/// by what DataFusion accepts (no DDL, no writes).
#[serde(default)]
sql_filter: Option<String>,
/// Dataset to run the SQL filter against. Defaults to the index's
/// source if omitted.
#[serde(default)]
filter_dataset: Option<String>,
/// Column in the SQL result that maps to the vector index's doc_id.
/// Default: "worker_id" (for the Ethereal dataset) or "candidate_id".
#[serde(default)]
id_column: Option<String>,
#[serde(default = "default_top_k")]
top_k: usize,
/// If true, generate an LLM answer from the matched context.
/// If false, just return the ranked matches (faster, no Ollama gen).
#[serde(default = "default_true")]
generate: bool,
/// Phase 19: consult `playbook_memory` and boost workers that past
/// similar playbooks successfully filled. Off by default so current
/// callers keep deterministic ranking; opt-in unlocks the feedback.
#[serde(default)]
use_playbook_memory: bool,
/// Number of past playbooks to consider when `use_playbook_memory`
/// is on. Ignored otherwise. Defaults to 5.
#[serde(default)]
playbook_memory_k: Option<usize>,
}
fn default_true() -> bool { true }
#[derive(serde::Serialize)]
struct HybridResponse {
question: String,
sql_filter: Option<String>,
sql_matches: usize,
vector_reranked: usize,
method: String,
answer: Option<String>,
sources: Vec<HybridSource>,
duration_ms: u64,
}
#[derive(serde::Serialize)]
struct HybridSource {
doc_id: String,
chunk_text: String,
score: f32,
sql_verified: bool,
/// Phase 19: how much the playbook_memory boost lifted this hit's
/// score. 0.0 when `use_playbook_memory=false` or no past playbook
/// endorsed this worker.
#[serde(default, skip_serializing_if = "is_zero")]
playbook_boost: f32,
/// playbook_ids whose endorsement contributed to `playbook_boost`.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
playbook_citations: Vec<String>,
}
fn is_zero(x: &f32) -> bool { x.abs() < 1e-6 }
async fn hybrid_search(
State(state): State<VectorState>,
Json(req): Json<HybridRequest>,
) -> impl IntoResponse {
let t0 = std::time::Instant::now();
// Step 1: If SQL filter provided, run it to get the set of valid IDs.
let valid_ids: Option<std::collections::HashSet<String>> = if let Some(ref filter) = req.sql_filter {
let index_meta = state.index_registry.get(&req.index_name).await;
let dataset = req.filter_dataset.clone()
.or_else(|| index_meta.map(|m| m.source.clone()))
.unwrap_or_else(|| req.index_name.clone());
let id_col = req.id_column.clone().unwrap_or_else(|| "worker_id".into());
let sql = format!("SELECT CAST({id_col} AS VARCHAR) AS id FROM {dataset} WHERE {filter}");
tracing::info!("hybrid: SQL filter → {sql}");
// Use queryd through the catalog — same engine as /query/sql
// Use the query engine to get JSON rows — avoids Arrow type
// wrangling across DataFusion's Utf8View/StringViewArray variants.
let engine = queryd::context::QueryEngine::new(
state.catalog.clone(),
state.bucket_registry.clone(),
queryd::cache::MemCache::new(0),
);
match engine.query(&sql).await {
Ok(batches) => {
use arrow::array::{Array, AsArray};
let mut ids = std::collections::HashSet::new();
for batch in &batches {
if let Some(col) = batch.column_by_name("id") {
// DataFusion CAST(x AS VARCHAR) → StringViewArray.
// Try StringView first, then String, then Int.
if let Some(arr) = col.as_string_view_opt() {
for i in 0..arr.len() {
if !arr.is_null(i) { ids.insert(arr.value(i).to_string()); }
}
} else if let Some(arr) = col.as_string_opt::<i32>() {
for i in 0..arr.len() {
if !arr.is_null(i) { ids.insert(arr.value(i).to_string()); }
}
} else {
// Fallback: try as Int32/Int64 (if CAST didn't happen)
if let Some(arr) = col.as_any().downcast_ref::<arrow::array::Int32Array>() {
for i in 0..arr.len() {
if !arr.is_null(i) { ids.insert(arr.value(i).to_string()); }
}
} else if let Some(arr) = col.as_any().downcast_ref::<arrow::array::Int64Array>() {
for i in 0..arr.len() {
if !arr.is_null(i) { ids.insert(arr.value(i).to_string()); }
}
}
}
}
}
tracing::info!("hybrid: SQL filter returned {} IDs", ids.len());
if ids.is_empty() { None } else { Some(ids) }
}
Err(e) => {
return Err((StatusCode::BAD_REQUEST, format!("SQL filter error: {e}")));
}
}
} else {
None
};
// Step 2: Vector search — embed question, search index.
let embed_resp = state.ai_client
.embed(EmbedRequest { texts: vec![req.question.clone()], model: None })
.await
.map_err(|e| (StatusCode::BAD_GATEWAY, format!("embed: {e}")))?;
if embed_resp.embeddings.is_empty() {
return Err((StatusCode::BAD_GATEWAY, "no embedding".into()));
}
let qv: Vec<f32> = embed_resp.embeddings[0].iter().map(|&x| x as f32).collect();
// When SQL-filtered: use brute-force cosine over all embeddings,
// then filter by SQL IDs, then take top_k. HNSW's ef_search caps
// results at ~30, which is too few to reliably intersect with
// narrow SQL filters. Brute-force on 10K vectors is ~50ms — fast
// enough for the hybrid path. Without SQL filter, use HNSW normally.
let all_results = if valid_ids.is_some() {
// Brute-force path: score ALL vectors, filter by SQL IDs later.
let embeddings = store::load_embeddings(&state.store, &req.index_name).await
.map_err(|e| (StatusCode::NOT_FOUND, format!("load embeddings: {e}")))?;
search::search(&qv, &embeddings, embeddings.len()) // score everything
} else if state.hnsw_store.has_index(&req.index_name).await {
state.hnsw_store.search(&req.index_name, &qv, req.top_k).await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?
.into_iter()
.map(|h| search::SearchResult {
doc_id: h.doc_id,
chunk_text: h.chunk_text,
score: h.score,
source: h.source,
chunk_idx: h.chunk_idx as u32,
})
.collect::<Vec<_>>()
} else {
let embeddings = store::load_embeddings(&state.store, &req.index_name).await
.map_err(|e| (StatusCode::NOT_FOUND, format!("load embeddings: {e}")))?;
search::search(&qv, &embeddings, req.top_k)
};
// Step 3: Filter vector results to only SQL-verified IDs.
// ADR-020: read the index's id_prefix from the catalog instead of
// hardcoding prefix stripping. Falls back to heuristic for legacy indexes.
let id_prefix: Option<String> = state.index_registry
.get(&req.index_name).await
.and_then(|m| m.id_prefix.clone());
let sql_count = valid_ids.as_ref().map(|s| s.len()).unwrap_or(0);
// Phase 19: when playbook_memory is consulted, pull a wider candidate
// pool so endorsed workers outside the vanilla top-K can still be
// boosted into visibility. 5× is a conservative multiplier — plenty
// for a +0.25 boost to flip rankings without dragging the cost up.
let fetch_k = if req.use_playbook_memory { req.top_k * 5 } else { req.top_k };
let filtered: Vec<search::SearchResult> = if let Some(ref ids) = valid_ids {
all_results.into_iter()
.filter(|r| {
let raw_id = if let Some(ref prefix) = id_prefix {
r.doc_id.strip_prefix(prefix.as_str()).unwrap_or(&r.doc_id)
} else {
// Legacy: heuristic strip for pre-ADR-020 indexes
r.doc_id.strip_prefix("W500K-")
.or_else(|| r.doc_id.strip_prefix("W500-"))
.or_else(|| r.doc_id.strip_prefix("W5K-"))
.or_else(|| r.doc_id.strip_prefix("W-"))
.or_else(|| r.doc_id.strip_prefix("CAND-"))
.unwrap_or(&r.doc_id)
};
ids.contains(raw_id)
})
.take(fetch_k)
.collect()
} else {
all_results.into_iter().take(fetch_k).collect()
};
// Step 4: Build sources with SQL-verified flag.
let mut sources: Vec<HybridSource> = filtered.iter().map(|r| HybridSource {
doc_id: r.doc_id.clone(),
chunk_text: r.chunk_text.clone(),
score: r.score,
sql_verified: valid_ids.is_some(),
playbook_boost: 0.0,
playbook_citations: Vec::new(),
}).collect();
// Step 4b (Phase 19): if use_playbook_memory, look up semantically
// similar past playbooks and boost workers they endorsed. Name-match
// is on the tuple (city, state, name) extracted from chunk_text —
// hybrid_search's SQL filter already narrowed to one city+state, so
// this just needs to check the name against each playbook's endorsed
// set. Additive boost on the existing vector score, then re-sort.
if req.use_playbook_memory {
let boost_k = req.playbook_memory_k.unwrap_or(playbook_memory::DEFAULT_TOP_K_PLAYBOOKS);
// Extract target (city, state, role) from the SQL filter so
// compute_boost_for can skip playbooks from other cities AND
// prioritize exact role matches via the multi-strategy path.
// The executor's filter shape is stable:
// `... role = 'Welder' AND city = 'Toledo' AND state = 'OH' ...`.
// Case-insensitive match, tolerant of single quotes.
let target_geo = req.sql_filter.as_deref().and_then(extract_target_geo);
let target_role = req.sql_filter.as_deref().and_then(extract_target_role);
// We embedded the question as `qv` above — reuse it for the
// playbook similarity lookup so we don't double-pay Ollama.
let boosts = state.playbook_memory
.compute_boost_for_filtered_with_role(
&qv,
boost_k,
0.5,
target_geo.as_ref().map(|(c, s)| (c.as_str(), s.as_str())),
target_role.as_deref(),
)
.await;
// Diagnostics for Phase 19 boost pipeline. Logged so item 3
// investigation has ground truth:
// - boosts.len(): how many (city,state,name) keys surfaced for
// this query (0 = playbook_memory found nothing semantically
// similar to the question).
// - parsed: how many candidate chunks parsed cleanly into
// (name,city,state) via parse_worker_chunk.
// - matched: how many parsed keys matched an entry in boosts.
// 2026-04-21 — 20-scenario batch showed 34/40 ok combos never
// got a citation. These counters pin whether the gap is on the
// SIMILARITY side (boosts empty) or the MATCH side (parsed vs
// boosted keys mismatch — e.g. name format drift).
let mut parsed_count = 0usize;
let mut matched_count = 0usize;
for src in sources.iter_mut() {
// Parse "{Name} — {Role} in {City}, {State}. …" chunk. Being
// defensive: chunks from other datasets may not follow this
// exact shape, so absent fields just skip the boost.
if let Some((name, city, state)) = parse_worker_chunk(&src.chunk_text) {
parsed_count += 1;
let key = (city, state, name);
if let Some(entry) = boosts.get(&key) {
src.score += entry.boost;
src.playbook_boost = entry.boost;
src.playbook_citations = entry.citations.clone();
matched_count += 1;
}
}
}
tracing::info!(
"playbook_boost: boosts={} sources={} parsed={} matched={} target_geo={:?} target_role={:?} (query='{}')",
boosts.len(),
sources.len(),
parsed_count,
matched_count,
target_geo,
target_role,
req.question.chars().take(60).collect::<String>(),
);
// Re-rank: boosted scores can flip ordering.
sources.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal));
// Finally trim to the caller's requested top_k — we pulled fetch_k
// (5× wider) above specifically so the boost could reach workers
// that would otherwise have been trimmed pre-boost.
sources.truncate(req.top_k);
}
// Step 5: Generate answer if requested.
let answer = if req.generate && !sources.is_empty() {
let context: String = sources.iter().enumerate().map(|(i, s)| {
format!("[{}] (id: {}, verified: {}) {}", i + 1, s.doc_id, s.sql_verified, s.chunk_text)
}).collect::<Vec<_>>().join("\n\n");
let gen_resp = state.ai_client.generate(GenerateRequest {
prompt: format!(
"You are a staffing intelligence assistant. Answer based ONLY on these \
verified worker records. Every record has been SQL-verified against the \
database — you can trust the facts in them. Be specific: cite names, \
skills, certifications, scores, and locations.\n\n\
Records:\n{context}\n\n\
Question: {}\n\nAnswer:", req.question,
),
model: None,
system: None,
temperature: Some(0.2),
max_tokens: Some(512),
think: None,
}).await;
gen_resp.ok().map(|r| r.text.trim().to_string())
} else {
None
};
let method = if valid_ids.is_some() { "hybrid_sql_vector" } else { "vector_only" };
Ok(Json(HybridResponse {
question: req.question,
sql_filter: req.sql_filter,
sql_matches: sql_count,
vector_reranked: sources.len(),
method: method.into(),
answer,
sources,
duration_ms: t0.elapsed().as_millis() as u64,
}))
}
// --- HNSW Fast Search ---
#[derive(Deserialize)]
struct BuildHnswRequest {
/// Name of the stored vector index to build HNSW from
index_name: String,
/// Optional config override. Omit to use the production default
/// (ec=80 es=30 — see HnswConfig::default docs for rationale).
#[serde(default)]
config: Option<trial::HnswConfig>,
}
/// Build an HNSW index from an existing stored vector index.
/// Uses the embedding cache so repeated builds don't reload from Parquet.
async fn build_hnsw(
State(state): State<VectorState>,
Json(req): Json<BuildHnswRequest>,
) -> impl IntoResponse {
let config = req.config.unwrap_or_default();
tracing::info!(
"building HNSW for '{}' ef_construction={} ef_search={}",
req.index_name, config.ef_construction, config.ef_search,
);
let embeddings = state
.embedding_cache
.get_or_load(&req.index_name)
.await
.map_err(|e| (StatusCode::NOT_FOUND, format!("index not found: {e}")))?;
match state
.hnsw_store
.build_index_with_config(&req.index_name, (*embeddings).clone(), &config)
.await
{
Ok(stats) => Ok(Json(stats)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
#[derive(Deserialize)]
struct HnswSearchRequest {
index_name: String,
query: String,
top_k: Option<usize>,
}
/// Search using HNSW — approximate nearest neighbors, much faster than brute-force.
async fn search_hnsw(
State(state): State<VectorState>,
Json(req): Json<HnswSearchRequest>,
) -> impl IntoResponse {
let top_k = req.top_k.unwrap_or(5);
// Embed query
let embed_resp = state.ai_client.embed(EmbedRequest {
texts: vec![req.query.clone()],
model: None,
}).await.map_err(|e| (StatusCode::BAD_GATEWAY, format!("embed error: {e}")))?;
if embed_resp.embeddings.is_empty() {
return Err((StatusCode::BAD_GATEWAY, "no embedding returned".to_string()));
}
let query_vec: Vec<f32> = embed_resp.embeddings[0].iter().map(|&x| x as f32).collect();
// Search HNSW
match state.hnsw_store.search(&req.index_name, &query_vec, top_k).await {
Ok(results) => Ok(Json(serde_json::json!({
"results": results,
"query": req.query,
"method": "hnsw",
}))),
Err(e) => Err((StatusCode::NOT_FOUND, e)),
}
}
async fn list_hnsw(State(state): State<VectorState>) -> impl IntoResponse {
Json(state.hnsw_store.list().await)
}
// --- Trial System: parameterized HNSW tuning loop ---
//
// Flow:
// 1. Agent picks an HnswConfig
// 2. POST /hnsw/trial builds HNSW with that config against cached embeddings,
// runs every query in the harness, measures latency + recall vs the
// harness's ground truth, appends a Trial record to _hnsw_trials/{idx}.jsonl
// 3. Agent reads GET /hnsw/trials/{index}, sees history, decides next config
// 4. Repeat until converged.
//
// The first trial triggers embedding load (slow). Every subsequent trial reuses
// the cache — so the agent iterates in seconds, not minutes.
#[derive(Deserialize)]
struct TrialRequest {
index_name: String,
harness: String,
#[serde(default)]
config: trial::HnswConfig,
#[serde(default)]
note: Option<String>,
}
async fn run_trial(
State(state): State<VectorState>,
Json(req): Json<TrialRequest>,
) -> Result<Json<trial::Trial>, (StatusCode, String)> {
let mut harness_set = state.harness_store.load_for_index(&req.index_name, &req.harness)
.await
.map_err(|e| (StatusCode::NOT_FOUND, format!("harness not found: {e}")))?;
if harness_set.index_name != req.index_name {
return Err((
StatusCode::BAD_REQUEST,
format!(
"harness '{}' is for index '{}', not '{}'",
req.harness, harness_set.index_name, req.index_name
),
));
}
if harness_set.queries.is_empty() {
return Err((StatusCode::BAD_REQUEST, "harness has no queries".into()));
}
let embeddings = state
.embedding_cache
.get_or_load(&req.index_name)
.await
.map_err(|e| (StatusCode::NOT_FOUND, format!("load embeddings: {e}")))?;
if !harness_set.ground_truth_built {
tracing::info!("trial: computing ground truth for harness '{}'", harness_set.name);
let t0 = std::time::Instant::now();
harness::compute_ground_truth(&mut harness_set, &embeddings, &state.ai_client)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, format!("ground truth: {e}")))?;
tracing::info!("trial: ground truth built in {:.1}s", t0.elapsed().as_secs_f32());
state.harness_store
.save(&harness_set)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, format!("save harness: {e}")))?;
}
let trial_id = trial::Trial::new_id();
let hnsw_slot = format!("{}__{}", req.index_name, trial_id);
let build_stats = state
.hnsw_store
.build_index_with_config(&hnsw_slot, (*embeddings).clone(), &req.config)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, format!("build: {e}")))?;
let query_vectors: Vec<Vec<f32>> = harness_set
.queries
.iter()
.filter_map(|q| q.query_embedding.clone())
.collect();
let bench = state
.hnsw_store
.bench_search(&hnsw_slot, &query_vectors, harness_set.k)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, format!("search: {e}")))?;
let mut recalls = Vec::with_capacity(harness_set.queries.len());
for (q, hits) in harness_set.queries.iter().zip(bench.retrieved.iter()) {
if let Some(gt) = &q.ground_truth {
recalls.push(harness::recall_at_k(hits, gt, harness_set.k));
}
}
let mean_recall = if recalls.is_empty() {
0.0
} else {
recalls.iter().sum::<f32>() / recalls.len() as f32
};
let mut lats = bench.latencies_us.clone();
lats.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
let p = |pct: f32| -> f32 {
if lats.is_empty() { return 0.0; }
let idx = ((lats.len() as f32 - 1.0) * pct).round() as usize;
lats[idx.min(lats.len() - 1)]
};
// One brute-force reference latency — keeps the cost proportional to
// whatever the agent is willing to pay per trial.
let brute_latency_us = if let Some(qv) = query_vectors.first() {
let t0 = std::time::Instant::now();
let _ = harness::brute_force_top_k(qv, &embeddings, harness_set.k);
t0.elapsed().as_micros() as f32
} else {
0.0
};
let dims = embeddings.first().map(|e| e.vector.len()).unwrap_or(0);
let memory_bytes =
(embeddings.len() * dims * std::mem::size_of::<f32>() + embeddings.len() * 128) as u64;
let trial_record = trial::Trial {
id: trial_id.clone(),
index_name: req.index_name.clone(),
eval_set: req.harness.clone(),
config: req.config.clone(),
metrics: trial::TrialMetrics {
build_time_secs: build_stats.build_time_secs,
search_latency_p50_us: p(0.50),
search_latency_p95_us: p(0.95),
search_latency_p99_us: p(0.99),
recall_at_k: mean_recall,
memory_bytes,
vectors: build_stats.vectors,
eval_queries: harness_set.queries.len(),
brute_force_latency_us: brute_latency_us,
},
created_at: chrono::Utc::now(),
note: req.note,
};
state
.trial_journal
.append(&trial_record)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, format!("journal: {e}")))?;
state.hnsw_store.drop(&hnsw_slot).await;
Ok(Json(trial_record))
}
async fn list_trials(
State(state): State<VectorState>,
Path(index_name): Path<String>,
) -> impl IntoResponse {
match state.trial_journal.list(&index_name).await {
Ok(trials) => Ok(Json(trials)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
#[derive(Deserialize)]
struct BestTrialQuery {
#[serde(default = "default_metric")]
metric: String,
}
fn default_metric() -> String {
"pareto".to_string()
}
async fn best_trial(
State(state): State<VectorState>,
Path(index_name): Path<String>,
Query(q): Query<BestTrialQuery>,
) -> impl IntoResponse {
match state.trial_journal.best(&index_name, &q.metric).await {
Ok(Some(t)) => Ok(Json(t)),
Ok(None) => Err((StatusCode::NOT_FOUND, "no trials yet".to_string())),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
// --- Harness management ---
async fn list_evals(State(state): State<VectorState>) -> impl IntoResponse {
Json(state.harness_store.list_all().await)
}
async fn get_eval(
State(state): State<VectorState>,
Path(name): Path<String>,
) -> impl IntoResponse {
match state.harness_store.get_any(&name).await {
Ok(e) => Ok(Json(e)),
Err(err) => Err((StatusCode::NOT_FOUND, err)),
}
}
async fn put_eval(
State(state): State<VectorState>,
Path(name): Path<String>,
Json(mut harness_set): Json<harness::EvalSet>,
) -> impl IntoResponse {
harness_set.name = name;
harness_set.ground_truth_built = harness_set
.queries
.iter()
.all(|q| q.ground_truth.is_some());
match state.harness_store.save(&harness_set).await {
Ok(()) => Ok(Json(harness_set)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
#[derive(Deserialize)]
struct AutogenRequest {
index_name: String,
#[serde(default = "default_sample_count")]
sample_count: usize,
#[serde(default = "default_k")]
k: usize,
}
fn default_sample_count() -> usize { 100 }
fn default_k() -> usize { 10 }
async fn autogen_eval(
State(state): State<VectorState>,
Path(name): Path<String>,
Json(req): Json<AutogenRequest>,
) -> Result<Json<harness::EvalSet>, (StatusCode, String)> {
let embeddings = state
.embedding_cache
.get_or_load(&req.index_name)
.await
.map_err(|e| (StatusCode::NOT_FOUND, format!("load embeddings: {e}")))?;
let mut harness_set = harness::synthetic_from_chunks(
&name,
&req.index_name,
&embeddings,
req.sample_count,
req.k,
);
harness::compute_ground_truth(&mut harness_set, &embeddings, &state.ai_client)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, format!("ground truth: {e}")))?;
state.harness_store
.save(&harness_set)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, format!("save: {e}")))?;
Ok(Json(harness_set))
}
// --- Embedding cache management ---
async fn cache_stats(State(state): State<VectorState>) -> impl IntoResponse {
Json(state.embedding_cache.stats().await)
}
async fn cache_evict(
State(state): State<VectorState>,
Path(index_name): Path<String>,
) -> impl IntoResponse {
let ok = state.embedding_cache.evict(&index_name).await;
Json(serde_json::json!({ "evicted": ok, "index_name": index_name }))
}
// --- Phase C: embedding refresh ---
//
// Decouples "new row data arrived" from "re-embed everything." Ingest marks
// a dataset's embeddings stale (see catalogd::registry::mark_embeddings_stale);
// `/vectors/refresh/{dataset}` diffs existing embeddings against current
// rows, embeds only the new ones, appends to the index, and clears the
// stale flag.
async fn refresh_dataset(
State(state): State<VectorState>,
Path(dataset_name): Path<String>,
Json(req): Json<refresh::RefreshRequest>,
) -> Result<Json<refresh::RefreshResult>, (StatusCode, String)> {
tracing::info!(
"refresh requested for dataset '{}' -> index '{}'",
dataset_name, req.index_name,
);
match refresh::refresh_index(
&dataset_name,
&req,
&state.store,
&state.catalog,
&state.ai_client,
&state.embedding_cache,
&state.index_registry,
)
.await
{
Ok(result) => Ok(Json(result)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
#[derive(Serialize)]
struct StaleEntry {
dataset_name: String,
last_embedded_at: Option<String>,
stale_since: String,
refresh_policy: Option<shared::types::RefreshPolicy>,
}
async fn list_stale(State(state): State<VectorState>) -> impl IntoResponse {
let datasets = state.catalog.stale_datasets().await;
let entries: Vec<StaleEntry> = datasets
.into_iter()
.map(|d| StaleEntry {
dataset_name: d.name,
last_embedded_at: d.last_embedded_at.map(|t| t.to_rfc3339()),
stale_since: d
.embedding_stale_since
.map(|t| t.to_rfc3339())
.unwrap_or_default(),
refresh_policy: d.embedding_refresh_policy,
})
.collect();
Json(entries)
}
// --- Phase 17: Model profile activation + scoped search ---
#[derive(Serialize)]
struct ActivateReport {
profile_id: String,
ollama_name: String,
indexes_warmed: Vec<WarmedIndex>,
failures: Vec<String>,
total_vectors: usize,
duration_secs: f32,
/// Phase C: did we successfully preload the Ollama model?
model_preloaded: bool,
/// Phase C: which profile previously held the GPU slot, if any.
/// Useful for observability of the swap.
previous_profile: Option<String>,
}
#[derive(Serialize)]
struct WarmedIndex {
index_name: String,
source: String,
vectors: usize,
hnsw_build_secs: f32,
}
/// Warm this profile's indexes. For every bound dataset, find the
/// matching vector index (any index whose `source` equals the dataset
/// or view name), load its embeddings into EmbeddingCache, build HNSW
/// with the profile's config. Next `/profile/{id}/search` call is then
/// <1ms cold.
///
/// Failures on individual indexes don't stop the activation — they get
/// reported in the response. This matches the "substrate keeps working"
/// philosophy from ADR-017: one bad binding shouldn't take down the
/// whole profile.
async fn activate_profile(
State(state): State<VectorState>,
Path(profile_id): Path<String>,
) -> impl IntoResponse {
let t0 = std::time::Instant::now();
let profile = match state.catalog.get_profile(&profile_id).await {
Some(p) => p,
None => return Err((StatusCode::NOT_FOUND, format!("profile not found: {profile_id}"))),
};
let mut warmed = Vec::new();
let mut failures = Vec::new();
let mut total_vectors = 0usize;
// Phase 17 / C: VRAM-aware swap. If another profile currently holds
// the GPU and uses a DIFFERENT Ollama model than the one being
// activated, unload it first (keep_alive=0). Same-model activations
// skip the unload — no point churning a model that's already loaded.
let previous_slot = {
let guard = state.active_profile.read().await;
guard.clone()
};
if let Some(prev) = &previous_slot {
if prev.ollama_name != profile.ollama_name {
match state.ai_client.unload_model(&prev.ollama_name).await {
Ok(_) => tracing::info!(
"profile swap: unloaded '{}' ({} -> {})",
prev.ollama_name, prev.profile_id, profile.id,
),
Err(e) => failures.push(format!(
"unload previous model '{}': {e}", prev.ollama_name,
)),
}
}
}
// Federation layer 2: if this profile declares its own bucket and
// that bucket isn't registered yet, auto-provision it under the
// configured profile_root. This is the moment a "dormant" profile
// becomes live — its bucket exists and is readable/writable.
if let Some(bucket_name) = profile.bucket.clone() {
if !state.bucket_registry.contains(&bucket_name) {
let root = format!(
"{}/{}",
state.bucket_registry.profile_root().trim_end_matches('/'),
bucket_name.replace(':', "_"),
);
let bc = shared::config::BucketConfig {
name: bucket_name.clone(),
backend: "local".to_string(),
root: Some(root.clone()),
bucket: None,
region: None,
endpoint: None,
secret_ref: None,
};
match state.bucket_registry.add_bucket(bc).await {
Ok(info) => {
tracing::info!(
"profile '{}' activated bucket '{}' (root={}, reachable={})",
profile.id, bucket_name, root, info.reachable,
);
}
Err(e) => {
failures.push(format!(
"auto-provision bucket '{}': {}", bucket_name, e,
));
}
}
}
}
let all_indexes = state.index_registry.list(None, None).await;
let use_lance = profile.vector_backend == shared::types::VectorBackend::Lance;
for binding in &profile.bound_datasets {
let matched: Vec<_> = all_indexes
.iter()
.filter(|m| &m.source == binding)
.collect();
if matched.is_empty() {
failures.push(format!(
"no vector index found for binding '{}'", binding,
));
continue;
}
for meta in matched {
if use_lance {
// --- Lance activation path ---
// Ensure a Lance dataset exists for this index. If it
// doesn't, auto-migrate from the Parquet blob. Then
// ensure an IVF_PQ index is built.
let bucket = meta.bucket.clone();
let lance_store = match state.lance.store_for_new(&meta.index_name, &bucket).await {
Ok(s) => s,
Err(e) => {
failures.push(format!("{}: lance store init: {e}", meta.index_name));
continue;
}
};
let count = lance_store.count().await.unwrap_or(0);
if count == 0 {
// Auto-migrate from existing Parquet.
let pq_store = match state.bucket_registry.get(&bucket) {
Ok(s) => s,
Err(e) => { failures.push(format!("{}: bucket: {e}", meta.index_name)); continue; }
};
match storaged::ops::get(&pq_store, &meta.storage_key).await {
Ok(bytes) => {
let build_t = std::time::Instant::now();
match lance_store.migrate_from_parquet_bytes(&bytes).await {
Ok(ms) => {
total_vectors += ms.rows_written;
tracing::info!(
"lance auto-migrate '{}': {} rows in {:.2}s",
meta.index_name, ms.rows_written, ms.duration_secs,
);
warmed.push(WarmedIndex {
index_name: meta.index_name.clone(),
source: meta.source.clone(),
vectors: ms.rows_written,
hnsw_build_secs: build_t.elapsed().as_secs_f32(),
});
}
Err(e) => failures.push(format!("{}: lance migrate: {e}", meta.index_name)),
}
}
Err(e) => failures.push(format!("{}: read parquet: {e}", meta.index_name)),
}
} else {
total_vectors += count;
warmed.push(WarmedIndex {
index_name: meta.index_name.clone(),
source: meta.source.clone(),
vectors: count,
hnsw_build_secs: 0.0,
});
}
// Ensure IVF_PQ vector index exists.
if !lance_store.has_vector_index().await.unwrap_or(false) {
match lance_store.build_index(316, 8, 48).await {
Ok(ix) => tracing::info!(
"lance auto-index '{}': IVF_PQ built in {:.1}s",
meta.index_name, ix.build_time_secs,
),
Err(e) => failures.push(format!("{}: lance IVF_PQ build: {e}", meta.index_name)),
}
}
// Ensure scalar btree on doc_id for O(log N) random fetch.
if !lance_store.has_scalar_index("doc_id").await.unwrap_or(false) {
match lance_store.build_scalar_index("doc_id").await {
Ok(ix) => tracing::info!(
"lance auto-index '{}': doc_id btree built in {:.2}s",
meta.index_name, ix.build_time_secs,
),
Err(e) => failures.push(format!("{}: lance doc_id btree: {e}", meta.index_name)),
}
}
} else {
// --- Parquet + HNSW activation path (existing) ---
let embeddings = match state.embedding_cache.get_or_load(&meta.index_name).await {
Ok(arc) => arc,
Err(e) => {
failures.push(format!("{}: load failed: {}", meta.index_name, e));
continue;
}
};
total_vectors += embeddings.len();
let profile_default = trial::HnswConfig {
ef_construction: profile.hnsw_config.ef_construction,
ef_search: profile.hnsw_config.ef_search,
seed: profile.hnsw_config.seed,
};
let cfg = state
.promotion_registry
.config_or(&meta.index_name, profile_default)
.await;
let build_t = std::time::Instant::now();
match state
.hnsw_store
.build_index_with_config(&meta.index_name, (*embeddings).clone(), &cfg)
.await
{
Ok(_) => {
warmed.push(WarmedIndex {
index_name: meta.index_name.clone(),
source: meta.source.clone(),
vectors: embeddings.len(),
hnsw_build_secs: build_t.elapsed().as_secs_f32(),
});
}
Err(e) => {
failures.push(format!("{}: HNSW build failed: {}", meta.index_name, e));
}
}
}
}
}
// Preload the new profile's Ollama model proactively. Same-model
// re-activations are cheap (Ollama no-ops if already loaded).
let mut model_preloaded = false;
match state.ai_client.preload_model(&profile.ollama_name).await {
Ok(_) => {
model_preloaded = true;
tracing::info!("profile '{}' preloaded ollama model '{}'",
profile.id, profile.ollama_name);
}
Err(e) => failures.push(format!(
"preload ollama model '{}': {e}", profile.ollama_name,
)),
}
// Take the GPU slot.
{
let mut guard = state.active_profile.write().await;
*guard = Some(ActiveProfileSlot {
profile_id: profile.id.clone(),
ollama_name: profile.ollama_name.clone(),
activated_at: chrono::Utc::now(),
});
}
Ok(Json(ActivateReport {
profile_id: profile.id,
ollama_name: profile.ollama_name,
indexes_warmed: warmed,
failures,
total_vectors,
duration_secs: t0.elapsed().as_secs_f32(),
model_preloaded,
previous_profile: previous_slot.map(|s| s.profile_id),
}))
}
/// Unload this profile's model and clear the active slot. No-op if the
/// caller isn't the currently-active profile.
async fn deactivate_profile(
State(state): State<VectorState>,
Path(profile_id): Path<String>,
) -> impl IntoResponse {
let profile = match state.catalog.get_profile(&profile_id).await {
Some(p) => p,
None => return Err((StatusCode::NOT_FOUND, format!("profile not found: {profile_id}"))),
};
let was_active = {
let mut guard = state.active_profile.write().await;
match guard.as_ref() {
Some(s) if s.profile_id == profile_id => {
let prev = s.clone();
*guard = None;
Some(prev)
}
_ => None,
}
};
// Regardless of whether it held the slot, we can still try to unload —
// the operator's intent is "get this model out of VRAM."
let unload_result = state.ai_client.unload_model(&profile.ollama_name).await;
Ok(Json(serde_json::json!({
"profile_id": profile.id,
"ollama_name": profile.ollama_name,
"was_active": was_active.is_some(),
"unloaded": unload_result.is_ok(),
"unload_error": unload_result.err(),
})))
}
async fn get_active_profile(State(state): State<VectorState>) -> impl IntoResponse {
let slot = state.active_profile.read().await.clone();
Json(slot)
}
#[derive(Deserialize)]
struct ProfileSearchRequest {
index_name: String,
query: String,
top_k: Option<usize>,
}
/// Search scoped to a profile — refuses if the requested index's source
/// isn't in the profile's bound_datasets. Reuses the existing HNSW
/// search path when the index is warm; falls back to brute-force cosine
/// if it's not (handled by the existing search code path).
async fn profile_scoped_search(
State(state): State<VectorState>,
Path(profile_id): Path<String>,
Json(req): Json<ProfileSearchRequest>,
) -> impl IntoResponse {
let profile = match state.catalog.get_profile(&profile_id).await {
Some(p) => p,
None => return Err((StatusCode::NOT_FOUND, format!("profile not found: {profile_id}"))),
};
// Verify the index is in scope for this profile.
let index_meta = match state.index_registry.get(&req.index_name).await {
Some(m) => m,
None => return Err((StatusCode::NOT_FOUND, format!("index not found: {}", req.index_name))),
};
if !profile.bound_datasets.contains(&index_meta.source) {
return Err((
StatusCode::FORBIDDEN,
format!(
"profile '{}' is not bound to '{}' — allowed bindings: {:?}",
profile.id, index_meta.source, profile.bound_datasets,
),
));
}
let top_k = req.top_k.unwrap_or(5);
let use_lance = profile.vector_backend == shared::types::VectorBackend::Lance;
// Embed the query.
let embed_resp = state
.ai_client
.embed(EmbedRequest { texts: vec![req.query.clone()], model: None })
.await
.map_err(|e| (StatusCode::BAD_GATEWAY, format!("embed: {e}")))?;
if embed_resp.embeddings.is_empty() {
return Err((StatusCode::BAD_GATEWAY, "no embedding returned".into()));
}
let query_vec: Vec<f32> = embed_resp.embeddings[0].iter().map(|&x| x as f32).collect();
// ADR-019 hybrid: route to Lance or Parquet+HNSW based on the
// profile's declared backend. Callers don't need to know which
// storage tier they're hitting — the profile abstracts it.
if use_lance {
let lance_store = state.lance.store_for(&req.index_name).await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
let t0 = std::time::Instant::now();
match lance_store.search(
&query_vec,
top_k,
Some(LANCE_DEFAULT_NPROBES),
Some(LANCE_DEFAULT_REFINE_FACTOR),
).await {
Ok(hits) => Ok(Json(serde_json::json!({
"profile": profile.id,
"source": index_meta.source,
"method": "lance_ivf_pq",
"latency_us": t0.elapsed().as_micros() as u64,
"results": hits,
}))),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
} else if state.hnsw_store.has_index(&req.index_name).await {
match state.hnsw_store.search(&req.index_name, &query_vec, top_k).await {
Ok(hits) => Ok(Json(serde_json::json!({
"profile": profile.id,
"source": index_meta.source,
"method": "hnsw",
"results": hits,
}))),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
} else {
let embeddings = state
.embedding_cache
.get_or_load(&req.index_name)
.await
.map_err(|e| (StatusCode::NOT_FOUND, format!("embeddings: {e}")))?;
let results = search::search(&query_vec, &embeddings, top_k);
Ok(Json(serde_json::json!({
"profile": profile.id,
"source": index_meta.source,
"method": "brute_force",
"results": results,
})))
}
}
// --- Phase 16: Promotion + autotune ---
#[derive(Deserialize)]
struct PromoteQuery {
#[serde(default)]
promoted_by: String,
#[serde(default)]
note: Option<String>,
}
async fn promote_trial(
State(state): State<VectorState>,
Path((index_name, trial_id)): Path<(String, String)>,
Query(q): Query<PromoteQuery>,
) -> impl IntoResponse {
// Pull the trial from the journal to get its config.
let trials = state
.trial_journal
.list(&index_name)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?;
let trial = trials
.iter()
.find(|t| t.id == trial_id)
.ok_or_else(|| (StatusCode::NOT_FOUND, format!("trial not found: {trial_id}")))?;
let entry = promotion::PromotionEntry {
config: trial.config.clone(),
trial_id: trial.id.clone(),
promoted_at: chrono::Utc::now(),
promoted_by: q.promoted_by,
note: q.note,
};
match state.promotion_registry.promote(&index_name, entry).await {
Ok(file) => Ok(Json(file)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
async fn rollback_promotion(
State(state): State<VectorState>,
Path(index_name): Path<String>,
) -> impl IntoResponse {
match state.promotion_registry.rollback(&index_name).await {
Ok(file) => Ok(Json(file)),
Err(e) => Err((StatusCode::NOT_FOUND, e)),
}
}
async fn get_promoted(
State(state): State<VectorState>,
Path(index_name): Path<String>,
) -> impl IntoResponse {
match state.promotion_registry.load(&index_name).await {
Ok(file) => Ok(Json(file)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
async fn run_autotune_endpoint(
State(state): State<VectorState>,
Json(req): Json<autotune::AutotuneRequest>,
) -> impl IntoResponse {
match autotune::run_autotune(
req,
&state.store,
&state.catalog,
&state.ai_client,
&state.embedding_cache,
&state.hnsw_store,
&state.index_registry,
&state.trial_journal,
&state.promotion_registry,
&state.harness_store,
&state.job_tracker,
).await {
Ok(result) => Ok(Json(result)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
// --- Phase 16.2: autotune agent endpoints ---
async fn agent_status(State(state): State<VectorState>) -> impl IntoResponse {
Json(state.agent_handle.status().await)
}
async fn agent_stop(State(state): State<VectorState>) -> impl IntoResponse {
let stopped = state.agent_handle.stop().await;
Json(serde_json::json!({ "stopped": stopped }))
}
async fn agent_enqueue(
State(state): State<VectorState>,
Path(index_name): Path<String>,
) -> impl IntoResponse {
let event = agent::TriggerEvent::manual(index_name);
match state.agent_handle.enqueue(event).await {
Ok(()) => Ok(Json(serde_json::json!({ "enqueued": true }))),
Err(e) => Err((StatusCode::SERVICE_UNAVAILABLE, e)),
}
}
// --- ADR-019: Lance hybrid backend HTTP surface ---
//
// Lance routes operate on the same `index_name` as the Parquet/HNSW path,
// but materialize the data as a Lance dataset on disk under
// `{bucket_root}/lance/{index_name}/`. The two backends are independent:
// you can have an index in both formats simultaneously. `IndexMeta.vector_backend`
// records which one is canonical for that index.
#[derive(Deserialize)]
struct LanceMigrateRequest {
/// Optional bucket override. Defaults to whatever the existing
/// IndexMeta says, or "primary" for indexes that don't exist yet.
#[serde(default)]
bucket: Option<String>,
}
/// Read the existing Parquet vector file for `index_name` from object
/// storage, hand the bytes to vectord-lance, return migration stats.
/// The original Parquet file is left intact — both backends coexist
/// after migration.
async fn lance_migrate(
State(state): State<VectorState>,
Path(index_name): Path<String>,
Json(req): Json<LanceMigrateRequest>,
) -> impl IntoResponse {
let meta = state.index_registry.get(&index_name).await
.ok_or((StatusCode::NOT_FOUND, format!("index not found: {index_name}")))?;
let bucket = req.bucket.unwrap_or(meta.bucket.clone());
// Pull the Parquet bytes via storaged::ops — same path as the
// existing embedding loader uses.
let store = state.bucket_registry.get(&bucket)
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
let bytes = storaged::ops::get(&store, &meta.storage_key).await
.map_err(|e| (StatusCode::NOT_FOUND, format!("read parquet: {e}")))?;
let lance_store = state.lance.store_for_new(&index_name, &bucket).await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
let stats = lance_store.migrate_from_parquet_bytes(&bytes).await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?;
tracing::info!(
"lance migrate '{}': {} rows, {}d, {} bytes on disk, {:.2}s",
index_name, stats.rows_written, stats.dimensions,
stats.disk_bytes, stats.duration_secs,
);
Ok::<_, (StatusCode, String)>(Json(serde_json::json!({
"index_name": index_name,
"bucket": bucket,
"lance_path": lance_store.path(),
"stats": stats,
})))
}
#[derive(Deserialize)]
struct LanceIndexRequest {
#[serde(default = "default_partitions")]
num_partitions: u32,
#[serde(default = "default_bits")]
num_bits: u32,
#[serde(default = "default_subvectors")]
num_sub_vectors: u32,
}
fn default_partitions() -> u32 { 316 } // ≈√100K — sane for the reference dataset
fn default_bits() -> u32 { 8 }
fn default_subvectors() -> u32 { 48 } // 768/48 = 16 dims per subvector
/// Build the IVF_PQ index on the Lance dataset.
async fn lance_build_index(
State(state): State<VectorState>,
Path(index_name): Path<String>,
Json(req): Json<LanceIndexRequest>,
) -> impl IntoResponse {
let lance_store = state.lance.store_for(&index_name).await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
match lance_store.build_index(req.num_partitions, req.num_bits, req.num_sub_vectors).await {
Ok(stats) => Ok(Json(stats)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
#[derive(Deserialize)]
struct LanceSearchRequest {
/// Plain text query — embedded server-side for symmetry with the
/// existing /vectors/search path.
query: String,
#[serde(default = "default_top_k")]
top_k: usize,
/// IVF partitions to probe. `None` uses Lance's built-in default of
/// 1, which caps recall well below the index's real capability.
/// Recommended: 510% of num_partitions (≈20 for a 316-partition
/// index). Omitting it here picks the server-side default.
#[serde(default)]
nprobes: Option<usize>,
/// Refine factor — re-rank `top_k * factor` PQ-approximate candidates
/// with exact distances before returning `top_k`. Recovers recall
/// lost to product quantization.
#[serde(default)]
refine_factor: Option<u32>,
}
/// Server-side defaults when the caller doesn't pin nprobes / refine
/// themselves. Tuned for the ~100K × 768d reference workload; see
/// docs/ADR-019-vector-storage.md for the recall / latency trade-off.
const LANCE_DEFAULT_NPROBES: usize = 20;
const LANCE_DEFAULT_REFINE_FACTOR: u32 = 5;
fn default_top_k() -> usize { 5 }
/// Vector search against a Lance dataset. Embeds the query text via the
/// sidecar then calls Lance's nearest-neighbor scanner.
async fn lance_search(
State(state): State<VectorState>,
Path(index_name): Path<String>,
Json(req): Json<LanceSearchRequest>,
) -> impl IntoResponse {
let embed_resp = state.ai_client
.embed(EmbedRequest { texts: vec![req.query.clone()], model: None })
.await
.map_err(|e| (StatusCode::BAD_GATEWAY, format!("embed: {e}")))?;
if embed_resp.embeddings.is_empty() {
return Err((StatusCode::BAD_GATEWAY, "no embedding returned".into()));
}
let qv: Vec<f32> = embed_resp.embeddings[0].iter().map(|&x| x as f32).collect();
let lance_store = state.lance.store_for(&index_name).await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
let t0 = std::time::Instant::now();
let nprobes = req.nprobes.or(Some(LANCE_DEFAULT_NPROBES));
let refine = req.refine_factor.or(Some(LANCE_DEFAULT_REFINE_FACTOR));
let hits = lance_store.search(&qv, req.top_k, nprobes, refine).await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?;
Ok(Json(serde_json::json!({
"index_name": index_name,
"query": req.query,
"method": "lance_ivf_pq",
"latency_us": t0.elapsed().as_micros() as u64,
"results": hits,
})))
}
/// Random-access fetch by doc_id — the O(1) lookup that's basically
/// impossible in our Parquet path without scanning the whole file.
async fn lance_get_doc(
State(state): State<VectorState>,
Path((index_name, doc_id)): Path<(String, String)>,
) -> impl IntoResponse {
let lance_store = state.lance.store_for(&index_name).await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
let t0 = std::time::Instant::now();
match lance_store.get_by_doc_id(&doc_id).await {
Ok(Some(row)) => Ok(Json(serde_json::json!({
"index_name": index_name,
"doc_id": doc_id,
"latency_us": t0.elapsed().as_micros() as u64,
"row": row,
}))),
Ok(None) => Err((StatusCode::NOT_FOUND, format!("doc_id not found: {doc_id}"))),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
#[derive(Deserialize)]
struct LanceAppendRequest {
/// Optional source tag — set on every appended row.
#[serde(default)]
source: Option<String>,
rows: Vec<LanceAppendRow>,
}
#[derive(Deserialize)]
struct LanceAppendRow {
doc_id: String,
#[serde(default)]
chunk_idx: Option<i32>,
chunk_text: String,
/// Pre-computed embedding. Caller is responsible for ensuring it
/// matches the dataset's dimensions and embedding model.
vector: Vec<f32>,
}
async fn lance_append(
State(state): State<VectorState>,
Path(index_name): Path<String>,
Json(req): Json<LanceAppendRequest>,
) -> impl IntoResponse {
if req.rows.is_empty() {
return Err((StatusCode::BAD_REQUEST, "rows array is empty".into()));
}
let lance_store = state.lance.store_for(&index_name).await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
let mut doc_ids = Vec::with_capacity(req.rows.len());
let mut chunk_idxs = Vec::with_capacity(req.rows.len());
let mut chunk_texts = Vec::with_capacity(req.rows.len());
let mut vectors = Vec::with_capacity(req.rows.len());
for r in req.rows {
doc_ids.push(r.doc_id);
chunk_idxs.push(r.chunk_idx.unwrap_or(0));
chunk_texts.push(r.chunk_text);
vectors.push(r.vector);
}
match lance_store.append(req.source, doc_ids, chunk_idxs, chunk_texts, vectors).await {
Ok(stats) => Ok(Json(stats)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
async fn lance_stats(
State(state): State<VectorState>,
Path(index_name): Path<String>,
) -> impl IntoResponse {
let lance_store = state.lance.store_for(&index_name).await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
match lance_store.stats().await {
Ok(s) => Ok(Json(s)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
/// Run an existing harness against Lance IVF_PQ and measure recall@k.
/// Uses the same ground truth computed by brute-force cosine (the HNSW
/// eval path). This closes ADR-019's explicit gap: "IVF_PQ recall not
/// measured."
#[derive(Deserialize)]
struct LanceRecallRequest {
harness: String,
#[serde(default = "default_top_k")]
top_k: usize,
/// Override server defaults so operators can sweep nprobes /
/// refine_factor to chart the recall-vs-latency curve.
#[serde(default)]
nprobes: Option<usize>,
#[serde(default)]
refine_factor: Option<u32>,
}
#[derive(serde::Serialize)]
struct LanceRecallResult {
index_name: String,
harness: String,
queries: usize,
top_k: usize,
mean_recall: f32,
per_query: Vec<LanceRecallQuery>,
latency_p50_us: f32,
latency_p95_us: f32,
total_duration_secs: f32,
}
#[derive(serde::Serialize)]
struct LanceRecallQuery {
query_id: String,
recall: f32,
latency_us: f32,
hits_returned: usize,
}
// --- Phase 19: playbook memory endpoints ---
/// Extract (name, city, state) from a chunk formatted like
/// "{Name} — {Role} in {City}, {State}. Skills: …".
/// Returns None if the chunk doesn't match the shape; callers simply
/// skip the boost for that hit.
/// Extract role from an SQL filter matching `role = 'Welder'` style.
/// Case-insensitive on the column name. Quoted value; quotes not
/// included in returned string.
fn extract_target_role(sql_filter: &str) -> Option<String> {
grab_eq_value(sql_filter, "role")
}
/// Shared equality-value extractor for (city, state, role) lookups.
fn grab_eq_value(src: &str, col: &str) -> Option<String> {
let lower = src.to_ascii_lowercase();
let col_lower = col.to_ascii_lowercase();
let mut search_from = 0usize;
while let Some(off) = lower[search_from..].find(&col_lower) {
let pos = search_from + off;
let prior_ok = pos == 0
|| !lower.as_bytes()[pos - 1].is_ascii_alphanumeric()
&& lower.as_bytes()[pos - 1] != b'_';
let after = pos + col_lower.len();
if !prior_ok || after >= src.len() {
search_from = pos + col_lower.len();
continue;
}
let mut i = after;
while i < src.len() && src.as_bytes()[i] == b' ' { i += 1; }
if i >= src.len() || src.as_bytes()[i] != b'=' { search_from = pos + col_lower.len(); continue; }
i += 1;
while i < src.len() && src.as_bytes()[i] == b' ' { i += 1; }
if i >= src.len() || src.as_bytes()[i] != b'\'' { search_from = pos + col_lower.len(); continue; }
i += 1;
let start = i;
while i < src.len() && src.as_bytes()[i] != b'\'' { i += 1; }
if i > start {
return Some(src[start..i].to_string());
}
search_from = pos + col_lower.len();
}
None
}
/// Pull (city, state) out of a SQL filter that uses
/// `city = 'Toledo' AND state = 'OH'` style equality. Returns None if
/// either is missing — the caller keeps the original global boost map
/// behavior (no geo narrowing). Case-insensitive on the column name
/// so `CITY=` or `City =` also work.
fn extract_target_geo(sql_filter: &str) -> Option<(String, String)> {
let city = grab_eq_value(sql_filter, "city")?;
let state = grab_eq_value(sql_filter, "state")?;
Some((city, state))
}
fn parse_worker_chunk(chunk: &str) -> Option<(String, String, String)> {
// "Name — Role in City, ST. …" → split on "—" then " in " then ","
let (name_part, rest) = chunk.split_once('—')?;
let rest = rest.trim();
let (_role, loc_part) = rest.split_once(" in ")?;
let loc_part = loc_part.trim();
let (city, state_plus) = loc_part.split_once(',')?;
let state: String = state_plus.trim()
.chars()
.take_while(|c| c.is_ascii_alphabetic())
.collect();
let name = name_part.trim().to_string();
let city = city.trim().to_string();
if name.is_empty() || city.is_empty() || state.is_empty() {
return None;
}
Some((name, city, state))
}
#[derive(Deserialize)]
struct SeedPlaybookRequest {
/// One playbook with {operation, approach, context, endorsed_names}.
/// City + state are parsed from the operation text.
operation: String,
#[serde(default)]
approach: String,
#[serde(default)]
context: String,
endorsed_names: Vec<String>,
/// Append to the existing memory rather than replacing. Default true —
/// seeding is a bootstrap/demo tool, not a rebuild substitute.
#[serde(default = "default_true")]
append: bool,
/// Phase 25 — optional schema_fingerprint captured at seed time.
/// When the underlying dataset's schema changes, any entry whose
/// fingerprint doesn't match the new one is auto-retired via
/// retire_on_schema_drift. Caller-provided so the producer (the
/// scenario driver, the orchestrator) can pass the live fingerprint
/// without the gateway needing a second catalogd round trip.
#[serde(default)]
schema_fingerprint: Option<String>,
/// Phase 25 — optional hard expiry. RFC3339 timestamp. After this
/// moment the entry is skipped during boost computation (not
/// retired, just inactive). Useful for seasonal/temp contracts.
#[serde(default)]
valid_until: Option<String>,
}
/// Bootstrap / test-only: inject a playbook entry directly into
/// `playbook_memory` without going through `successful_playbooks`. Useful
/// when the source dataset has stale or phantom entries (as the initial
/// staffing seed did — names that don't correspond to real workers), and
/// you want to demonstrate the feedback loop with a known-good fixture.
///
/// Production path is always `/rebuild` — this endpoint is for operators
/// who need to prime the memory before real playbooks accumulate.
async fn seed_playbook_memory(
State(state): State<VectorState>,
Json(req): Json<SeedPlaybookRequest>,
) -> impl IntoResponse {
// Embed the entry through the same text shape `rebuild` uses so
// similarity math is comparable across seed + real entries.
let tmp_entry = playbook_memory::PlaybookEntry {
playbook_id: String::new(),
operation: req.operation.clone(),
approach: req.approach.clone(),
context: req.context.clone(),
timestamp: chrono::Utc::now().to_rfc3339(),
endorsed_names: req.endorsed_names.clone(),
city: None, state: None, embedding: None,
schema_fingerprint: None,
valid_until: None,
retired_at: None,
retirement_reason: None,
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
};
let text = format!(
"{} | {} | {} | fills: {}",
tmp_entry.operation, tmp_entry.approach, tmp_entry.context,
tmp_entry.endorsed_names.join(", "),
);
let resp = match state.ai_client.embed(EmbedRequest { texts: vec![text], model: None }).await {
Ok(r) => r,
Err(e) => return Err((StatusCode::BAD_GATEWAY, format!("embed seed: {e}"))),
};
if resp.embeddings.is_empty() {
return Err((StatusCode::BAD_GATEWAY, "embed returned nothing".into()));
}
let emb: Vec<f32> = resp.embeddings[0].iter().map(|&x| x as f32).collect();
// Parse city/state from the operation ("fill: Role xN in City, ST").
// Parser lives in playbook_memory::rebuild — expose via a tiny helper
// or inline the same logic here; duplicated briefly since this seed
// path is stable but infrequently called.
let (city, state_) = {
let after_in = req.operation.split(" in ").nth(1).unwrap_or("");
let mut parts = after_in.splitn(2, ',');
let city = parts.next().map(|s| s.trim().to_string()).filter(|s| !s.is_empty());
let state = parts.next().map(|s| s.trim().chars().take_while(|c| c.is_ascii_alphabetic()).collect::<String>()).filter(|s| !s.is_empty());
(city, state)
};
if city.is_none() || state_.is_none() {
return Err((StatusCode::BAD_REQUEST,
"operation must match 'fill: Role xN in City, ST' shape".into()));
}
// Stable id: hash of timestamp + operation. Callers get the id back
// so they can reference it in citations.
let ts = chrono::Utc::now().to_rfc3339();
use sha2::{Digest, Sha256};
let mut h = Sha256::new();
h.update(ts.as_bytes());
h.update(b"|");
h.update(req.operation.as_bytes());
let bytes = h.finalize();
let pid = format!("pb-seed-{}", bytes.iter().take(8).map(|b| format!("{b:02x}")).collect::<String>());
let new_entry = playbook_memory::PlaybookEntry {
playbook_id: pid.clone(),
operation: req.operation,
approach: req.approach,
context: req.context,
timestamp: ts,
endorsed_names: req.endorsed_names,
city, state: state_,
embedding: Some(emb),
// Phase 25 — seed request may carry a fingerprint; if not, we
// default to None and the entry degrades to "no expiry signal"
// (never auto-retired on drift, but manual retirement still
// works). valid_until + retired_at start None.
schema_fingerprint: req.schema_fingerprint.clone(),
valid_until: req.valid_until.clone(),
retired_at: None,
retirement_reason: None,
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
};
// Phase 26 — when append=true (default), route through upsert so
// same-day re-seeds of the same operation merge instead of
// appending duplicates. When append=false, retain the old
// replace-all semantics for callers that want a hard reset.
if req.append {
match state.playbook_memory.upsert_entry(new_entry).await {
Ok(outcome) => {
let entries_after = state.playbook_memory.entry_count().await;
Ok(Json(serde_json::json!({
"outcome": outcome,
"entries_after": entries_after,
})))
}
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, format!("upsert: {e}"))),
}
} else {
if let Err(e) = state.playbook_memory.set_entries(vec![new_entry]).await {
return Err((StatusCode::INTERNAL_SERVER_ERROR, format!("persist: {e}")));
}
Ok(Json(serde_json::json!({
"outcome": { "mode": "replaced", "playbook_id": pid },
"entries_after": state.playbook_memory.entry_count().await,
})))
}
}
async fn rebuild_playbook_memory(
State(state): State<VectorState>,
) -> impl IntoResponse {
match playbook_memory::rebuild(
&state.playbook_memory,
&state.ai_client,
&state.catalog,
&state.bucket_registry,
).await {
Ok(report) => Ok(Json(report)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
// Path 2 foundation — dump in-memory playbook_memory state to a fresh
// `successful_playbooks_live` dataset. Cheap to call (writes one parquet,
// updates one manifest), so /log can call it after every seed to keep the
// SQL-queryable surface honest without the destructive REPLACE bug that
// /ingest/file has.
async fn persist_playbook_memory_sql(
State(state): State<VectorState>,
) -> impl IntoResponse {
match playbook_memory::persist_to_sql(&state.playbook_memory, &state.catalog).await {
Ok(report) => Ok(Json(report)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
#[derive(Deserialize)]
struct PatternsRequest {
query: String,
#[serde(default = "default_pattern_k")]
top_k_playbooks: usize,
/// Minimum frequency (0.0-1.0) for a trait to make the report.
/// Default 0.4 — at least 40% of examined workers must share it.
#[serde(default = "default_pattern_min_freq")]
min_trait_frequency: f32,
}
fn default_pattern_k() -> usize { 10 }
fn default_pattern_min_freq() -> f32 { 0.4 }
// Path 2 — meta-index discovery surface. "What did past similar fills
// have in common that I didn't ask about?" — surfaces signals like
// recurring certifications, skill clusters, archetype tendencies.
async fn discover_playbook_patterns(
State(state): State<VectorState>,
Json(req): Json<PatternsRequest>,
) -> impl IntoResponse {
match playbook_memory::discover_patterns(
&state.playbook_memory,
&state.ai_client,
&state.catalog,
&state.bucket_registry,
&req.query,
req.top_k_playbooks,
req.min_trait_frequency,
).await {
Ok(report) => Ok(Json(report)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
#[derive(Deserialize)]
struct MarkFailedRequest {
/// Operation text, same shape as seed: "fill: Role xN in City, ST"
operation: String,
/// Names of workers who didn't deliver on the fill.
failed_names: Vec<String>,
/// Short reason (no-show, fired, unreliable). Stored verbatim.
#[serde(default)]
reason: String,
}
async fn mark_playbook_failed(
State(state): State<VectorState>,
Json(req): Json<MarkFailedRequest>,
) -> impl IntoResponse {
// Parse city + state from the operation — mirrors seed's parser.
let after_in = req.operation.split(" in ").nth(1).unwrap_or("");
let mut parts = after_in.splitn(2, ',');
let city = parts.next().map(|s| s.trim().to_string()).filter(|s| !s.is_empty());
let state_ = parts.next().map(|s|
s.trim().chars().take_while(|c| c.is_ascii_alphabetic()).collect::<String>()
).filter(|s| !s.is_empty());
let (Some(city), Some(state_code)) = (city, state_) else {
return Err((StatusCode::BAD_REQUEST,
"operation must match 'fill: Role xN in City, ST' shape".into()));
};
let ts = chrono::Utc::now().to_rfc3339();
let records: Vec<playbook_memory::FailureRecord> = req.failed_names.iter()
.map(|n| playbook_memory::FailureRecord {
city: city.clone(), state: state_code.clone(), name: n.clone(),
reason: req.reason.clone(), timestamp: ts.clone(),
})
.collect();
match state.playbook_memory.mark_failures(records).await {
Ok(added) => Ok(Json(serde_json::json!({ "added": added, "city": city, "state": state_code }))),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
async fn playbook_memory_stats(
State(state): State<VectorState>,
) -> impl IntoResponse {
let entries = state.playbook_memory.snapshot().await;
Json(serde_json::json!({
"entries": entries.len(),
"total_names_endorsed": entries.iter().map(|e| e.endorsed_names.len()).sum::<usize>(),
"entries_with_embeddings": entries.iter().filter(|e| e.embedding.is_some()).count(),
"sample": entries.iter().take(3).map(|e| serde_json::json!({
"id": e.playbook_id,
"operation": e.operation,
"city": e.city,
"state": e.state,
"endorsed": e.endorsed_names,
})).collect::<Vec<_>>(),
}))
}
#[derive(Deserialize)]
struct RetirePlaybookRequest {
/// Retire by playbook_id — exact match, single entry. Used for
/// manual operator retirement via the UI.
#[serde(default)]
playbook_id: Option<String>,
/// Retire by scope — city + state required, with a fingerprint
/// that entries must match to survive. Fingerprint mismatch → retire.
/// Use when a schema migration produces a new fingerprint and
/// historical playbooks need to be auto-retired.
#[serde(default)]
city: Option<String>,
#[serde(default)]
state: Option<String>,
#[serde(default)]
current_schema_fingerprint: Option<String>,
/// Human-readable reason stored on the retired entry.
reason: String,
}
/// Phase 25 retirement endpoint. Two modes:
/// {playbook_id, reason} → retire one
/// {city, state, current_schema_fingerprint, reason} → retire all
/// entries in scope whose
/// fingerprint differs
async fn retire_playbook_memory(
State(state): State<VectorState>,
Json(req): Json<RetirePlaybookRequest>,
) -> impl IntoResponse {
if let Some(id) = &req.playbook_id {
return match state.playbook_memory.retire_one(id, &req.reason).await {
Ok(found) => Ok(Json(serde_json::json!({ "mode": "by_id", "retired": if found { 1 } else { 0 } }))),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
};
}
if let (Some(city), Some(state_code), Some(fp)) = (&req.city, &req.state, &req.current_schema_fingerprint) {
return match state.playbook_memory.retire_on_schema_drift(city, state_code, fp, &req.reason).await {
Ok(n) => Ok(Json(serde_json::json!({ "mode": "schema_drift", "retired": n, "city": city, "state": state_code }))),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
};
}
Err((StatusCode::BAD_REQUEST,
"supply either {playbook_id, reason} or {city, state, current_schema_fingerprint, reason}".into()))
}
/// Phase 27 — request body for `POST /playbook_memory/revise`. Same
/// shape as a seed request minus `append` (revise is always
/// append-semantics for a specific parent) plus `parent_id`. The new
/// version's `playbook_id` is derived deterministically so callers get
/// the same id back from repeated revises with identical content —
/// useful for idempotent retry paths.
#[derive(Deserialize)]
struct RevisePlaybookRequest {
parent_id: String,
operation: String,
approach: String,
context: String,
endorsed_names: Vec<String>,
#[serde(default)]
schema_fingerprint: Option<String>,
#[serde(default)]
valid_until: Option<String>,
}
/// Phase 27 — create a new version of an existing playbook. The parent
/// is marked superseded; the new entry inherits the chain via
/// `parent_id` and carries `version = parent.version + 1`. Errors with
/// 400 on a retired or already-superseded parent (must revise the tip
/// of the chain). Embeds the new text through the same shape as
/// `/seed` so cosine similarity stays comparable across rebuild + seed
/// + revise entries.
async fn revise_playbook_memory(
State(state): State<VectorState>,
Json(req): Json<RevisePlaybookRequest>,
) -> Result<Json<serde_json::Value>, (StatusCode, String)> {
let text = format!(
"{} | {} | {} | fills: {}",
req.operation, req.approach, req.context,
req.endorsed_names.join(", "),
);
let resp = state.ai_client.embed(EmbedRequest { texts: vec![text], model: None })
.await
.map_err(|e| (StatusCode::BAD_GATEWAY, format!("embed revise: {e}")))?;
if resp.embeddings.is_empty() {
return Err((StatusCode::BAD_GATEWAY, "embed returned nothing".into()));
}
let emb: Vec<f32> = resp.embeddings[0].iter().map(|&x| x as f32).collect();
let (city, state_) = {
let after_in = req.operation.split(" in ").nth(1).unwrap_or("");
let mut parts = after_in.splitn(2, ',');
let city = parts.next().map(|s| s.trim().to_string()).filter(|s| !s.is_empty());
let state = parts.next()
.map(|s| s.trim().chars().take_while(|c| c.is_ascii_alphabetic()).collect::<String>())
.filter(|s| !s.is_empty());
(city, state)
};
if city.is_none() || state_.is_none() {
return Err((StatusCode::BAD_REQUEST,
"operation must match 'fill: Role xN in City, ST' shape".into()));
}
let ts = chrono::Utc::now().to_rfc3339();
use sha2::{Digest, Sha256};
let mut h = Sha256::new();
h.update(ts.as_bytes());
h.update(b"|");
h.update(req.parent_id.as_bytes());
h.update(b"|");
h.update(req.operation.as_bytes());
let bytes = h.finalize();
let pid = format!("pb-rev-{}", bytes.iter().take(8).map(|b| format!("{b:02x}")).collect::<String>());
let new_entry = playbook_memory::PlaybookEntry {
playbook_id: pid.clone(),
operation: req.operation,
approach: req.approach,
context: req.context,
timestamp: ts,
endorsed_names: req.endorsed_names,
city, state: state_,
embedding: Some(emb),
schema_fingerprint: req.schema_fingerprint,
valid_until: req.valid_until,
retired_at: None,
retirement_reason: None,
// revise_entry overwrites these from the parent — values here
// are just placeholders so the struct is well-formed.
version: 1,
parent_id: None,
superseded_at: None,
superseded_by: None,
};
let outcome = state.playbook_memory.revise_entry(&req.parent_id, new_entry)
.await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
Ok(Json(serde_json::json!({
"outcome": outcome,
"entries_after": state.playbook_memory.entry_count().await,
})))
}
/// Phase 27 — return the full version chain containing `playbook_id`,
/// ordered root → tip. 404 if the id isn't present. The walker is
/// cycle-safe by construction (visited set per direction).
async fn playbook_memory_history(
State(state): State<VectorState>,
Path(playbook_id): Path<String>,
) -> Result<Json<serde_json::Value>, (StatusCode, String)> {
let chain = state.playbook_memory.history(&playbook_id).await;
if chain.is_empty() {
return Err((StatusCode::NOT_FOUND, format!("no playbook with id '{playbook_id}'")));
}
Ok(Json(serde_json::json!({
"playbook_id": playbook_id,
"versions": chain.len(),
"chain": chain,
})))
}
/// Phase 25 status endpoint — reports retirement counts so dashboards
/// can show "N playbooks retired (12 from 2026-05 schema migration)".
/// Phase 27 added `superseded` as a distinct counter.
async fn playbook_memory_status(
State(state): State<VectorState>,
) -> impl IntoResponse {
let (total, retired, superseded, failures) = state.playbook_memory.status_counts().await;
// `active` = entries eligible for boost. Retired and superseded are
// distinct exclusion reasons; subtract both. An entry can in principle
// be both retired AND superseded (e.g. revised then retired) so
// saturating_sub guards against underflow if that pathological case
// ever lands.
let inactive = retired + superseded;
Json(serde_json::json!({
"total": total,
"retired": retired,
"superseded": superseded,
"active": total.saturating_sub(inactive),
"failures": failures,
}))
}
async fn lance_recall_harness(
State(state): State<VectorState>,
Path(index_name): Path<String>,
Json(req): Json<LanceRecallRequest>,
) -> impl IntoResponse {
let t0 = std::time::Instant::now();
let harness_set = state.harness_store.load_for_index(&index_name, &req.harness).await
.map_err(|e| (StatusCode::NOT_FOUND, format!("harness: {e}")))?;
if !harness_set.ground_truth_built {
return Err((StatusCode::BAD_REQUEST,
"harness has no ground truth — run a regular /hnsw/trial first to compute it".into()));
}
let lance_store = state.lance.store_for(&index_name).await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
let k = req.top_k;
let mut per_query = Vec::with_capacity(harness_set.queries.len());
let mut latencies: Vec<f32> = Vec::with_capacity(harness_set.queries.len());
let mut recalls: Vec<f32> = Vec::with_capacity(harness_set.queries.len());
for q in &harness_set.queries {
let qv = match &q.query_embedding {
Some(v) => v,
None => continue,
};
let gt = match &q.ground_truth {
Some(gt) => gt,
None => continue,
};
let qt0 = std::time::Instant::now();
let hits = lance_store.search(
qv,
k,
Some(req.nprobes.unwrap_or(LANCE_DEFAULT_NPROBES)),
Some(req.refine_factor.unwrap_or(LANCE_DEFAULT_REFINE_FACTOR)),
).await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, format!("search: {e}")))?;
let lat_us = qt0.elapsed().as_micros() as f32;
let predicted: Vec<String> = hits.iter().map(|h| h.doc_id.clone()).collect();
let recall = harness::recall_at_k(&predicted, gt, k);
per_query.push(LanceRecallQuery {
query_id: q.id.clone(),
recall,
latency_us: lat_us,
hits_returned: hits.len(),
});
latencies.push(lat_us);
recalls.push(recall);
}
let mean_recall = if recalls.is_empty() { 0.0 } else {
recalls.iter().sum::<f32>() / recalls.len() as f32
};
latencies.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
let p = |pct: f32| -> f32 {
if latencies.is_empty() { return 0.0; }
let idx = ((latencies.len() as f32 - 1.0) * pct).round() as usize;
latencies[idx.min(latencies.len() - 1)]
};
Ok(Json(LanceRecallResult {
index_name,
harness: req.harness,
queries: per_query.len(),
top_k: k,
mean_recall,
per_query,
latency_p50_us: p(0.50),
latency_p95_us: p(0.95),
total_duration_secs: t0.elapsed().as_secs_f32(),
}))
}
/// Build a scalar btree index on a column (typically `doc_id`). Makes
/// filter-pushdown queries O(log N) instead of full-fragment scan.
async fn lance_build_scalar_index(
State(state): State<VectorState>,
Path((index_name, column)): Path<(String, String)>,
) -> impl IntoResponse {
let lance_store = state.lance.store_for(&index_name).await
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
match lance_store.build_scalar_index(&column).await {
Ok(stats) => Ok(Json(stats)),
Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)),
}
}
#[cfg(test)]
mod extractor_tests {
use super::*;
#[test]
fn extract_target_geo_basic() {
let f = "role = 'Welder' AND city = 'Toledo' AND state = 'OH' AND CAST(availability AS DOUBLE) > 0.5";
assert_eq!(extract_target_geo(f), Some(("Toledo".into(), "OH".into())));
}
#[test]
fn extract_target_geo_missing_state_returns_none() {
let f = "role = 'Welder' AND city = 'Toledo'";
assert_eq!(extract_target_geo(f), None);
}
#[test]
fn extract_target_geo_word_boundary() {
// "civilian" contains "city" as a substring — must not match.
let f = "civilian_rank = 1 AND city = 'Toledo' AND state = 'OH'";
assert_eq!(extract_target_geo(f), Some(("Toledo".into(), "OH".into())));
}
#[test]
fn extract_target_role_basic() {
let f = "role = 'Welder' AND city = 'Toledo'";
assert_eq!(extract_target_role(f), Some("Welder".into()));
}
#[test]
fn extract_target_role_none_when_absent() {
let f = "city = 'Toledo' AND state = 'OH'";
assert_eq!(extract_target_role(f), None);
}
#[test]
fn extract_target_role_multi_word() {
let f = "role = 'Warehouse Associate' AND city = 'Chicago'";
assert_eq!(extract_target_role(f), Some("Warehouse Associate".into()));
}
}