Four shipped features and a PRD realignment, all measured end-to-end:
HNSW trial system (Phase 15 horizon item → complete)
- vectord: EmbeddingCache, harness (eval sets + brute-force ground truth),
TrialJournal, parameterized HnswConfig on build_index_with_config
- /vectors/hnsw/trial, /hnsw/trials/{idx}, /hnsw/trials/{idx}/best,
/hnsw/evals/{name}/autogen, /hnsw/cache/stats
- Measured on resumes_100k_v2 (100K × 768d): brute-force 44ms -> HNSW 873us
at 100% recall@10. ec=80 es=30 locked as HnswConfig::default()
- Lower ec values trade recall for build time: 20/30 = 0.96 recall in 8s,
80/30 = 1.00 recall in 230s
Catalog manifest repair
- catalogd: resync_from_parquet reads parquet footers to restore row_count
and columns on drifted manifests
- POST /catalog/datasets/{name}/resync + POST /catalog/resync-missing
- All 7 staffing tables recovered to PRD-matching 2,469,278 rows
Federation foundation (ADR-017)
- shared::secrets: SecretsProvider trait + FileSecretsProvider (reads
/etc/lakehouse/secrets.toml, enforces 0600 perms)
- storaged::registry::BucketRegistry — multi-bucket resolution with
rescue_bucket read fallback and reachability probing
- storaged::error_journal — bucket op failures visible in one HTTP call
- storaged::append_log — write-once batched append pattern (fixes the RMW
anti-pattern llms3.com calls out; errors and trial journals both use it)
- /storage/buckets, /storage/errors, /storage/bucket-health,
/storage/errors/{flush,compact}
- Bucket-aware I/O at /storage/buckets/{bucket}/objects/{*key} with
X-Lakehouse-Rescue-Used observability headers on fallback
Postgres streaming ingest
- ingestd::pg_stream: DSN parser, batched ORDER BY + LIMIT/OFFSET pagination
into ArrowWriter, lineage redacts password
- POST /ingest/db — verified against live knowledge_base.team_runs
(586 rows × 13 cols, 6 batches, 196ms end-to-end)
PRD realignment (2026-04-16)
- Dual use case: staffing analytics + local LLM knowledge substrate
- Removed "multi-tenancy (single-owner system)" from non-goals
- Added invariants 8-11: indexes hot-swappable, per-reader profiles,
trials-as-data, operational failures findable in one HTTP call
- New phases 16 (hot-swap generations), 17 (model profiles + dataset
bindings), 18 (Lance vs Parquet+sidecar evaluation)
- Known ceilings table documents the 5M vector wall and escape hatches
- ADR-017 (federation), ADR-018 (append-log pattern) added
- EXECUTION_PLAN.md sequences phases B-E with success gates and
decision rules
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
284 lines
10 KiB
Rust
284 lines
10 KiB
Rust
/// Multi-backend bucket registry — the federation foundation.
|
|
///
|
|
/// Federation rule: every `ObjectRef` belongs to exactly one named bucket.
|
|
/// The registry resolves bucket names to `object_store` backends, handles
|
|
/// rescue-bucket fallback on read failure, writes every failure to the
|
|
/// error journal, and exposes a health summary for operators.
|
|
///
|
|
/// Existing call sites can keep using `ops::*` with `registry.get(name)`.
|
|
/// New bucket-aware call sites use `registry.read_smart` / `write_smart`
|
|
/// which handle fallback + journaling automatically.
|
|
|
|
use object_store::ObjectStore;
|
|
use object_store::local::LocalFileSystem;
|
|
use serde::Serialize;
|
|
use shared::config::{BucketConfig, StorageConfig};
|
|
use shared::secrets::{BucketCredentials, SecretsProvider};
|
|
use std::collections::HashMap;
|
|
use std::sync::Arc;
|
|
|
|
use crate::error_journal::{BucketErrorEvent, ErrorJournal};
|
|
|
|
/// A registered bucket — the store handle + its configuration.
|
|
pub struct BucketEntry {
|
|
pub name: String,
|
|
pub backend: String,
|
|
pub store: Arc<dyn ObjectStore>,
|
|
pub config: BucketConfig,
|
|
}
|
|
|
|
/// Read outcome — may have been rescued.
|
|
#[derive(Debug, Clone)]
|
|
pub struct ReadOutcome {
|
|
pub data: bytes::Bytes,
|
|
pub rescued: bool,
|
|
pub original_bucket: String,
|
|
pub served_by: String,
|
|
}
|
|
|
|
/// Summary entry for GET /storage/buckets.
|
|
#[derive(Debug, Clone, Serialize)]
|
|
pub struct BucketInfo {
|
|
pub name: String,
|
|
pub backend: String,
|
|
pub reachable: bool,
|
|
pub role: BucketRole,
|
|
}
|
|
|
|
#[derive(Debug, Clone, Serialize, PartialEq)]
|
|
#[serde(rename_all = "lowercase")]
|
|
pub enum BucketRole {
|
|
Primary,
|
|
Rescue,
|
|
Profile,
|
|
Tenant,
|
|
}
|
|
|
|
pub struct BucketRegistry {
|
|
buckets: HashMap<String, Arc<BucketEntry>>,
|
|
default: String,
|
|
rescue: Option<String>,
|
|
profile_root: String,
|
|
journal: ErrorJournal,
|
|
}
|
|
|
|
impl BucketRegistry {
|
|
/// Build the registry from storage config + secrets provider.
|
|
/// Back-compat: if `buckets` is empty, synthesize a `primary` bucket from
|
|
/// the legacy `root` field so pre-federation configs keep working.
|
|
pub async fn from_config(
|
|
cfg: &StorageConfig,
|
|
secrets: Arc<dyn SecretsProvider>,
|
|
) -> Result<Self, String> {
|
|
let mut buckets: HashMap<String, Arc<BucketEntry>> = HashMap::new();
|
|
|
|
let bucket_configs: Vec<BucketConfig> = if cfg.buckets.is_empty() {
|
|
vec![BucketConfig {
|
|
name: "primary".to_string(),
|
|
backend: "local".to_string(),
|
|
root: Some(cfg.root.clone()),
|
|
bucket: None,
|
|
region: None,
|
|
endpoint: None,
|
|
secret_ref: None,
|
|
}]
|
|
} else {
|
|
cfg.buckets.clone()
|
|
};
|
|
|
|
for bc in bucket_configs {
|
|
let store = build_store(&bc, secrets.as_ref()).await?;
|
|
let entry = Arc::new(BucketEntry {
|
|
name: bc.name.clone(),
|
|
backend: bc.backend.clone(),
|
|
store,
|
|
config: bc.clone(),
|
|
});
|
|
buckets.insert(bc.name.clone(), entry);
|
|
}
|
|
|
|
// Ensure `primary` always exists — it's where error journals live.
|
|
if !buckets.contains_key("primary") {
|
|
return Err("no bucket named 'primary' configured — required as error-journal home".into());
|
|
}
|
|
|
|
// Rescue bucket is optional but, if named, must exist.
|
|
if let Some(r) = &cfg.rescue_bucket {
|
|
if !buckets.contains_key(r) {
|
|
return Err(format!("rescue_bucket '{r}' not found among configured buckets"));
|
|
}
|
|
}
|
|
|
|
let journal = ErrorJournal::new(buckets.get("primary").unwrap().store.clone());
|
|
let _ = journal.load_recent().await;
|
|
|
|
Ok(Self {
|
|
buckets,
|
|
default: "primary".to_string(),
|
|
rescue: cfg.rescue_bucket.clone(),
|
|
profile_root: cfg.profile_root.clone(),
|
|
journal,
|
|
})
|
|
}
|
|
|
|
pub fn default_name(&self) -> &str { &self.default }
|
|
|
|
pub fn rescue_name(&self) -> Option<&str> { self.rescue.as_deref() }
|
|
|
|
pub fn journal(&self) -> &ErrorJournal { &self.journal }
|
|
|
|
/// Resolve a bucket name to its object store. Existing call sites use
|
|
/// this as a drop-in replacement for the old single-store pattern.
|
|
pub fn get(&self, bucket: &str) -> Result<Arc<dyn ObjectStore>, String> {
|
|
self.buckets
|
|
.get(bucket)
|
|
.map(|e| e.store.clone())
|
|
.ok_or_else(|| format!("unknown bucket: {bucket}"))
|
|
}
|
|
|
|
/// The default bucket's store — use for code paths that don't yet know
|
|
/// about buckets.
|
|
pub fn default_store(&self) -> Arc<dyn ObjectStore> {
|
|
self.buckets.get(&self.default).unwrap().store.clone()
|
|
}
|
|
|
|
/// List all registered buckets. Checks reachability by doing a trivial
|
|
/// `list` with limit 1 on each.
|
|
pub async fn list(&self) -> Vec<BucketInfo> {
|
|
let mut out = Vec::with_capacity(self.buckets.len());
|
|
for (name, entry) in &self.buckets {
|
|
let reachable = probe(&entry.store).await;
|
|
let role = self.classify(name);
|
|
out.push(BucketInfo {
|
|
name: name.clone(),
|
|
backend: entry.backend.clone(),
|
|
reachable,
|
|
role,
|
|
});
|
|
}
|
|
out.sort_by(|a, b| a.name.cmp(&b.name));
|
|
out
|
|
}
|
|
|
|
fn classify(&self, name: &str) -> BucketRole {
|
|
if name == self.default { BucketRole::Primary }
|
|
else if Some(name) == self.rescue.as_deref() { BucketRole::Rescue }
|
|
else if name.starts_with("profile:") { BucketRole::Profile }
|
|
else { BucketRole::Tenant }
|
|
}
|
|
|
|
/// Read with rescue-bucket fallback. If the target bucket fails and a
|
|
/// rescue is configured, retries against rescue. Records every failure
|
|
/// in the error journal.
|
|
pub async fn read_smart(&self, bucket: &str, key: &str) -> Result<ReadOutcome, String> {
|
|
let target = self.buckets.get(bucket)
|
|
.ok_or_else(|| format!("unknown bucket: {bucket}"))?;
|
|
|
|
match crate::ops::get(&target.store, key).await {
|
|
Ok(data) => Ok(ReadOutcome {
|
|
data, rescued: false,
|
|
original_bucket: bucket.to_string(),
|
|
served_by: bucket.to_string(),
|
|
}),
|
|
Err(err) => {
|
|
// Record failure regardless of what happens next.
|
|
self.journal.append(BucketErrorEvent::new_read(bucket, key, &err)).await;
|
|
|
|
// Try rescue, if any.
|
|
if let Some(rescue_name) = &self.rescue {
|
|
if rescue_name != bucket {
|
|
if let Some(rescue) = self.buckets.get(rescue_name) {
|
|
match crate::ops::get(&rescue.store, key).await {
|
|
Ok(data) => {
|
|
self.journal.mark_rescued_last(bucket, key).await;
|
|
return Ok(ReadOutcome {
|
|
data, rescued: true,
|
|
original_bucket: bucket.to_string(),
|
|
served_by: rescue_name.clone(),
|
|
});
|
|
}
|
|
Err(rescue_err) => {
|
|
return Err(format!(
|
|
"read '{key}' failed in '{bucket}' ({err}); rescue '{rescue_name}' also failed ({rescue_err})"
|
|
));
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
Err(format!("read '{key}' failed in '{bucket}': {err}"))
|
|
}
|
|
}
|
|
}
|
|
|
|
/// Write always goes to target. No rescue fallback for writes — writes
|
|
/// that silently vanish are the worst possible failure.
|
|
pub async fn write_smart(
|
|
&self,
|
|
bucket: &str,
|
|
key: &str,
|
|
data: bytes::Bytes,
|
|
) -> Result<(), String> {
|
|
let target = self.buckets.get(bucket)
|
|
.ok_or_else(|| format!("unknown bucket: {bucket}"))?;
|
|
match crate::ops::put(&target.store, key, data).await {
|
|
Ok(()) => Ok(()),
|
|
Err(err) => {
|
|
self.journal.append(BucketErrorEvent::new_write(bucket, key, &err)).await;
|
|
Err(format!("write '{key}' failed in '{bucket}': {err}"))
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
/// Trivial reachability check — try to list with limit 0.
|
|
async fn probe(store: &Arc<dyn ObjectStore>) -> bool {
|
|
use futures::StreamExt;
|
|
let mut stream = store.list(None);
|
|
// Pulling the first item confirms the store responds. Empty bucket = ok.
|
|
match stream.next().await {
|
|
Some(Ok(_)) => true,
|
|
None => true, // empty but reachable
|
|
Some(Err(_)) => false,
|
|
}
|
|
}
|
|
|
|
/// Build a concrete ObjectStore from a BucketConfig.
|
|
async fn build_store(
|
|
bc: &BucketConfig,
|
|
secrets: &dyn SecretsProvider,
|
|
) -> Result<Arc<dyn ObjectStore>, String> {
|
|
match bc.backend.as_str() {
|
|
"local" => {
|
|
let root = bc.root.as_deref()
|
|
.ok_or_else(|| format!("bucket '{}' is backend=local but has no root", bc.name))?;
|
|
std::fs::create_dir_all(root)
|
|
.map_err(|e| format!("create bucket dir '{root}': {e}"))?;
|
|
let fs = LocalFileSystem::new_with_prefix(root)
|
|
.map_err(|e| format!("init local bucket '{}': {e}", bc.name))?;
|
|
Ok(Arc::new(fs))
|
|
}
|
|
"s3" => {
|
|
let handle = bc.secret_ref.as_deref()
|
|
.ok_or_else(|| format!("s3 bucket '{}' has no secret_ref", bc.name))?;
|
|
let creds: BucketCredentials = secrets.resolve(handle).await?;
|
|
let s3_bucket = bc.bucket.as_deref()
|
|
.ok_or_else(|| format!("s3 bucket '{}' has no `bucket` name", bc.name))?;
|
|
let region = bc.region.as_deref().unwrap_or("us-east-1");
|
|
|
|
let mut builder = object_store::aws::AmazonS3Builder::new()
|
|
.with_bucket_name(s3_bucket)
|
|
.with_region(region)
|
|
.with_access_key_id(&creds.access_key)
|
|
.with_secret_access_key(&creds.secret_key);
|
|
if let Some(endpoint) = &bc.endpoint {
|
|
builder = builder.with_endpoint(endpoint);
|
|
}
|
|
let s3 = builder.build()
|
|
.map_err(|e| format!("init s3 bucket '{}': {e}", bc.name))?;
|
|
Ok(Arc::new(s3))
|
|
}
|
|
other => Err(format!("unknown backend '{other}' for bucket '{}'", bc.name)),
|
|
}
|
|
}
|