root dbe00d018f Federation foundation + HNSW trial system + Postgres streaming + PRD reframe
Four shipped features and a PRD realignment, all measured end-to-end:

HNSW trial system (Phase 15 horizon item → complete)
- vectord: EmbeddingCache, harness (eval sets + brute-force ground truth),
  TrialJournal, parameterized HnswConfig on build_index_with_config
- /vectors/hnsw/trial, /hnsw/trials/{idx}, /hnsw/trials/{idx}/best,
  /hnsw/evals/{name}/autogen, /hnsw/cache/stats
- Measured on resumes_100k_v2 (100K × 768d): brute-force 44ms -> HNSW 873us
  at 100% recall@10. ec=80 es=30 locked as HnswConfig::default()
- Lower ec values trade recall for build time: 20/30 = 0.96 recall in 8s,
  80/30 = 1.00 recall in 230s

Catalog manifest repair
- catalogd: resync_from_parquet reads parquet footers to restore row_count
  and columns on drifted manifests
- POST /catalog/datasets/{name}/resync + POST /catalog/resync-missing
- All 7 staffing tables recovered to PRD-matching 2,469,278 rows

Federation foundation (ADR-017)
- shared::secrets: SecretsProvider trait + FileSecretsProvider (reads
  /etc/lakehouse/secrets.toml, enforces 0600 perms)
- storaged::registry::BucketRegistry — multi-bucket resolution with
  rescue_bucket read fallback and reachability probing
- storaged::error_journal — bucket op failures visible in one HTTP call
- storaged::append_log — write-once batched append pattern (fixes the RMW
  anti-pattern llms3.com calls out; errors and trial journals both use it)
- /storage/buckets, /storage/errors, /storage/bucket-health,
  /storage/errors/{flush,compact}
- Bucket-aware I/O at /storage/buckets/{bucket}/objects/{*key} with
  X-Lakehouse-Rescue-Used observability headers on fallback

Postgres streaming ingest
- ingestd::pg_stream: DSN parser, batched ORDER BY + LIMIT/OFFSET pagination
  into ArrowWriter, lineage redacts password
- POST /ingest/db — verified against live knowledge_base.team_runs
  (586 rows × 13 cols, 6 batches, 196ms end-to-end)

PRD realignment (2026-04-16)
- Dual use case: staffing analytics + local LLM knowledge substrate
- Removed "multi-tenancy (single-owner system)" from non-goals
- Added invariants 8-11: indexes hot-swappable, per-reader profiles,
  trials-as-data, operational failures findable in one HTTP call
- New phases 16 (hot-swap generations), 17 (model profiles + dataset
  bindings), 18 (Lance vs Parquet+sidecar evaluation)
- Known ceilings table documents the 5M vector wall and escape hatches
- ADR-017 (federation), ADR-018 (append-log pattern) added
- EXECUTION_PLAN.md sequences phases B-E with success gates and
  decision rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 01:50:05 -05:00

284 lines
10 KiB
Rust

/// Multi-backend bucket registry — the federation foundation.
///
/// Federation rule: every `ObjectRef` belongs to exactly one named bucket.
/// The registry resolves bucket names to `object_store` backends, handles
/// rescue-bucket fallback on read failure, writes every failure to the
/// error journal, and exposes a health summary for operators.
///
/// Existing call sites can keep using `ops::*` with `registry.get(name)`.
/// New bucket-aware call sites use `registry.read_smart` / `write_smart`
/// which handle fallback + journaling automatically.
use object_store::ObjectStore;
use object_store::local::LocalFileSystem;
use serde::Serialize;
use shared::config::{BucketConfig, StorageConfig};
use shared::secrets::{BucketCredentials, SecretsProvider};
use std::collections::HashMap;
use std::sync::Arc;
use crate::error_journal::{BucketErrorEvent, ErrorJournal};
/// A registered bucket — the store handle + its configuration.
pub struct BucketEntry {
pub name: String,
pub backend: String,
pub store: Arc<dyn ObjectStore>,
pub config: BucketConfig,
}
/// Read outcome — may have been rescued.
#[derive(Debug, Clone)]
pub struct ReadOutcome {
pub data: bytes::Bytes,
pub rescued: bool,
pub original_bucket: String,
pub served_by: String,
}
/// Summary entry for GET /storage/buckets.
#[derive(Debug, Clone, Serialize)]
pub struct BucketInfo {
pub name: String,
pub backend: String,
pub reachable: bool,
pub role: BucketRole,
}
#[derive(Debug, Clone, Serialize, PartialEq)]
#[serde(rename_all = "lowercase")]
pub enum BucketRole {
Primary,
Rescue,
Profile,
Tenant,
}
pub struct BucketRegistry {
buckets: HashMap<String, Arc<BucketEntry>>,
default: String,
rescue: Option<String>,
profile_root: String,
journal: ErrorJournal,
}
impl BucketRegistry {
/// Build the registry from storage config + secrets provider.
/// Back-compat: if `buckets` is empty, synthesize a `primary` bucket from
/// the legacy `root` field so pre-federation configs keep working.
pub async fn from_config(
cfg: &StorageConfig,
secrets: Arc<dyn SecretsProvider>,
) -> Result<Self, String> {
let mut buckets: HashMap<String, Arc<BucketEntry>> = HashMap::new();
let bucket_configs: Vec<BucketConfig> = if cfg.buckets.is_empty() {
vec![BucketConfig {
name: "primary".to_string(),
backend: "local".to_string(),
root: Some(cfg.root.clone()),
bucket: None,
region: None,
endpoint: None,
secret_ref: None,
}]
} else {
cfg.buckets.clone()
};
for bc in bucket_configs {
let store = build_store(&bc, secrets.as_ref()).await?;
let entry = Arc::new(BucketEntry {
name: bc.name.clone(),
backend: bc.backend.clone(),
store,
config: bc.clone(),
});
buckets.insert(bc.name.clone(), entry);
}
// Ensure `primary` always exists — it's where error journals live.
if !buckets.contains_key("primary") {
return Err("no bucket named 'primary' configured — required as error-journal home".into());
}
// Rescue bucket is optional but, if named, must exist.
if let Some(r) = &cfg.rescue_bucket {
if !buckets.contains_key(r) {
return Err(format!("rescue_bucket '{r}' not found among configured buckets"));
}
}
let journal = ErrorJournal::new(buckets.get("primary").unwrap().store.clone());
let _ = journal.load_recent().await;
Ok(Self {
buckets,
default: "primary".to_string(),
rescue: cfg.rescue_bucket.clone(),
profile_root: cfg.profile_root.clone(),
journal,
})
}
pub fn default_name(&self) -> &str { &self.default }
pub fn rescue_name(&self) -> Option<&str> { self.rescue.as_deref() }
pub fn journal(&self) -> &ErrorJournal { &self.journal }
/// Resolve a bucket name to its object store. Existing call sites use
/// this as a drop-in replacement for the old single-store pattern.
pub fn get(&self, bucket: &str) -> Result<Arc<dyn ObjectStore>, String> {
self.buckets
.get(bucket)
.map(|e| e.store.clone())
.ok_or_else(|| format!("unknown bucket: {bucket}"))
}
/// The default bucket's store — use for code paths that don't yet know
/// about buckets.
pub fn default_store(&self) -> Arc<dyn ObjectStore> {
self.buckets.get(&self.default).unwrap().store.clone()
}
/// List all registered buckets. Checks reachability by doing a trivial
/// `list` with limit 1 on each.
pub async fn list(&self) -> Vec<BucketInfo> {
let mut out = Vec::with_capacity(self.buckets.len());
for (name, entry) in &self.buckets {
let reachable = probe(&entry.store).await;
let role = self.classify(name);
out.push(BucketInfo {
name: name.clone(),
backend: entry.backend.clone(),
reachable,
role,
});
}
out.sort_by(|a, b| a.name.cmp(&b.name));
out
}
fn classify(&self, name: &str) -> BucketRole {
if name == self.default { BucketRole::Primary }
else if Some(name) == self.rescue.as_deref() { BucketRole::Rescue }
else if name.starts_with("profile:") { BucketRole::Profile }
else { BucketRole::Tenant }
}
/// Read with rescue-bucket fallback. If the target bucket fails and a
/// rescue is configured, retries against rescue. Records every failure
/// in the error journal.
pub async fn read_smart(&self, bucket: &str, key: &str) -> Result<ReadOutcome, String> {
let target = self.buckets.get(bucket)
.ok_or_else(|| format!("unknown bucket: {bucket}"))?;
match crate::ops::get(&target.store, key).await {
Ok(data) => Ok(ReadOutcome {
data, rescued: false,
original_bucket: bucket.to_string(),
served_by: bucket.to_string(),
}),
Err(err) => {
// Record failure regardless of what happens next.
self.journal.append(BucketErrorEvent::new_read(bucket, key, &err)).await;
// Try rescue, if any.
if let Some(rescue_name) = &self.rescue {
if rescue_name != bucket {
if let Some(rescue) = self.buckets.get(rescue_name) {
match crate::ops::get(&rescue.store, key).await {
Ok(data) => {
self.journal.mark_rescued_last(bucket, key).await;
return Ok(ReadOutcome {
data, rescued: true,
original_bucket: bucket.to_string(),
served_by: rescue_name.clone(),
});
}
Err(rescue_err) => {
return Err(format!(
"read '{key}' failed in '{bucket}' ({err}); rescue '{rescue_name}' also failed ({rescue_err})"
));
}
}
}
}
}
Err(format!("read '{key}' failed in '{bucket}': {err}"))
}
}
}
/// Write always goes to target. No rescue fallback for writes — writes
/// that silently vanish are the worst possible failure.
pub async fn write_smart(
&self,
bucket: &str,
key: &str,
data: bytes::Bytes,
) -> Result<(), String> {
let target = self.buckets.get(bucket)
.ok_or_else(|| format!("unknown bucket: {bucket}"))?;
match crate::ops::put(&target.store, key, data).await {
Ok(()) => Ok(()),
Err(err) => {
self.journal.append(BucketErrorEvent::new_write(bucket, key, &err)).await;
Err(format!("write '{key}' failed in '{bucket}': {err}"))
}
}
}
}
/// Trivial reachability check — try to list with limit 0.
async fn probe(store: &Arc<dyn ObjectStore>) -> bool {
use futures::StreamExt;
let mut stream = store.list(None);
// Pulling the first item confirms the store responds. Empty bucket = ok.
match stream.next().await {
Some(Ok(_)) => true,
None => true, // empty but reachable
Some(Err(_)) => false,
}
}
/// Build a concrete ObjectStore from a BucketConfig.
async fn build_store(
bc: &BucketConfig,
secrets: &dyn SecretsProvider,
) -> Result<Arc<dyn ObjectStore>, String> {
match bc.backend.as_str() {
"local" => {
let root = bc.root.as_deref()
.ok_or_else(|| format!("bucket '{}' is backend=local but has no root", bc.name))?;
std::fs::create_dir_all(root)
.map_err(|e| format!("create bucket dir '{root}': {e}"))?;
let fs = LocalFileSystem::new_with_prefix(root)
.map_err(|e| format!("init local bucket '{}': {e}", bc.name))?;
Ok(Arc::new(fs))
}
"s3" => {
let handle = bc.secret_ref.as_deref()
.ok_or_else(|| format!("s3 bucket '{}' has no secret_ref", bc.name))?;
let creds: BucketCredentials = secrets.resolve(handle).await?;
let s3_bucket = bc.bucket.as_deref()
.ok_or_else(|| format!("s3 bucket '{}' has no `bucket` name", bc.name))?;
let region = bc.region.as_deref().unwrap_or("us-east-1");
let mut builder = object_store::aws::AmazonS3Builder::new()
.with_bucket_name(s3_bucket)
.with_region(region)
.with_access_key_id(&creds.access_key)
.with_secret_access_key(&creds.secret_key);
if let Some(endpoint) = &bc.endpoint {
builder = builder.with_endpoint(endpoint);
}
let s3 = builder.build()
.map_err(|e| format!("init s3 bucket '{}': {e}", bc.name))?;
Ok(Arc::new(s3))
}
other => Err(format!("unknown backend '{other}' for bucket '{}'", bc.name)),
}
}