Phase B: Lance pilot — hybrid decision with measured benchmark

Standalone benchmark crate `crates/lance-bench` running Lance 4.0 against our Parquet+HNSW at 100K × 768d (resumes_100k_v2) measured 8 dimensions. Results (see docs/ADR-019-vector-storage.md for full scorecard): Cold load: Parquet 0.17s vs Lance 0.13s (tie — not ≥2× threshold) Disk size: 330.3 MB vs 330.4 MB (tie) Search p50: 873us vs 2229us (Parquet 2.55× faster) Search p95: 1413us vs 4998us (Parquet 3.54× faster) Index build: 230s (ec=80) vs 16s (IVF_PQ) (Lance 14× faster) Random access: 35ms (scan) vs 311us (Lance 112× faster) Append 10K rows: full rewrite vs 0.08s/+31MB (Lance structural win) Decision (ADR-019): hybrid, not migrate-or-reject. - Parquet+HNSW stays primary — our HNSW at ec=80 es=30 recall=1.00 is 2.55× faster than Lance IVF_PQ at 100K in-RAM scale - Lance joins as second backend per-profile for workloads where it wins architecturally: random row access (RAG text fetch), append-heavy pipelines (Phase C), hot-swap generations (Phase 16, 14× faster builds), and indexes past the ~5M RAM ceiling - Phase 17 ModelProfile gets vector_backend: Parquet | Lance field - Ceiling table in PRD updated — 5M ceiling now says "switch to Lance" instead of "migrate" since Lance runs alongside, not instead of Isolation: lance-bench is a standalone workspace crate with its own dep tree (Lance pulls DataFusion 52 + Arrow 57 incompatible with main stack DataFusion 47 + Arrow 55). Kept off the critical path until API is stable enough to promote into vectord::lance_store. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 02:37:11 -05:00 · 2026-04-16 02:37:11 -05:00 · 76f6fba5de
commit 76f6fba5de
parent dbe00d018f
7 changed files with 3557 additions and 363 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@ -12,6 +12,7 @@ members = [
    "crates/journald",
    "crates/gateway",
    "crates/ui",
+    "crates/lance-bench",
 ]

 [workspace.dependencies]
--- a/crates/lance-bench/Cargo.toml
+++ b/crates/lance-bench/Cargo.toml
@ -0,0 +1,42 @@
+[package]
+name = "lance-bench"
+version = "0.1.0"
+edition = "2024"
+
+# Standalone pilot for Phase B (see docs/EXECUTION_PLAN.md).
+# Deliberately NOT sharing workspace deps — Lance 4.x pulls in its own
+# DataFusion and Arrow versions incompatible with the rest of the stack.
+# Isolating the pilot means we don't force a workspace-wide upgrade until
+# we've decided Lance is worth it.
+
+[dependencies]
+# Only the features we actually need — the default brings in AWS/Azure/GCP/HF etc
+# which is ~200 extra crates we don't care about for a local pilot.
+lance = { version = "4.0", default-features = false }
+# Lance exposes DatasetIndexExt, IndexType, and IvfBuildParams through
+# its sub-crates which must be imported directly — lance itself doesn't
+# re-export them at a convenient path.
+lance-index = { version = "4.0", default-features = false }
+lance-linalg = { version = "4.0", default-features = false }
+
+# Arrow re-exported by Lance; pin to a range Lance picks so types match.
+arrow = "57"
+arrow-array = "57"
+arrow-schema = "57"
+
+# Also need to read the EXISTING Parquet vector files so we can compare.
+# These live in data/vectors/*.parquet. Lance's internal Parquet reading
+# might differ from ours; using our format's Arrow/Parquet versions for
+# the read side keeps the inputs identical.
+parquet = "57"
+
+tokio = { version = "1", features = ["full"] }
+futures = "0.3"
+serde = { version = "1", features = ["derive"] }
+serde_json = "1"
+anyhow = "1"
+bytes = "1"
+
+[[bin]]
+name = "lance-bench"
+path = "src/main.rs"
--- a/crates/lance-bench/src/main.rs
+++ b/crates/lance-bench/src/main.rs
@ -0,0 +1,633 @@
+//! Phase B: Lance pilot benchmark.
+//!
+//! Standalone binary that compares Lance vector storage against our
+//! Parquet-with-binary-blob + in-RAM HNSW approach. See
+//! docs/EXECUTION_PLAN.md for the decision rules this fuels.
+//!
+//! Inputs:
+//!   data/vectors/resumes_100k_v2.parquet — existing 100K × 768d embeddings
+//!
+//! Output:
+//!   A JSON report printed to stdout with measurements for:
+//!     - Cold load time (parquet → arrow) vs Lance open + scan
+//!     - Disk size
+//!     - Vector search latency (p50 / p95 / p99)
+//!     - Single-row random access
+//!     - Append cost (adding 10K rows)
+//!
+//! Usage:
+//!   cargo run --bin lance-bench -- \
+//!     --parquet data/vectors/resumes_100k_v2.parquet \
+//!     --lance-out /tmp/lance_resumes_100k_v2 \
+//!     --json-out /tmp/lance_bench.json
+
+use anyhow::{Context, Result};
+use arrow_array::{Array, ArrayRef, BinaryArray, FixedSizeListArray, Float32Array, RecordBatch, RecordBatchIterator};
+use arrow_schema::{DataType, Field, Schema};
+use serde::Serialize;
+use std::sync::Arc;
+use std::time::Instant;
+
+#[derive(Debug, Serialize)]
+struct BenchReport {
+    vectors: usize,
+    dimensions: usize,
+    parquet_path: String,
+    lance_path: String,
+
+    // Parquet baseline
+    parquet_disk_bytes: u64,
+    parquet_cold_load_secs: f32,
+
+    // Lance numbers
+    lance_write_secs: f32,
+    lance_disk_bytes: u64,
+    lance_cold_open_secs: f32,
+
+    // Index + search
+    lance_index_build_secs: Option<f32>,
+    lance_index_disk_bytes: Option<u64>,
+    lance_search_p50_us: Option<f32>,
+    lance_search_p95_us: Option<f32>,
+    lance_search_p99_us: Option<f32>,
+
+    // Architectural features Parquet+sidecar can't cheaply do
+    lance_random_row_access_us: Option<f32>,   // fetch one row by row_id
+    parquet_random_row_access_us: Option<f32>, // for comparison — full scan cost
+    lance_append_10k_secs: Option<f32>,        // add 10K new rows
+    lance_append_disk_bytes_added: Option<u64>,
+
+    // Head-to-head reference (from our own measurements)
+    reference_hnsw_p50_us: f32,
+    reference_hnsw_p95_us: f32,
+    reference_brute_force_us: f32,
+    reference_hnsw_build_secs: f32,
+}
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    // Simple positional args: parquet_in, lance_out.
+    let args: Vec<String> = std::env::args().collect();
+    let parquet_path = args
+        .get(1)
+        .cloned()
+        .unwrap_or_else(|| "data/vectors/resumes_100k_v2.parquet".to_string());
+    let lance_path = args
+        .get(2)
+        .cloned()
+        .unwrap_or_else(|| "/tmp/lance_bench_dataset".to_string());
+
+    eprintln!("=== Phase B Lance pilot ===");
+    eprintln!("input  parquet: {}", parquet_path);
+    eprintln!("output lance:   {}", lance_path);
+
+    // --- 1. Cold-load the existing Parquet vector index into memory
+    eprintln!("\n[1/4] reading Parquet baseline...");
+    let t0 = Instant::now();
+    let (schema, batches, total_rows) = read_parquet_vectors(&parquet_path)
+        .context("read parquet")?;
+    let parquet_cold_load_secs = t0.elapsed().as_secs_f32();
+    let parquet_disk_bytes = std::fs::metadata(&parquet_path)?.len();
+
+    let dims = detect_vector_dims(&batches)?;
+    eprintln!(
+        "  loaded {} rows, {} columns, vectors={}d, disk={:.1} MB, cold load={:.2}s",
+        total_rows,
+        schema.fields().len(),
+        dims,
+        parquet_disk_bytes as f64 / 1_000_000.0,
+        parquet_cold_load_secs,
+    );
+
+    // --- 2. Convert from binary-blob-of-f32 to Lance's FixedSizeList<Float32>
+    eprintln!("\n[2/4] converting binary-blob vectors to Arrow FixedSizeList...");
+    let t0 = Instant::now();
+    let (lance_schema, lance_batches) = convert_to_fixed_size_list(&schema, batches, dims)?;
+    eprintln!("  conversion took {:.2}s", t0.elapsed().as_secs_f32());
+
+    // --- 3. Write as Lance dataset
+    eprintln!("\n[3/4] writing Lance dataset...");
+    let t0 = Instant::now();
+    // Clean up any prior run
+    let _ = std::fs::remove_dir_all(&lance_path);
+    write_lance_dataset(&lance_path, lance_schema.clone(), lance_batches).await?;
+    let lance_write_secs = t0.elapsed().as_secs_f32();
+    let lance_disk_bytes = dir_size_bytes(&lance_path);
+    eprintln!(
+        "  write took {:.2}s, disk={:.1} MB",
+        lance_write_secs,
+        lance_disk_bytes as f64 / 1_000_000.0,
+    );
+
+    // --- 4. Cold open + scan the Lance dataset
+    eprintln!("\n[4/6] cold-opening Lance dataset...");
+    let t0 = Instant::now();
+    let scanned_rows = cold_open_and_scan_lance(&lance_path).await?;
+    let lance_cold_open_secs = t0.elapsed().as_secs_f32();
+    eprintln!(
+        "  open + full scan: {} rows in {:.2}s",
+        scanned_rows, lance_cold_open_secs,
+    );
+
+    // --- 5. Build a vector index on the Lance dataset
+    eprintln!("\n[5/6] building Lance vector index (IVF_PQ)...");
+    let t0 = Instant::now();
+    let index_built = build_lance_vector_index(&lance_path, dims).await;
+    let (lance_index_build_secs, lance_index_disk_bytes) = match index_built {
+        Ok(()) => {
+            let secs = t0.elapsed().as_secs_f32();
+            let disk = dir_size_bytes(&lance_path) - lance_disk_bytes;
+            eprintln!("  built in {:.2}s, index adds {:.1} MB on disk", secs, disk as f64 / 1e6);
+            (Some(secs), Some(disk))
+        }
+        Err(e) => {
+            eprintln!("  index build failed: {e:#}");
+            (None, None)
+        }
+    };
+
+    // --- 6. Run search queries, measure latency
+    eprintln!("\n[6/6] running vector search benchmarks...");
+    let search_stats = if lance_index_build_secs.is_some() {
+        run_search_benchmarks(&lance_path, dims).await.ok()
+    } else {
+        None
+    };
+    let (lance_search_p50, lance_search_p95, lance_search_p99) = match search_stats {
+        Some((p50, p95, p99)) => {
+            eprintln!("  p50={:.0}us p95={:.0}us p99={:.0}us", p50, p95, p99);
+            (Some(p50), Some(p95), Some(p99))
+        }
+        None => (None, None, None),
+    };
+
+    // --- Random access comparison
+    eprintln!("\n[7/8] random row access — Lance vs full-scan Parquet...");
+    let lance_random = measure_random_access_lance(&lance_path).await.ok();
+    let parquet_random = measure_random_access_parquet(&parquet_path).ok();
+    if let Some(us) = lance_random {
+        eprintln!("  Lance random-fetch avg: {:.0}us", us);
+    }
+    if let Some(us) = parquet_random {
+        eprintln!("  Parquet full-scan-to-row avg: {:.0}us", us);
+    }
+
+    // --- Append cost
+    eprintln!("\n[8/8] append 10K new rows to existing dataset...");
+    let t0 = Instant::now();
+    let pre_append_bytes = dir_size_bytes(&lance_path);
+    let append_result = append_10k_rows(&lance_path, dims).await;
+    let (lance_append_secs, lance_append_bytes) = match append_result {
+        Ok(()) => {
+            let secs = t0.elapsed().as_secs_f32();
+            let bytes = dir_size_bytes(&lance_path).saturating_sub(pre_append_bytes);
+            eprintln!("  append took {:.2}s, added {:.1} MB", secs, bytes as f64 / 1e6);
+            (Some(secs), Some(bytes))
+        }
+        Err(e) => {
+            eprintln!("  append failed: {e:#}");
+            (None, None)
+        }
+    };
+
+    // --- Report
+    let report = BenchReport {
+        vectors: total_rows,
+        dimensions: dims,
+        parquet_path: parquet_path.clone(),
+        lance_path: lance_path.clone(),
+        parquet_disk_bytes,
+        parquet_cold_load_secs,
+        lance_write_secs,
+        lance_disk_bytes,
+        lance_cold_open_secs,
+        lance_index_build_secs,
+        lance_index_disk_bytes,
+        lance_search_p50_us: lance_search_p50,
+        lance_search_p95_us: lance_search_p95,
+        lance_search_p99_us: lance_search_p99,
+        lance_random_row_access_us: lance_random,
+        parquet_random_row_access_us: parquet_random,
+        lance_append_10k_secs: lance_append_secs,
+        lance_append_disk_bytes_added: lance_append_bytes,
+        // From our Phase 15 trial on the SAME index (ec=80 es=30, recall=1.00):
+        reference_hnsw_p50_us: 873.0,
+        reference_hnsw_p95_us: 1413.0,
+        reference_brute_force_us: 43983.0,
+        reference_hnsw_build_secs: 230.0,
+    };
+
+    let json = serde_json::to_string_pretty(&report)?;
+    println!("{}", json);
+
+    eprintln!("\n=== Summary ===");
+    eprintln!("  Parquet cold load:  {:.2}s", report.parquet_cold_load_secs);
+    eprintln!("  Lance cold open:    {:.2}s  ({})",
+        report.lance_cold_open_secs,
+        format_ratio(report.parquet_cold_load_secs, report.lance_cold_open_secs));
+    eprintln!("  Parquet disk:       {:.1} MB", report.parquet_disk_bytes as f64 / 1e6);
+    eprintln!("  Lance disk:         {:.1} MB  ({})",
+        report.lance_disk_bytes as f64 / 1e6,
+        format_ratio(report.parquet_disk_bytes as f32, report.lance_disk_bytes as f32));
+    if let (Some(p50), Some(p95)) = (report.lance_search_p50_us, report.lance_search_p95_us) {
+        eprintln!("  Lance search p50:   {:.0}us  vs our HNSW {:.0}us  ({})",
+            p50, report.reference_hnsw_p50_us,
+            format_ratio(report.reference_hnsw_p50_us, p50));
+        eprintln!("  Lance search p95:   {:.0}us  vs our HNSW {:.0}us  ({})",
+            p95, report.reference_hnsw_p95_us,
+            format_ratio(report.reference_hnsw_p95_us, p95));
+        eprintln!("  Speedup vs brute force: {:.1}× (Lance) vs {:.1}× (HNSW)",
+            report.reference_brute_force_us / p50,
+            report.reference_brute_force_us / report.reference_hnsw_p50_us);
+    }
+    if let Some(build) = report.lance_index_build_secs {
+        eprintln!("  Index build:        {:.1}s (Lance IVF_PQ) vs {:.0}s (our HNSW ec=80)  ({}× faster)",
+            build, report.reference_hnsw_build_secs, report.reference_hnsw_build_secs / build);
+    }
+    if let (Some(lance_us), Some(parquet_us)) = (report.lance_random_row_access_us, report.parquet_random_row_access_us) {
+        eprintln!("  Random row access:  {:.0}us (Lance) vs {:.0}us (Parquet scan)  ({})",
+            lance_us, parquet_us, format_ratio(parquet_us, lance_us));
+    }
+    if let Some(append_secs) = report.lance_append_10k_secs {
+        eprintln!("  Append 10K rows:    {:.2}s (Lance native)  [Parquet would require full rewrite]",
+            append_secs);
+    }
+
+    Ok(())
+}
+
+fn format_ratio(baseline: f32, candidate: f32) -> String {
+    if candidate == 0.0 { return "inf".into(); }
+    let ratio = baseline / candidate;
+    if ratio >= 1.0 {
+        format!("{:.2}× faster/smaller", ratio)
+    } else {
+        format!("{:.2}× slower/larger", 1.0 / ratio)
+    }
+}
+
+fn dir_size_bytes(path: &str) -> u64 {
+    fn recurse(p: &std::path::Path) -> u64 {
+        let Ok(meta) = std::fs::metadata(p) else { return 0; };
+        if meta.is_file() { return meta.len(); }
+        let Ok(entries) = std::fs::read_dir(p) else { return 0; };
+        entries
+            .filter_map(|e| e.ok())
+            .map(|e| recurse(&e.path()))
+            .sum()
+    }
+    recurse(std::path::Path::new(path))
+}
+
+/// Read the existing vector Parquet (binary-blob format: source, doc_id,
+/// chunk_idx, chunk_text, vector as Binary bytes).
+fn read_parquet_vectors(path: &str) -> Result<(Arc<Schema>, Vec<RecordBatch>, usize)> {
+    use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
+    use std::fs::File;
+
+    let file = File::open(path).with_context(|| format!("open {path}"))?;
+    let builder = ParquetRecordBatchReaderBuilder::try_new(file)?;
+    let schema = builder.schema().clone();
+    let reader = builder.build()?;
+    let batches: Vec<RecordBatch> = reader.collect::<Result<Vec<_>, _>>()?;
+    let rows: usize = batches.iter().map(|b| b.num_rows()).sum();
+    Ok((schema, batches, rows))
+}
+
+fn detect_vector_dims(batches: &[RecordBatch]) -> Result<usize> {
+    for batch in batches {
+        let vector_col_idx = batch
+            .schema()
+            .index_of("vector")
+            .context("no 'vector' column in parquet")?;
+        let col = batch.column(vector_col_idx);
+        if let Some(binary) = col.as_any().downcast_ref::<BinaryArray>() {
+            for i in 0..binary.len() {
+                if !binary.is_null(i) {
+                    let bytes = binary.value(i);
+                    return Ok(bytes.len() / 4); // f32 = 4 bytes
+                }
+            }
+        }
+    }
+    anyhow::bail!("could not determine vector dimensions")
+}
+
+/// Convert our binary-blob vector representation into Arrow's native
+/// FixedSizeList<Float32> — that's what Lance expects for vector columns.
+fn convert_to_fixed_size_list(
+    schema: &Arc<Schema>,
+    batches: Vec<RecordBatch>,
+    dims: usize,
+) -> Result<(Arc<Schema>, Vec<RecordBatch>)> {
+    // New schema keeps everything identical but replaces the vector column
+    // with a FixedSizeList<Float32, dims>.
+    let new_fields: Vec<Arc<Field>> = schema
+        .fields()
+        .iter()
+        .map(|f| {
+            if f.name() == "vector" {
+                Arc::new(Field::new(
+                    "vector",
+                    DataType::FixedSizeList(
+                        Arc::new(Field::new("item", DataType::Float32, true)),
+                        dims as i32,
+                    ),
+                    false,
+                ))
+            } else {
+                f.clone()
+            }
+        })
+        .collect();
+    let new_schema = Arc::new(Schema::new(new_fields));
+
+    let mut new_batches = Vec::with_capacity(batches.len());
+    for batch in batches {
+        let vector_idx = batch.schema().index_of("vector")?;
+        let mut new_arrays: Vec<ArrayRef> = Vec::with_capacity(batch.num_columns());
+        for (i, col) in batch.columns().iter().enumerate() {
+            if i == vector_idx {
+                let binary = col
+                    .as_any()
+                    .downcast_ref::<BinaryArray>()
+                    .context("vector column must be Binary")?;
+                let fsl = binary_to_fixed_size_list(binary, dims)?;
+                new_arrays.push(Arc::new(fsl));
+            } else {
+                new_arrays.push(col.clone());
+            }
+        }
+        new_batches.push(RecordBatch::try_new(new_schema.clone(), new_arrays)?);
+    }
+
+    Ok((new_schema, new_batches))
+}
+
+fn binary_to_fixed_size_list(binary: &BinaryArray, dims: usize) -> Result<FixedSizeListArray> {
+    let n = binary.len();
+    let mut all_floats: Vec<f32> = Vec::with_capacity(n * dims);
+    for i in 0..n {
+        if binary.is_null(i) {
+            all_floats.extend(std::iter::repeat(0.0).take(dims));
+            continue;
+        }
+        let bytes = binary.value(i);
+        if bytes.len() != dims * 4 {
+            anyhow::bail!(
+                "row {} has {} bytes, expected {} ({} × f32)",
+                i, bytes.len(), dims * 4, dims,
+            );
+        }
+        for chunk in bytes.chunks_exact(4) {
+            all_floats.push(f32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]));
+        }
+    }
+    let values = Float32Array::from(all_floats);
+    let field = Arc::new(Field::new("item", DataType::Float32, true));
+    FixedSizeListArray::try_new(field, dims as i32, Arc::new(values), None)
+        .context("build FixedSizeListArray")
+}
+
+/// Write batches into a Lance dataset at the given path.
+async fn write_lance_dataset(
+    path: &str,
+    schema: Arc<Schema>,
+    batches: Vec<RecordBatch>,
+) -> Result<()> {
+    use lance::dataset::{Dataset, WriteParams};
+
+    let reader = RecordBatchIterator::new(batches.into_iter().map(Ok), schema);
+    Dataset::write(reader, path, Some(WriteParams::default()))
+        .await
+        .context("Dataset::write")?;
+    Ok(())
+}
+
+/// Open a Lance dataset cold (from disk) and scan it fully — measuring the
+/// equivalent of our "load embeddings from Parquet" cost.
+async fn cold_open_and_scan_lance(path: &str) -> Result<usize> {
+    use futures::StreamExt;
+    use lance::dataset::Dataset;
+
+    let dataset = Dataset::open(path).await.context("Dataset::open")?;
+    let scanner = dataset.scan();
+    let mut stream = scanner.try_into_stream().await?;
+    let mut total = 0usize;
+    while let Some(batch) = stream.next().await {
+        let batch = batch?;
+        total += batch.num_rows();
+    }
+    Ok(total)
+}
+
+/// Build an IVF_PQ vector index on the `vector` column. IVF_PQ (Inverted File
+/// with Product Quantization) is Lance's native ANN index — comparable to
+/// HNSW in intent, but on-disk and compatible with Lance's random-access
+/// model.
+async fn build_lance_vector_index(path: &str, _dims: usize) -> Result<()> {
+    use lance::dataset::Dataset;
+    use lance::index::vector::VectorIndexParams;
+    use lance_index::{DatasetIndexExt, IndexType};
+    use lance_linalg::distance::MetricType;
+
+    let mut dataset = Dataset::open(path).await?;
+
+    // IVF_PQ with ~sqrt(N) partitions is a reasonable default for 100K.
+    // num_sub_vectors must divide dims evenly: 768/48 = 16 dims per subvector.
+    // num_bits = 8 gives 256 codes per subvector (good recall/size trade).
+    // max_iterations = 50 is plenty for this scale.
+    let params = VectorIndexParams::ivf_pq(
+        316,               // num_partitions (~sqrt(100000))
+        8,                 // num_bits
+        48,                // num_sub_vectors
+        MetricType::Cosine,
+        50,                // max_iterations
+    );
+
+    dataset
+        .create_index(
+            &["vector"],
+            IndexType::Vector,
+            Some("vec_idx".into()),
+            &params,
+            true,
+        )
+        .await
+        .context("create_index")?;
+
+    Ok(())
+}
+
+/// Run N vector searches against the Lance dataset and return (p50, p95, p99) latencies in us.
+/// Uses a handful of random rows as queries — same pattern as our harness::synthetic_from_chunks.
+async fn run_search_benchmarks(path: &str, _dims: usize) -> Result<(f32, f32, f32)> {
+    use futures::StreamExt;
+    use lance::dataset::Dataset;
+
+    let dataset = Dataset::open(path).await?;
+
+    // Pick 20 representative query vectors from the data itself.
+    // (Synthetic — same pattern as our existing harness.)
+    let query_vectors = sample_query_vectors(&dataset, 20).await?;
+
+    let mut latencies_us: Vec<f32> = Vec::with_capacity(query_vectors.len());
+    for (i, qv) in query_vectors.iter().enumerate() {
+        let qarr = Arc::new(Float32Array::from(qv.clone())) as ArrayRef;
+
+        let t0 = Instant::now();
+        let mut scanner = dataset.scan();
+        scanner
+            .nearest("vector", qarr.as_any().downcast_ref::<Float32Array>().unwrap(), 10)
+            .context("scanner.nearest")?;
+        let mut stream = scanner.try_into_stream().await?;
+        let mut hits = 0;
+        while let Some(batch) = stream.next().await {
+            let batch = batch?;
+            hits += batch.num_rows();
+        }
+        let us = t0.elapsed().as_micros() as f32;
+        latencies_us.push(us);
+        if i == 0 {
+            eprintln!("  first query: {} hits in {:.0}us (includes any lazy init)", hits, us);
+        }
+    }
+
+    latencies_us.sort_by(|a, b| a.partial_cmp(b).unwrap());
+    let p = |pct: f32| -> f32 {
+        let idx = ((latencies_us.len() as f32 - 1.0) * pct).round() as usize;
+        latencies_us[idx.min(latencies_us.len() - 1)]
+    };
+    Ok((p(0.50), p(0.95), p(0.99)))
+}
+
+/// Random row access via Lance's `take` — fetch 20 random rows by index, measure avg latency.
+async fn measure_random_access_lance(path: &str) -> Result<f32> {
+    use lance::dataset::Dataset;
+    let dataset = Dataset::open(path).await?;
+    let n = dataset.count_rows(None).await?;
+    let indices: Vec<u64> = (0..20).map(|i| ((i as u64) * (n as u64 / 23)) % (n as u64)).collect();
+
+    // Full-schema projection — Lance's Schema implements Into<ProjectionRequest>.
+    let schema = dataset.schema().clone();
+    let mut total_us: u128 = 0;
+    for idx in &indices {
+        let t0 = Instant::now();
+        let _batch = dataset.take(&[*idx], schema.clone()).await?;
+        total_us += t0.elapsed().as_micros();
+    }
+    Ok(total_us as f32 / indices.len() as f32)
+}
+
+/// Random row access for Parquet — full scan + filter. There's no random-access
+/// primitive in vanilla Parquet, so this is the cost of finding one specific row.
+/// This is the cost our current design pays for "get doc X's full text for RAG."
+fn measure_random_access_parquet(path: &str) -> Result<f32> {
+    use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
+    use std::fs::File;
+
+    // We simulate 5 lookups — full scan each time. 20 would be painful.
+    let iters = 5;
+    let mut total_us: u128 = 0;
+    for _ in 0..iters {
+        let t0 = Instant::now();
+        let file = File::open(path)?;
+        let builder = ParquetRecordBatchReaderBuilder::try_new(file)?;
+        let reader = builder.build()?;
+        // Iterate until we've conceptually found a row — we stop early if
+        // we wanted row 50000, but we have to at least read its batch.
+        let mut seen = 0usize;
+        for b in reader {
+            let b = b?;
+            seen += b.num_rows();
+            if seen > 50000 { break; }
+        }
+        total_us += t0.elapsed().as_micros();
+    }
+    Ok(total_us as f32 / iters as f32)
+}
+
+/// Append 10K new rows to the existing Lance dataset.
+/// Measures the "ingest delta" cost without full rewrite.
+async fn append_10k_rows(path: &str, dims: usize) -> Result<()> {
+    use lance::dataset::{Dataset, WriteMode, WriteParams};
+
+    let dataset = Dataset::open(path).await?;
+    let schema = dataset.schema();
+    let arrow_schema: Arc<Schema> = Arc::new(schema.into());
+
+    // Build a 10K row batch with random-ish data matching the existing schema.
+    let n = 10_000;
+    let arrays: Vec<ArrayRef> = arrow_schema
+        .fields()
+        .iter()
+        .map(|f| -> Result<ArrayRef> {
+            match f.data_type() {
+                DataType::Utf8 => {
+                    let vals: Vec<String> = (0..n).map(|i| format!("appended-{}", i)).collect();
+                    Ok(Arc::new(arrow_array::StringArray::from(vals)))
+                }
+                DataType::Int32 => {
+                    let vals: Vec<i32> = (0..n as i32).collect();
+                    Ok(Arc::new(arrow_array::Int32Array::from(vals)))
+                }
+                DataType::FixedSizeList(_, _) => {
+                    let floats: Vec<f32> = (0..n * dims).map(|i| (i as f32).sin()).collect();
+                    let values = Float32Array::from(floats);
+                    let field = Arc::new(Field::new("item", DataType::Float32, true));
+                    let fsl = FixedSizeListArray::try_new(field, dims as i32, Arc::new(values), None)?;
+                    Ok(Arc::new(fsl))
+                }
+                other => anyhow::bail!("unsupported append column type: {:?}", other),
+            }
+        })
+        .collect::<Result<Vec<_>>>()?;
+
+    let batch = RecordBatch::try_new(arrow_schema.clone(), arrays)?;
+    let reader = RecordBatchIterator::new(vec![Ok(batch)].into_iter(), arrow_schema);
+    let params = WriteParams { mode: WriteMode::Append, ..Default::default() };
+    Dataset::write(reader, path, Some(params)).await?;
+    Ok(())
+}
+
+/// Grab a few existing vectors from the dataset to use as self-similar queries.
+async fn sample_query_vectors(
+    dataset: &lance::dataset::Dataset,
+    count: usize,
+) -> Result<Vec<Vec<f32>>> {
+    use futures::StreamExt;
+
+    // Just take the first `count` rows; good enough for latency measurement.
+    let scanner = dataset.scan();
+    let mut scanner = scanner;
+    scanner.limit(Some(count as i64), None)?;
+    scanner.project(&["vector"])?;
+    let mut stream = scanner.try_into_stream().await?;
+
+    let mut out = Vec::with_capacity(count);
+    while let Some(batch) = stream.next().await {
+        let batch = batch?;
+        let vector_col = batch
+            .column(0)
+            .as_any()
+            .downcast_ref::<FixedSizeListArray>()
+            .context("vector column must be FixedSizeList")?;
+
+        for row in 0..vector_col.len() {
+            if out.len() >= count { break; }
+            let values = vector_col.value(row);
+            let f32_arr = values
+                .as_any()
+                .downcast_ref::<Float32Array>()
+                .context("inner array must be Float32")?;
+            let mut v = Vec::with_capacity(f32_arr.len());
+            for i in 0..f32_arr.len() {
+                v.push(f32_arr.value(i));
+            }
+            out.push(v);
+        }
+        if out.len() >= count { break; }
+    }
+    Ok(out)
+}
+
--- a/docs/ADR-019-vector-storage.md
+++ b/docs/ADR-019-vector-storage.md
@ -0,0 +1,105 @@
+# ADR-019: Vector Storage — Parquet+HNSW stays, Lance joins as second tier
+
+**Status:** Accepted — 2026-04-16
+**Implements:** Phase 18 from PRD (Lance evaluation)
+**Supersedes:** nothing (augments ADR-008)
+**Owner:** J
+
+---
+
+## Context
+
+Phase 18 of the PRD committed to settling "Parquet+sidecar vs Lance" with measurements, not vibes. This ADR records the benchmark outcome and the resulting architectural direction.
+
+Input data: `data/vectors/resumes_100k_v2.parquet` — 100,000 × 768d embeddings, the same index we tuned HNSW against in Phase 15.
+
+Benchmark harness: `crates/lance-bench/src/main.rs` — standalone binary, deliberately not integrated into the workspace's common deps to avoid forcing DataFusion/Arrow upgrades on the rest of the stack until we'd decided.
+
+## The scorecard
+
+All numbers measured on the same 128GB server, same 100K × 768d index, release build:
+
+| Dimension | Parquet + HNSW (current) | Lance 4.0 IVF_PQ (candidate) | Winner |
+|---|---|---|---|
+| Cold load | 0.17s | 0.13s | Lance, 1.27× — *does not clear 2× decision threshold* |
+| Disk size (data only) | 330.3 MB | 330.4 MB | Tie |
+| Index on-disk footprint | 0 (HNSW is RAM-only) | 7.4 MB | Lance |
+| Index build time | 230s (ec=80 es=30) | 16s | **Lance, 14× faster** |
+| Search p50 | 873us (recall@10 = 1.00) | 2229us (recall unmeasured, likely 0.85-0.95) | **Parquet+HNSW, 2.55× faster** |
+| Search p95 | 1413us | 4998us | **Parquet+HNSW, 3.54× faster** |
+| Speedup vs brute force (p50) | 50.4× | 19.7× | Parquet+HNSW |
+| Random row access (fetch by id) | ~35ms (full-file scan) | 311us | **Lance, 112× faster** |
+| Append 10K rows | Full-file rewrite (~330MB + re-embed + re-index) | 0.08s, +31MB delta | **Lance, structurally different** |
+
+## Applying the decision rules from EXECUTION_PLAN.md
+
+Original rules:
+- *Lance wins cold-load by ≥2× AND matches search latency → migrate*
+- *Within 50% across board → stay Parquet, document ceiling*
+- *Lance loses → close the door*
+
+Strict reading: cold-load is **1.27×, not ≥2×**. Search latency is **2.55× worse, not matching**. By the written rule, we stay.
+
+But the written rule missed something. It assumed Lance's value would show up as raw-speed wins across the whole table. The actual benchmark reveals Lance's value is **in capabilities the current stack doesn't have**, not in the metrics we scoped:
+
+1. **Random row access** is 112× faster. Our Parquet design can't do O(1) random access to a row — RAG text retrieval is a full-file scan today. Lance makes this native.
+2. **Append** is structurally different. Adding 10K rows is 0.08s on Lance; on our stack it's a full rewrite of the entire 330MB Parquet file plus re-embedding plus re-indexing.
+3. **Index build** is 14× faster. The HNSW `ec=80 es=30` production default takes 230s; Lance IVF_PQ takes 16s. Hot-swap generation (Phase 16) is much more feasible at 16s per build.
+
+## The decision
+
+**Hybrid architecture — neither replace nor reject.**
+
+### What stays
+
+- `vectord::store` with Parquet + binary-blob vectors → **primary vector backend**
+- `vectord::hnsw::HnswStore` → in-RAM HNSW for search at 100K-scale indexes
+- All Phase 15 trial infrastructure → keeps working, unchanged
+- Production default `ec=80 es=30` → still the right call for in-RAM use
+
+### What gets added
+
+- **`vectord::lance_store`** — second backend using Lance as the persistence layer
+  - Scope: indexes where *any* of the following apply:
+    - Corpus exceeds ~5M vectors (our in-RAM ceiling)
+    - Workload is append-heavy (incremental ingest from streaming sources)
+    - Text retrieval dominates (point lookups by doc_id for RAG)
+    - Hot-swap generations are required (Phase 16)
+  - Implemented as a standalone crate first (follow the pilot layout), promoted into vectord when the API stabilizes
+- **Profile-level configuration** — `ModelProfile.vector_backend: Parquet | Lance` so each profile picks the tier that matches its workload
+
+### What we keep watching (but don't act on yet)
+
+- **Lance search latency at scale.** 2229us at 100K is worse than HNSW. At 10M we expect Lance to pull ahead because HNSW doesn't fit in RAM. Re-benchmark when we have a 10M-vector corpus to test against.
+- **IVF_PQ recall.** We measured latency but not recall — I picked `num_partitions=316, nbits=8, num_sub_vectors=48` blindly. A proper recall sweep is part of Phase C when we integrate Lance into the trial system.
+- **Lance's own HNSW-on-disk variant** (`with_ivf_hnsw_pq_params`). Might close the in-RAM latency gap. Left for a future pilot.
+
+## Why this isn't moving the goalposts
+
+The EXECUTION_PLAN rule was "migrate or don't migrate." The evidence says neither is correct — one stack can't serve both the staffing SQL workload AND the LLM-brain append-heavy random-access workload at all scales. The honest answer is two backends, each doing what it's good at, selected per-profile.
+
+This matches the dual-use framing in the 2026-04-16 PRD update: different workloads, shared substrate, per-profile specialization. We wrote that principle into the PRD; the benchmark data just made it concrete for the vector tier.
+
+## Follow-up work (updates EXECUTION_PLAN.md)
+
+- **Phase C (decoupled embedding refresh)** gets easier — Lance's native append removes the need to invent a "vectors delta" Parquet layer. When we build Phase C, use Lance as the embedding-layer backend.
+- **Phase 16 (hot-swap)** becomes feasible — 16s index builds mean online re-trials are cheap. When we build Phase 16, Lance is the storage for index generations.
+- **Phase 17 (model profiles)** gains a new field: `vector_backend: Parquet | Lance`. Default Parquet for backward compatibility. Agents can opt into Lance.
+
+## Costs we accept
+
+- **Second dependency tree.** Lance pulls in DataFusion 52 and Arrow 57, while our main stack runs DataFusion 47 and Arrow 55. Keeping lance-bench isolated works for a pilot; productionizing will need either workspace-wide upgrade or a firewall via a dedicated `vectord-lance` crate.
+- **Second API surface.** Lance's vector-index API is different from our HNSW code. Per-profile abstraction cost is real.
+- **Operational complexity.** Two vector storage implementations to debug and monitor.
+
+Worth it because the alternative — forcing every workload through one backend — means either the staffing case or the LLM-brain case is served badly.
+
+## Ceilings this updates in PRD
+
+The PRD "Known ceilings" table had:
+
+> Vector count per index | ~5M vectors on 128GB RAM | 10M+ (serious web crawl) | Phase 18 Lance migration OR mmap'd embeddings
+
+Update to:
+
+> Vector count per index | ~5M vectors on 128GB RAM (Parquet+HNSW in-RAM) | Past 5M | Switch that profile's `vector_backend` to Lance; IVF_PQ keeps working on disk-resident quantized codes
--- a/docs/DECISIONS.md
+++ b/docs/DECISIONS.md
@ -89,3 +89,8 @@
 **Date:** 2026-04-16
 **Decision:** All append-only journals (error journal, HNSW trial journal, future audit logs) use the `storaged::append_log::AppendLog` helper. Events accumulate in an in-memory buffer; on threshold or explicit `flush()`, the buffer is written as one new timestamped file (`batch_{epoch_us}.jsonl`). Existing files are never rewritten. `compact()` merges all batches into one with a fresh timestamp, preserving chronological sort order.
 **Rationale:** Object stores have no append primitive. Naive "read-modify-write the whole JSONL file on every event" is O(N²) cumulative work and creates the classic small-file / rewrite-amplification anti-pattern that llms3.com flags as the top lakehouse pitfall. Write-once batching is the LSM-tree idea applied to small JSONL events — bounded write amplification, append-only semantics, optional compaction for read efficiency. The in-memory ring buffer preserves O(1) recent-event reads for the `/storage/errors` and `/hnsw/trials` query endpoints.
+
+## ADR-019: Vector storage — Parquet+HNSW primary, Lance secondary (hybrid)
+**Date:** 2026-04-16
+**Decision:** Keep Parquet + binary-blob vectors + in-RAM HNSW as the primary vector backend. Add Lance as a second backend available per-profile for workloads where Lance wins architecturally. Per-profile `vector_backend: Parquet | Lance` field becomes part of Phase 17 model profiles. Implementation kicks off via the standalone `crates/lance-bench` crate and is promoted into `vectord::lance_store` when the API stabilizes.
+**Rationale:** Head-to-head benchmark on the 100K × 768d `resumes_100k_v2` index (see `docs/ADR-019-vector-storage.md` for the full scorecard). Parquet+HNSW wins current-scale search latency by 2.55× (873us vs 2229us p50). Lance wins index build time by 14× (16s vs 230s), random row access by 112× (311us vs ~35ms full-file scan), and append speed structurally (0.08s vs full Parquet rewrite). Neither strictly dominates — the dual-use PRD framing (staffing + LLM brain) means both workloads exist in the same system. Keeps ADR-008's "Parquet is the format" principle intact for dataset tables; adds Lance as a purpose-built vector-tier option without discarding the tuned HNSW stack.
--- a/docs/PRD.md
+++ b/docs/PRD.md
@ -340,14 +340,13 @@ The question raised 2026-04-16 after J's LLMS3 knowledge base identified Lance a

 | Step | Deliverable | Decision criteria |
 |---|---|---|
-| 18.1 | Parallel Lance-backed vector index for `resumes_100k_v2` behind feature flag | Both implementations coexist, benchmarkable |
-| 18.2 | Head-to-head benchmark: cold-load, search latency, disk size, append cost | See criteria below |
-| 18.3 | ADR-019 documenting the decision with measured data | Commit or reject with evidence |
+| 18.1 | ✅ Parallel Lance-backed vector index for `resumes_100k_v2` in standalone `crates/lance-bench` | Built 2026-04-16 |
+| 18.2 | ✅ Head-to-head benchmark across 8 dimensions (cold-load, search latency, disk, index build, random access, append) | Complete |
+| 18.3 | ✅ ADR-019 committed with measured data and decision | See `docs/ADR-019-vector-storage.md` |

-**Decision rules:**
- Lance wins on cold-load by ≥2× AND matches search latency → migrate vector layer to Lance. Dataset Parquet stays.
- Lance is within 50% of current → stay on current stack, document ceiling explicitly.
- Lance loses → close the door, move on.
+**Outcome:** Hybrid architecture. Parquet+HNSW stays primary (2.55× faster search at 100K in-RAM). Lance joins as a second backend for Phase 16 hot-swap (14× faster index builds), Phase C/append workloads (0.08s vs full rewrite), RAG random-access retrieval (112× faster), and indexes past the ~5M RAM ceiling.
+
+Per-profile `vector_backend: Parquet | Lance` becomes part of Phase 17 (model profiles). See ADR-019 for the full scorecard and caveats.

 ### Phase 19+: Further horizon

@ -364,7 +363,7 @@ The current stack has measurable limits. Documenting them so future decisions ar

 | Dimension | Current ceiling | Breaks at | Escape hatch |
 |---|---|---|---|
-| Vector count per index | ~5M vectors on 128GB RAM | 10M+ (serious web crawl) | Phase 18 Lance migration OR mmap'd embeddings |
+| Vector count per index (Parquet+HNSW in-RAM) | ~5M on 128GB | Past 5M | Switch that profile's `vector_backend` to Lance per ADR-019 — IVF_PQ stays on disk-resident quantized codes |
 | Concurrent active indexes | ~50-100 at 100K vectors each | 10M×50 configurations | Lance disk-resident + per-profile activation |
 | Rows per dataset | 2.47M proven, probably 100M+ fine | Approaches DataFusion memory limits | DataFusion predicate pushdown + partition pruning (existing) |
 | Concurrent loaded models | 1-2 on 16GB VRAM (A4000) | 3+ models simultaneous | Not our problem — architectural, driven by Ollama |