The bench's own measure_random_access_lance uses take(row_position) —
doesn't need the btree. But datasets written by this bench are commonly
queried via /vectors/lance/doc/<name>/<doc_id> downstream, and without
the btree that path falls back to a full table scan. Building inline
keeps bench-produced datasets immediately production-shape and removes
a footgun (the same one that made scale_test_10m's doc-fetch ~100ms
until commit 5d30b3d fixed it via the migrate handler path).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standalone benchmark crate `crates/lance-bench` running Lance 4.0 against
our Parquet+HNSW at 100K × 768d (resumes_100k_v2) measured 8 dimensions.
Results (see docs/ADR-019-vector-storage.md for full scorecard):
Cold load: Parquet 0.17s vs Lance 0.13s (tie — not ≥2× threshold)
Disk size: 330.3 MB vs 330.4 MB (tie)
Search p50: 873us vs 2229us (Parquet 2.55× faster)
Search p95: 1413us vs 4998us (Parquet 3.54× faster)
Index build: 230s (ec=80) vs 16s (IVF_PQ) (Lance 14× faster)
Random access: 35ms (scan) vs 311us (Lance 112× faster)
Append 10K rows: full rewrite vs 0.08s/+31MB (Lance structural win)
Decision (ADR-019): hybrid, not migrate-or-reject.
- Parquet+HNSW stays primary — our HNSW at ec=80 es=30 recall=1.00 is
2.55× faster than Lance IVF_PQ at 100K in-RAM scale
- Lance joins as second backend per-profile for workloads where it wins
architecturally: random row access (RAG text fetch), append-heavy
pipelines (Phase C), hot-swap generations (Phase 16, 14× faster
builds), and indexes past the ~5M RAM ceiling
- Phase 17 ModelProfile gets vector_backend: Parquet | Lance field
- Ceiling table in PRD updated — 5M ceiling now says "switch to Lance"
instead of "migrate" since Lance runs alongside, not instead of
Isolation: lance-bench is a standalone workspace crate with its own dep
tree (Lance pulls DataFusion 52 + Arrow 57 incompatible with main stack
DataFusion 47 + Arrow 55). Kept off the critical path until API is
stable enough to promote into vectord::lance_store.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>