scale_test_10m doc-fetch p50 was ~100ms — full table scan over 35GB. Root cause: the auto-build at service.rs:1492-1503 only fires for IndexMeta- registered indexes during set_active_profile warming. lance-bench writes datasets through /vectors/lance/migrate/* directly, bypassing IndexMeta, so its datasets never get the doc_id btree that ADR-019 depends on. Fix: build the btree inline at the end of lance_migrate. Costs ~1.2s on 10M rows (+269MB on disk), drops doc-fetch from ~100ms to ~5ms (20x). Failure is non-fatal — logs a warning and the dataset stays queryable. Verified live (post-restart): scale_test_10m doc-fetch 4-15ms across 5 calls, smoke 9/9 PASS, vectord-lance 7/7 unit tests PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.8 KiB
Lance backend re-benchmark — 10M vectors (scale_test_10m)
Date: 2026-05-02
Dataset: data/lance/scale_test_10m (33 GB, ~10M vectors, 768d)
Driver: live HTTP gateway :3100/vectors/lance/* (post sanitizer-fix binary)
Method tag on every search response: lance_ivf_pq (confirms IVF_PQ, not brute-force)
ADR-019 deferred a 10M re-bench: "at 10M we expect Lance to pull ahead because HNSW doesn't fit in RAM. Re-benchmark when we have a 10M-vector corpus to test against." The corpus exists; this is that benchmark.
Search latency, 10 diverse queries, top_k=10 (cold)
| Query | Latency |
|---|---|
| warehouse forklift operator second shift | 50.5ms |
| senior software engineer kubernetes | 52.9ms |
| registered nurse pediatric | 37.6ms |
| welder TIG aluminum | 127.7ms |
| data scientist python | 41.6ms |
| electrician journeyman commercial | 31.4ms |
| accountant CPA tax | 28.6ms |
| machine learning research | 32.1ms |
| construction site supervisor | 31.8ms |
| biomedical engineer | 25.0ms |
Median ~32ms, mean ~46ms, one ~128ms outlier (TIG aluminum query — not investigated; could be query-specific IVF traversal pattern or transient I/O).
Search latency, repeated query (warm cache)
Same query (forklift operator) hit 5 times in a row:
| Call | Latency |
|---|---|
| 1 | 21.9ms |
| 2 | 20.2ms |
| 3 | 19.2ms |
| 4 | 22.4ms |
| 5 | 18.6ms |
Warm-cache p50 ~20ms. Stable across the 5 trials.
Doc-fetch by id, 5 calls (post-warmup) — BEFORE scalar-index fix
Fetched the same doc_id (VEC-2196862) repeatedly:
| Call | Latency |
|---|---|
| 1 | 68.2ms |
| 2 | 89.3ms |
| 3 | 153.9ms |
| 4 | 126.5ms |
| 5 | 140.7ms |
~100ms p50, climbing under repeat. Substantially slower than the 100K-corpus number from ADR-019 (311μs claimed; ~6ms measured today on workers_500k_v1).
Root cause (investigated post-bench)
/vectors/lance/stats/scale_test_10m returned has_doc_id_index: false. The scalar btree on doc_id was never built for this dataset. Doc-fetch was running a full table scan over 35GB.
Cause: the auto-build code in crates/vectord/src/service.rs:1492-1503 only fires for IndexMeta-registered indexes during set_active_profile warming. scale_test_10m was created by the lance-bench binary directly via the migrate HTTP route — it bypasses the IndexMeta registry, so warming never sees it, so neither the vector index nor the scalar index gets auto-built. (The vector index was built manually via /vectors/lance/index/scale_test_10m; the scalar index never was.)
Doc-fetch by id, 5 calls — AFTER POST /vectors/lance/scalar-index/scale_test_10m/doc_id
Build took 1.22s for 10M rows, added 269MB of btree on disk.
| Call | Latency |
|---|---|
| 1 | 5.6ms |
| 2 | 5.0ms |
| 3 | 5.0ms |
| 4 | 4.9ms |
| 5 | 4.7ms |
~5ms p50, stable. ~20x improvement. Matches workers_500k_v1's ~6ms baseline.
ADR-019's "O(1) random access via btree" claim is structurally vindicated. The 311μs projection from the 100K bench was an in-process Rust call; the live HTTP/JSON round-trip floor is ~5ms regardless of dataset size.
Followup: close the IndexMeta-bypass gap
The lance-bench binary writes datasets that the rest of the gateway can't see. Two reasonable fixes:
- Auto-build scalar index inside
lance_migrateHTTP handler — every dataset created via the migrate route gets the btree before returning. Costs 1-2 seconds at ingest time, saves 100ms per doc-fetch forever after. - Have
lance-benchregister an IndexMeta entry at the end of its run, so the existing warming code picks it up on next gateway start.
Recommendation: do (1). It's a one-line addition next to the existing build_index call inside the handler, and it makes the migrate route self-sufficient — no caller needs to remember a follow-up build call.
Compared to ADR-019 100K projections
| Op | 100K (ADR-019) | 10M (today) | Notes |
|---|---|---|---|
| Search (cold) | 2229μs | ~46ms | 21x slower at 100x scale → reasonable for IVF_PQ |
| Search (warm) | (not measured) | ~20ms | Warm cache converges nicely |
| Doc fetch (no btree) | — | ~100ms | full scan, 35GB |
| Doc fetch (post btree build) | 311μs | ~5ms | structural win confirmed; HTTP/JSON floor explains delta |
| Index method | lance_ivf_pq | lance_ivf_pq | confirmed via response tag |
What this means
ADR-019's claim that "at 10M, Lance pulls ahead because HNSW doesn't fit in RAM" remains unverified-but-not-refuted. We can't directly compare to HNSW at 10M because HNSW's RAM footprint at 10M × 768d × 4 bytes = ~30 GB just for vectors, double that for the graph — way past any single-node deployment. So Lance "wins" at 10M by being the only contender that operationally exists.
What the bench DID surface:
- Search at 10M works at production-shape latency (~20ms warm). Acceptable for batch / async / non-conversational workloads. Too slow for sub-10ms voice or recommendation paths.
- Doc-fetch at 10M is fast (~5ms) once the scalar btree is built. Pre-build was ~100ms (full scan). Built in 1.2s, +269MB on disk. ADR-019's structural claim holds.
- The auto-build only fires for IndexMeta-registered datasets.
lance-benchbypasses IndexMeta, so its datasets need either a manualPOST /vectors/lance/scalar-index/<name>/doc_idafter migration, or a one-line fix to thelance_migratehandler that builds the btree inline. Recommend the inline fix. - Sanitizer fix held under load — no 500-with-leak surfaced even on rare query pattern (TIG aluminum). The fix is robust to long-tail queries.
Repro
# Search latency, single query
curl -sS -X POST http://127.0.0.1:3100/vectors/lance/search/scale_test_10m \
-H 'Content-Type: application/json' \
-d '{"query":"forklift operator","top_k":10}' | jq '.latency_us'
# Doc fetch by id
curl -sS http://127.0.0.1:3100/vectors/lance/doc/scale_test_10m/VEC-2196862 \
| jq '.latency_us'