lance-bench: also build doc_id btree post-IVF — match gateway's migrate behavior

The bench's own measure_random_access_lance uses take(row_position) — doesn't need the btree. But datasets written by this bench are commonly queried via /vectors/lance/doc/<name>/<doc_id> downstream, and without the btree that path falls back to a full table scan. Building inline keeps bench-produced datasets immediately production-shape and removes a footgun (the same one that made scale_test_10m's doc-fetch ~100ms until commit 5d30b3d fixed it via the migrate handler path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:19:16 -05:00 · 2026-05-02 22:19:16 -05:00 · 044650a1da
commit 044650a1da
parent 5d30b3da89
1 changed files with 20 additions and 0 deletions
--- a/crates/lance-bench/src/main.rs
+++ b/crates/lance-bench/src/main.rs
@ -456,6 +456,26 @@ async fn build_lance_vector_index(path: &str, _dims: usize) -> Result<()> {
        .await
        .context("create_index")?;

+    // Also build the scalar btree on doc_id. This bench's
+    // measure_random_access_lance uses take(row_position) which doesn't
+    // need the btree, but the dataset this bench writes is also queried
+    // downstream by /vectors/lance/doc/<name>/<doc_id> (the production
+    // lookup path) — without this index that path falls back to a full
+    // table scan. Cheap to build (~1.2s on 10M rows) and matches the
+    // gateway's lance_migrate handler behavior so bench-produced datasets
+    // are immediately production-shape.
+    use lance_index::scalar::ScalarIndexParams;
+    dataset
+        .create_index(
+            &["doc_id"],
+            IndexType::Scalar,
+            Some("doc_id_btree".into()),
+            &ScalarIndexParams::default(),
+            true,
+        )
+        .await
+        .context("create_index doc_id btree")?;
+
    Ok(())
 }