lance-bench: also build doc_id btree post-IVF — match gateway's migrate behavior
The bench's own measure_random_access_lance uses take(row_position) — doesn't need the btree. But datasets written by this bench are commonly queried via /vectors/lance/doc/<name>/<doc_id> downstream, and without the btree that path falls back to a full table scan. Building inline keeps bench-produced datasets immediately production-shape and removes a footgun (the same one that made scale_test_10m's doc-fetch ~100ms until commit 5d30b3d fixed it via the migrate handler path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
5d30b3da89
commit
044650a1da
@ -456,6 +456,26 @@ async fn build_lance_vector_index(path: &str, _dims: usize) -> Result<()> {
|
|||||||
.await
|
.await
|
||||||
.context("create_index")?;
|
.context("create_index")?;
|
||||||
|
|
||||||
|
// Also build the scalar btree on doc_id. This bench's
|
||||||
|
// measure_random_access_lance uses take(row_position) which doesn't
|
||||||
|
// need the btree, but the dataset this bench writes is also queried
|
||||||
|
// downstream by /vectors/lance/doc/<name>/<doc_id> (the production
|
||||||
|
// lookup path) — without this index that path falls back to a full
|
||||||
|
// table scan. Cheap to build (~1.2s on 10M rows) and matches the
|
||||||
|
// gateway's lance_migrate handler behavior so bench-produced datasets
|
||||||
|
// are immediately production-shape.
|
||||||
|
use lance_index::scalar::ScalarIndexParams;
|
||||||
|
dataset
|
||||||
|
.create_index(
|
||||||
|
&["doc_id"],
|
||||||
|
IndexType::Scalar,
|
||||||
|
Some("doc_id_btree".into()),
|
||||||
|
&ScalarIndexParams::default(),
|
||||||
|
true,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.context("create_index doc_id btree")?;
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user