root 9e6002c4d4 S3 backend for Lance — hybrid operates on real MinIO object storage
Enabled lance feature "aws" for S3-compatible storage via opendal.
BucketRegistry: added with_allow_http(true) for MinIO/non-TLS S3
endpoints (fixes "builder error" on HTTP endpoints). lakehouse.toml
gains [[storage.buckets]] name="s3:lakehouse" with S3 backend config.

lance_backend.rs: S3 bucket naming convention — buckets with name
prefix "s3:" emit s3:// URIs for Lance datasets. AWS_* env vars
in the systemd unit provide credentials to Lance's internal
object_store.

Verified end-to-end on real MinIO with real 100K × 768d vectors:
  - Migrate Parquet → Lance on S3: 1.7s (vs 0.57s local)
  - Build IVF_PQ: 16.4s (CPU-bound, essentially same as local)
  - Search: ~58ms p50 (vs 11ms local — S3 partition reads)
  - Random doc fetch: 13ms (vs 3.5ms local)
  - Recall@10: 0.835 (randomized IVF_PQ, consistent with local 0.805)
  - Total S3 footprint: 637 MiB (vectors + index + lance metadata)

The "public storage" claim from the PRD is now proven: the hybrid
Parquet+HNSW ⊕ Lance architecture works on S3-compatible object
storage, not just local filesystem.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 21:09:42 -05:00

38 lines
1.5 KiB
TOML

[package]
name = "vectord-lance"
version = "0.1.0"
edition = "2024"
# Firewall crate: the Lance stack (Arrow 57, DataFusion 52) is isolated
# from the rest of vectord (Arrow 55, DataFusion 47). Public API uses
# only std types — no Arrow types cross the crate boundary — so no
# version conflict propagates outward.
#
# See docs/ADR-019-vector-storage.md for the rationale: "productionizing
# will need either workspace-wide upgrade or a firewall via a dedicated
# vectord-lance crate." This is that firewall.
[dependencies]
# S3 support: Lance delegates to its internal object_store crate when
# given s3:// URIs. The "dynamodb" feature enables DynamoDB-based
# commit locking for multi-writer S3; we don't need that (single-writer)
# so just the base AWS/S3 feature is enough.
# Lance 4.0 feature "aws" enables S3-compatible storage via its internal
# object_store + opendal crates. Reads AWS_* env vars for credentials.
lance = { version = "4.0", default-features = false, features = ["aws"] }
lance-index = { version = "4.0", default-features = false }
lance-linalg = { version = "4.0", default-features = false }
# These Arrow/Parquet versions MUST match Lance's expectations — Lance
# re-exports their types through its API so any mismatch is a hard
# compile error. Keep in sync with lance-bench.
arrow = "57"
arrow-array = "57"
arrow-schema = "57"
parquet = "57"
tokio = { version = "1", features = ["full"] }
futures = "0.3"
serde = { version = "1", features = ["derive"] }
bytes = "1"