5,000 workers embedded through nomic-embed-text (real, not random).
Results on REAL embeddings:
HNSW recall@10: 1.0000 p50: 762us — PERFECT
Lance recall@10: 0.9500 p50: 6.8ms — better than random vectors
SQL autonomous: 50/50 (100%)
Key finding: real embeddings IMPROVE Lance recall (0.95 vs 0.80 on
random vectors) because real text embeddings cluster by topic, making
IVF partitions more effective. The concern about degraded recall on
real data was wrong — it's the opposite.
Also discovered: the 50K embedding job DID complete (50K chunks in
234s) but the job progress tracker showed 0/0. The supervisor's
progress reporting has a bug — the actual embedding pipeline works.
Known remaining issue: hybrid search ID matching between workers_500k
(worker_id format) and vector index (W5K-{id} format) needs the
prefix stripping fix applied to the new index.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
151 lines
7.5 KiB
Plaintext
151 lines
7.5 KiB
Plaintext
2026-04-17 01:06:09 ═══ Scale test heartbeat: step= ═══
|
||
2026-04-17 01:06:09 Unknown state: . Resetting to start.
|
||
2026-04-17 01:06:09 Heartbeat done. Next step: start
|
||
2026-04-17 01:06:21 ═══ Scale test heartbeat: step=start ═══
|
||
2026-04-17 01:06:21 Step 1: Registering 10M vector index in catalog...
|
||
2026-04-17 01:06:21 Parquet exists: 29G
|
||
2026-04-17 01:06:21 Heartbeat done. Next step: migrate_lance
|
||
2026-04-17 01:08:01 ═══ Scale test heartbeat: step=migrate_lance ═══
|
||
═══ Scale test heartbeat: step=migrate_lance ═══
|
||
2026-04-17 01:08:01 Step 2: Migrating 10M vectors Parquet → Lance...
|
||
Step 2: Migrating 10M vectors Parquet → Lance...
|
||
2026-04-17 01:08:01 This will take several minutes for 28.8 GB...
|
||
This will take several minutes for 28.8 GB...
|
||
2026-04-17 01:08:01 Migration via API needs index registered. Using direct Lance path...
|
||
Migration via API needs index registered. Using direct Lance path...
|
||
Lance migration needs to read 28.8GB Parquet — this takes time...
|
||
Starting migration...
|
||
Error: HTTP Error 404: Not Found
|
||
Attempting direct Lance write...
|
||
2026-04-17 01:08:01 Heartbeat done. Next step: check_lance
|
||
Heartbeat done. Next step: check_lance
|
||
error: externally-managed-environment
|
||
|
||
× This environment is externally managed
|
||
╰─> To install Python packages system-wide, try apt install
|
||
python3-xyz, where xyz is the package you are trying to
|
||
install.
|
||
|
||
If you wish to install a non-Debian-packaged Python package,
|
||
create a virtual environment using python3 -m venv path/to/venv.
|
||
Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
|
||
sure you have python3-full installed.
|
||
|
||
If you wish to install a non-Debian packaged Python application,
|
||
it may be easiest to use pipx install xyz, which will manage a
|
||
virtual environment for you. Make sure you have pipx installed.
|
||
|
||
See /usr/share/doc/python3.13/README.venv for more information.
|
||
|
||
note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
|
||
hint: See PEP 668 for the detailed specification.
|
||
Traceback (most recent call last):
|
||
File "<string>", line 11, in <module>
|
||
import lance
|
||
ModuleNotFoundError: No module named 'lance'
|
||
|
||
During handling of the above exception, another exception occurred:
|
||
|
||
Traceback (most recent call last):
|
||
File "<string>", line 19, in <module>
|
||
import lance
|
||
ModuleNotFoundError: No module named 'lance'
|
||
Missing dep: No module named 'lance'
|
||
Installing lance...
|
||
2026-04-17 01:10:01 ═══ Scale test heartbeat: step=check_lance ═══
|
||
═══ Scale test heartbeat: step=check_lance ═══
|
||
2026-04-17 01:10:01 Step 2b: Checking Lance dataset status...
|
||
Step 2b: Checking Lance dataset status...
|
||
2026-04-17 01:10:02 Lance dataset not ready yet. Will retry on next heartbeat.
|
||
Lance dataset not ready yet. Will retry on next heartbeat.
|
||
2026-04-17 01:10:02 Heartbeat done. Next step: check_lance
|
||
Heartbeat done. Next step: check_lance
|
||
2026-04-17 01:12:01 ═══ Scale test heartbeat: step=check_lance ═══
|
||
═══ Scale test heartbeat: step=check_lance ═══
|
||
2026-04-17 01:12:01 Step 2b: Checking Lance dataset status...
|
||
Step 2b: Checking Lance dataset status...
|
||
2026-04-17 01:12:01 Lance dataset: 8000000 rows
|
||
Lance dataset: 8000000 rows
|
||
2026-04-17 01:12:01 Heartbeat done. Next step: build_index
|
||
Heartbeat done. Next step: build_index
|
||
Migrating 10,000,000 vectors to Lance...
|
||
500,000 / 10,000,000 (117,052/sec ETA 81s)
|
||
1,000,000 / 10,000,000 (121,674/sec ETA 74s)
|
||
1,500,000 / 10,000,000 (123,846/sec ETA 69s)
|
||
2,000,000 / 10,000,000 (124,296/sec ETA 64s)
|
||
2,500,000 / 10,000,000 (124,056/sec ETA 60s)
|
||
3,000,000 / 10,000,000 (124,131/sec ETA 56s)
|
||
3,500,000 / 10,000,000 (124,769/sec ETA 52s)
|
||
4,000,000 / 10,000,000 (125,028/sec ETA 48s)
|
||
4,500,000 / 10,000,000 (125,375/sec ETA 44s)
|
||
5,000,000 / 10,000,000 (125,476/sec ETA 40s)
|
||
5,500,000 / 10,000,000 (125,140/sec ETA 36s)
|
||
6,000,000 / 10,000,000 (124,899/sec ETA 32s)
|
||
6,500,000 / 10,000,000 (124,355/sec ETA 28s)
|
||
7,000,000 / 10,000,000 (123,762/sec ETA 24s)
|
||
7,500,000 / 10,000,000 (123,050/sec ETA 20s)
|
||
8,000,000 / 10,000,000 (122,744/sec ETA 16s)
|
||
8,500,000 / 10,000,000 (122,164/sec ETA 12s)
|
||
9,000,000 / 10,000,000 (121,839/sec ETA 8s)
|
||
9,500,000 / 10,000,000 (121,655/sec ETA 4s)
|
||
10,000,000 / 10,000,000 (121,529/sec ETA 0s)
|
||
Done: 10,000,000 rows in 82s
|
||
Verified: 10,000,000 rows in Lance
|
||
2026-04-17 01:14:01 ═══ Scale test heartbeat: step=build_index ═══
|
||
═══ Scale test heartbeat: step=build_index ═══
|
||
2026-04-17 01:14:01 Step 3: Building IVF_PQ index on 10M Lance dataset...
|
||
Step 3: Building IVF_PQ index on 10M Lance dataset...
|
||
2026-04-17 01:14:01 Using tuned config: 3162 partitions (√10M), 8 bits, 192 sub_vectors
|
||
Using tuned config: 3162 partitions (√10M), 8 bits, 192 sub_vectors
|
||
Fri Apr 17 01:16:01 AM CDT 2026 Already running (pid 957071)
|
||
[2026-04-17T06:16:02Z WARN lance::index::vector::builder] partition 2174 is empty, skipping
|
||
Dataset: 10,000,000 rows
|
||
Building IVF_PQ: 3162 partitions, 8 bits, 192 sub_vectors...
|
||
Index built in 173s
|
||
|
||
=== Search benchmark: 10 queries on 10M vectors ===
|
||
First query: 19ms, 10 hits
|
||
Top hit: VEC-2662261
|
||
p50=5ms p95=19ms avg=6ms
|
||
All 10 searches completed on 10M vectors
|
||
|
||
═══════════════════════════════════════════════════════════
|
||
10M VECTOR SCALE TEST — RESULTS
|
||
═══════════════════════════════════════════════════════════
|
||
Vectors: 10,000,000
|
||
Dimensions: 768
|
||
Storage: 30 GB (Lance on disk)
|
||
IVF_PQ build: 173 seconds (3162 partitions, 192 sub_vectors)
|
||
Search p50: 5ms
|
||
Search p95: 19ms
|
||
|
||
HNSW at 10M would need: 29 GB RAM (past ceiling)
|
||
Lance at 10M: 30 GB disk, 5ms search
|
||
|
||
THIS IS THE PROOF: Lance handles what HNSW can't.
|
||
═══════════════════════════════════════════════════════════
|
||
|
||
Fri Apr 17 01:18:01 AM CDT 2026 Already running (pid 957071)
|
||
2026-04-17 01:19:10 IVF_PQ built in 309.85242s
|
||
IVF_PQ built in 309.85242s
|
||
2026-04-17 01:19:10 Heartbeat done. Next step: search_test
|
||
Heartbeat done. Next step: search_test
|
||
2026-04-17 01:20:01 ═══ Scale test heartbeat: step=search_test ═══
|
||
═══ Scale test heartbeat: step=search_test ═══
|
||
2026-04-17 01:20:01 Step 4: Search benchmark on 10M vectors...
|
||
Step 4: Search benchmark on 10M vectors...
|
||
Running 10 searches on 10M Lance dataset...
|
||
Search 0 failed: HTTP Error 422: Unprocessable Entity
|
||
Search 1 failed: HTTP Error 422: Unprocessable Entity
|
||
Search 2 failed: HTTP Error 422: Unprocessable Entity
|
||
Search 3 failed: HTTP Error 422: Unprocessable Entity
|
||
Search 4 failed: HTTP Error 422: Unprocessable Entity
|
||
Search 5 failed: HTTP Error 422: Unprocessable Entity
|
||
Search 6 failed: HTTP Error 422: Unprocessable Entity
|
||
Search 7 failed: HTTP Error 422: Unprocessable Entity
|
||
Search 8 failed: HTTP Error 422: Unprocessable Entity
|
||
Search 9 failed: HTTP Error 422: Unprocessable Entity
|
||
No successful searches
|
||
2026-04-17 01:20:01 Heartbeat done. Next step: hot_swap_test
|
||
Heartbeat done. Next step: hot_swap_test
|