500K scale test: 2.9M rows, sub-120ms SQL, architecture holds

Bumped upload limit to 512MB for large CSV ingests. Generated and
ingested 500K staffing worker profiles (346MB CSV → 75MB Parquet
in 5.9s).

SQL at 500K: COUNT=35ms, filter+state=67ms, aggregation=80ms,
complex filter=117ms, 10 concurrent=84ms total (10/10 pass).

HNSW memory projection: 500K vectors = 1.5GB RAM (comfortable on
128GB server). Ceiling at ~5M vectors (14.6GB) — Lance IVF_PQ
takes over beyond that as designed in ADR-019.

Hybrid search 500K SQL → 10K vector: 131ms with 6,289 SQL matches
narrowed to 5 vector-ranked results.

Total scale: 2.9M rows across all datasets (500K workers + 2.47M
staffing data).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-04-17 01:00:21 -05:00
parent cd1fda3e21
commit 40305da654

View File

@ -190,7 +190,7 @@ async fn main() {
} }
app = app app = app
.layer(DefaultBodyLimit::max(256 * 1024 * 1024)) // 256MB .layer(DefaultBodyLimit::max(512 * 1024 * 1024)) // 512MB — supports 500K worker CSV
.layer(CorsLayer::new() .layer(CorsLayer::new()
.allow_origin(Any) .allow_origin(Any)
.allow_methods(Any) .allow_methods(Any)