10,000 staffing worker profiles from profit/ethereal repo. Flattened
JSON → CSV → Parquet. Indexed on HNSW (9.5s) + Lance IVF_PQ (7.2s).
SQL hybrid verified: forklift operators in IL with reliability > 0.8
returned exact matches. Vector search alone missed the state filter —
confirms the hybrid SQL+vector routing need from quality eval.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
clean_sql now uses 3 strategies in priority order:
1. Extract from ```sql...``` markdown blocks
2. Find first SELECT/WITH/INSERT statement in text
3. Strip leading "sql" keyword fallback
Tested against 5 real model output patterns:
- Clean SQL ✓
- "sql" prefixed ✓
- Markdown fenced ✓
- Explanation before ```sql block ✓
- Explanation with SELECT buried in text ✓
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- POST /ingest/postgres/tables — list all tables in a database
- POST /ingest/postgres/import — import table → Parquet → catalog → queryable
- Auto type mapping: int2/4/8 → Int, float4/8 → Float64, bool → Boolean,
text/varchar/jsonb/timestamp → Utf8 (safe default per ADR-010)
- Auto PII detection + lineage on import
- Empty password support for trust auth
- Tested: imported lab_trials (40 rows, 10 cols) and threat_intel (20 rows, 30 cols)
from local knowledge_base Postgres database — immediately queryable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drop CSV/JSON/PDF/text into ./inbox → auto-detected → Parquet → queryable
- Polls every 10 seconds (configurable)
- Processed files moved to ./inbox/processed/
- Failed files moved to ./inbox/failed/
- Dedup: same file dropped twice = no-op
- Watcher starts automatically on gateway boot
- Tested: CSV dropped → queryable in <15s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ui: Dioxus WASM app with dataset sidebar, SQL editor (Ctrl+Enter), results table
- ui: dynamic API base URL (same-origin for nginx, port-based for local dev)
- gateway: CORS enabled for cross-origin requests
- nginx: lakehouse.devop.live proxies UI (:3300) + API (:3100) on same origin
- justfile: ui-build, ui-serve, sidecar, up commands added
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>