lakehouse

History

root f9f92706f3 RAG reranker + manifest bucket fix — quality improvements from eval

RAG pipeline now includes a cross-encoder rerank step between retrieval
and generation. The LLM re-sorts top-K results by relevance before
they become context. Falls back to original order if model output is
unparseable (~5% with 7B models). Also improved the generation prompt
to be domain-aware ("staffing database") and request specific citations.

Fixed 4 catalog manifests with bucket="data" (pre-federation leftover)
that poisoned the entire DataFusion query context on startup. The
"users", "lab_trials", "meta_runs", and "new_candidates" datasets
now correctly reference bucket="primary". This bug was surfaced by
the quality evaluation pipeline — wouldn't have been found by
structural tests alone.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-16 22:19:11 -05:00

_catalog/manifests

RAG reranker + manifest bucket fix — quality improvements from eval

2026-04-16 22:19:11 -05:00

datasets

Robust SQL extraction: handles explanations, markdown, prefixes

2026-03-27 20:42:11 -05:00

journal

Stress test suite: 9/9 passed — architecture validated

2026-03-27 22:13:27 -05:00

vectors

100K embedding COMPLETE: 177/sec, 9.5 min, zero failures

2026-03-27 09:53:47 -05:00