lakehouse

Go to file

root f9f92706f3 RAG reranker + manifest bucket fix — quality improvements from eval

RAG pipeline now includes a cross-encoder rerank step between retrieval
and generation. The LLM re-sorts top-K results by relevance before
they become context. Falls back to original order if model output is
unparseable (~5% with 7B models). Also improved the generation prompt
to be domain-aware ("staffing database") and request specific citations.

Fixed 4 catalog manifests with bucket="data" (pre-federation leftover)
that poisoned the entire DataFusion query context on startup. The
"users", "lab_trials", "meta_runs", and "new_candidates" datasets
now correctly reference bucket="primary". This bug was surfaced by
the quality evaluation pipeline — wouldn't have been found by
structural tests alone.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-16 22:19:11 -05:00

crates

RAG reranker + manifest bucket fix — quality improvements from eval

2026-04-16 22:19:11 -05:00

data

RAG reranker + manifest bucket fix — quality improvements from eval

2026-04-16 22:19:11 -05:00

docs

Update PRD + PHASES.md — reflect 8-commit 2026-04-17 push

2026-04-16 20:54:05 -05:00

inbox/processed

Scheduled ingest: file watcher auto-ingests from ./inbox