lakehouse

Go to file

root 26fc98c885 Phase 7: Vector index + RAG pipeline

- vectord crate: chunk → embed → store → search → RAG
- chunker: configurable chunk size + overlap, sentence-boundary aware splitting
- store: embeddings as Parquet (binary blob f32 vectors), portable format
- search: brute-force cosine similarity (works up to ~100K vectors)
- rag: full pipeline — embed question → search index → retrieve context → LLM answer
- Endpoints: POST /vectors/index, /vectors/search, /vectors/rag
- Gateway wired with vectord service
- Tested: 200 candidate resumes indexed in 5.4s, semantic search + RAG working
- 20 unit tests passing (chunker, search, ingestd, shared)
- AI gives honest "no match found" when context doesn't support an answer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-27 08:12:28 -05:00

crates

Phase 7: Vector index + RAG pipeline

2026-03-27 08:12:28 -05:00

data

Phase 7: Vector index + RAG pipeline

2026-03-27 08:12:28 -05:00

docs

PRD v2: production roadmap with ingest, vector search, hot cache phases

2026-03-27 07:54:24 -05:00

proto

Phase 5: hardening — gRPC, observability, auth, config