lakehouse

History

root 6740a017c7 PRD v2: production roadmap with ingest, vector search, hot cache phases

- Phase 6: Ingest pipeline (CSV/JSON → schema detect → Parquet → catalog)
- Phase 7: Vector index + RAG (embed → HNSW → semantic search → LLM answer)
- Phase 8: Hot cache + incremental updates (MemTable, delta files, merge-on-read)
- ADR-008 through ADR-011: embeddings as Parquet, delta files not Delta Lake,
  schema defaults to string, not a CRM replacement
- Staffing company reference dataset (286K rows, 7 tables)
- Honest risk assessment: vector search at scale and incremental updates are hard

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-27 07:54:24 -05:00

DECISIONS.md

PRD v2: production roadmap with ingest, vector search, hot cache phases

2026-03-27 07:54:24 -05:00

PHASES.md

Phase 5: hardening — gRPC, observability, auth, config

2026-03-27 06:37:07 -05:00

PRD.md

PRD v2: production roadmap with ingest, vector search, hot cache phases

2026-03-27 07:54:24 -05:00