root
|
354c9c4a04
|
PRD v3: future-proofing roadmap — event journal, rich catalog, tool registry
Phases 9-15 designed based on "future regret" analysis:
- Phase 9: Event journal (append-only mutation history — can't retrofit)
- Phase 10: Rich catalog v2 (ownership, sensitivity, lineage, freshness)
- Phase 11: Embedding versioning (model-proof vector layer)
- Phase 12: Tool registry (governed agent actions via MCP)
- Phase 13: Security & access control (field-level, row-level, audit)
- Phase 14: Schema evolution with AI migration rules
- Phase 15+: Federated query, DB connectors, OCR, fine-tuned models
8 design principles: store truth openly, describe richly, never destroy
evidence, secure centrally, expose through tools, version everything,
unstructured first-class, separate storage/compute/intelligence.
ADR-012 through ADR-016 documenting key future-proofing decisions.
Updated benchmarks: 2.47M rows, hot cache 9.8x speedup.
Updated operating rules: cheap-now/expensive-later built first.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-27 08:57:29 -05:00 |
|
root
|
6740a017c7
|
PRD v2: production roadmap with ingest, vector search, hot cache phases
- Phase 6: Ingest pipeline (CSV/JSON → schema detect → Parquet → catalog)
- Phase 7: Vector index + RAG (embed → HNSW → semantic search → LLM answer)
- Phase 8: Hot cache + incremental updates (MemTable, delta files, merge-on-read)
- ADR-008 through ADR-011: embeddings as Parquet, delta files not Delta Lake,
schema defaults to string, not a CRM replacement
- Staffing company reference dataset (286K rows, 7 tables)
- Honest risk assessment: vector search at scale and incremental updates are hard
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-27 07:54:24 -05:00 |
|
root
|
01373c0e45
|
Phase 5: hardening — gRPC, observability, auth, config
- proto: lakehouse.proto with CatalogService, QueryService, StorageService, AiService
- proto crate: tonic-build codegen from proto definitions
- catalogd: gRPC CatalogService implementation
- gateway: dual HTTP (:3100) + gRPC (:3101) servers
- gateway: OpenTelemetry tracing with stdout exporter
- gateway: API key auth middleware (toggleable)
- shared: TOML config system with typed structs and defaults
- lakehouse.toml config file
- ADR-006 and ADR-007 documented
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-27 06:37:07 -05:00 |
|
root
|
a52ca841c6
|
Phase 0: bootstrap Rust workspace
- Cargo workspace with 6 crates: shared, storaged, catalogd, queryd, aibridge, gateway
- shared: types (DatasetId, ObjectRef, SchemaFingerprint, DatasetManifest) + error enum
- gateway: Axum HTTP entrypoint with nested service routers + tracing
- All services expose /health stubs
- justfile with build/test/run recipes
- PRD, phase tracker, and ADR docs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-27 04:59:05 -05:00 |
|