lakehouse/docs/PHASES.md
root 655b6c0b37 Phase 1: storage + catalog layer
- storaged: object_store backend (LocalFileSystem), PUT/GET/DELETE/LIST endpoints
- shared: arrow_helpers with Parquet roundtrip + schema fingerprinting (2 tests)
- catalogd: in-memory registry with write-ahead manifest persistence to object storage
- catalogd: POST/GET /datasets, GET /datasets/by-name/{name}
- gateway: wires storaged + catalogd with shared object_store state
- Phase tracker updated: Phase 0 + Phase 1 gates passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 05:15:27 -05:00

58 lines
2.3 KiB
Markdown

# Phase Tracker
## Phase 0: Bootstrap
- [x] 0.1 — Cargo workspace with all crate stubs compiling
- [x] 0.2 — `shared` crate: error types, ObjectRef, DatasetId
- [x] 0.3 — `gateway` with Axum: GET /health → 200
- [x] 0.4 — tracing + tracing-subscriber wired in gateway
- [x] 0.5 — justfile with build, test, run recipes
- [x] 0.6 — docs committed to git
**Gate: PASSED** — All crates compile. Gateway runs. Logs emit. Docs committed.
## Phase 1: Storage + Catalog
- [x] 1.1 — storaged: object_store backend init (LocalFileSystem)
- [x] 1.2 — storaged: Axum endpoints (PUT/GET/DELETE/LIST /objects/{key})
- [x] 1.3 — shared/arrow_helpers.rs: RecordBatch ↔ Parquet + schema fingerprinting
- [x] 1.4 — catalogd/registry.rs: in-memory index + manifest persistence to object storage
- [x] 1.5 — catalogd/schema.rs: schema fingerprinting (merged into shared/arrow_helpers.rs)
- [x] 1.6 — catalogd service: POST/GET /datasets + GET /datasets/by-name/{name}
- [x] 1.7 — gateway routes to storaged + catalogd with shared state
**Gate: PASSED** — PUT object → register dataset → list → get by name. All via gateway HTTP.
## Phase 2: Query Engine
- [ ] 2.1 — queryd: SessionContext + object_store config
- [ ] 2.2 — queryd: ListingTable from catalog ObjectRefs
- [ ] 2.3 — queryd service: POST /query → Arrow IPC or JSON
- [ ] 2.4 — queryd → catalogd wiring
- [ ] 2.5 — gateway routes /query
**Gate:** SQL over Parquet returns correct results via catalog resolution.
## Phase 3: AI Integration
- [ ] 3.1 — Python sidecar: FastAPI + Ollama (embed/generate/rerank)
- [ ] 3.2 — Dockerfile for sidecar
- [ ] 3.3 — aibridge/client.rs: HTTP client to sidecar
- [ ] 3.4 — aibridge service: Axum proxy endpoints
- [ ] 3.5 — Model config via env vars
**Gate:** Rust → Python → Ollama → real embeddings return.
## Phase 4: Frontend
- [ ] 4.1 — Dioxus scaffold, WASM build
- [ ] 4.2 — Dataset browser
- [ ] 4.3 — Query editor + results table
- [ ] 4.4 — Error display + loading states
**Gate:** Browse datasets and query from browser.
## Phase 5: Hardening
- [ ] 5.1 — Proto definitions
- [ ] 5.2 — Internal gRPC migration
- [ ] 5.3 — OpenTelemetry tracing
- [ ] 5.4 — Auth middleware
- [ ] 5.5 — Config-driven startup
**Gate:** gRPC internals, traces, auth, restartable from repo + config.