lakehouse/docs/PHASES.md
root a52ca841c6 Phase 0: bootstrap Rust workspace
- Cargo workspace with 6 crates: shared, storaged, catalogd, queryd, aibridge, gateway
- shared: types (DatasetId, ObjectRef, SchemaFingerprint, DatasetManifest) + error enum
- gateway: Axum HTTP entrypoint with nested service routers + tracing
- All services expose /health stubs
- justfile with build/test/run recipes
- PRD, phase tracker, and ADR docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 04:59:05 -05:00

2.2 KiB

Phase Tracker

Phase 0: Bootstrap

  • 0.1 — Cargo workspace with all crate stubs compiling
  • 0.2 — shared crate: error types, ObjectRef, DatasetId
  • 0.3 — gateway with Axum: GET /health → 200
  • 0.4 — tracing + tracing-subscriber wired in gateway
  • 0.5 — justfile with build, test, run recipes
  • 0.6 — docs committed to git

Gate: All crates compile. Gateway runs. Logs emit. Docs committed.

Phase 1: Storage + Catalog

  • 1.1 — storaged: object_store backend init (LocalFileSystem → S3)
  • 1.2 — storaged: Axum endpoints (PUT/GET/DELETE /objects/{key})
  • 1.3 — shared/arrow.rs: RecordBatch ↔ Parquet helpers
  • 1.4 — catalogd/registry.rs: in-memory index + manifest persistence
  • 1.5 — catalogd/schema.rs: schema fingerprinting
  • 1.6 — catalogd service: POST/GET /datasets endpoints
  • 1.7 — gateway routes to storaged + catalogd

Gate: Upload Parquet → register → metadata → read back. All via gateway.

Phase 2: Query Engine

  • 2.1 — queryd: SessionContext + object_store config
  • 2.2 — queryd: ListingTable from catalog ObjectRefs
  • 2.3 — queryd service: POST /query → Arrow IPC or JSON
  • 2.4 — queryd → catalogd wiring
  • 2.5 — gateway routes /query

Gate: SQL over Parquet returns correct results via catalog resolution.

Phase 3: AI Integration

  • 3.1 — Python sidecar: FastAPI + Ollama (embed/generate/rerank)
  • 3.2 — Dockerfile for sidecar
  • 3.3 — aibridge/client.rs: HTTP client to sidecar
  • 3.4 — aibridge service: Axum proxy endpoints
  • 3.5 — Model config via env vars

Gate: Rust → Python → Ollama → real embeddings return.

Phase 4: Frontend

  • 4.1 — Dioxus scaffold, WASM build
  • 4.2 — Dataset browser
  • 4.3 — Query editor + results table
  • 4.4 — Error display + loading states

Gate: Browse datasets and query from browser.

Phase 5: Hardening

  • 5.1 — Proto definitions
  • 5.2 — Internal gRPC migration
  • 5.3 — OpenTelemetry tracing
  • 5.4 — Auth middleware
  • 5.5 — Config-driven startup

Gate: gRPC internals, traces, auth, restartable from repo + config.