- queryd: SessionContext with custom URL scheme to avoid path doubling with LocalFileSystem
- queryd: ListingTable registration from catalog ObjectRefs with schema inference
- queryd: POST /query/sql returns JSON {columns, rows, row_count}
- queryd→catalogd wiring: reads all datasets, registers as named tables
- gateway: wires QueryEngine with shared store + registry
- e2e verified: SELECT *, WHERE/ORDER BY, COUNT/AVG all correct
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.5 KiB
2.5 KiB
Phase Tracker
Phase 0: Bootstrap
- 0.1 — Cargo workspace with all crate stubs compiling
- 0.2 —
sharedcrate: error types, ObjectRef, DatasetId - 0.3 —
gatewaywith Axum: GET /health → 200 - 0.4 — tracing + tracing-subscriber wired in gateway
- 0.5 — justfile with build, test, run recipes
- 0.6 — docs committed to git
Gate: PASSED — All crates compile. Gateway runs. Logs emit. Docs committed.
Phase 1: Storage + Catalog
- 1.1 — storaged: object_store backend init (LocalFileSystem)
- 1.2 — storaged: Axum endpoints (PUT/GET/DELETE/LIST /objects/{key})
- 1.3 — shared/arrow_helpers.rs: RecordBatch ↔ Parquet + schema fingerprinting
- 1.4 — catalogd/registry.rs: in-memory index + manifest persistence to object storage
- 1.5 — catalogd/schema.rs: schema fingerprinting (merged into shared/arrow_helpers.rs)
- 1.6 — catalogd service: POST/GET /datasets + GET /datasets/by-name/{name}
- 1.7 — gateway routes to storaged + catalogd with shared state
Gate: PASSED — PUT object → register dataset → list → get by name. All via gateway HTTP.
Phase 2: Query Engine
- 2.1 — queryd: SessionContext + object_store config (custom scheme to avoid path doubling)
- 2.2 — queryd: ListingTable from catalog ObjectRefs with schema inference
- 2.3 — queryd service: POST /query/sql → JSON (columns + rows + row_count)
- 2.4 — queryd → catalogd wiring (reads dataset list, registers as tables)
- 2.5 — gateway routes /query with QueryEngine state
Gate: PASSED — SELECT *, WHERE/ORDER BY, COUNT/AVG all return correct results via catalog.
Phase 3: AI Integration
- 3.1 — Python sidecar: FastAPI + Ollama (embed/generate/rerank)
- 3.2 — Dockerfile for sidecar
- 3.3 — aibridge/client.rs: HTTP client to sidecar
- 3.4 — aibridge service: Axum proxy endpoints
- 3.5 — Model config via env vars
Gate: Rust → Python → Ollama → real embeddings return.
Phase 4: Frontend
- 4.1 — Dioxus scaffold, WASM build
- 4.2 — Dataset browser
- 4.3 — Query editor + results table
- 4.4 — Error display + loading states
Gate: Browse datasets and query from browser.
Phase 5: Hardening
- 5.1 — Proto definitions
- 5.2 — Internal gRPC migration
- 5.3 — OpenTelemetry tracing
- 5.4 — Auth middleware
- 5.5 — Config-driven startup
Gate: gRPC internals, traces, auth, restartable from repo + config.