4 Commits

Author SHA1 Message Date
root
bb05c4412e Phase 6: Ingest pipeline — CSV, JSON, PDF, text file support
- ingestd crate: detect file type → parse → schema detection → Parquet → catalog
- CSV: auto-detect column types (int, float, bool, string), handles $, %, commas
  Strips dollar signs from amounts, flexible row parsing, sanitized column names
- JSON: array or newline-delimited, nested object flattening (a.b.c → a_b_c)
- PDF: text extraction via lopdf, one row per page (source_file, page_number, text)
- Text/SMS: line-based ingestion with line numbers
- Dedup: SHA-256 content hash, re-ingest same file = no-op
- Gateway: POST /ingest/file multipart upload, 256MB body limit
- Schema detection per ADR-010: ambiguous types default to String
- 12 unit tests passing (CSV parsing, JSON flattening, type inference, dedup)
- Tested: messy CSV with missing data, dollar amounts, N/A values → queryable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:07:31 -05:00
root
387ce0074c UI: full-stack test coverage with tabs for Query, Storage, AI, Status
- Query tab: SQL editor with results table (existing)
- Storage tab: list objects, register datasets pointing at storage keys
- AI tab: embed (nomic-embed-text), generate (qwen2.5), rerank with scored results
- Status tab: health checks for all 5 services + functional tests (embed, generate, SQL)
- nginx: added /lakehouse/ and API proxy paths to devop.live config
- Loaded 3 sample datasets: employees, events, products
- Fixed Rust 2024 reserved keyword `gen`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:56:18 -05:00
root
01373c0e45 Phase 5: hardening — gRPC, observability, auth, config
- proto: lakehouse.proto with CatalogService, QueryService, StorageService, AiService
- proto crate: tonic-build codegen from proto definitions
- catalogd: gRPC CatalogService implementation
- gateway: dual HTTP (:3100) + gRPC (:3101) servers
- gateway: OpenTelemetry tracing with stdout exporter
- gateway: API key auth middleware (toggleable)
- shared: TOML config system with typed structs and defaults
- lakehouse.toml config file
- ADR-006 and ADR-007 documented

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:37:07 -05:00
root
50a8c8013f Phase 4: Dioxus frontend with dataset browser and SQL query editor
- ui: Dioxus WASM app with dataset sidebar, SQL editor (Ctrl+Enter), results table
- ui: dynamic API base URL (same-origin for nginx, port-based for local dev)
- gateway: CORS enabled for cross-origin requests
- nginx: lakehouse.devop.live proxies UI (:3300) + API (:3100) on same origin
- justfile: ui-build, ui-serve, sidecar, up commands added

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:24:15 -05:00