10 Commits

Author SHA1 Message Date
root
ed17216005 Fix browser crash: limit schema context + cap table rows at 200
- Schema context limited to 7 core staffing tables (was all 12+)
- Results table capped at 200 rows to prevent DOM explosion
- Shows "first 200 of N rows" when truncated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:48:32 -05:00
root
0bd753294b Robust SQL extraction: handles explanations, markdown, prefixes
clean_sql now uses 3 strategies in priority order:
1. Extract from ```sql...``` markdown blocks
2. Find first SELECT/WITH/INSERT statement in text
3. Strip leading "sql" keyword fallback

Tested against 5 real model output patterns:
- Clean SQL ✓
- "sql" prefixed ✓
- Markdown fenced ✓
- Explanation before ```sql block ✓
- Explanation with SELECT buried in text ✓

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:42:11 -05:00
root
34c03894ae Auto-retry on ALL SQL errors, not just schema errors
Previous: only retried on "Schema error" or "No field named"
Now: retries on any error (type mismatches, execution errors, etc.)
Model gets full error message + schema to write corrected SQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:38:27 -05:00
root
ddcbb9590c Fix SQL generation: clean_sql helper + relationship hints + verified
- clean_sql() strips markdown fences, leading "sql" keyword, trailing explanations
- Schema context now includes table relationships (JOIN paths)
- Explicit note: "vertical only in candidates/clients/job_orders, JOIN for others"
- Full column paths (table.column) in schema to reduce ambiguity
- Auto-retry on schema errors feeds error + schema back to model
- TESTED: 4 questions all return correct results:
  "highest avg salary" → IT $2,213 ✓
  "top 5 earning over $50/hr" → correct candidates ✓
  "most placements by vertical" → Industrial 10,096 ✓
  "revenue by client" → 1,996 clients ✓

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:36:55 -05:00
root
2c5aeaeada Fix SQL generation: stricter prompt + auto-retry on schema errors
- Prompt now says "CRITICAL: ONLY use columns from schema, do NOT invent"
- Strips markdown backticks from model output
- Auto-retry: if SQL fails with "Schema error" or "No field named",
  feeds the error + schema back to the model for a corrected query
- Both button click and Enter key paths have retry logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:32:25 -05:00
root
399fc81ab5 UI: Dashboard + Ingest tabs showing full system progression
- Dashboard: live stats (datasets, rows, embeddings, HNSW, tools, cache)
  Architecture overview (6 capability areas)
  Build progression timeline (all 17 phases listed)
- Ingest tab: Postgres table browser + import, file upload info, inbox watcher
- System tab: existing health checks
- Starts on Dashboard for immediate overview
- No futures::executor in WASM — all async/await

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:22:36 -05:00
root
b37e171e10 UI redesign: Ask, Explore, SQL, System tabs
- Ask: natural language → AI generates SQL → DataFusion executes → results
  Shows the AI-over-data-lake story: schema introspection → LLM → query
- Explore: click dataset → schema + preview + AI-generated summary
- SQL: raw DataFusion SQL editor with Ctrl+Enter
- System: health grid testing all 5 services + embeddings + generation
- Example prompts for quick demo
- Dark theme with accent styling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:24:51 -05:00
root
b235ef9201 Fix nginx route collision — namespace lakehouse API under /lakehouse/api/
Previous regex routes for /catalog, /storage, /health intercepted main site.
Now all lakehouse API calls go through /lakehouse/api/ prefix, stripped by nginx rewrite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:57:58 -05:00
root
387ce0074c UI: full-stack test coverage with tabs for Query, Storage, AI, Status
- Query tab: SQL editor with results table (existing)
- Storage tab: list objects, register datasets pointing at storage keys
- AI tab: embed (nomic-embed-text), generate (qwen2.5), rerank with scored results
- Status tab: health checks for all 5 services + functional tests (embed, generate, SQL)
- nginx: added /lakehouse/ and API proxy paths to devop.live config
- Loaded 3 sample datasets: employees, events, products
- Fixed Rust 2024 reserved keyword `gen`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:56:18 -05:00
root
50a8c8013f Phase 4: Dioxus frontend with dataset browser and SQL query editor
- ui: Dioxus WASM app with dataset sidebar, SQL editor (Ctrl+Enter), results table
- ui: dynamic API base URL (same-origin for nginx, port-based for local dev)
- gateway: CORS enabled for cross-origin requests
- nginx: lakehouse.devop.live proxies UI (:3300) + API (:3100) on same origin
- justfile: ui-build, ui-serve, sidecar, up commands added

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:24:15 -05:00