lakehouse

Author	SHA1	Message	Date
root	fdc5123f6d	cleanup: drop workspace warnings from 11 to 6 Some checks failed lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts Three trivial cleanups that pull the workspace baseline down by five: - vectord/trial.rs: removed unused ObjectStore import (not referenced anywhere in the file; cargo's unused_imports lint was flagging it on every check). Net: -2 warnings (cascade effect from one import). - ui/main.rs:1241: `Err(e)` with unused binding → `Err(_)`. - ui/main.rs:1247: `let mut import_table` never mutated → `let`. Matters because the scrum_applier's hardened warning-count gate uses this baseline as its reject threshold. Lower baseline = lower floor = any future patch that adds a warning trips the gate earlier. Remaining 6 warnings are all aibridge context::estimate_tokens deprecation notices pointing at a planned-but-unbuilt shared::model_matrix::ModelMatrix::estimate_tokens. Fix requires creating that type (next commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 06:28:36 -05:00
root	25b7e6c3a7	Phase 19 wiring + Path 1/2 work + chain integrity fixes Backend: - crates/vectord/src/playbook_memory.rs (new): Phase 19 in-memory boost store with seed/rebuild/snapshot, plus temporal decay (e^-age/30 per playbook), persist_to_sql endpoint backing successful_playbooks_live, and discover_patterns endpoint for meta-index pattern aggregation (recurring certs/skills/archetype/reliability across similar past fills). - DEFAULT_TOP_K_PLAYBOOKS bumped 5 → 25; old default silently missed most boosts when memory had > 25 entries. - service.rs: new routes /vectors/playbook_memory/{seed,rebuild,stats, persist_sql,patterns}. Bun staffing co-pilot (mcp-server/): - /search, /match, /verify, /proof, /simulation/run, MCP tools all forward use_playbook_memory:true and playbook_memory_k:25 to the hybrid endpoint. Boost was previously dark across the entire app. - /log no longer POSTs to /ingest/file — that endpoint REPLACES the dataset's object list, so single-row CSV writes were wiping all prior rows in successful_playbooks (sp_rows went 33→1 in one /log call). /log now seeds playbook_memory with canonical short text and calls /persist_sql to keep successful_playbooks_live in sync. - /simulation/run cumulative end-of-week CSV write removed for the same reason. Per-day per-contract /seed (added in this session) is the accumulating feedback path now. - search.html addWorkerInsight renders a green "Endorsed · N playbooks" chip with playbook citations when boost > 0. Internal Dioxus UI (crates/ui/): - Dashboard phase list rewritten through Phase 19 (was stuck at "Phase 16: File Watcher" / "Phase 17: DB Connector" — both wrong). - Removed fabricated "27ms" stat label. - Ask tab examples + SQL default replaced with real staffing prompts against candidates/clients/job_orders (was referencing nonexistent employees/products/events). - New Playbook tab exposes /vectors/playbook_memory/{stats,rebuild} and side-by-side hybrid search (boost OFF vs ON) with citations. Tests (tests/multi-agent/): - run_e2e_rated.ts: parallel two-agent (mistral + qwen2.5) build phase + verifier rating (geo, auth, persist, boost, speed → /10). - network_proving.ts: continuous build → verify → repeat with staffing-recruiter profile hot-swap; geo-discrimination check. - chain_of_custody.ts: single recruiter operation traced through every layer (Bun /search, direct /vectors/hybrid parity, /log, SQL, playbook_memory growth, profile activation, post-op boost lift).	2026-04-20 06:21:13 -05:00
root	fdb2e9cda8	Fix browser crash: cache schema context, lazy Dashboard, default to Ask tab Root cause: Dashboard auto-fired 6+ API calls on load, then Ask tab fired 7 DESCRIBE queries per question — 15+ concurrent requests from WASM. Fixes: - Schema context cached after first build (7 DESCRIBE → 0 on subsequent questions) - Dashboard lazy-loads only when tab clicked (not on app mount) - Default tab changed back to Ask (no background API storm) - std::sync::Mutex for WASM compat (no tokio in browser) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 21:18:53 -05:00
root	238cb84d26	Server-side pagination for large result sets - ResultStore: execute query, store batches server-side, serve pages on demand - POST /query/paged → returns query_id + total_rows + page count (no rows) - GET /query/page/{id}/{page}?size=100 → returns one page of rows - RecordBatch slicing for efficient page extraction from Arrow batches - LRU eviction: keeps 50 most recent query results in memory - Tested: 100K rows → 1,000 pages of 100, any page fetchable by number - Supervisor pattern: chunk results, serve on demand, retry-safe (idempotent GET) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 20:54:44 -05:00
root	ed17216005	Fix browser crash: limit schema context + cap table rows at 200 - Schema context limited to 7 core staffing tables (was all 12+) - Results table capped at 200 rows to prevent DOM explosion - Shows "first 200 of N rows" when truncated Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 20:48:32 -05:00
root	0bd753294b	Robust SQL extraction: handles explanations, markdown, prefixes clean_sql now uses 3 strategies in priority order: 1. Extract from ```sql...``` markdown blocks 2. Find first SELECT/WITH/INSERT statement in text 3. Strip leading "sql" keyword fallback Tested against 5 real model output patterns: - Clean SQL ✓ - "sql" prefixed ✓ - Markdown fenced ✓ - Explanation before ```sql block ✓ - Explanation with SELECT buried in text ✓ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 20:42:11 -05:00
root	34c03894ae	Auto-retry on ALL SQL errors, not just schema errors Previous: only retried on "Schema error" or "No field named" Now: retries on any error (type mismatches, execution errors, etc.) Model gets full error message + schema to write corrected SQL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 20:38:27 -05:00
root	ddcbb9590c	Fix SQL generation: clean_sql helper + relationship hints + verified - clean_sql() strips markdown fences, leading "sql" keyword, trailing explanations - Schema context now includes table relationships (JOIN paths) - Explicit note: "vertical only in candidates/clients/job_orders, JOIN for others" - Full column paths (table.column) in schema to reduce ambiguity - Auto-retry on schema errors feeds error + schema back to model - TESTED: 4 questions all return correct results: "highest avg salary" → IT $2,213 ✓ "top 5 earning over $50/hr" → correct candidates ✓ "most placements by vertical" → Industrial 10,096 ✓ "revenue by client" → 1,996 clients ✓ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 20:36:55 -05:00
root	2c5aeaeada	Fix SQL generation: stricter prompt + auto-retry on schema errors - Prompt now says "CRITICAL: ONLY use columns from schema, do NOT invent" - Strips markdown backticks from model output - Auto-retry: if SQL fails with "Schema error" or "No field named", feeds the error + schema back to the model for a corrected query - Both button click and Enter key paths have retry logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 20:32:25 -05:00
root	399fc81ab5	UI: Dashboard + Ingest tabs showing full system progression - Dashboard: live stats (datasets, rows, embeddings, HNSW, tools, cache) Architecture overview (6 capability areas) Build progression timeline (all 17 phases listed) - Ingest tab: Postgres table browser + import, file upload info, inbox watcher - System tab: existing health checks - Starts on Dashboard for immediate overview - No futures::executor in WASM — all async/await Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 20:22:36 -05:00
root	b37e171e10	UI redesign: Ask, Explore, SQL, System tabs - Ask: natural language → AI generates SQL → DataFusion executes → results Shows the AI-over-data-lake story: schema introspection → LLM → query - Explore: click dataset → schema + preview + AI-generated summary - SQL: raw DataFusion SQL editor with Ctrl+Enter - System: health grid testing all 5 services + embeddings + generation - Example prompts for quick demo - Dark theme with accent styling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 07:24:51 -05:00
root	b235ef9201	Fix nginx route collision — namespace lakehouse API under /lakehouse/api/ Previous regex routes for /catalog, /storage, /health intercepted main site. Now all lakehouse API calls go through /lakehouse/api/ prefix, stripped by nginx rewrite. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 06:57:58 -05:00
root	387ce0074c	UI: full-stack test coverage with tabs for Query, Storage, AI, Status - Query tab: SQL editor with results table (existing) - Storage tab: list objects, register datasets pointing at storage keys - AI tab: embed (nomic-embed-text), generate (qwen2.5), rerank with scored results - Status tab: health checks for all 5 services + functional tests (embed, generate, SQL) - nginx: added /lakehouse/ and API proxy paths to devop.live config - Loaded 3 sample datasets: employees, events, products - Fixed Rust 2024 reserved keyword `gen` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 06:56:18 -05:00
root	50a8c8013f	Phase 4: Dioxus frontend with dataset browser and SQL query editor - ui: Dioxus WASM app with dataset sidebar, SQL editor (Ctrl+Enter), results table - ui: dynamic API base URL (same-origin for nginx, port-based for local dev) - gateway: CORS enabled for cross-origin requests - nginx: lakehouse.devop.live proxies UI (:3300) + API (:3100) on same origin - justfile: ui-build, ui-serve, sidecar, up commands added Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 06:24:15 -05:00

14 Commits