14 Commits

Author SHA1 Message Date
root
fdc5123f6d cleanup: drop workspace warnings from 11 to 6
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Three trivial cleanups that pull the workspace baseline down by five:

  - vectord/trial.rs: removed unused ObjectStore import (not referenced
    anywhere in the file; cargo's unused_imports lint was flagging it
    on every check). Net: -2 warnings (cascade effect from one import).
  - ui/main.rs:1241: `Err(e)` with unused binding → `Err(_)`.
  - ui/main.rs:1247: `let mut import_table` never mutated → `let`.

Matters because the scrum_applier's hardened warning-count gate uses
this baseline as its reject threshold. Lower baseline = lower floor
= any future patch that adds a warning trips the gate earlier.

Remaining 6 warnings are all aibridge context::estimate_tokens
deprecation notices pointing at a planned-but-unbuilt
shared::model_matrix::ModelMatrix::estimate_tokens. Fix requires
creating that type (next commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 06:28:36 -05:00
root
25b7e6c3a7 Phase 19 wiring + Path 1/2 work + chain integrity fixes
Backend:
- crates/vectord/src/playbook_memory.rs (new): Phase 19 in-memory boost
  store with seed/rebuild/snapshot, plus temporal decay (e^-age/30 per
  playbook), persist_to_sql endpoint backing successful_playbooks_live,
  and discover_patterns endpoint for meta-index pattern aggregation
  (recurring certs/skills/archetype/reliability across similar past fills).
- DEFAULT_TOP_K_PLAYBOOKS bumped 5 → 25; old default silently missed
  most boosts when memory had > 25 entries.
- service.rs: new routes /vectors/playbook_memory/{seed,rebuild,stats,
  persist_sql,patterns}.

Bun staffing co-pilot (mcp-server/):
- /search, /match, /verify, /proof, /simulation/run, MCP tools all
  forward use_playbook_memory:true and playbook_memory_k:25 to the
  hybrid endpoint. Boost was previously dark across the entire app.
- /log no longer POSTs to /ingest/file — that endpoint REPLACES the
  dataset's object list, so single-row CSV writes were wiping all prior
  rows in successful_playbooks (sp_rows went 33→1 in one /log call).
  /log now seeds playbook_memory with canonical short text and calls
  /persist_sql to keep successful_playbooks_live in sync.
- /simulation/run cumulative end-of-week CSV write removed for the same
  reason. Per-day per-contract /seed (added in this session) is the
  accumulating feedback path now.
- search.html addWorkerInsight renders a green "Endorsed · N playbooks"
  chip with playbook citations when boost > 0.

Internal Dioxus UI (crates/ui/):
- Dashboard phase list rewritten through Phase 19 (was stuck at "Phase
  16: File Watcher" / "Phase 17: DB Connector" — both wrong).
- Removed fabricated "27ms" stat label.
- Ask tab examples + SQL default replaced with real staffing prompts
  against candidates/clients/job_orders (was referencing nonexistent
  employees/products/events).
- New Playbook tab exposes /vectors/playbook_memory/{stats,rebuild} and
  side-by-side hybrid search (boost OFF vs ON) with citations.

Tests (tests/multi-agent/):
- run_e2e_rated.ts: parallel two-agent (mistral + qwen2.5) build phase
  + verifier rating (geo, auth, persist, boost, speed → /10).
- network_proving.ts: continuous build → verify → repeat with
  staffing-recruiter profile hot-swap; geo-discrimination check.
- chain_of_custody.ts: single recruiter operation traced through every
  layer (Bun /search, direct /vectors/hybrid parity, /log, SQL,
  playbook_memory growth, profile activation, post-op boost lift).
2026-04-20 06:21:13 -05:00
root
fdb2e9cda8 Fix browser crash: cache schema context, lazy Dashboard, default to Ask tab
Root cause: Dashboard auto-fired 6+ API calls on load, then Ask tab fired
7 DESCRIBE queries per question — 15+ concurrent requests from WASM.

Fixes:
- Schema context cached after first build (7 DESCRIBE → 0 on subsequent questions)
- Dashboard lazy-loads only when tab clicked (not on app mount)
- Default tab changed back to Ask (no background API storm)
- std::sync::Mutex for WASM compat (no tokio in browser)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 21:18:53 -05:00
root
238cb84d26 Server-side pagination for large result sets
- ResultStore: execute query, store batches server-side, serve pages on demand
- POST /query/paged → returns query_id + total_rows + page count (no rows)
- GET /query/page/{id}/{page}?size=100 → returns one page of rows
- RecordBatch slicing for efficient page extraction from Arrow batches
- LRU eviction: keeps 50 most recent query results in memory
- Tested: 100K rows → 1,000 pages of 100, any page fetchable by number
- Supervisor pattern: chunk results, serve on demand, retry-safe (idempotent GET)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:54:44 -05:00
root
ed17216005 Fix browser crash: limit schema context + cap table rows at 200
- Schema context limited to 7 core staffing tables (was all 12+)
- Results table capped at 200 rows to prevent DOM explosion
- Shows "first 200 of N rows" when truncated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:48:32 -05:00
root
0bd753294b Robust SQL extraction: handles explanations, markdown, prefixes
clean_sql now uses 3 strategies in priority order:
1. Extract from ```sql...``` markdown blocks
2. Find first SELECT/WITH/INSERT statement in text
3. Strip leading "sql" keyword fallback

Tested against 5 real model output patterns:
- Clean SQL ✓
- "sql" prefixed ✓
- Markdown fenced ✓
- Explanation before ```sql block ✓
- Explanation with SELECT buried in text ✓

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:42:11 -05:00
root
34c03894ae Auto-retry on ALL SQL errors, not just schema errors
Previous: only retried on "Schema error" or "No field named"
Now: retries on any error (type mismatches, execution errors, etc.)
Model gets full error message + schema to write corrected SQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:38:27 -05:00
root
ddcbb9590c Fix SQL generation: clean_sql helper + relationship hints + verified
- clean_sql() strips markdown fences, leading "sql" keyword, trailing explanations
- Schema context now includes table relationships (JOIN paths)
- Explicit note: "vertical only in candidates/clients/job_orders, JOIN for others"
- Full column paths (table.column) in schema to reduce ambiguity
- Auto-retry on schema errors feeds error + schema back to model
- TESTED: 4 questions all return correct results:
  "highest avg salary" → IT $2,213 ✓
  "top 5 earning over $50/hr" → correct candidates ✓
  "most placements by vertical" → Industrial 10,096 ✓
  "revenue by client" → 1,996 clients ✓

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:36:55 -05:00
root
2c5aeaeada Fix SQL generation: stricter prompt + auto-retry on schema errors
- Prompt now says "CRITICAL: ONLY use columns from schema, do NOT invent"
- Strips markdown backticks from model output
- Auto-retry: if SQL fails with "Schema error" or "No field named",
  feeds the error + schema back to the model for a corrected query
- Both button click and Enter key paths have retry logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:32:25 -05:00
root
399fc81ab5 UI: Dashboard + Ingest tabs showing full system progression
- Dashboard: live stats (datasets, rows, embeddings, HNSW, tools, cache)
  Architecture overview (6 capability areas)
  Build progression timeline (all 17 phases listed)
- Ingest tab: Postgres table browser + import, file upload info, inbox watcher
- System tab: existing health checks
- Starts on Dashboard for immediate overview
- No futures::executor in WASM — all async/await

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 20:22:36 -05:00
root
b37e171e10 UI redesign: Ask, Explore, SQL, System tabs
- Ask: natural language → AI generates SQL → DataFusion executes → results
  Shows the AI-over-data-lake story: schema introspection → LLM → query
- Explore: click dataset → schema + preview + AI-generated summary
- SQL: raw DataFusion SQL editor with Ctrl+Enter
- System: health grid testing all 5 services + embeddings + generation
- Example prompts for quick demo
- Dark theme with accent styling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:24:51 -05:00
root
b235ef9201 Fix nginx route collision — namespace lakehouse API under /lakehouse/api/
Previous regex routes for /catalog, /storage, /health intercepted main site.
Now all lakehouse API calls go through /lakehouse/api/ prefix, stripped by nginx rewrite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:57:58 -05:00
root
387ce0074c UI: full-stack test coverage with tabs for Query, Storage, AI, Status
- Query tab: SQL editor with results table (existing)
- Storage tab: list objects, register datasets pointing at storage keys
- AI tab: embed (nomic-embed-text), generate (qwen2.5), rerank with scored results
- Status tab: health checks for all 5 services + functional tests (embed, generate, SQL)
- nginx: added /lakehouse/ and API proxy paths to devop.live config
- Loaded 3 sample datasets: employees, events, products
- Fixed Rust 2024 reserved keyword `gen`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:56:18 -05:00
root
50a8c8013f Phase 4: Dioxus frontend with dataset browser and SQL query editor
- ui: Dioxus WASM app with dataset sidebar, SQL editor (Ctrl+Enter), results table
- ui: dynamic API base URL (same-origin for nginx, port-based for local dev)
- gateway: CORS enabled for cross-origin requests
- nginx: lakehouse.devop.live proxies UI (:3300) + API (:3100) on same origin
- justfile: ui-build, ui-serve, sidecar, up commands added

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 06:24:15 -05:00