lakehouse

Author	SHA1	Message	Date
root	a001a21902	MCP self-orientation: /context + /verify + architecture resources Any agent (Claude Code via MCP stdio, or sub-agents via HTTP :3700) can now self-orient without human explanation: GET /context returns: - System purpose and name - All datasets with row counts - All vector indexes with backends - Available models and their strengths - Complete tool list with rules - Current VRAM state POST /verify fact-checks any claim about a worker against the golden data. Agent says "worker 1313 is a Forklift Operator in IL with reliability 0.82" → endpoint returns verified=true/false with exact discrepancies. MCP resources (stdio path for Claude Code): - lakehouse://system — live system status - lakehouse://architecture — full PRD - lakehouse://instructions — agent operating manual - lakehouse://playbooks — successful operations database - lakehouse://datasets — dataset listing This is the "command and control" layer J asked for: any agent connecting to this system gets the context it needs to operate independently. No human intermediary required. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 00:41:46 -05:00
root	67ab6e4bac	Langfuse observability — every LLM call traced and scored Langfuse v2.95.11 running on :3001 (Docker + Postgres). Login: j@lakehouse.local / lakehouse2026 tracing.ts: startTrace → logGeneration/logRetrieval/logSpan → scoreTrace → flush. Every hybrid search, SQL generation, RAG pipeline, and co-pilot briefing gets a full trace: model, prompt, output, latency, tokens. The observer can now score traces based on verification results — Langfuse aggregates accuracy over time so we can see which models and approaches actually work in production, not just in tests. Services: lakehouse(:3100) + sidecar(:3200) + agent(:3700) + observer + langfuse(:3001) + minio(:9000) + mariadb(:3306) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 00:38:21 -05:00
root	b532ae61f1	Agent gateway + observer — autonomous internal operation Three new systemd services: - lakehouse-agent (:3700) — REST gateway wrapping all lakehouse tools. Clean JSON in/out, no protocol complexity. 9 endpoints: /search, /sql, /match, /worker/:id, /ask, /log, /playbooks, /profile/:id, /vram - lakehouse-observer — watches operations, logs to lakehouse, asks local model to diagnose failure patterns, consolidates successful patterns into playbooks every 5 cycles - Stdio MCP transport preserved for Claude Code integration AGENT_INSTRUCTIONS.md: complete operating manual for sub-agents. Rules: never hallucinate, SQL first for structured questions, hybrid for matching, log every success, check playbooks before complex tasks. Observer loop: observed() wrapper timestamps + persists every gateway call → error analyzer reads failures + asks LLM for diagnosis → playbook consolidator groups successes by endpoint pattern All three designed for zero human intervention — agents operate, observer watches, playbooks accumulate, iteration happens internally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 00:00:08 -05:00
root	e1d48d3c8f	MCP server (Bun) + 100K worker generator + lakehouse integration MCP server at mcp-server/index.ts — 9 tools exposing the full lakehouse to any MCP-compatible model: search_workers (hybrid SQL+vector), query_sql, match_contract, get_worker, rag_question, log_success, get_playbooks, swap_profile, vram_status The "successful playbooks" pattern: log_success writes outcomes back to the lakehouse as a queryable dataset. Small models call get_playbooks to learn what approaches worked for similar tasks — no retraining needed, just data. generate_workers.py scales to 100K+ with realistic distributions: - 20 roles weighted by staffing industry frequency - 44 real Midwest/South cities across 12 states - Per-role skill pools (warehouse/production/machine/maintenance) - 13 certification types with realistic probability - 8 behavioral archetypes with score distributions - SMS communication templates (20 patterns) 100K worker dataset ingested: 70MB CSV → Parquet in 1.1s. Verified: 11K forklift ops, 27K in IL, archetype distribution matches weights. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 23:54:33 -05:00

4 Commits