lakehouse

profit/lakehouse

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	24b06d80b2	mcp: register gitea-mcp server — closes Phase 40 repo-ops gap Phase 40 PRD (docs/CONTROL_PLANE_PRD.md:91) claimed: "Gitea MCP reconnect — the MCP server binary still installed at /home/profit/.bun/install/cache/gitea-mcp@0.0.10/ gets wired into mcp-server/index.ts tool registry." The PHASES.md checkbox marked this done, but audit found: - gitea-mcp binary exists in bun cache (verified) - Zero references to gitea/list_prs/open_pr in mcp-server/index.ts - No entry for "gitea" in .mcp.json The PRD's architectural description ("wired into mcp-server/index.ts tool registry") is conceptually wrong — gitea-mcp is a peer MCP server that the MCP host (Claude Code) connects to directly, not a library to import. Correct wiring: register it in .mcp.json so Claude Code spawns both lakehouse's MCP server AND gitea-mcp as separate children, each exposing their own tools. This commit adds the "gitea" entry to .mcp.json pointing at bunx gitea-mcp with GITEA_HOST set to git.agentview.dev. OPERATOR STEP (one-time): before restarting Claude Code, generate a personal access token at https://git.agentview.dev/user/settings/ applications and replace the SET_ME_... placeholder in GITEA_ACCESS_TOKEN. Token needs at minimum `read:repository, write:issue, read:user` scopes for list_prs/open_pr/comment_on_issue. Still open from Phase 40 (not in this commit, bigger scope): - crates/aibridge/src/providers/gemini.rs (claimed, missing) - crates/aibridge/src/providers/claude.rs (claimed, missing) These are ~100-200 lines each (full HTTP adapter + auth + request shape mapping). Flag as follow-up commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:19:46 -05:00
root	a001a21902	MCP self-orientation: /context + /verify + architecture resources Any agent (Claude Code via MCP stdio, or sub-agents via HTTP :3700) can now self-orient without human explanation: GET /context returns: - System purpose and name - All datasets with row counts - All vector indexes with backends - Available models and their strengths - Complete tool list with rules - Current VRAM state POST /verify fact-checks any claim about a worker against the golden data. Agent says "worker 1313 is a Forklift Operator in IL with reliability 0.82" → endpoint returns verified=true/false with exact discrepancies. MCP resources (stdio path for Claude Code): - lakehouse://system — live system status - lakehouse://architecture — full PRD - lakehouse://instructions — agent operating manual - lakehouse://playbooks — successful operations database - lakehouse://datasets — dataset listing This is the "command and control" layer J asked for: any agent connecting to this system gets the context it needs to operate independently. No human intermediary required. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 00:41:46 -05:00
root	e1d48d3c8f	MCP server (Bun) + 100K worker generator + lakehouse integration MCP server at mcp-server/index.ts — 9 tools exposing the full lakehouse to any MCP-compatible model: search_workers (hybrid SQL+vector), query_sql, match_contract, get_worker, rag_question, log_success, get_playbooks, swap_profile, vram_status The "successful playbooks" pattern: log_success writes outcomes back to the lakehouse as a queryable dataset. Small models call get_playbooks to learn what approaches worked for similar tasks — no retraining needed, just data. generate_workers.py scales to 100K+ with realistic distributions: - 20 roles weighted by staffing industry frequency - 44 real Midwest/South cities across 12 states - Per-role skill pools (warehouse/production/machine/maintenance) - 13 certification types with realistic probability - 8 behavioral archetypes with score distributions - SMS communication templates (20 patterns) 100K worker dataset ingested: 70MB CSV → Parquet in 1.1s. Verified: 11K forklift ops, 27K in IL, archetype distribution matches weights. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 23:54:33 -05:00

Author

SHA1

Message

Date

root

24b06d80b2

mcp: register gitea-mcp server — closes Phase 40 repo-ops gap

Phase 40 PRD (docs/CONTROL_PLANE_PRD.md:91) claimed:
  "Gitea MCP reconnect — the MCP server binary still installed at
   /home/profit/.bun/install/cache/gitea-mcp@0.0.10/ gets wired into
   mcp-server/index.ts tool registry."

The PHASES.md checkbox marked this done, but audit found:
  - gitea-mcp binary exists in bun cache (verified)
  - Zero references to gitea/list_prs/open_pr in mcp-server/index.ts
  - No entry for "gitea" in .mcp.json

The PRD's architectural description ("wired into mcp-server/index.ts
tool registry") is conceptually wrong — gitea-mcp is a peer MCP server
that the MCP host (Claude Code) connects to directly, not a library
to import. Correct wiring: register it in .mcp.json so Claude Code
spawns both lakehouse's MCP server AND gitea-mcp as separate children,
each exposing their own tools.

This commit adds the "gitea" entry to .mcp.json pointing at bunx
gitea-mcp with GITEA_HOST set to git.agentview.dev.

OPERATOR STEP (one-time): before restarting Claude Code, generate a
personal access token at https://git.agentview.dev/user/settings/
applications and replace the SET_ME_... placeholder in
GITEA_ACCESS_TOKEN. Token needs at minimum `read:repository,
write:issue, read:user` scopes for list_prs/open_pr/comment_on_issue.

Still open from Phase 40 (not in this commit, bigger scope):
  - crates/aibridge/src/providers/gemini.rs (claimed, missing)
  - crates/aibridge/src/providers/claude.rs (claimed, missing)
These are ~100-200 lines each (full HTTP adapter + auth + request
shape mapping). Flag as follow-up commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-24 13:19:46 -05:00

root

a001a21902

MCP self-orientation: /context + /verify + architecture resources

Any agent (Claude Code via MCP stdio, or sub-agents via HTTP :3700)
can now self-orient without human explanation:

GET /context returns:
  - System purpose and name
  - All datasets with row counts
  - All vector indexes with backends
  - Available models and their strengths
  - Complete tool list with rules
  - Current VRAM state

POST /verify fact-checks any claim about a worker against the golden
data. Agent says "worker 1313 is a Forklift Operator in IL with
reliability 0.82" → endpoint returns verified=true/false with exact
discrepancies.

MCP resources (stdio path for Claude Code):
  - lakehouse://system — live system status
  - lakehouse://architecture — full PRD
  - lakehouse://instructions — agent operating manual
  - lakehouse://playbooks — successful operations database
  - lakehouse://datasets — dataset listing

This is the "command and control" layer J asked for: any agent
connecting to this system gets the context it needs to operate
independently. No human intermediary required.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-17 00:41:46 -05:00

root

e1d48d3c8f

MCP server (Bun) + 100K worker generator + lakehouse integration

MCP server at mcp-server/index.ts — 9 tools exposing the full
lakehouse to any MCP-compatible model:
  search_workers (hybrid SQL+vector), query_sql, match_contract,
  get_worker, rag_question, log_success, get_playbooks,
  swap_profile, vram_status

The "successful playbooks" pattern: log_success writes outcomes
back to the lakehouse as a queryable dataset. Small models call
get_playbooks to learn what approaches worked for similar tasks —
no retraining needed, just data.

generate_workers.py scales to 100K+ with realistic distributions:
  - 20 roles weighted by staffing industry frequency
  - 44 real Midwest/South cities across 12 states
  - Per-role skill pools (warehouse/production/machine/maintenance)
  - 13 certification types with realistic probability
  - 8 behavioral archetypes with score distributions
  - SMS communication templates (20 patterns)

100K worker dataset ingested: 70MB CSV → Parquet in 1.1s. Verified:
11K forklift ops, 27K in IL, archetype distribution matches weights.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-16 23:54:33 -05:00

3 Commits