lakehouse

Author	SHA1	Message	Date
root	d277efbfd2	v1/mode: task_class → mode/model router (decision-only, phase 1) Some checks failed lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts HANDOVER §queued (2026-04-25): "Mode router — port LLM Team multi-model patterns. Pick the right TOOL/MODE for each task class via the matrix, not cascade through models." Two-stage architecture: 1. Decision (POST /v1/mode) — pure recommendation, no execution. Returns {mode, model, decision: {source, fallbacks, matrix_corpus, notes}} so callers see WHY this mode was picked. 2. Execution (future POST /v1/mode/execute) — proxy to LLM Team /api/run for modes not yet ported to native Rust runners. Not wired in this phase. Splitting decision from execution lets us A/B-test the routing logic without committing to running every recommendation. The decision function is pure enough for exhaustive unit tests (3 added). config/modes.toml — initial map for 5 task_classes (scrum_review, contract_analysis, staffing_inference, fact_extract, doc_drift_check) + a default. matrix_corpus per task is reserved for the future matrix-informed routing pass. VALID_MODES list (24 modes) is kept in sync manually with LLM Team's /api/run handler at /root/llm_team_ui.py:10581. Adding a mode here without adding it upstream returns 400 from a future proxy. GET /v1/mode/list — operator introspection so a UI can render the registry table without re-parsing TOML. Live-tested: 5 task classes match, unknown classes fall through to default, force_mode override works + validates, bogus modes return 400 with the valid_modes list. Updates reference_llm_team_modes.md memory — earlier note claiming "only extract is registered" was wrong (all 25 are registered). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:16:32 -05:00
root	2f1b9c9768	phase-39+41: land promised artifacts — providers.toml, activation.rs, profiles/ Three PRD gaps closed in one coherent batch — all were cosmetic or scaffold-shaped, now real files: Phase 39 (PRD:57): + config/providers.toml — provider registry (name/base_url/auth/ default_model) for ollama, ollama_cloud, openrouter. Commented stubs for gemini + claude pending adapter work. Secrets stay in /etc/lakehouse/secrets.toml or env, NEVER inline. Phase 41 (PRD:115): + crates/vectord/src/activation.rs — ActivationTracker with the PRD-named single-flight guard ("refuse new activation if one is pending/running"). Per-profile granularity — activating A doesn't block B. 5 tests cover the full state machine. Handler body stays in service.rs for now; tracker usage integration is a follow-up. Phase 41 (PRD:113): + crates/shared/src/profiles/ with 4 submodules: * execution.rs — `pub use crate::types::ModelProfile as ExecutionProfile` (backward-compat rename per PRD) * retrieval.rs — top_k, rerank_top_k, freshness cutoff, playbook boost, sensitivity-gate enforcement * memory.rs — playbook boost ceiling, history cap, doc staleness, auto-retire-on-failure * observer.rs — failure cluster size, alert cooldown, ring size, langfuse forwarding All fields `#[serde(default)]` so existing ModelProfile files load unchanged. Still open from the same phases: - Gemini + Claude provider adapters (Phase 40 — 100-200 LOC each) - Full activate_profile handler extraction into activation.rs (Phase 41 — module-structure refactor) - Catalogd CRUD endpoints for retrieval/memory/observer profiles (Phase 41 — exists at list level, no create/update/delete yet) - truth/ repo-root directory for file-backed rules (Phase 42 — TOML loader + schema) - crates/validator crate (Phase 43 — full greenfield) Workspace warnings still at 0. 5 new tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:32:40 -05:00
root	55f8e0fe6e	Phase 40: Routing Engine + Policy - RoutingEngine with RouteDecision (model_pattern → provider) - config/routing.toml: rules, fallback chain, cost gating - Per-provider Usage tracking in /v1/usage response - 12 gateway tests green	2026-04-23 02:36:45 -05:00
root	0c4868c191	qwen3.5 executor + continuation primitive + think:false Three coupled fixes that together turned the Riverfront Steel scenario from 0/5 (mistral) to 4/5 (qwen3.5) with T3 flagging real staffing concerns rather than linter advice. MODEL SWAP - Executor: mistral → qwen3.5:latest (9.7B, 262K ctx, thinking). mistral's decoder emitted malformed JSON on complex SQL filters regardless of prompt; J called it — stop using mistral. - Reviewer: qwen2.5 → qwen3:latest (40K ctx) - Applied to scenario.ts, orchestrator.ts, network_proving.ts, run_e2e_rated.ts CONTINUATION PRIMITIVE (agent.ts) - generateContinuable(): empty-response → geometric backoff retry; truncated-JSON → continue from partial as scratchpad; bounded by budget cap + max_continuations. No more "bump max_tokens until it stops truncating" tourniquet. - generateTreeSplit(): map-reduce for oversized input corpora with running scratchpad digest, reduce pass for final synthesis. - Empty text no longer throws — it's a signal to continuable that thinking ate the budget. think:false FOR HOT PATH - qwen3.5 burned ~650 tokens of hidden thinking for trivial JSON emission. For executor/reviewer/draft: think:false. For T3/T4/T5 overseers: thinking stays on (that's the point). - Sidecar generate endpoint accepts `think` bool, passes through to Ollama's /api/generate. VERIFIED OUTCOMES Riverfront Steel 2026-04-21, qwen3.5+continuable+think:false: 08:00 baseline_fill 3/3 4 turns 10:30 recurring 2/2 3 turns (1 playbook citation) 12:15 expansion 0/5 drift-aborted (5-fill orchestration problem, separate work) 14:00 emergency 4/4 3 turns (1 citation) 15:45 misplacement 1/1 3 turns → T3 caught Patrick Ross double-booking across events → T3 flagged forklift cert drift on the event that failed → Cross-day lesson proposed "maintain buffer of ≥3 emergency candidates, pre-fetch certs for expansion, booking system cross-check" — real staffing advice, not generic linter output PRD PHASE 21 rewritten to reflect the actual primitive shape (two- call map-reduce with scratchpad glue) instead of the tourniquet approach originally documented. Rust port queued for next sprint. scripts/ab_t3_test.sh: A/B harness that chains B→C→D runs and emits tests/multi-agent/playbooks/ab_scorecard.json.	2026-04-20 20:19:02 -05:00
root	6e7ca1830e	Phase 21 foundation — context stability + chunking pipeline PRD: add Phase 20 (model matrix, wired) and Phase 21 (context stability, partial). Phase 21 exists because LLM Team hit this exact wall — running multi-model ranking on large context silently truncated, rankings degraded, no pipeline caught it. The stable answer: every agent call goes through a budget check against the model's declared context_window minus safety_margin, with a declared overflow_policy when the check fails. config/models.json: - context_window + context_budget per tier - overflow_policies block: summarize_oldest_tool_results_via_t3, chunk_lessons_via_cosine_topk, two_pass_map_reduce, escalate_to_kimi_k2_1t_or_split_decision - chunking_cache spec (data/_chunk_cache/, corpus-hash keyed) agent.ts: - estimateTokens() chars/4 biased safe ~15% - CONTEXT_WINDOWS table (fallback; prod reads models.json) - assertContextBudget() — throws on overflow with exact numbers, can bypass with bypass_budget:true for callers with their own policy - Wired into generate() and generateCloud() so EVERY call is checked scenario.ts: - T3 lesson archive to data/_playbook_lessons/*.json (the old /vectors/playbook_memory/seed path was silently failing with HTTP 400 because it requires 'fill: Role xN in City, ST' operation shape) - loadPriorLessons() at scenario start — filters by city/state match, date-sorted, takes top-3 - prior_lessons.json archived per-run (honest signal for A/B) - guidanceFor() injects up to 2 prior lessons (≤500 chars each) into the executor's per-event context - Retrospective shows explicit "Prior lessons loaded: N" line Verified: mistral correctly rejects a 150K-char prompt (7532 tokens over), gpt-oss:120b accepts it with 90K headroom. The enforcement is in-band on every call now, not an afterthought. Full chunking service (Rust) remains deferred to the sprint this feeds: crates/aibridge/src/budget.rs + chunk.rs + storaged/chunk_cache.rs	2026-04-20 19:34:44 -05:00
root	03d723e7e6	Model matrix — 5 tiers, local hard workers + cloud overseers config/models.json is the authoritative catalog. Hot path (T1/T2) stays local; cloud is consulted only for overview (T3), strategic (T4), and gatekeeper (T5) calls. J named qwen3.5 + newer models (minimax-m2.7, glm-5, qwen3-next) specifically — all mapped with real reachable IDs verified against ollama.com/api/tags. Tier shape: - t1_hot mistral + qwen2.5 local — 50-200 calls/scenario - t2_review qwen2.5 + qwen3 local — 5-14 calls/event - t3_overview gpt-oss:120b cloud — 1-3 calls/scenario - t4_strategic qwen3.5:397b + glm-4.7 — 1-10 calls/day - t5_gatekeeper kimi-k2-thinking — 1-5 calls/day, audit-logged Rate budgets are declared in-config — Ollama Cloud paid tier is generous but we cap overview/strategic/gatekeeper so no single rogue scenario can blow the day's quota. Experimental rotation list wired but disabled by default. When enabled, T4 randomly routes 10% of calls to a rotating minimax/GLM/qwen-next/ deepseek/nemotron/cogito/mistral-large candidate, logs comparisons, and auto-promotes after 3 rotations of wins. Playbook versioning SPEC embedded under `playbook_versioning` key: every seed gets version + parent_id + retired_at + architecture_snapshot, so when a schema migration breaks a playbook we can pinpoint which change retired it. Implementation flagged for next sprint (touches gateway + catalogd + mcp-server) — not wired here. - scenario.ts now loads config/models.json at init, env vars still override - mcp-server exposes /models/matrix read-only so UI can render it	2026-04-20 19:24:41 -05:00

6 Commits