37 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
bc698eb6da |
gateway: OpenCode (Zen + Go) provider adapter
Wires opencode.ai as a /v1/chat provider. One sk-* key reaches 40
models across Anthropic, OpenAI, Google, Moonshot, DeepSeek, Zhipu,
Alibaba, Minimax — billed against either the user's Zen balance
(pay-per-token premium models) or Go subscription (flat-rate
Kimi/GLM/DeepSeek/etc.). The unified /zen/v1 endpoint routes both;
upstream picks the billing tier based on model id.
Notable adapter quirks:
- Strip "opencode/" prefix on outbound (mirrors openrouter/kimi
pattern). Caller can use {provider:"opencode", model:"X"} or
{model:"opencode/X"}.
- Drop temperature for claude-*, gpt-5*, o1/o3/o4 models. Anthropic
and OpenAI's reasoning lineage rejects temperature with 400
"deprecated for this model". OCChatBody now serializes temperature
as Option<f64> with skip_serializing_if so omitting it produces
clean JSON.
- max_tokens.filter(|&n| n > 0) catches Some(0) — defensive after
the same trap bit kimi.rs (empty env -> Number("") -> 0 -> 503).
- 600s default upstream timeout; reasoning models on big audit
prompts legitimately take 3-5 min. Override OPENCODE_TIMEOUT_SECS.
Key handling:
- /etc/lakehouse/opencode.env (0600 root) loaded via systemd
EnvironmentFile. Same pattern as kimi.env.
- OPENCODE_API_KEY env first, file scrape as fallback.
Verified end-to-end:
opencode/claude-opus-4-7 -> "I'm Claude, made by Anthropic."
opencode/kimi-k2.6 -> PONG-K26-GO
opencode/deepseek-v4-pro -> PONG-DS-V4
opencode/glm-5.1 -> PONG-GLM
opencode/minimax-m2.5-free -> PONG-FREE
Pricing reference (per audit @ ~14k in / 6k out):
claude-opus-4-7 ~$0.22 (Zen)
claude-haiku-4-5 ~$0.04 (Zen)
gpt-5.5-pro ~$1.50 (Zen)
gemini-3-flash ~$0.03 (Zen)
kimi-k2.6 / glm / deepseek / qwen / minimax / mimo: covered by Go
subscription ($10/mo, $60/mo cap).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
643dd2d520 |
gateway: direct Kimi For Coding provider adapter (api.kimi.com)
Wires kimi-for-coding (Kimi K2.6 underneath) as a first-class /v1/chat
provider so consumers can target it via {provider:"kimi"} or model
prefix kimi/<model>. Bypasses the upstream-broken kimi-k2:1t on Ollama
Cloud and the rate-limited moonshotai/kimi-k2.6 path through OpenRouter.
Adapter shape mirrors openrouter.rs (OpenAI-compatible Chat Completions).
Differences from generic OpenAI providers:
- api.kimi.com is a SEPARATE account system from api.moonshot.ai and
api.moonshot.cn. sk-kimi-* keys are NOT interchangeable across them.
- Endpoint is User-Agent-gated to "approved" coding agents (Kimi CLI,
Claude Code, Roo Code, Kilo Code, ...). Requests from generic clients
return 403 access_terminated_error. Adapter sends User-Agent:
claude-code/1.0.0. Per Moonshot TOS this is a tampering-class action
that may result in seat suspension; J authorized 2026-04-27 with
awareness of the risk.
- kimi-for-coding is a reasoning model — reasoning_content counts
against max_tokens. Default 800-token budget yields empty visible
content with finish_reason=length. Code-review workloads need
max_tokens >= 1500.
- Default 600s upstream timeout (vs 180s for openrouter.rs) — code
audits with full file context legitimately take 3-5 minutes.
Override via KIMI_TIMEOUT_SECS env.
Key handling:
- /etc/lakehouse/kimi.env (0600 root) loaded via systemd EnvironmentFile
- KIMI_API_KEY env first, then file scrape as fallback
- /etc/systemd/system/lakehouse.service NOT included in this commit
(system file outside repo); operator must add EnvironmentFile=-
/etc/lakehouse/kimi.env to the lakehouse.service unit
NOT in scrum_master_pipeline LADDER. The 9-rung ladder is for
unattended automatic recovery; placing Kimi there would hammer a
TOS-gated endpoint with hostility-policy potential. Kimi is
addressable via /v1/chat for explicit invocations only — auditor
integration in a follow-up commit.
Verification:
cargo check -p gateway --tests compiles
curl /v1/chat provider=kimi 200 OK, content="PONG"
curl /v1/chat model="kimi/kimi-for-coding" 200 OK (prefix routing)
Kimi audit on distillation last-week 7/7 grounded findings
(reports/kimi/audit-last-week-full.md)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
951c6014ec |
gateway: boot-time probe of truth/ file-backed rules
Phase 42 PRD deliverable de8fb10 landed the file loader + 2 rule
files. This commit wires the loader into gateway startup so the
rules actually get READ at boot — catches parse errors and
duplicate-ID collisions before the first request hits, rather than
"silently 0 rules loaded."
Scope is deliberately narrow — a probe, not full plumbing:
- Reads LAKEHOUSE_TRUTH_DIR env override, defaults to
/home/profit/lakehouse/truth
- Skips silently with a debug log if the dir is absent
- Loads rules on top of default_truth_store() into a throwaway
store, logs the count (or the error)
- Does NOT yet replace the per-request default_truth_store() in
execution_loop or v1/chat. That plumbing needs a V1State.truth
field + passing it through the request context, which is a
separate scope.
Why the separation matters: this commit gives ops + me a visible
boot-time signal ("truth: loaded 3 file-backed rule(s)") that the
loader + files work end-to-end. The next commit can confidently
swap per-request stores without wondering whether the parsing even
succeeds.
Workspace warnings still at 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6532938e85 |
gateway/tools: truth gate for model-provided SQL (iter 11 CF-1+CF-2)
Scrum iter 11 flagged crates/gateway/src/tools/service.rs with two
95%-confidence critical failures:
CF-1: "Direct SQL execution from model-provided parameters without
explicit validation or sanitization" (line 68, 95% conf)
CF-2: "No permission check performed before executing SQL query;
access control is bypassed entirely" (line 102, 90% conf)
CF-1 is the real one — same security gap as queryd /sql had before
P42-002 (9cc0ceb). Tool invocations build SQL from a template +
model-provided params, then state.query_fn.execute(&sql) runs it.
No truth-gate check between build and execute meant an adversarial
model could emit DROP TABLE / DELETE FROM / TRUNCATE inside a param
and bypass queryd's gate by routing through the tool surface instead.
Fix mirrors the queryd SQL gate exactly:
- ToolState grows an Arc<TruthStore> field
- main.rs constructs it via truth::sql_query_guard_store()
(shared default — same destructive-verb block as queryd)
- call_tool evaluates the built SQL against "sql_query" task class
BEFORE executing
- Any Reject/Block outcome → 403 FORBIDDEN + log_invocation row
marked success=false with the rule message
CF-2 (access control) is P13-001 territory — needs AccessControl
wiring into queryd first, still open. Flagged in memory.
Workspace warnings still at 0. Pattern is now:
queryd /sql → truth::sql_query_guard_store (9cc0ceb)
gateway /tools → truth::sql_query_guard_store (this commit)
execution_loop → truth::default_truth_store (51a1aa3)
All three surfaces that pipe SQL or spec-shaped data through to the
substrate now gate it. Any new SQL-executing surface should follow
the same pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
0b3bd28cf8 |
phase-40: Gemini + Claude provider adapters
Phase 40 PRD (docs/CONTROL_PLANE_PRD.md:82-83) listed: - crates/aibridge/src/providers/gemini.rs - crates/aibridge/src/providers/claude.rs Neither existed. Landing both now, in gateway/src/v1/ (matches the existing ollama.rs + openrouter.rs sibling pattern — aibridge's providers/ is for the adapter *trait* abstractions, v1/ holds the concrete /v1/chat dispatchers that know the wire format). gemini.rs: - POST https://generativelanguage.googleapis.com/v1beta/models/ {model}:generateContent?key=<API_KEY> - Auth: query-string key (not bearer) - Maps messages → contents+parts (Gemini's wire shape), extracts from candidates[0].content.parts[0].text - 3 tests: key resolution, body serialization (camelCase generationConfig + maxOutputTokens), prefix-strip claude.rs: - POST https://api.anthropic.com/v1/messages - Auth: x-api-key header + anthropic-version: 2023-06-01 - Carries system prompt in top-level `system` field (not messages[]). Extracts from content[0].text where type=="text" - 4 tests: key resolution, body serialization with/without system field, prefix-strip v1/mod.rs: + V1State.gemini_key + claude_key Option<String> + resolve_provider() strips "gemini/" and "claude/" prefixes + /v1/chat dispatcher handles "gemini" + "claude"/"anthropic" + 2 new resolve_provider tests (prefix + strip per adapter) main.rs: + Construct both keys at startup via resolve_*_key() helpers. Missing keys log at debug (not warn) since these are optional providers — unlike OpenRouter which is the rescue rung. Every /v1/chat error path mirrors the existing pattern: - 503 SERVICE_UNAVAILABLE when key isn't configured - 502 BAD_GATEWAY with the provider's error text when the upstream call fails - Response shape always the OpenAI-compatible ChatResponse Workspace warnings still at 0. 9 new tests pass. Pre-existing test failure `executor_prompt_includes_surfaced_ candidates` at execution_loop/mod.rs:1550 is unrelated (fails on pristine HEAD too — PR fixture divergence). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
81bae108f4 |
gateway/tools: collapse ToolRegistry::new() and new_with_defaults() into one
Some checks failed
lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Two constructors existed with a subtle trap:
- `new()` had `#[allow(dead_code)]` and called `register_defaults()`
via `tokio::task::block_in_place(...)` — a sync wrapper hack around
an async method, fragile and unused.
- `new_with_defaults()` was misleadingly named — it created the empty
registry WITHOUT registering defaults, despite the name.
main.rs was doing the right thing: `new_with_defaults()` + explicit
`.register_defaults().await`. The misleading name was a landmine
for future callers.
Fix: delete the dead `new()` with its block_in_place hack, rename
`new_with_defaults()` → `new()` (Rust idiom — `new` is the canonical
constructor), add a docstring that says what you need to do after.
Single clear API.
Update the one caller in main.rs. Workspace warnings still at 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2f8b347f37 |
pathway_memory: consensus-designed sidecar + hot-swap learning loop
Some checks failed
lakehouse/auditor 11 warnings — see review
10-probe N=3 consensus (kimi-k2:1t / gpt-oss:120b / qwen3.5:latest /
deepseek-v3.1:671b / qwen3-coder:480b / mistral-large-3:675b /
qwen3.5:397b + 2 stability re-probes; 2 openrouter probes 429'd) locked
the design across three rounds. Full JSON responses in
data/_kb/consensus_reducer_design_{mocq3akn,mocq6pi1,mocqatik}.json.
What it does
Preserves FULL backtrack context per reviewed file (ladder attempts +
latencies + reject reasons, KB chunks with provenance + cosine + rank,
observer signals, context7 bridge hits, sub-pipeline calls, audit
consensus) and indexes them by narrow fingerprint for hot-swap of
proven review pathways.
When scrum reviews a file:
1. narrow fingerprint = task_class + file_prefix + signal_class
2. query_hot_swap checks pathway memory for a match that passes
probation (≥3 replays @ ≥80% success) + audit gate + similarity
(≥0.90 cosine on normalized-metadata-token embedding)
3. if hot-swap eligible, recommended model tried first in the ladder
4. replay outcome reported back, updating the pathway's success_rate
5. pathways below 0.80 after ≥3 replays retire permanently (sticky)
6. full PathwayTrace always inserted at end of review — hot-swap
grows with use, it doesn't bootstrap from nothing
Gate design is load-bearing:
- narrow fingerprint (6 of 8 consensus models converged on the same
3-field composition; lock) — enables generalization within crate
- probation ≥3 replays — binomial tail at 80% is ~5%, below is noise
- success rate ≥0.80 — mistral + qwen3-coder independently proposed
this exact threshold across two rounds
- similarity ≥0.90 — middle of the 0.85/0.95 consensus spread
- bootstrap: null audit_consensus ALLOWED (auditor → pathway update
not wired yet; probation + success_rate gates alone enforce safety
during bootstrap; explicit audit FAIL still blocks)
- retirement is sticky — prevents oscillation on noise
Files
+ crates/vectord/src/pathway_memory.rs (new, 600 lines + 18 tests)
PathwayTrace, LadderAttempt, KbChunkRef, ObserverSignal, BridgeHit,
SubPipelineCall, AuditConsensus, HotSwapCandidate, PathwayMemory,
PathwayMemoryStats. 18/18 tests green.
Cosine + 32-bucket L2-normalized embedding; mirror of TS impl.
M crates/vectord/src/lib.rs
pub mod pathway_memory;
M crates/vectord/src/service.rs
VectorState grows pathway_memory field;
4 HTTP handlers (/pathway/insert, /pathway/query,
/pathway/record_replay, /pathway/stats).
M crates/gateway/src/main.rs
Construct PathwayMemory + load from storage on boot,
wire into VectorState.
M tests/real-world/scrum_master_pipeline.ts
Byte-matching TS bucket-hash (verified same bucket indices as
Rust); pre-ladder hot-swap query; ladder reorder on hit;
per-attempt latency capture; post-accept trace insert
(fire-and-forget); replay outcome recording;
observer /event emits pathway_hot_swap_hit, pathway_similarity,
rungs_saved per review for the VCP UI.
M ui/server.ts
/data/pathway_stats aggregates /vectors/pathway/stats +
scrum_reviews.jsonl window for the value metric.
M ui/ui.js
Three new metric cards:
· pathway reuse rate (activity: is it firing?)
· avg rungs saved (value: is it earning its keep?)
· pathways tracked (stability: retirement = learning)
What's not in this commit (queued)
- auditor → pathway audit_consensus update wire (explicit audit-fail
block activates when this lands)
- bridge_hits + sub_pipeline_calls population from context7 / LLM
Team extract results (fields wired, callers not yet)
- replay log (PathwayReplayOutcome {matched_id, succeeded, ts}) as
a separate jsonl for forensic audit of why specific replays failed
Why > summarization
Summaries discard the causal chain. With this, auditor can verify
citation provenance, applier can distinguish lucky from learned paths,
and the matrix indexing actually stores end-to-end pathways instead of
just RAG chunks — which is what J meant by "why aren't we using it
for everything."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8b77d67c9c |
OpenRouter rescue ladder + tree-split reduce fix + observer→LLM Team + scrum_applier + first auto-applied patch
Some checks failed
lakehouse/auditor 1 blocking issue: cloud: claim not backed — "journal event verified live (total_events_created 0→1 after probe)."
## Infrastructure (scrum loop hardening)
crates/gateway/src/v1/openrouter.rs — new OpenRouter provider
Direct HTTPS to openrouter.ai/api/v1/chat/completions with OpenAI-compatible shape.
Key resolution: OPENROUTER_API_KEY env → /home/profit/.env → /root/llm_team_config.json
(shares LLM Team UI's quota). Added after iter 5 hit repeated Ollama Cloud 502s on
kimi-k2:1t — different provider backbone as rescue rung. Unit tests pin the URL
stripping and OpenAI wire shape.
crates/gateway/src/v1/mod.rs + main.rs
Added `"openrouter" | "openrouter_free"` arm to /v1/chat dispatch.
V1State.openrouter_key loaded at startup via openrouter::resolve_openrouter_key()
mirroring the Ollama Cloud pattern. Startup log:
"v1: OpenRouter key loaded — /v1/chat provider=openrouter enabled"
tests/real-world/scrum_master_pipeline.ts
* 9-rung ladder — kimi-k2:1t → qwen3-coder:480b → deepseek-v3.1:671b →
mistral-large-3:675b → gpt-oss:120b → qwen3.5:397b → openrouter/gpt-oss-120b:free
→ openrouter/gemma-3-27b-it:free → local qwen3.5:latest.
Added qwen3-coder:480b as rung 2 after live probes confirmed it rescues
kimi-k2:1t 502s cleanly (0.9s latency, substantive reviews).
Dropped devstral-2 (displaced by qwen3-coder); dropped kimi-k2.6 (not available);
dropped minimax-m2.7 (returned 0 chars / 400 thinking tokens).
Local fallback promoted qwen3.5:latest per J's direction 2026-04-24.
* MAX_ATTEMPTS bumped 6 → 9 to accommodate the rescue tier.
* Tree-split scratchpad fixed — was concatenating shard markers directly
into the reviewer input, causing kimi-k2:1t to write titles like
"Forensic Audit Report – file.rs (shard 3)". Now uses internal §N§
markers during accumulation and runs a proper reduce step that
collapses per-shard digests into ONE coherent file-level synthesis
with markers stripped. Matches the Phase 21 aibridge::tree_split
map→reduce design. Fallback to stripped scratchpad if reducer returns thin.
tests/real-world/scrum_applier.ts — NEW (737 lines)
The auto-apply pipeline. Reads scrum_reviews.jsonl, filters rows where
gradient_tier ∈ {auto, dry_run} AND confidence_avg ≥ MIN_CONF (default 90),
asks the reviewer model for concrete old_string/new_string patch JSON,
applies via text replacement, runs cargo check after each file, commits
if green and reverts if red. Deny-list: /etc/, config/, ops/, auditor/,
docs/, data/, mcp-server/, ui/, sidecar/, scripts/. Hard caps: per-patch
confidence ≥ MIN_CONF, old_string must be exactly unique, max 20 lines per
patch. Never runs on main without explicit LH_APPLIER_BRANCH override.
Audit trail in data/_kb/auto_apply.jsonl.
Empirical behavior (dry-run over iter 4 reviews):
5 eligible files → 1 green commit-ready, 2 build-red reverts, 2 all-rejected
The build-green gate caught 2 bad patches before they'd have merged.
mcp-server/observer.ts — LLM Team code_review escalation
When a sig_hash accumulates ≥3 failures (ESCALATION_THRESHOLD), fire-and-forget
POST /api/run?mode=code_review at localhost:5000 with the failure cluster context.
Parses facts/entities/relationships/file_hints from the response. Writes to a
new data/_kb/observer_escalations.jsonl surface. Answers J's vision of the
observer triggering richer LLM Team calls when failures pile up.
Non-blocking: runs parallel to existing qwen2.5 analyzer, never replaces it.
Tracks escalated sig_hashes in a session-local Set to avoid re-hammering
LLM Team when a cluster persists across observer cycles.
crates/aibridge/src/context.rs
First auto-applied patch produced by scrum_applier.ts (dry-run path —
applier writes files in dry-run mode but doesn't commit; bug noted for
iter 6 fix). Adds #[deprecated] annotation to the inline estimate_tokens
helper pointing callers to the centralized shared::model_matrix::ModelMatrix
entry point (P21-002 — duplicate token-estimator surfaces). Cargo check
passes with the annotation (verified by applier's own build gate).
## Visual Control Plane (UI)
ui/server.ts — Bun.serve on :3950 with /data/* fan-out:
/data/services, /data/reviews, /data/metrics, /data/trust, /data/overrides,
/data/findings, /data/outcomes, /data/audit_facts, /data/file/:path,
/data/refactor_signals, /data/search?q=, /data/signal_classes,
/data/logs/:svc (journalctl tail per systemd unit), /data/scrum_log.
Bug fix: tryFetch always attempts JSON.parse before falling back to text
— observer's Bun.serve returns JSON without application/json content-type,
which was displaying stats as a raw string ("0 ops" on map) before.
ui/index.html + ui.css — dark neo-brutalist shell. 6 views:
MAP (D3 force-graph + overlays) / TRACE (per-file iter history) /
TRAJECTORY (signal-class cards + refactor-signals table + reverse-index
search box) / METRICS (every card has SOURCE + GOOD lines explaining
where the number comes from and what target trajectory means) /
KB (card grid with tooltips on every field) / CONSOLE (per-service
journalctl tabs).
ui/ui.js — polling client, D3 wiring, signal-class panel, refactor-signals
table, reverse-index search, per-service console tabs. Bug fix:
renderNodeContext had Object.entries() iterating string characters when
/health returned a plain string — now guards with typeof check so
"lakehouse ok" renders as one row instead of "0 l / 1 a / 2 k / ...".
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
21fd3b9c61 |
Scrum-driven fixes: P5-001 auth wired, P42-001 truth evaluator, P9-001 journal on ingest
Some checks failed
lakehouse/auditor 2 blocking issues: cloud: claim not backed — "| **P9-001** (partial) | `crates/ingestd/src/service.rs` | **3 → 6** ↑↑↑ | `journal.record_ing
Apply the highest-confidence findings from the Phase 0→42 forensic sweep
after four scrum-master iterations under the adversarial prompt. Each fix
is independently validated by a later scrum iteration scoring the same
file higher under the same bar.
Code changes
────────────
P5-001 — crates/gateway/src/auth.rs + main.rs
api_key_auth was marked #[allow(dead_code)] and never wrapped around
the router, so `[auth] enabled=true` logged a green message and
enforced nothing. Now wired via from_fn_with_state, with constant-time
header compare and /health exempted for LB probes.
P42-001 — crates/truth/src/lib.rs
TruthStore::check() ignored RuleCondition entirely — signature looked
like enforcement, body returned every action unconditionally. Added
evaluate(task_class, ctx) that actually walks FieldEquals / FieldEmpty /
FieldGreater / Always against a serde_json::Value via dot-path lookup.
check() kept for back-compat. Tests 14 → 24 (10 new exercising real
pass/fail semantics). serde_json moved to [dependencies].
P9-001 (partial) — crates/ingestd/src/service.rs
Added Optional<Journal> to IngestState + a journal.record_ingest() call
on /ingest/file success. Gateway wires it with `journal.clone()` before
the /journal nest consumes the original. First-ever internal mutation
journal event verified live (total_events_created 0→1 after probe).
Iter-4 scrum scored these files higher under same prompt:
ingestd/src/service.rs 3 → 6 (P9-001 visible)
truth/src/lib.rs 3 → 4 (P42-001 visible)
gateway/src/auth.rs 3 → 4 (P5-001 visible)
gateway/src/execution_loop 4 → 6 (indirect)
storaged/src/federation 3 → 4 (indirect)
Infrastructure additions
────────────────────────
* tests/real-world/scrum_master_pipeline.ts
- cloud-first ladder: kimi-k2:1t → deepseek-v3.1:671b → mistral-large-3:675b
→ gpt-oss:120b → devstral-2:123b → qwen3.5:397b (deep final thinker)
- LH_SCRUM_FORENSIC env: injects SCRUM_FORENSIC_PROMPT.md as adversarial preamble
- LH_SCRUM_PROPOSAL env: per-iter fix-wave doc override
- Confidence extraction (markdown + JSON), schema v4 KB rows with:
verdict, critical_failures_count, verified_components_count,
missing_components_count, output_format, gradient_tier
- Model trust profile written per file-accept to data/_kb/model_trust.jsonl
- Fire-and-forget POST to observer /event so by_source.scrum appears in /stats
* mcp-server/observer.ts — unchanged in shape, confirmed receiving scrum events
* ui/ — new Visual Control Plane on :3950
- Bun.serve with /data/{services,reviews,metrics,trust,overrides,findings,file,refactor_signals,search,logs/:svc,scrum_log}
- Views: MAP (D3 graph, 5 overlays) / TRACE (per-file iter timeline) /
TRAJECTORY (refactor signals + reverse index search) / METRICS (explainers
with SOURCE + GOOD lines) / KB (card grid with tooltips) / CONSOLE (per-service
journalctl tail, tabs for gateway/sidecar/observer/mcp/ctx7/auditor/langfuse)
- tryFetch always attempts JSON.parse (fix for observer returning JSON without content-type)
- renderNodeContext primitive-vs-object guard (fix for gateway /health string)
* docs/SCRUM_FIX_WAVE.md — iter-specific scope directing the scrum
* docs/SCRUM_FORENSIC_PROMPT.md — adversarial audit prompt (verdict/critical/verified schema)
* docs/SCRUM_LOOP_NOTES.md — iteration observations + fix-next-loop queue
* docs/SYSTEM_EVOLUTION_LAYERS.md — Layers 1-10 roadmap (trust profiling, execution DNA, drift sentinel, etc)
Measurements across iterations
──────────────────────────────
iter 1 (soft prompt, gpt-oss:120b): mean score 5.00/10
iter 3 (forensic, kimi-k2:1t): mean score 3.56/10 (−1.44 — bar raised)
iter 4 (same bar, post fixes): mean score 4.00/10 (+0.44 — fixes landed)
Score movement iter3→iter4: ↑5 ↓1 =12
21/21 first-attempt accept by kimi-k2:1t in iter 4
20/21 emitted forensic JSON (richer signal than markdown)
16 verified_components captured (proof-of-life, new metric)
Permission Gradient distribution: 0 auto · 16 dry_run · 4 sim · 1 block
Observer loop: by_source {scrum: 21, langfuse: 1985, phase24_audit: 1}
v1/usage: 224 requests, 477K tokens, all tracked
Signal classes per file (iter 3 → iter 4):
CONVERGING: 1 (ingestd/service.rs — fix clearly landed)
LOOPING: 4 (catalogd/registry, main, queryd/service, vectord/index_registry)
ORBITING: 1 (truth — novel findings surfacing as surface ones fix)
PLATEAU: 9 (scores flat with high confidence — diminishing returns)
MIXED: 6
Loop thesis status
──────────────────
A file's score rises only when the scrum confirms a real fix landed.
No false positives yet across 3 iterations. Fixes applied to 3 files all
raised their independent scores under the same adversarial prompt. Loop
is measurable, not hand-wavy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
75a0f424ef |
Phase 40 (early): Langfuse tracing on /v1/chat — observability recovery
The lost stack J flagged was partly already present: Langfuse container has been running 2 days with the staffing project, SDK installed, mcp-server tracing gw:/* routes. What was missing was Rust-side /v1/chat emission — the new Phase 38/39 code bypassed Langfuse entirely. This commit bridges it. Fire-and-forget HTTP POST to http://localhost:3001/api/public/ingestion (batch {trace-create + generation-create}) on every chat call. Non-blocking — spawned tokio task, response latency unaffected. Trace failures log warn and drop, never propagate. Verified end-to-end after restart: - Log line "v1: Langfuse tracing enabled" at startup - /v1/chat local (qwen3.5:latest) → v1.chat:ollama trace appears with lat=0.41s, 24+6 tokens - /v1/chat cloud (gpt-oss:120b) → v1.chat:ollama_cloud trace appears with lat=1.87s, 73+87 tokens - mcp-server's existing gw:/log + gw:/intelligence/* traces continue to flow into the same project unchanged Files: - crates/gateway/src/v1/langfuse_trace.rs (new, 195 LOC) — thin client, no SDK. reqwest Basic Auth. ChatTrace payload + event serializer. from_env_or_defaults() resolver matches mcp-server/tracing.ts conventions (pk-lf-staffing / sk-lf- staffing-secret / localhost:3001) - crates/gateway/src/v1/mod.rs — V1State.langfuse field, emission after successful provider call (post-dispatch, pre-usage-update) - crates/gateway/src/main.rs — resolve + log at startup Tests: 12/12 green (9 prior + 3 for langfuse_trace: ingestion-batch serialization, uuid generator uniqueness, env resolver shape). Recovered piece #1 of 3 from the lost-stack narrative. Still open: - Langfuse → observer :3800 pipe (Phase 40 mid-deliverable) - Gitea MCP reconnect in mcp-server/index.ts (Phase 40 late) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
42a11d35cd |
Phase 39 (first slice): Ollama Cloud adapter on /v1/chat
Second provider wired. /v1/chat now routes by optional `provider` field: default "ollama" hits local via sidecar, "ollama_cloud" (or "cloud") hits ollama.com/api/generate directly with Bearer auth. Key sourced at gateway startup from OLLAMA_CLOUD_KEY env, then /root/llm_team_config.json (providers.ollama_cloud.api_key), then OLLAMA_CLOUD_API_KEY env. Config source matches LLM Team convention. Shape-identical to scenario.ts::generateCloud — same endpoint, same body, same Bearer auth. Cloud path bypasses sidecar entirely (sidecar is local-only by design, mirrors TS agent.ts). Changes: - crates/gateway/src/v1/ollama_cloud.rs (new, 130 LOC) — reqwest client, resolve_cloud_key(), chat() adapter, CloudGenerateBody / CloudGenerateResponse wire shapes - crates/gateway/src/v1/ollama.rs — flatten_messages_public() re-export so sibling adapters reuse the shape collapse - crates/gateway/src/v1/mod.rs — provider field on ChatRequest, dispatch match in chat() handler, ollama_cloud_key on V1State - crates/gateway/src/main.rs — resolves cloud key at startup, logs which source provided it - crates/gateway/Cargo.toml — reqwest 0.12 with rustls-tls Verified end-to-end after restart: - provider=ollama → qwen3.5:latest local (~400ms, Phase 38 unchanged) - provider=ollama_cloud + model=gpt-oss:120b → real 225-word technical response in 5.4s, 313 tokens Tests: 9/9 green (7 from Phase 38 + 2 new for cloud body serialization and key resolver shape). Not in this slice: trait extraction (full Phase 39 scope adds ProviderAdapter trait + OpenRouter adapter + fallback chain logic). These land next with Phase 40 routing engine on top. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4cb405bb42 |
Phase 38: Universal API skeleton — /v1/chat, /v1/usage, /v1/sessions
First slice of the control-plane pivot. OpenAI-compatible surface
over the existing aibridge → Ollama path. Additive — no existing
routes touched. All 7 unit tests green, release build clean.
What ships:
- crates/gateway/src/v1/mod.rs — router, V1State (ai_client + Usage
counter), ChatRequest/ChatResponse/Message/UsageBlock types, handlers
for /chat, /usage, /sessions. OpenAI-compatible field shapes:
{model, messages[{role,content}], temperature?, max_tokens?, stream?}
- crates/gateway/src/v1/ollama.rs — shape adapter. Flattens messages
into (system, prompt), calls aibridge.generate, unwraps response
back into OpenAI /v1/chat shape. Prefers sidecar-reported tokens;
falls back to chars/4 ceiling estimate matching Phase 21 convention.
- crates/gateway/src/main.rs — one new mod, one .nest("/v1", ...)
Tests (7/7):
- chat_request_parses_openai_shape
- chat_request_accepts_minimal
- usage_counter_default_is_zero
- flatten_separates_system_from_turns
- flatten_concatenates_multiple_system_messages
- flatten_with_no_system_returns_empty_system
- estimate_tokens_chars_div_4_ceiling
Not in this phase (per CONTROL_PLANE_PRD.md): streaming, tool calls,
session state, multi-provider, fallback chain, cost gating. All
land in Phases 39-44.
Next: live-test POST /v1/chat after gateway restart, then migrate
bot/propose.ts off direct sidecar calls to prove the loop end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5b1fcf6d27 |
Phase 28-36 body of work
Accumulated since a6f12e2 (Phase 21 Rust port + Phase 27 versioning): - Phase 36: embed_semaphore on VectorState (permits=1) serializes seed embed calls — prevents sidecar socket collisions under concurrent /seed stress load - Phase 31+: run_stress.ts 6-task diverse stress scaffolding; run_e2e_rated.ts + orchestrator.ts tightening - Catalog dedupe cleanup: 16 duplicate manifests removed; canonical candidates.parquet (10.5MB -> 76KB) + placements.parquet (1.2MB -> 11KB) regenerated post-dedupe; fresh manifests for active datasets - vectord: harness EvalSet refinements (+181), agent portfolio rotation + ingest triggers (+158), autotune + rag adjustments - catalogd/storaged/ingestd/mcp-server: misc tightening - docs: Phase 28-36 PRD entries + DECISIONS ADR additions; control-plane pivot banner added to top of docs/PRD.md (pointing at docs/CONTROL_PLANE_PRD.md which lands in next commit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
40305da654 |
500K scale test: 2.9M rows, sub-120ms SQL, architecture holds
Bumped upload limit to 512MB for large CSV ingests. Generated and ingested 500K staffing worker profiles (346MB CSV → 75MB Parquet in 5.9s). SQL at 500K: COUNT=35ms, filter+state=67ms, aggregation=80ms, complex filter=117ms, 10 concurrent=84ms total (10/10 pass). HNSW memory projection: 500K vectors = 1.5GB RAM (comfortable on 128GB server). Ceiling at ~5M vectors (14.6GB) — Lance IVF_PQ takes over beyond that as designed in ADR-019. Hybrid search 500K SQL → 10K vector: 131ms with 6,289 SQL matches narrowed to 5 vector-ranked results. Total scale: 2.9M rows across all datasets (500K workers + 2.47M staffing data). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
7c1222d240 |
Phase E: Scheduled ingest — the substrate runs itself
Background Scheduler task fires due ingests on interval, records
outcomes, reschedules. Single-flight per schedule_id so a slow run
can't pile up. 10s tick cadence, schedules' own intervals independent.
ScheduleDef persisted as JSON at primary://_schedules/{id}.json,
rebuilt on startup. ScheduleKind supports Mysql and Postgres (both
through existing streaming paths). ScheduleTrigger::Interval is
live; Cron variant defined in the enum but parsing stubbed with a
safe 1h fallback.
next_run_at set to "now" on creation so operators see success or
failure within one tick — no waiting for the first full interval.
run-now endpoint fires even when schedule is disabled (manual
override for testing). Full catalog integration: PII detection,
lineage with redacted DSN, mark-stale + autotune agent trigger.
Verified live: 20s MySQL schedule against MariaDB lh_demo.customers.
Source mutated between runs (added row + updated value). Second
auto-fire picked up both changes (10→11 rows). DataFusion SQL
confirmed mutations in the lakehouse. 6 unit tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
0d037cfac1 |
Phases 16.2 + L2 + 17 VRAM gate + MySQL + 18 Lance hybrid milestone
Five threads of work landing as one milestone — all individually
verified end-to-end against real data, full release build clean,
46 unit tests pass.
## Phase 16.2 / 16.5 — autotune agent + ingest triggers
`vectord::agent` is a long-running tokio task that watches the trial
journal and autonomously proposes + runs new HNSW configs. Distinct
from `autotune::run_autotune` (synchronous one-shot grid). Triggered
on POST /vectors/agent/enqueue/{idx} or by the periodic wake; ingest
paths now push DatasetAppended events when an index's source dataset
gets re-ingested. Rate-limited (max_trials_per_hour) and cooldown-
gated so it can't saturate Ollama under live load.
The proposer is ε-greedy around the current champion: with prob 0.25
sample random from full bounds, otherwise perturb champion ± small
delta on both axes. Dedup against history. Deterministic — RNG seeded
from history.len() so the same journal state proposes the same next
config (helps offline replay debugging).
`[agent]` config section in lakehouse.toml; opt-in via enabled=true.
## Federation Layer 2 — runtime bucket lifecycle + per-index scoping
`BucketRegistry.buckets` moved to `std::sync::RwLock<HashMap>` so
buckets can be added/removed after startup. POST /storage/buckets
provisions at runtime; DELETE /storage/buckets/{name} unregisters
(refuses primary/rescue with 403). Local-backend buckets get their
root directory auto-created.
`IndexMeta.bucket` (default "primary" via serde) records each index's
home bucket. `TrialJournal` and `PromotionRegistry` now hold
Arc<BucketRegistry> + IndexRegistry; they resolve target store per-
index via IndexMeta.bucket. PromotionRegistry::list_all scans every
bucket and dedups by index_name. Pre-federation indexes keep working
unchanged — they just default to primary.
`ModelProfile.bucket: Option<String>` declares per-profile artifact
home. POST /vectors/profile/{id}/activate auto-provisions the
profile's bucket under storage.profile_root if not yet registered.
EvalSets stay primary-only for now — noted gap, low-risk to extend
later with the same resolver pattern.
## Phase 17 — VRAM-aware two-profile gate
Sidecar gains POST /admin/unload (Ollama keep_alive=0 trick — forces
immediate VRAM release), POST /admin/preload (keep_alive=5m with
empty prompt, takes the slot warm), and GET /admin/vram (combines
nvidia-smi snapshot with Ollama /api/ps). Exposed via aibridge as
unload_model / preload_model / vram_snapshot.
`VectorState.active_profile` is the GPU-slot singleton —
Arc<RwLock<Option<ActiveProfileSlot>>>. activate_profile checks for
a previous profile with a different ollama_name and unloads it
before preloading the new one; same-model reactivations skip the
unload (Ollama no-ops). New routes: POST /vectors/profile/{id}/
deactivate (unload + clear slot), GET /vectors/profile/active.
Verified live: staffing-recruiter (qwen2.5) → docs-assistant
(mistral) swap freed qwen2.5 from VRAM and loaded mistral. nomic-
embed-text persists across swaps because both profiles use it —
free optimization that fell out of the design. Scoped search
correctly 403s cross-profile in both directions.
## MySQL streaming connector
`crates/ingestd/src/my_stream.rs` mirrors pg_stream.rs for MySQL.
Pure-rust `mysql_async` driver (default-features=false to avoid C
deps). Same OFFSET pagination, same Parquet-streaming write shape.
Type mapping per ADR-010: int/bigint → Int32/Int64, decimal/float
→ Float64, tinyint(1)/bool → Boolean, everything else → Utf8 with
fallback parsers for date/time/json/uuid via Display.
POST /ingest/mysql parallel to /ingest/db. Same PII auto-detection,
same lineage capture (source_system="mysql"), same agent-trigger
hook. `redact_dsn` generalized — was hardcoded to "postgresql://"
length, now works for any scheme://user:pass@host/path URL (latent
PII leak fix for MySQL DSNs).
Verified live against MariaDB on localhost: 10 rows × 9 columns of
test data round-tripped through datatypes int/varchar/decimal/
tinyint/datetime/text. PII detection auto-flagged name + email.
Aggregation queries through DataFusion match the source values
exactly.
## Phase 18 — Hybrid Parquet+HNSW ⊕ Lance backend (ADR-019)
`vectord-lance` is a new firewall crate. Lance pulls Arrow 57 and
DataFusion 52 — incompatible with the rest of the workspace's
Arrow 55 / DataFusion 47. The firewall isolates that dep tree:
public API uses only std types (Vec<f32>, Vec<String>, Hit, Row,
*Stats), so no Arrow types cross the crate boundary and nothing
propagates to vectord. The ADR-019 path that didn't ship until now.
`vectord::lance_backend::LanceRegistry` lazy-creates a
LanceVectorStore per index, resolving bucket → URI via the
conventional local-bucket layout. `IndexMeta.vector_backend` and
`ModelProfile.vector_backend` carry the choice (default Parquet so
existing indexes unchanged).
Six routes under /vectors/lance/*:
- migrate/{idx}: convert binary-blob Parquet → Lance FixedSizeList
- index/{idx}: build IVF_PQ
- search/{idx}: vector search (embed via sidecar)
- doc/{idx}/{doc_id}: random row fetch
- append/{idx}: native fragment append
- stats/{idx}: row count + index presence
Verified live on the real resumes_100k_v2 corpus (100K × 768d):
- Migrate: 0.57s
- Build IVF_PQ index: 16.2s (matches ADR-019 bench; 14× faster than
HNSW's 230s for the same data)
- Search end-to-end (Ollama embed + Lance scan): 23-53ms
- Random doc_id fetch: 5-7ms (filter scan; faster than Parquet's
~35ms full-file scan, slower than the bench's 311us positional
take — would close that gap with a scalar btree on doc_id)
- Append 100 rows: 3.3ms / +320KB on disk vs Parquet's required
full ~330MB rewrite — the structural win
- Index survives append; both backends coexist cleanly
## Known follow-ups not in this milestone
- ModelProfile.vector_backend doesn't yet auto-route /vectors/profile/
{id}/search to Lance; callers go through /vectors/lance/* directly
- Scalar btree on doc_id (closes the 5-7ms → ~300us gap)
- vectord-lance built default-features=false → no S3 yet
- IVF_PQ recall not measured (ADR-019 caveat) — needs a Lance-aware
variant of the eval harness
- Watcher-path ingest doesn't push agent triggers (HTTP paths do)
- EvalSets still primary-only (federation gap)
- No PATCH endpoint to move an existing index between buckets
- The pre-existing storaged::append_log doctest fails to compile
(malformed `{prefix}/` parses as code fence) — pre-existing bug,
left for a focused fix
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
4d5c49090c |
Phase 16: Hot-swap generations + autotune agent loop
Closes the self-iteration loop from the PRD reframe: an agent can
tune HNSW configs autonomously and the winner flows through to the
next profile activation without human intervention.
Three primitives:
1. PromotionRegistry (vectord::promotion)
- Per-index current + history at _hnsw_promotions/{index}.json
- promote(index, entry) atomically swaps current, pushes prior
onto history (capped at 50)
- rollback() pops history back onto current; clears current if
history exhausted
- config_or(index, default) — the read side used at build time,
returns promoted config if set else caller's default
- Full cache + persistence; writes are durable on return
2. Autotune (vectord::autotune)
- run_autotune(request, ...) — synchronous agent loop
- Default grid: 5 configs covering the practical range
(ec=20/40/80/80/160, es=30/30/30/60/30) with seed=42 for
reproducibility
- Every trial goes through the existing trial-journal pipeline
so autotune runs land alongside manual trials in the
"trials are data" log
- Winner: max recall first, then min p50 latency; must clear
min_recall gate (default 0.9) or no promotion happens
- Config bounds (ec ∈ [10,400], es ∈ [10,200]) reject absurd
values from the request's optional custom grid
- On winner: promote with note "autotune winner: recall=X p50=Y"
3. Wiring
- VectorState gains promotion_registry
- activate_profile now calls promotion_registry.config_or(...)
so newly-promoted configs are picked up on next activation —
the "hot-swap" is: autotune promotes -> profile activates ->
HNSW rebuilt with new config
- New endpoints:
POST /vectors/hnsw/promote/{index}/{trial_id}
?promoted_by=...¬e=...
POST /vectors/hnsw/rollback/{index}
GET /vectors/hnsw/promoted/{index}
POST /vectors/hnsw/autotune { index_name, harness,
min_recall?, grid? }
End-to-end verified on threat_intel_v1 (54 vectors):
- autogen harness 'threat_intel_smoke' (10 queries)
- POST /autotune -> 5 trials in 620ms, winner ec=20 es=30
recall=1.00 p50=64us auto-promoted
- Manual promote of ec=80 es=30 -> history depth 1
- Rollback -> back to ec=20 es=30 autotune winner
- Second rollback -> current cleared
- Re-promote + restart -> persistence verified
- Profile activation after promotion logged:
"building HNSW ef_construction=80 ef_search=30 seed=Some(42)"
proving the hot-swap loop is closed.
Deferred:
- Bayesian optimization (random-grid is fine at this config-space size)
- Append-triggered autotune (Phase 17.5 — refresh OnAppend policy
can schedule autotune after appending sufficient new rows)
- Concurrent autotune per index guard (JobTracker integration)
PRD invariants satisfied: invariant 8 (hot-swappable indexes) is now
real code — promote is atomic, rollback is always available, the
active generation is a persistent pointer not a runtime convention.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
24f1249a62 |
Federation layer 2: header routing + cross-bucket SQL
Three pieces of the multi-bucket federation made real:
1. Catalog migration (POST /catalog/migrate-buckets)
- One-shot normalizer for ObjectRef.bucket field
- Empty -> "primary"; legacy "data"/"local" -> "primary"
- Idempotent; re-running on canonical state is no-op
- Ran on existing catalog: 12 refs renamed from "data", 2 already
"primary", all 14 now canonical
2. X-Lakehouse-Bucket header middleware on ingest
- resolve_bucket() helper extracts header, returns
(bucket_name, store) or 404 with valid bucket list
- ingest_file and ingest_db_stream now route writes per-request
- Defaults to "primary" when header absent
- pipeline::ingest_file_to_bucket records the actual bucket on the
ObjectRef so catalog stays the source of truth for "where does this
data live"
- Verified: ingest with X-Lakehouse-Bucket: testing lands in
data/_testing/, ingest without header lands in data/, bad header
returns 404 with hint
3. queryd registers every bucket with DataFusion
- QueryEngine now holds Arc<BucketRegistry> instead of single store
- build_context iterates all buckets, registers each as a separate
ObjectStore under URL scheme "lakehouse-{bucket}://"
- ListingTable URLs include the per-object bucket scheme so
DataFusion routes scans automatically based on ObjectRef.bucket
- Profile bucket names like "profile:user" sanitized to
"lakehouse-profile-user" since URL host segments can't contain ":"
- Tolerant of duplicate manifest entries (pre-existing
pipeline::ingest_file behavior creates a fresh dataset id per
ingest); duplicates skipped with debug log
- Backward compat: legacy "lakehouse://data/" URL still registered
pointing at primary
Success gate: cross-bucket CROSS JOIN
SELECT p.name, p.role, a.species
FROM people_test p (bucket: testing)
CROSS JOIN animals a (bucket: primary)
LIMIT 5
returns rows correctly. DataFusion routed each scan to its bucket's
ObjectStore based on the URL scheme.
No regressions: SELECT COUNT(*) FROM candidates still returns 100000
from the primary bucket.
Deferred to Phase 17:
- POST /profile/{user}/activate (HNSW hot-load on profile switch)
- vectord storage paths becoming bucket-scoped (trial journals,
eval sets per-profile)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
97a376482c |
Phase C: Decoupled embedding refresh
Implements the llms3.com-inspired pattern: embeddings refresh
asynchronously, decoupled from transactional row writes. New rows arrive,
ingest marks the vector index stale, a later refresh embeds only the
delta (doc_ids not already in the index).
Schema additions (DatasetManifest):
- last_embedded_at: Option<DateTime> - when the index was last refreshed
- embedding_stale_since: Option<DateTime> - set when data written, cleared on refresh
- embedding_refresh_policy: Option<RefreshPolicy> - Manual | OnAppend | Scheduled
Ingest paths (pipeline::ingest_file + pg_stream) call
registry.mark_embeddings_stale after writing. No-op if the dataset has
never been embedded — stale semantics only kick in once last_embedded_at
is set.
Refresh pipeline (vectord::refresh::refresh_index):
- Reads the dataset Parquet, extracts (doc_id, text) pairs
- Accepts Utf8 / Int32 / Int64 id columns (covers both CSV and pg schemas)
- Loads existing embeddings via EmbeddingCache (empty on first-time build)
- Filters to rows whose doc_id is NOT in the existing set
- Chunks (chunker::chunk_column), embeds via Ollama (batches of 32),
writes combined index, clears stale flag
Endpoints:
- POST /vectors/refresh/{dataset_name} - body {index_name, id_column,
text_column, chunk_size?, overlap?}
- GET /vectors/stale - lists datasets whose embedding_stale_since is set
End-to-end verified on threat_intel (knowledge_base.threat_intel):
- Initial refresh: 20 rows -> 20 chunks -> embedded in 2.1s,
last_embedded_at set
- Idempotent second refresh: 0 new docs -> 1.8ms (pure delta check)
- Re-ingest to 54 rows: mark_embeddings_stale fires -> stale_since set
- /vectors/stale surfaces threat_intel with timestamps + policy
- Delta refresh: 34 new docs embedded in 970ms (6x faster than full
re-embed); stale_cleared = true
Not in MVP scope:
- UPDATE semantics (same doc_id, different content) - would need
per-row content hashing
- OnAppend policy auto-trigger - just declares intent; actual scheduler
deferred
- Scheduler runtime - the Scheduled(cron) variant declares the intent so
operators can see which datasets expect what, but the cron itself is
separate
Per ADR-019: when a profile switches to vector_backend=Lance, this
refresh path benefits — Lance's native append replaces our "read all +
rewrite" Parquet rebuild pattern. Current MVP works well enough at
~500-5K rows to validate the architecture; Lance unblocks the 5M+ case.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
dbe00d018f |
Federation foundation + HNSW trial system + Postgres streaming + PRD reframe
Four shipped features and a PRD realignment, all measured end-to-end:
HNSW trial system (Phase 15 horizon item → complete)
- vectord: EmbeddingCache, harness (eval sets + brute-force ground truth),
TrialJournal, parameterized HnswConfig on build_index_with_config
- /vectors/hnsw/trial, /hnsw/trials/{idx}, /hnsw/trials/{idx}/best,
/hnsw/evals/{name}/autogen, /hnsw/cache/stats
- Measured on resumes_100k_v2 (100K × 768d): brute-force 44ms -> HNSW 873us
at 100% recall@10. ec=80 es=30 locked as HnswConfig::default()
- Lower ec values trade recall for build time: 20/30 = 0.96 recall in 8s,
80/30 = 1.00 recall in 230s
Catalog manifest repair
- catalogd: resync_from_parquet reads parquet footers to restore row_count
and columns on drifted manifests
- POST /catalog/datasets/{name}/resync + POST /catalog/resync-missing
- All 7 staffing tables recovered to PRD-matching 2,469,278 rows
Federation foundation (ADR-017)
- shared::secrets: SecretsProvider trait + FileSecretsProvider (reads
/etc/lakehouse/secrets.toml, enforces 0600 perms)
- storaged::registry::BucketRegistry — multi-bucket resolution with
rescue_bucket read fallback and reachability probing
- storaged::error_journal — bucket op failures visible in one HTTP call
- storaged::append_log — write-once batched append pattern (fixes the RMW
anti-pattern llms3.com calls out; errors and trial journals both use it)
- /storage/buckets, /storage/errors, /storage/bucket-health,
/storage/errors/{flush,compact}
- Bucket-aware I/O at /storage/buckets/{bucket}/objects/{*key} with
X-Lakehouse-Rescue-Used observability headers on fallback
Postgres streaming ingest
- ingestd::pg_stream: DSN parser, batched ORDER BY + LIMIT/OFFSET pagination
into ArrowWriter, lineage redacts password
- POST /ingest/db — verified against live knowledge_base.team_runs
(586 rows × 13 cols, 6 batches, 196ms end-to-end)
PRD realignment (2026-04-16)
- Dual use case: staffing analytics + local LLM knowledge substrate
- Removed "multi-tenancy (single-owner system)" from non-goals
- Added invariants 8-11: indexes hot-swappable, per-reader profiles,
trials-as-data, operational failures findable in one HTTP call
- New phases 16 (hot-swap generations), 17 (model profiles + dataset
bindings), 18 (Lance vs Parquet+sidecar evaluation)
- Known ceilings table documents the 5M vector wall and escape hatches
- ADR-017 (federation), ADR-018 (append-log pattern) added
- EXECUTION_PLAN.md sequences phases B-E with success gates and
decision rules
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
294f3f6a49 |
Scheduled ingest: file watcher auto-ingests from ./inbox
- Drop CSV/JSON/PDF/text into ./inbox → auto-detected → Parquet → queryable - Polls every 10 seconds (configurable) - Processed files moved to ./inbox/processed/ - Failed files moved to ./inbox/failed/ - Dedup: same file dropped twice = no-op - Watcher starts automatically on gateway boot - Tested: CSV dropped → queryable in <15s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
04770c97eb |
HNSW vector index: 100K search in 27ms (58x faster than brute-force)
- instant-distance HNSW implementation for approximate nearest neighbors - HnswStore: build from stored embeddings, in-memory index, thread-safe - POST /vectors/hnsw/build — build index from Parquet (100K in 35s release) - POST /vectors/hnsw/search — fast ANN search - GET /vectors/hnsw/list — list loaded indexes Benchmark (100K × 768d, release build): Brute-force: 1,567ms HNSW: 31ms (50x) HNSW warm: 27ms (58x) Build cost: 35s one-time for 100K vectors (release mode) ef_construction=40, ef_search=50 — good recall/speed balance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
e5b7663c20 |
Phase 13: Access control — role-based sensitivity enforcement
- AccessControl: agent roles with allowed sensitivity levels - 4 default roles: admin (all), recruiter (PII ok), analyst (financial ok), agent (internal only) - Field-level masking: determines which columns to mask per agent based on sensitivity - Query audit log: tracks every query with agent, datasets, PII fields accessed - Endpoints: GET/POST /access/roles, GET /access/audit, POST /access/check - Toggleable via config (auth.enabled) - 100K embedding: supervisor now sustained 125/sec (2.9x vs single pipeline) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
6f0f92a9e4 |
Phase 12: Tool registry — governed business actions for AI agents
- ToolRegistry: named tools with parameter validation and audit logging
- 6 built-in staffing tools:
search_candidates (skills, city, state, experience, availability)
get_candidate (by ID)
revenue_by_client (top N by billed revenue)
recruiter_performance (placements, revenue per recruiter)
cold_leads (called N+ times, never placed)
open_jobs (by vertical, city)
- Each tool: name, description, params, permission level (read/write/admin)
- SQL template with validated parameter substitution
- Full audit trail: every invocation logged with agent, params, result
- Endpoints: GET /tools (list), GET /tools/{name} (schema),
POST /tools/{name}/call (execute), GET /tools/audit (log)
- Per ADR-015: governed interface before raw SQL for agents
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
6cd1daeb51 |
Phase 11: Embedding versioning — model-proof vector layer
- IndexRegistry: tracks all vector indexes with model metadata
(model_name, model_version, dimensions, build stats)
- Index metadata persisted as JSON in vectors/meta/
- Rebuilt on startup for crash recovery
- GET /vectors/indexes — list all indexes (filter by source/model)
- GET /vectors/indexes/{name} — get index metadata
- Background jobs auto-register metadata on completion
- Multi-version support: same data, different models, coexist
- Per ADR-014: enables incremental re-embed on model upgrade
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
bf7cf96911 |
Phase 9: Event journal — append-only mutation history
- journald crate: immutable event log for every data mutation
- Events: entity_type, entity_id, field, action, old_value, new_value,
actor, source, workspace_id, timestamp
- In-memory buffer with configurable flush threshold (default 100 events)
- Flush writes events as Parquet to journal/ directory
- Query: GET /journal/history/{entity_id} — full history of any record
- Query: GET /journal/recent?limit=50 — latest events across all entities
- Convenience methods: record_insert, record_update, record_ingest
- Stats: GET /journal/stats — buffer size, persisted file count
- Manual flush: POST /journal/flush
- Per ADR-012: events are never modified or deleted
This is the single most important future-proofing decision.
Once history is lost, it's gone forever.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
6a532cb248 |
Background job system for embedding — fixes 100K timeout
- JobTracker: create/update/complete/fail jobs with progress tracking
- POST /vectors/index now returns immediately with job_id (HTTP 202)
- Embedding runs in tokio::spawn background task
- GET /vectors/jobs/{id} returns live progress (chunks embedded, rate, ETA)
- GET /vectors/jobs lists all jobs
- Progress logged every 100 batches with chunks/sec and ETA
- 100K embedding job running successfully at 44 chunks/sec
- System stays responsive during embedding (queries in 23ms)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
0b9da45647 |
Agent workspaces: per-contract overlays with instant handoff
- WorkspaceManager: create/get/list workspaces with daily/weekly/monthly/pinned tiers
- Saved searches: agent stores SQL queries in workspace context
- Shortlist: tag candidates/records to a workspace with notes
- Activity log: track calls, emails, updates per workspace per agent
- Instant handoff: transfer workspace ownership with full history
Zero data copy — just a pointer swap, receiving agent sees everything
- Persistence: workspaces stored as JSON in object storage, rebuilt on startup
- Endpoints: /workspaces/create, /{id}, /{id}/handoff, /{id}/search,
/{id}/shortlist, /{id}/activity
- Tested: Sarah creates workspace, saves searches, shortlists 3 candidates,
logs activity, hands off to Mike who continues seamlessly
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
6df904a03c |
Phase 8: Hot cache + incremental delta updates
- MemCache: LRU in-memory cache for hot datasets (configurable max, default 16GB) Pin/evict/stats endpoints: POST /query/cache/pin, /cache/evict, GET /cache/stats - Delta store: append-only delta Parquet files for row-level updates Write deltas without rewriting base files, merge at query time - Compaction: POST /query/compact merges deltas into base Parquet - Query engine: checks cache first, falls back to Parquet, merges deltas - Benchmarked on 2.47M rows: 1M row JOIN: 854ms cold → 96ms hot (8.9x speedup) 100K filter: 62ms cold → 21ms hot (3x speedup) 1.1M rows cached in 408MB RAM Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
26fc98c885 |
Phase 7: Vector index + RAG pipeline
- vectord crate: chunk → embed → store → search → RAG - chunker: configurable chunk size + overlap, sentence-boundary aware splitting - store: embeddings as Parquet (binary blob f32 vectors), portable format - search: brute-force cosine similarity (works up to ~100K vectors) - rag: full pipeline — embed question → search index → retrieve context → LLM answer - Endpoints: POST /vectors/index, /vectors/search, /vectors/rag - Gateway wired with vectord service - Tested: 200 candidate resumes indexed in 5.4s, semantic search + RAG working - 20 unit tests passing (chunker, search, ingestd, shared) - AI gives honest "no match found" when context doesn't support an answer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
bb05c4412e |
Phase 6: Ingest pipeline — CSV, JSON, PDF, text file support
- ingestd crate: detect file type → parse → schema detection → Parquet → catalog - CSV: auto-detect column types (int, float, bool, string), handles $, %, commas Strips dollar signs from amounts, flexible row parsing, sanitized column names - JSON: array or newline-delimited, nested object flattening (a.b.c → a_b_c) - PDF: text extraction via lopdf, one row per page (source_file, page_number, text) - Text/SMS: line-based ingestion with line numbers - Dedup: SHA-256 content hash, re-ingest same file = no-op - Gateway: POST /ingest/file multipart upload, 256MB body limit - Schema detection per ADR-010: ambiguous types default to String - 12 unit tests passing (CSV parsing, JSON flattening, type inference, dedup) - Tested: messy CSV with missing data, dollar amounts, N/A values → queryable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
01373c0e45 |
Phase 5: hardening — gRPC, observability, auth, config
- proto: lakehouse.proto with CatalogService, QueryService, StorageService, AiService - proto crate: tonic-build codegen from proto definitions - catalogd: gRPC CatalogService implementation - gateway: dual HTTP (:3100) + gRPC (:3101) servers - gateway: OpenTelemetry tracing with stdout exporter - gateway: API key auth middleware (toggleable) - shared: TOML config system with typed structs and defaults - lakehouse.toml config file - ADR-006 and ADR-007 documented Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
50a8c8013f |
Phase 4: Dioxus frontend with dataset browser and SQL query editor
- ui: Dioxus WASM app with dataset sidebar, SQL editor (Ctrl+Enter), results table - ui: dynamic API base URL (same-origin for nginx, port-based for local dev) - gateway: CORS enabled for cross-origin requests - nginx: lakehouse.devop.live proxies UI (:3300) + API (:3100) on same origin - justfile: ui-build, ui-serve, sidecar, up commands added Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
239e471223 |
Phase 3: AI integration with Ollama via Python sidecar
- sidecar: FastAPI app with /embed, /generate, /rerank hitting Ollama - sidecar: Dockerfile, env var config (EMBED_MODEL, GEN_MODEL, RERANK_MODEL) - aibridge: reqwest HTTP client with typed request/response structs - aibridge: Axum proxy endpoints (POST /ai/embed, /ai/generate, /ai/rerank) - gateway: wires AiClient with SIDECAR_URL env var - e2e verified: nomic-embed-text returns 768d vectors, qwen2.5 generates text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
19bdfab227 |
Phase 2: DataFusion query engine over Parquet
- queryd: SessionContext with custom URL scheme to avoid path doubling with LocalFileSystem
- queryd: ListingTable registration from catalog ObjectRefs with schema inference
- queryd: POST /query/sql returns JSON {columns, rows, row_count}
- queryd→catalogd wiring: reads all datasets, registers as named tables
- gateway: wires QueryEngine with shared store + registry
- e2e verified: SELECT *, WHERE/ORDER BY, COUNT/AVG all correct
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
655b6c0b37 |
Phase 1: storage + catalog layer
- storaged: object_store backend (LocalFileSystem), PUT/GET/DELETE/LIST endpoints
- shared: arrow_helpers with Parquet roundtrip + schema fingerprinting (2 tests)
- catalogd: in-memory registry with write-ahead manifest persistence to object storage
- catalogd: POST/GET /datasets, GET /datasets/by-name/{name}
- gateway: wires storaged + catalogd with shared object_store state
- Phase tracker updated: Phase 0 + Phase 1 gates passed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
a52ca841c6 |
Phase 0: bootstrap Rust workspace
- Cargo workspace with 6 crates: shared, storaged, catalogd, queryd, aibridge, gateway - shared: types (DatasetId, ObjectRef, SchemaFingerprint, DatasetManifest) + error enum - gateway: Axum HTTP entrypoint with nested service routers + tracing - All services expose /health stubs - justfile with build/test/run recipes - PRD, phase tracker, and ADR docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |