golangLAKEHOUSE

Author	SHA1	Message	Date
root	6c02c905c8	scrum lift_001: 4 fixes from cross-lineage review Cross-lineage scrum on b2e45f7 produced 1 convergent + 3 single-reviewer findings worth fixing. All apply. 1. (Opus WARN + Qwen INFO convergent) scripts/playbook_lift.sh: replace sleep 2.5 in SQL probe with active polling up to 5s. refresh_every=1s is a lower bound; under load the manifest may not be visible in a fixed sleep, which would 4xx the probe and abort the reality run. 2. (Opus INFO) scripts/playbook_lift.sh: report template glued "env JUDGE_MODEL" + value as "env JUDGE_MODELqwen2.5:latest" with no separator. Replaced two :+/:- substitution chains with a single JUDGE_SOURCE variable computed once at the top of the harness. 3. (Opus INFO) scripts/staffing_workers/main.go: -id-prefix "" silently allowed, defeating the flag's purpose (cross-corpus collision prevent). Now log.Fatal at startup with explicit hint. 4. (Opus WARN) cmd/{pathwayd,observerd}/main_test.go: newTestRouter returned http.Handler then re-cast to chi.Router for chi.Walk. Returning chi.Router directly satisfies http.Handler AND avoids an assertion that would panic if future middleware wraps the router. Dismissed (with rationale): - Kimi INFO hardcoded MinIO endpoint: harness is local-by-design. - Kimi WARN matrixd accepts 502/500: documented; real retriever needs real upstreams the test doesn't spin up. - Qwen INFO queryd string.Contains: brittle but very low risk; restating through typed-error path would couple without adding signal. go test ./cmd/{matrixd,queryd,pathwayd,observerd} all green. Verdicts at reports/scrum/_evidence/2026-04-30/verdicts/lift_001_*.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 06:27:24 -05:00
root	b2e45f7f26	playbook_lift: harness expansion + reality test #001 (7/8 lift, 87.5%) The 5-loop substrate's load-bearing gate is verified — playbook + matrix indexer give the results we're looking for. Per the report's rubric, lift ≥ 50% of discoveries means matrix is doing real work; 7/8 = 87.5% blew through that. Harness was structurally hiding bugs behind a 5-daemon stripped boot. Expanding to the full 10-daemon prod stack surfaced 7 fixes in cascade: 1. driver→matrixd: {"query": ...} → {"query_text": ...} field name 2. harness temp toml missing [s3] → wrong default bucket → catalogd rehydrate 500 on first call 3. harness→queryd SQL probe: {"q": ...} → {"sql": ...} field name 4. expand boot from 5 → 10 daemons in dep-ordered launch 5. add SQL surface probe (3-row CSV ingest → COUNT(*)=3 assertion) 6. candidates corpus was synthetic SWE-tech (Swift/iOS, Scala/Spark) — wrong domain for staffing queries; replaced with ethereal_workers (10K rows, real staffing schema, "e-" id prefix to avoid collision with workers' "w-"). staffing_workers driver gains -index-name + -id-prefix flags so the same binary serves both corpora 7. local_judge qwen3.5:latest is a vision-SSM 256K-ctx build running ~30s per judge call against the lift loop; reverted to qwen2.5:latest (~1s/call, 30× faster, held lift theory) Each contract drift (1, 3) is now locked into a cmd/<bin>/main_test.go so future drift fires in `go test`, not in a reality run. R-005 closed: - cmd/matrixd/main_test.go (new) — playbook record drift detector + score bounds + 6 routes mounted - cmd/queryd/main_test.go — wrong-field-name drift detector - cmd/pathwayd/main_test.go (new) — 9 routes + add round-trip + retire - cmd/observerd/main_test.go (new) — 4 routes + invalid-op + unknown-mode `go test ./cmd/{matrixd,queryd,pathwayd,observerd}` all green. Reality test results (reports/reality-tests/playbook_lift_001.{json,md}): Queries 21 (staffing-domain, 7 categories) Discoveries 8 (judge ≠ cosine top-1) Lifts 7/8 (87.5%) Boosts triggered 9 Mean Δ distance -0.053 (warm closer than cold) OOD honesty dental/RN/SWE rated 1, no fake matches Cross-corpus boosts confirmed (e- ↔ w- swaps in lifts) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 06:22:21 -05:00
root	740eb0d00c	scrum_review: switch curl to stdin so large diffs don't blow argv Phase 4-bundle review (128KB diff) hit "Argument list too long" when curl --data was passed the body as a literal arg. Pipe via stdin with --data-binary @- instead. Lifts the practical bundle size from ~30KB to whatever fits in process memory. Caught while running the harness scrum on golangLAKEHOUSE today — the bigger Phase A+B harness diff (4566 lines) tripped it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 02:46:52 -05:00
root	e4ee0029c0	scrum_review.sh: reusable 3-lineage cross-review driver Bash driver wrapping /v1/chat for Opus + Kimi + Qwen3-coder review runs. Used today to scrum the 4-phase wave (1,624 LoC of chatd + config-refactor + Rust cleanup) and caught 2 BLOCKs + 2 WARNs. Usage: ./scripts/scrum_review.sh <bundle.diff> <bundle_label> Output: reports/scrum/_evidence/<DATE>/verdicts/<bundle>_<reviewer>.md verbatim, per the evidence-only convention. Per-reviewer latency + token counts captured in the report header. System prompt enforces the BLOCK/WARN/INFO + WHERE/WHAT/WHY shape per feedback_cross_lineage_review.md — leads with verdict, no preamble (Kimi tends to spend tokens thinking otherwise). Reviewer fleet matches project_golang_lakehouse.md "Scrum routing": - opencode/claude-opus-4-7 - openrouter/moonshotai/kimi-k2-0905 - openrouter/qwen/qwen3-coder This is the first dogfood of chatd as the scrum vehicle — eats its own /v1/chat dispatcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:29:36 -05:00
root	0efc7363c5	scrum 2026-04-30: 4 real fixes + 2 INFOs from cross-lineage review 3-lineage scrum (Opus 4.7 / Kimi K2.6 / Qwen3-coder) on today's wave landed 4 real findings (2 BLOCK + 2 WARN) and 2 INFO touch-ups. Verbatim verdicts + disposition table at: reports/scrum/_evidence/2026-04-30/ B-1 (BLOCK Opus + INFO Kimi convergent) — ResolveKey API: collapse from 3-arg (envVar, envFileName, envFilePath) to 2-arg (envVar, envFilePath). Pre-fix every chatd caller passed the env var name twice; if operator renamed *_key_env in lakehouse.toml while keeping the canonical KEY= line in the .env file, fallback silently missed. B-2 (WARN Opus + WARN Kimi convergent) — handleProviders probe: drop the synthesize-then-Resolve probe; look up by name directly via Registry.Available(name). Prior probe synthesized "<name>/probe" model strings and routed through Resolve, fragile to any future routing rule (e.g. cloud-suffix special case). B-3 (BLOCK Opus single — verified by trace + end-to-end probe) — OllamaCloud.Chat StripPrefix used "cloud" but registry routes "ollama_cloud/<m>". Result: upstream got the prefixed model name and 400'd. Smoke missed it because chatd_smoke runs without ollama_cloud registered. Now strips the right prefix; new TestOllamaCloud_StripsCorrectPrefix locks both prefix + suffix cases. Verified live: ollama_cloud/deepseek-v3.2 round-trips cleanly through the real ollama.com endpoint. B-4 (WARN Opus single) — Ollama finishReason: read done_reason field instead of inferring from done bool alone. Newer Ollama reports done=true with done_reason="length" on truncation; the prior code mapped that to "stop" and lost the truncation signal the playbook_lift judge needs to retry. New TestFinishReasonFromOllama_PrefersDoneReason covers the fallback ladder. INFOs: - B-5: replace hand-rolled insertion sort in Registry.Names with sort.Strings (Opus called the "avoid sort import" comment a false economy — correct). - A-1: clarify the playbook_lift.sh comment around -judge "" arg passing (Opus noted the comment said "env priority" but didn't reflect that the empty arg also passes through the Go driver's resolution chain). False positives dismissed (3, documented in disposition.md): - Kimi: TestMaybeDowngrade_WithConfigList wrong assertion (test IS correct per design — model excluded from weak list = strong = downgrade) - Qwen: nil-deref claim (defensive code already handles nil) - Opus: qwen3.5:latest doesn't exist on Ollama hub (true on the public hub but local install has it) just verify: PASS. chatd_smoke 6/6 PASS. New regression tests: 3 (B-2, B-3, B-4 each get a focused test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:28:08 -05:00
root	05273ac06b	phase 4: chatd — multi-provider LLM dispatcher (ollama / cloud / openrouter / opencode / kimi) new cmd/chatd on :3220 routes /v1/chat to the right provider based on model-name prefix or :cloud suffix. closes the architectural gap named in lakehouse.toml [models]: tiers map to model IDs, but until phase 4 there was no service that could actually CALL those models from go. routing rules (registry.Resolve): ollama/<m> → local Ollama (prefix stripped) ollama_cloud/<m> → Ollama Cloud <m>:cloud → Ollama Cloud (suffix variant — kimi-k2.6:cloud) openrouter/<v>/<m> → OpenRouter (prefix stripped, OpenAI-compat) opencode/<m> → OpenCode unified Zen+Go kimi/<m> → Kimi For Coding (api.kimi.com/coding/v1) bare names → local Ollama (default) provider implementations: - internal/chat/types.go Provider interface, Request/Response, errors - internal/chat/registry.go prefix + :cloud suffix dispatch - internal/chat/ollama.go local Ollama via /api/chat (think=false default) - internal/chat/ollama_cloud.go Ollama Cloud via /api/generate (Bearer auth) - internal/chat/openai_compat.go shared OpenAI Chat Completions for the OpenRouter/OpenCode/Kimi family - internal/chat/builder.go BuildRegistry from BuilderInput; ResolveKey reads env then .env file fallback config: - ChatdConfig in internal/shared/config.go with bind, ollama_url, per-provider key env names + .env fallback paths, timeout - Gateway gains chatd_url + /v1/chat + /v1/chat/* routes - lakehouse.toml [chatd] block with /etc/lakehouse/<provider>.env defaults tests (19 in internal/chat): - registry: prefix + :cloud + errors + telemetry + provider listing - ollama: happy path + prefix strip + format=json + 500 mapping + flatten_messages - openai_compat: happy path + format=json + 429 mapping + zero-choices think=false default in ollama + ollama_cloud — local hot path skips reasoning, low-budget callers (the playbook_lift judge at max_tokens=10) get direct answers instead of empty content + done_reason=length. proven via chatd_smoke acceptance. acceptance gate: scripts/chatd_smoke.sh — 6/6 PASS: 1. /v1/chat/providers lists exactly registered providers (1 in dev mode) 2. bare model → ollama default with content + token counts + latency 3. explicit ollama/<m> → prefix stripped at upstream 4. <m>:cloud without ollama_cloud registered → 404 (no silent fall-through) 5. unknown/<m> → falls through to default → upstream 502 (no prefix rewrite) 6. missing model field → 400 just verify: PASS (vet + 30 packages × short tests + 9 smokes). chatd_smoke is a domain smoke (not in just verify, mirrors matrix / observer / pathway pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:08:29 -05:00
root	848cbf5fef	phase 3: playbook_lift harness reads judge from config migrate the reality-test harness's judge-model default from a hardcoded "qwen3.5:latest" string to cfg.Models.LocalJudge. resolution priority: explicit -judge flag > $JUDGE_MODEL env > cfg.Models.LocalJudge from lakehouse.toml > hardcoded fallback. bumping the judge for run #N+1 now means editing one line in lakehouse.toml [models].local_judge — no Go file or shell script edits required. changes: - scripts/playbook_lift/main.go: -config flag added, judge default flips to "" so resolution chain runs. Imports internal/shared for config loader. - scripts/playbook_lift.sh: JUDGE_MODEL no longer defaulted in bash; EFFECTIVE_JUDGE resolved by mirror-of-the-Go-chain (env > config grep > qwen3.5:latest fallback). Used for the Ollama presence check + report header. Pre-flight grep avoids requiring jq just to read the toml. - reports/reality-tests/README.md: documents the 4-step priority chain. verified all 4 paths produce the expected judge: - config (no env): qwen3.5:latest (from lakehouse.toml) - env override: env wins - flag override: flag wins over env - missing config: DefaultConfig fallback still gives qwen3.5:latest just verify PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 23:57:28 -05:00
root	3dd7d9fe30	reality-tests: playbook-lift harness — does the 5-loop substrate beat raw cosine? First reality test driver. Two-pass design: - Pass 1 (cold): matrix.search use_playbook=false → small-model judge rates top-K → record playbook entry pointing at the highest-rated result (which may NOT be top-1 by distance — that's the discovery). - Pass 2 (warm): same queries with use_playbook=true → measure ranking shift. Lift = real if recorded answer becomes top-1. Files: - scripts/playbook_lift/main.go driver (391 LoC) - scripts/playbook_lift.sh stack-bring-up + report gen - tests/reality/playbook_lift_queries.txt query corpus (5 placeholders; J writes real 20+) - reports/reality-tests/README.md framework + interpretation - .gitignore track reports/reality-tests/ but ignore per-run JSON evidence This answers the gate from project_small_model_pipeline_vision.md: "the playbook + matrix indexer must give the results we're looking for." Without ground-truth labels, the LLM judge is the proxy — the same small-model thesis applied to evaluation. Honest about that limitation in the generated reports. Driver compiles clean; full run requires Ollama + workers/candidates ingest. Skips cleanly if Ollama absent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 23:22:36 -05:00
root	c7e3124208	§3.8 second slice: real modes wired (matrix.relevance/downgrade/search, distillation.score, drift.scorer) Lands the workflow.Mode adapters for the §3.4 components + the distillation scorer + drift quantifier. Workflows can now compose real measurement capabilities; the substrate's parallel capabilities become composable Lego bricks (per the prior commit's closing insight). Modes registered (in observerd's registerBuiltinModes): Pure-function wrappers (no I/O): - matrix.relevance → matrix.FilterChunks - matrix.downgrade → matrix.MaybeDowngrade - distillation.score → distillation.ScoreRecord - drift.scorer → drift.ComputeScorerDrift HTTP-backed: - matrix.search → POST matrixd /matrix/search (registered only when matrixd_url is set) Fixture (kept from §3.8 first slice): - fixture.echo, fixture.upper internal/workflow/modes.go: Each mode follows the same glue pattern: marshal generic input through a typed struct (free schema validation + clear error messages), call the underlying capability, return a generic output map. Roundtrip-via-JSON gives us schema validation without writing custom field-by-field coercion. internal/workflow/modes_test.go (10 tests, all PASS): - matrix.relevance filters adjacency pollution (Connector kept, catalogd::Registry dropped — same headline as the relevance smoke, run through the workflow mode) - matrix.downgrade flips lakehouse→isolation on strong model; keeps lakehouse on weak (qwen3.5:latest); errors on missing fields - distillation.score rates scrum_review attempt_1 as accepted; rejects empty record - drift.scorer reports zero drift on matched inputs; errors on empty inputs slice - matrix.search HTTP flow round-trips through httptest fake matrixd; non-OK status surfaces a clear error scripts/workflow_smoke.sh (5 assertions PASS, was 4): New assertion #5: real-mode chain matrix.downgrade (lakehouse + grok-4.1-fast → isolation) → distillation.score (scrum_review attempt_1 → accepted) Proves §3.4 components compose through the workflow runner with no fixture intermediation. Both nodes ran successfully, runner recorded provenance, status=succeeded. Mode listing assertion now expects 7 modes (5 real + 2 fixture) instead of just the fixtures. 17-smoke regression all green. SPEC §3.8 acceptance gate G3.8.D ("Mode catalog dispatches matrix.search invocation to the matrixd backend without going through HTTP") still pending — current path goes through HTTP for matrix.search, which is the cleaner service- mesh shape but slower than direct in-process. In-process dispatch when matrixd is co-resident is a future optimization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:39:26 -05:00
root	e30da6e5aa	§3.8 first slice: workflow runner skeleton + DAG executor + observerd integration Lands the structural piece of SPEC §3.8 (Observer-KB workflow runner) documented in 97dd3f8: types + DAG runner + reference substitution + provenance recording into observerd. Real-mode integrations (matrix.search, distillation.score, drift.scorer, llm.chat) come in follow-up commits — this commit proves the mechanics. internal/workflow/types.go: - Workflow / Node / NodeResult / RunResult types matching Archon's YAML shape so existing workflows (e.g. lakehouse-architect-review.yaml) load directly. Optional `mode` field added — implicit fall-back is "llm.chat" matching Archon's convention. - Mode signature: func(Context, map[string]any) (map[string]any, error) - 4 sentinel errors: ErrCycle, ErrMissingDep, ErrUnknownMode, ErrDuplicateNodeID, ErrUnresolvedRef - Validate enforces structural invariants: unique IDs, every depends_on resolves, no cycles internal/workflow/runner.go: - Kahn's-algorithm topological sort, stable for declaration-order ties (deterministic execution + JSON output across runs) - Reference substitution: $node_id.output.key.path resolves through nested maps; $node_id alone resolves to the whole output map - Skip cascade: a node whose dependency failed/skipped is skipped with explicit "upstream node X failed" error in NodeResult, never silently dropped - Per-node provenance: NodeResult.StartedAt + DurationMs captured for every execution - Mode pre-validation: every node's mode checked against registry BEFORE any node runs — typo catches in 5ms not after 6 nodes internal/workflow/runner_test.go (14 tests, all PASS): - Validate: missing name, no nodes, duplicate IDs, missing deps, cycles - Run: single node, 3-node DAG with chained $-refs (shape→weakness→improvement), failed-node skip cascade with independent siblings still running, unknown-mode abort, unresolved-reference error, implicit llm.chat fallback, provenance fields populated, inputs (not just prompt) honor $-refs, topological-sort stability for ties cmd/observerd extended: - POST /observer/workflow/run executes a workflow, records each node's execution as an ObservedOp (source="workflow"), returns the full RunResult - GET /observer/workflow/modes lists the registered mode names - registerBuiltinModes wires fixture.echo + fixture.upper for v0; real modes register here in follow-up commits scripts/workflow_smoke.sh (4 assertions PASS): - GET /modes lists fixture.echo + fixture.upper - 3-node DAG executes: shape (uppercase "hello world") → weakness (sees "HELLO WORLD" via $shape.output.upper ref) → improvement (sees "HELLO WORLD" propagated through 2-hop $weakness.output.prompt) - /observer/stats shows by_source.workflow == 3 (one per node) and total == 3 — provenance lands as expected - Unknown mode → 400 with "unknown mode" in error body 17-smoke regression all green. Acceptance gates G3.8.A (Archon-shape workflow loads + executes topologically) + G3.8.B (per-node ObservedOps) + G3.8.C ($prior_node.output ref resolves, error on missing ref) all satisfied. G3.8.D (in-process matrix.search dispatch) deferred until a real mode is wired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:34:30 -05:00
root	bc9ab93afe	H: observerd — autonomous-iteration witness loop (SPEC §2 port) Port of the load-bearing pieces of mcp-server/observer.ts (Rust system, 852 lines TS) per SPEC §2's named target. Implements PRD loop 3 ("Observer loop — watches each run, refines configs"). Routes (all under /v1/observer/* via gateway): GET /observer/health — liveness GET /observer/stats — total / successes / failures / by_source / recent_scenario_ops (matches Rust JSON shape exactly) POST /observer/event — record one ObservedOp; auto-defaults timestamp + source, validates required fields (endpoint), persists to JSONL, appends to ring buffer Architecture: - internal/observer/types.go — ObservedOp model + Source taxonomy (mcp / scenario / langfuse / overseer_correction). Mirrors the Rust shape so JSON round-trips during cutover. - internal/observer/store.go — Store + Persistor. Ring buffer cap matches Rust's 2000; recent_scenarios cap matches Rust's 10. Same persist-then-apply order as pathwayd; same corruption- tolerant replay (skip malformed lines + warn). - cmd/observerd — :3219 HTTP service, fronted by gateway as /v1/observer/*. - lakehouse.toml + DefaultConfig — [observerd] block matches the pathwayd pattern (Bind + PersistPath; empty path = ephemeral). Tests + smoke (all PASS): - 7 unit tests in store_test.go: validation, default fields, stats aggregation, recent-scenarios cap + ordering, ring-buffer rollover at cap, JSONL round-trip persistence, corruption- tolerant replay (1 valid + 1 corrupt + 1 valid → 2 applied) - scripts/observer_smoke.sh: 4 assertions through gateway — record 5 events (3 ok / 2 fail across 2 sources), stats aggregates correctly, empty-endpoint→400, kill+restart preserves via JSONL replay (5 ops, 3 ok, 2 err survive) Deferred (named in package + cmd doc, not in this commit): - POST /observer/review (cloud-LLM hand-review fall-back). The heuristic-only path could land cheaply but the productized cloud path (qwen3-coder fall-back) is multi-day port. - Background loops: analyzeErrors, consolidatePlaybooks, tailOverseerCorrections (read overseer_corrections.jsonl into the ring buffer once per cycle). - escalateFailureClusterToLLMTeam (failure clustering trigger that posts to LLM Team's /api/run with code_review mode). /relevance is NOT duplicated — already ported in 9588bd8 to internal/matrix/relevance.go (component 3 of SPEC §3.4). 16-smoke regression all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance, downgrade, playbook, observer). 13 binaries now: gateway, storaged, catalogd, ingestd, queryd, vectord, embedd, pathwayd, matrixd, observerd, mcpd, fake_ollama (plus catalogd-only test build). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:18:02 -05:00
root	6392772f41	C: bulk playbook record — operational rating wiring POST /v1/matrix/playbooks/bulk accepts an array of playbook entries and records each independently — failures per-entry don't abort the batch. Designed for two operational use cases: 1. Backfilling historical placement data into the playbook substrate (the Rust system has 4,701 fill operations recorded with embeddings; that data deserves to feed the Go learning loop without a 4,701-call procedural script). 2. Batched click-tracking from a session's worth of coordinator interactions, posted once at idle rather than per-click. Per-entry response shape: {index, playbook_id} on success or {index, error} on failure. Caller can inspect failures without diffing. Smoke (scripts/playbook_smoke.sh, new assertion #4): Bulk POST 3 entries: 2 valid (alpha→widget-a, bravo→widget-b) + 1 invalid (empty query_text). Verifies recorded=2, failed=1, the 2 valid ones get playbook_ids back, and the invalid one surfaces its validation error in-line. Single-record /matrix/playbooks/record from 06e7152 still works unchanged; bulk is additive. The corpus field can be set per- entry or once at the batch level (entry-level wins on collision). Per the small-model autonomous pipeline framing: this is the "the playbook gets denser with each iteration" mechanism. Click tracking → bulk POST → playbook entries → future similar queries get those answers boosted via the existing /matrix/search use_playbook path. The learning loop now has both inflows wired (single + bulk) — what remains is the demo UI shim that calls /feedback on result interaction (deferred — no Go demo UI yet). 15-smoke regression all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:10:13 -05:00
root	b199093d1f	B: matrix metadata filter — post-retrieval structured gate Addresses the reality-test gap surfaced by the candidates and multi-corpus e2e runs (0d1553c, a97881d): semantic-only retrieval can't gate by status / state / availability. SearchRequest now takes an optional MetadataFilter map; results whose metadata doesn't match every key are dropped before top-K truncation. Filter value semantics: string\|number\|bool → exact equality (JSON-canonical, so 1 ≡ 1.0) []any → OR within key (any element matching wins) AND across keys: every filter key must match. Missing key in metadata = drop. Malformed metadata = drop. Filter absent or empty = pass through (zero overhead). The response now reports MetadataFilterDropped so callers can see how aggressive the filter was without re-querying. Caveat (also captured in code comment): this is POST-retrieval, not PRE-filtering via SQL. Aggressive filters can shrink the result set below K; caller should bump PerCorpusK to compensate. A queryd- backed pre-filter is a future commit; this lands the user-visible fix today. Tests: - 7 unit tests (internal/matrix/filter_test.go) covering: nil/ empty filter pass-through, missing-metadata always-fails, single-value exact match (incl. numeric 5 ≡ 5.0), AND across keys, OR within list, bool match, malformed JSON metadata - matrix_smoke.sh: new assertion #7 — filter label∈{"a near","b near"} drops the 4 mid/far entries from the 6-entry pool, keeping exactly 2 (one per corpus, both with the matching label). Dropped count surfaces in the response. 15-smoke regression all green. vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:08:56 -05:00
root	7f42089521	D: embed-text iteration — clean negative finding (3 variants tested) Workers driver embed text reverted to V0 after testing 3 variants on the "Forklift operator with OSHA-30 certification, warehouse experience" reality-test query against 5000 workers (which contains 569 actual Forklift Operators per the 31b4088 probe). V0 (current, restored): "Worker role: <role>. Skills: ... Certifications: ... <resume_text>" → 6 workers in top-8, 0 Forklift Ops, top distance 0.327, top role "Production Worker" V4a (role-doubled): "<role>. <role> with <skills>. ..." drop archetype + resume_text → 6 workers in top-8, 0 Forklift Ops, top distance 0.254, top role "Production Worker" V4b (resume-only): just the resume_text natural-language sentence, no structured prefix → 4 workers in top-8 (WORSE mix — software-engineer candidates filled the displaced slots), 0 Forklift Ops, top distance 0.379 Conclusion: all three variants surface Production Workers / Machine Operators / Line Leads ABOVE Forklift Operators for this query. The 569 actual Forklift Operators in the 5000-row sample don't appear in any top-8. Embed-text design isn't the bottleneck — nomic-embed-text 137M's geometry doesn't separate "Forklift Operator" from "Production Worker" / "Machine Operator" / "Line Lead" in this query's neighborhood. Real fixes belong elsewhere: - Hybrid SQL+semantic (B): pre-filter by role/certs via queryd before semantic ranking. Addresses the gap directly. - Different embedding model: mxbai-embed-large or a staffing- fine-tuned model. Costs an Ollama model swap + re-embedding. - Playbook boost (component 5, already shipped): record successful Forklift placements; future queries surface those workers via similarity. Compounds with use. V0 restored because it has the best worker/candidate mix in top-8 (6 vs 4 in V4b), preserving the multi-corpus reality-test signal quality even if the role match is imperfect. Comments updated to record the experiment so future sessions don't relitigate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:58:39 -05:00
root	a730fc2016	scrum fixes: 4 real findings landed, 4 false positives dismissed Cross-lineage scrum review on the 12 commits of this session (afbb506..06e7152) via Rust gateway :3100 with Opus + Kimi + Qwen3-coder. Results: Real findings landed: 1. Opus BLOCK — vectord BatchAdd intra-batch duplicates panic coder/hnsw's "node not added" length-invariant. Fixed with last-write-wins dedup inside BatchAdd before the pre-pass. Regression test TestBatchAdd_IntraBatchDedup added. 2. Opus + Kimi convergent WARN — strings.Contains(err.Error(), "status 404") was brittle string-matching to detect cold- start playbook state. Fixed: ErrCorpusNotFound sentinel returned by searchCorpus on HTTP 404; fetchPlaybookHits uses errors.Is. 3. Opus WARN — corpusingest.Run returned nil on total batch failure, masking broken pipelines as "empty corpora." Fixed: Stats.FailedBatches counter, ErrPartialFailure sentinel returned when nonzero. New regression test TestRun_NonzeroFailedBatchesReturnsError. 4. Opus WARN — dead var _ = io.EOF in staffing_500k/main.go was justified by a fictional comment. Removed. Drivers (staffing_500k, staffing_candidates, staffing_workers) updated to handle ErrPartialFailure gracefully — print warn, keep running queries — rather than fatal'ing on transient hiccups while still surfacing the failure clearly in the output. Documented (no code change): - Opus WARN: matrixd /matrix/downgrade reads LH_FORCE_FULL_ENRICHMENT from process env when body omits it. Comment now explains the opinionated default and points callers wanting deterministic behavior to pass the field explicitly. False positives dismissed (caught and verified, NOT acted on): A. Kimi BLOCK on errors.Is + wrapped error in cmd/matrixd:223. Verified false: Search wraps with %w (fmt.Errorf("%w: %v", ErrEmbed, err)), so errors.Is matches the chain correctly. B. Kimi INFO "BatchAdd has no unit tests." Verified false: batch_bench_test.go has BenchmarkBatchAdd; the new dedup test TestBatchAdd_IntraBatchDedup adds another. C. Opus BLOCK on missing finite/zero-norm pre-validation in cmd/vectord:280-291. Verified false: line 272 already calls vectord.ValidateVector before BatchAdd, so finite + zero- norm IS checked. Pre-validation is exhaustive. D. Opus WARN on relevance.go tokenRe (Opus self-corrected mid-finding when realizing leading char counts toward token length). Qwen3-coder returned NO FINDINGS — known issue with very long diffs through the OpenRouter free tier; lineage rotation worked as designed (Opus + Kimi between them caught everything Qwen would have). 15-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance, downgrade, playbook). Unit tests all green (corpusingest +1, vectord +1). Per feedback_cross_lineage_review.md: convergent finding #2 (404 detection) is the highest-signal one — both Opus and Kimi flagged it independently. The other Opus findings stand on single-reviewer signal but each one verified against the actual code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:42:39 -05:00
root	06e71520c4	matrix: playbook memory + boost — SPEC §3.4 component 5 of 5 (LEARNING LOOP) Closes SPEC §3.4. The matrix indexer is now a learning meta-index per feedback_meta_index_vision.md — every successful (query → answer) pair recorded via /matrix/playbooks/record boosts that answer for future similar queries. This is the architectural piece that lifts vectord from "static hybrid search" to the meta-index J originally framed in Phase 19 of the Rust system. What's new: - internal/matrix/playbook.go — PlaybookEntry, PlaybookHit, ApplyPlaybookBoost. Pure-function boost math: distance' = distance * (1 - 0.5 * score) Score 0 = no boost (factor 1.0); score 1 = halve distance (factor 0.5). Capped at 0.5 deliberately so a single high- confidence playbook can't dominate the base ranking forever (runaway-feedback-loop guard). - Retriever.Record(entry, corpus) — embeds query_text, ensures playbook corpus exists (idempotent), upserts via deterministic sha256-derived ID (last score wins on re-record of same triple). - Retriever.Search extended with UsePlaybook + PlaybookCorpus + PlaybookTopK + PlaybookMaxDistance. Reuses the query vector — no extra embed call. Missing-corpus 404 = no-op (cold-start state before any Record call), not an error. - POST /v1/matrix/playbooks/record (matrixd) — caller submits {query_text, answer_id, answer_corpus, score, tags?}; gets {playbook_id} back. Storage: a vectord index named "playbook_memory" (configurable per request) with embed(query_text) as the vector and the PlaybookEntry JSON as metadata. Just another corpus — observable from /vectors/index, persistable through G1P, etc. Match key for boost: (AnswerID, AnswerCorpus). Cross-corpus ID collisions don't false-match — verified by TestApplyPlaybookBoost_CorpusAttributionRespected. End-to-end smoke (scripts/playbook_smoke.sh, all assertions PASS): - Baseline search: widget-c at distance 0.6566 (rank 3) - Record playbook: query → widget-c, score=1.0 - Re-search with use_playbook=true: widget-c distance: 0.3283 (rank 2) ratio: 0.5 EXACTLY (matches boost math precisely) playbook_boosted: 1 - widget-c jumped from #3 to #2 — learning loop visible Tests: - 8 unit tests in internal/matrix/playbook_test.go covering Validate, BoostFactor (5 cases), the no-boost identity, the boost-moves-result-up scenario, highest-score wins on duplicate matches, cross-corpus attribution, JSON round-trip, and rejection of empty metadata - scripts/playbook_smoke.sh integration test (3 assertions PASS) 15-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance, downgrade, playbook). SPEC §3.4 NOW COMPLETE: 5 of 5 components shipped. The matrix indexer's port is done as a substrate; remaining work is operational (rating signal sources, telemetry, eventual structured filtering for staffing data — none in §3.4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:34:24 -05:00
root	31b408882b	multi_corpus_e2e: WORKERS_LIMIT env knob — and the embed-text-not-sample-size finding Adds WORKERS_LIMIT env override (default 5000) so the e2e can be re-run at different sample sizes. Tiny change; the interesting part is the FINDING that motivated the run. Investigation: a97881d's reality test put zero Forklift Operators in the top-6 for "Forklift operator with OSHA-30 certification, warehouse experience" — instead returned Production Worker / Machine Operator / Assembler. Hypothesis tested: maybe the 5000-row sample didn't contain forklift operators in retrievable density. Result: hypothesis falsified. Direct probe of workers_500k.parquet: All 500K rows → 55,349 Forklift Operators (11.07%) → 150,328 with "forklift" in certs → 74,852 with OSHA-30 specifically First 5K rows → 569 Forklift Operators (11.38%) → distribution matches global, no ordering bias So 569 forklift operators were IN the corpus the matrix indexer searched and STILL didn't surface in top-6. That means the bottleneck isn't sample size — it's nomic-embed-text + our embed-text template ranking "Production Worker" / "Machine Operator" / "Assembler" as semantically nearer to the query than literal "Forklift Operator". The reality test exposed this faithfully. Three real follow-ups, none in scope of this commit: 1. Embed text design — front-loading role + certs (currently "Worker role: <role>" then skills then certs) might dominate retrieval better. Worth A/B-testing. 2. Hybrid SQL+semantic — pre-filter by role/certs via queryd before semantic ranking. Not in SPEC §3.4 today; would address the "available" / "Chicago" gap from the candidates reality test (0d1553c) too. 3. Playbook-memory boost — SPEC §3.4 component 5. When a query "Forklift OSHA-30" was answered with worker w-X in the past, boost w-X's score for similar future queries. The retrieval gap CAN be bridged by the learning loop without changing the base embedder. Commits the env knob; the finding lives in the commit body so future sessions don't re-run the sample-size hypothesis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:26:32 -05:00
root	a97881d80c	workers corpus + multi-corpus reality test — matrix indexer end-to-end Lands the second real-data corpus (workers_500k) and the first multi-corpus reality test through /v1/matrix/search composing both corpora live. What's new: - scripts/staffing_workers/main.go — parquet driver over workers_500k.parquet, multi-chunk arrow handling (workers parquet has multiple row groups vs candidates' one). Embed text: role + skills + certifications + city + state + archetype + resume_text. IDs prefixed "w-". - scripts/multi_corpus_e2e.sh — first end-to-end test composing both corpora through the matrix indexer. Real-data multi-corpus result (this commit): Query: "Forklift operator with OSHA-30 certification, warehouse experience" Corpora: workers (5000 rows) + candidates (1000 rows) Merged top-8: workers=6, candidates=2 Top hits: w d=0.327 w-4573 Production Worker w d=0.353 w-1726 Machine Operator w d=0.362 w-3806 Production Worker w d=0.366 w-1000 Machine Operator w d=0.374 w-1436 Assembler w d=0.395 w-162 Machine Operator c d=0.440 c-CAND-00727 C#,.NET,Azure c d=0.446 c-CAND-00031 React,TypeScript,Node The matrix indexer correctly chose the right domain — manufacturing/ warehouse roles in workers (correct semantic match for the staffing query) rank ABOVE software-engineer candidates from the candidates corpus. 0.11 gap between the worst worker (0.395) and the best candidate (0.440) — clean distance separation. Compared to the candidates-only e2e run from 0d1553c: candidates-only top: c-CAND-00727 at d=0.4404 multi-corpus top: w-4573 at d=0.3265 (a Production Worker) That's the matrix indexer's whole point made visible: composing domain-distinct corpora surfaces better matches than single-corpus search. Without workers in the search space, the staffing query returned software engineers (wrong domain). With workers, it returns roles in the right ballpark. What's still imperfect (signal for component 5 + future work): - No top-6 worker actually has "Forklift" or "OSHA-30" visible in metadata; "Production Worker" is semantically nearest in this sample. Likely needs a larger workers ingest (5000 from 500K) or skill-keyword boost. - Status/availability still not gated. The staffing-side structured filtering gap from 0d1553c persists; relevance filter (CODE-aware) doesn't address it. Pipeline timings: workers ingest: 5000 rows / 19.2s = 260/sec end-to-end candidates ingest: 1000 rows / 3.1s = 322/sec multi-corpus query (text → embed → 2 parallel vectord → merge): 14ms 14-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance, downgrade). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:22:16 -05:00
root	3968ec8a7b	matrix: strong-model downgrade gate — SPEC §3.4 component 4 of 5 Pure-Go port of mode.rs::execute's pass5 downgrade gate (Rust 2026-04-26). Adds POST /v1/matrix/downgrade endpoint via matrixd. The gate captures the pass5 finding: composing matrix corpora into codereview_lakehouse on a strong model LOST 5/5 head-to-head reps against matrix-free codereview_isolation on grok-4.1-fast (p=0.031). Strong models have enough native capacity that bug fingerprints + adversarial framing + file content carry them; matrix chunks displace depth-of-analysis. Logic (matches Rust mode.rs:614-632): if mode == codereview_lakehouse && !forced_mode && !LH_FORCE_FULL_ENRICHMENT && !is_weak_model(model) → flip to codereview_isolation, record downgraded_from is_weak_model captures the empirical weak-list: - `:free` suffix or `:free/` infix (OpenRouter free tier) - qwen3.5:latest, qwen3:latest (local last-resort rungs) - everything else → strong by default Tests: - 3 unit tests in internal/matrix/downgrade_test.go: IsWeakModel coverage, MaybeDowngrade truth table (5 rows), forced-mode precedence (forced beats every other bypass) - scripts/downgrade_smoke.sh: 6 assertions through gateway covering all 5 truth-table rows + empty-mode 400 14-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance, downgrade). SPEC §3.4 progress: 4 of 5 components shipped (corpus builders, multi-corpus retrieve+merge, relevance filter, downgrade gate). Last component is learning-loop integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:17:55 -05:00
root	9588bd82ae	matrix: relevance filter — SPEC §3.4 component 3 of 5 Faithful port of mcp-server/relevance.ts (Rust observer's adjacency- pollution filter). Same 5-signal scoring, same default threshold 0.3. Adds POST /v1/matrix/relevance endpoint via matrixd. Scoring signals (additive, can sign-flip): path_match +1.0 chunk source/doc_id encodes focus.path filename_match +0.6 chunk text mentions focus's filename defined_match +0.6 chunk text mentions focus.defined_symbols token_overlap +0.4 jaccard of non-stopword tokens prefix_match +0.3 chunk source shares first-2-segment prefix import_penalty -0.5 mentions ONLY imported symbols, no defined ones What this does and doesn't do: - DOES filter code-aware corpora (eventually lakehouse_arch_v1, lakehouse_symbols_v1, scrum_findings_v1) — drops chunks about code the focus file IMPORTS rather than DEFINES, the "adjacency pollution" pattern that makes a reviewer LLM hallucinate imported-crate internals as belonging to the focus - DOES NOT meaningfully filter staffing data — the candidates reality test 2026-04-29 had "exact skill match buried at #3" which is a different problem (semantic-only ranking dominated by secondary text). Staffing needs structured filtering (status gates, location gates) that lives outside this package — future work, not in SPEC §3.4 yet Headline smoke assertion: focus = crates/queryd/src/db.go which defines Connector and imports catalogd::Registry. The filter scores: Connector chunk: +0.68 (defined_match fires, kept) Registry chunk: -0.46 (import_only penalty fires, dropped) unrelated junk: 0.00 (no signals, dropped) That's a 1.14-point gap between what we ARE and what we IMPORT — the entire purpose of the filter. Tests: - 9 unit tests in internal/matrix/relevance_test.go covering Tokenize, Jaccard, ExtractDefinedSymbols (Rust + TS), ExtractImportedSymbols, FilePrefix, ScoreRelevance per-signal, FilterChunks threshold splitting, and the headline AdjacencyPollutionScenario - scripts/relevance_smoke.sh integration smoke (3 assertions PASS): adjacency-pollution scenario, empty-chunks 400, threshold honored 13-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:13:22 -05:00
root	0d1553ca88	candidates corpus: first deep-field reality test on real staffing data Lands the second staffing corpus and the first end-to-end reality test through the full Go pipeline: parquet → corpusingest → embedd → vectord → matrixd → gateway. What's new: - scripts/staffing_candidates/main.go — parquet Source over candidates.parquet (1000 rows, 11 cols), single-chunk arrow-go pqarrow read. Embed text: "Candidate skills: <s>. Based in <city>, <state>. <years> years experience. Status: <status>. <first> <last>." IDs prefixed "c-" so multi-corpus merges against workers ("w-") stay unambiguous. - scripts/candidates_e2e.sh — first integration smoke that runs the full stack (storaged + embedd + vectord + matrixd + gateway), ingests via corpusingest, runs a real query through /v1/matrix/search, prints results. Ephemeral mode (vectord persistence disabled via custom toml) so re-runs don't pollute MinIO _vectors/ and break g1p_smoke's "only-one-persisted-index" assertion. Real bug caught + fixed in corpusingest: When LogProgress > 0, the progress goroutine's only exit was ctx.Done(). With context.Background() in the production driver, Run hung forever after the pipeline finished. Added a stopProgress channel that close()s after wg.Wait(). Regression test TestRun_ProgressLoggerExits bounds Run's wall to 2s with LogProgress=50ms. This is the bug the unit tests didn't catch because every prior test set LogProgress: 0. Reality test surfaced it on first real-data run — exactly the hyperfocus-and-find-architectural-weakness property J framed as the reason for the Go pass. End-to-end output (1000 candidates, query "Python AWS Docker engineer in Chicago available now"): populate: scanned=1000 embedded=1000 added=1000 wall=3.5s matrix returned 5 hits in 26ms The result quality is the interesting signal: top-5 had ZERO Chicago candidates, ZERO active-status candidates, and the exact- skill-match (Python,AWS,Docker) ranked #3 not #1. Pipeline works; retrieval quality has real architectural limits (no structured filtering, no relevance gate, semantic-only ranking dominated by secondary signals like "1 year experience" and "engineer"). This motivates SPEC §3.4 components 3 (relevance filter) and eventually structured filtering — exactly the kind of finding the deep field reality tests are supposed to surface before Enterprise cutover. 12-smoke regression sweep all green. 9 corpusingest unit tests including the new regression. vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:06:27 -05:00
root	166470f532	corpusingest: extract reusable text→vector ingest pipeline Generalizes the staffing_500k driver's embed-and-push loop into internal/corpusingest. Per docs/SPEC.md §3.4 component 1 (corpus builders): adding a new staffing/code/playbook corpus is now one Source impl + one main.go calling Run, not 200 lines of pipeline copy-paste. API: type Source interface { Next() (Row, error) } func Run(ctx, Config, Source) (Stats, error) Library owns: - Index lifecycle (create, optional drop-existing, idempotent reuse on 409) - Parallel embed dispatcher (configurable workers + batch size) - Vectord push batching - Progress logging + Stats reporting - Partial-failure semantics (log + continue per-batch errors; operator decides on re-run via Stats.Embedded vs Scanned delta) Per-corpus driver owns: source parsing + column→Row mapping + post-ingest validation queries. Refactor scripts/staffing_500k/main.go to use it. Driver is now ~190 lines (was 339), with the embed/add plumbing replaced by one Run call. -drop flag added so callers can opt out of the destructive DELETE-first behavior (default still true to keep the 500K test clean-recall semantics). Unit tests (internal/corpusingest/ingest_test.go, 8/8 PASS): - Pipeline shape: 50 rows / 16 batch → 4 embed + 4 add calls, every ID added exactly once, vectors at correct dimension - DropExisting fires DELETE - 409 on create → reuse existing index - Limit stops early - Empty Text rows skipped (counted as scanned, not added) - Required IndexName + Dimension validation - Context cancel stops mid-pipeline Real bug caught and fixed by the test suite: if embedd ever returns fewer vectors than texts in the request (degraded backend), the addBatch loop would panic with index-out-of-range. Worker now length-checks the response and logs+skips on mismatch. 12-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix). vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:47:18 -05:00
root	c1d96b7b60	matrixd: multi-corpus retrieve+merge — SPEC §3.4 component 2 of 5 Lands the matrix indexer's first piece per docs/SPEC.md §3.4: multi-corpus retrieve+merge with corpus attribution per result. Future components (relevance filter, downgrade gate, learning-loop integration) layer on top of this surface. Architecture: - internal/matrix/retrieve.go — Retriever takes (query, corpora, k, per_corpus_k), parallel-fans across vectord indexes, merges by distance ascending, preserves corpus origin per hit - cmd/matrixd — HTTP service on :3217, fronts /v1/matrix/* - gateway proxy + [matrixd] config + lakehouse.toml entry - Either query_text (matrix calls embedd) or query_vector (caller pre-embedded) — vector takes precedence if both set Error policy: fail-loud on any corpus error. Silent partial returns would lie about coverage, defeating the matrix's whole purpose. Bubbles vectord errors as 502 (upstream), validation as 400. Smoke (scripts/matrix_smoke.sh, 6 assertions PASS first try): - /matrix/corpora lists indexes - Multi-corpus search returns hits from BOTH corpora - Top hit is the globally-closest across all corpora (b-near beats a-near at distance 0.05 vs 0.1 — proves merge) - Metadata round-trips through the merge - Distances ascending in result list - Negative paths: empty corpora → 400, missing corpus → 502, no query → 400 12-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:39:17 -05:00
root	afbb506dbc	pathwayd: HTTP service over internal/pathway · 11/11 smoke gate Network-callable Mem0-style trace memory at :3217, fronted by gateway /v1/pathway/*. Closes the ADR-004 wire-up: store substrate landed in 2a6234f, this lands the HTTP surface + [pathwayd] config + acceptance gate. Smoke proves the architecturally distinctive properties: Revise → History walks the predecessor chain backward (audit trail), Retire excludes from Search default but stays Get-able, AddIdempotent bumps replay_count without replacing — and all survive kill+restart via JSONL log replay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 17:49:42 -05:00
root	ad1670d36a	storaged cap smoke — verifies ADR-002 at 300 MiB Closes the "needs heavy integration smoke" follow-up from the ADR-002 commit (423a381). Until now the per-prefix PUT cap was verified only by unit tests + commits' theory; this smoke runs the actual cap path with real bytes. Three assertions, ~2s wall: 1. PUT 300 MiB to _vectors/<key> → 200 (cap raised to 4 GiB for the vectord persistence prefix). 2. PUT same 300 MiB to datasets/<key> → 413 (default 256 MiB cap still protects routine traffic). 3. GET _vectors/<key> → sha256 round-trips (no truncation between cap-raise and S3 multipart streaming). scripts/storaged_cap_smoke.sh Builds storaged + gateway, boots them, generates 300 MiB deterministic /dev/zero payload (sha stable across runs), runs the 3 assertions, cleans up the keys + processes via trap. /dev/zero generation chosen over yes/head pipe — pipefail catches the SIGPIPE from yes when head closes early. just smoke-storaged-cap Wrapper recipe. Outside the main `just verify` chain because 300 MiB payload generation + transfer is MB-heavy. Run after meaningful storaged or vectord-persistence changes. Verified: bash scripts/storaged_cap_smoke.sh — 3/3 PASS · 2s wall just verify — vet + test + 9 smokes still 33s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 07:14:57 -05:00
root	fb08232f58	Batch 4: embed fixture-mode — partial R-006 closure Adds cmd/fake_ollama, a minimal Ollama-API-compatible fake that implements just enough surface for embedd to drive end-to-end without a real Ollama install: GET /api/tags — fixed model list including nomic-embed-text POST /api/embeddings — deterministic dim-D vector from sha256(prompt) GET /health — for the smoke's poll_health helper Same prompt → bit-identical vector across runs, machines, and CI nodes. Vectors are NOT semantically meaningful; the fake validates the embed CONTRACT (dimension echo, response shape, status codes, deterministic round-trip), not real semantic ranking. Real ranking still requires real Ollama and lives in scripts/g2_smoke.sh + the integration tier of the proof harness. scripts/g2_smoke_fixtures.sh — full chain smoke against the fake: - Build fake_ollama + embedd + vectord + gateway - Start fake on :11435 (distinct from real Ollama at :11434) - Generate temp lakehouse.toml with provider_url override - Boot embedd/vectord/gateway with --config <override> - 4 assertions: dim=768, deterministic same-text, different-text divergence, bad-model → 4xx/5xx (fake 404 → embedd 502) - Trap-cleanup tears down all 4 binaries + tmp config Wired into the task runner: just smoke-g2-fixtures Closes R-006 partially: - Embed half: ✓ — CI / fresh-clone reviewers without Ollama can now run the embed contract smoke - Storage half: deferred — mocking S3 protocol is non-trivial (multipart, signed URLs, etc.) and MinIO itself is lightweight enough to install via Docker in any CI environment. Documented as Sprint 0 follow-up if a CI system without Docker shows up. What this DOESN'T cover: - Real semantic similarity (use scripts/g2_smoke.sh + real Ollama) - Real Ollama API quirks (timeouts, version-specific shapes, /api/embed batch endpoint that newer versions support) Verified: bash scripts/g2_smoke_fixtures.sh — 4/4 assertions PASS, ~3s wall just verify — vet + test + 9 smokes still green Doesn't replace the existing g2_smoke.sh (which still requires real Ollama and exercises the actual embed semantics). Adds an alternate mode for portability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:22:07 -05:00
root	e31638204d	S0.3: just verify + pre-push hook gates the smoke chain Sprint 0 / R-004 / GATE-0.4 — the 9-smoke chain is no longer documentation only. One command (`just verify`) runs vet + tests + all 9 smokes; pre-push hook calls it; a regression cannot leave this machine without explicit --no-verify override. Recipes: just verify full gate (33s wall on this box) just smoke <day> single smoke (d1..d6, g1, g1p, g2) just smoke-all all 9 smokes only just doctor dep probe with structured output (--json for CI / pre-push) just install-hooks install .git/hooks/pre-push just fmt\|vet\|test\|build\|clean scripts/doctor.sh probes Go ≥1.25, gcc, MinIO at :9000 with bucket lakehouse-go-primary, Ollama at :11434 with nomic-embed-text loaded, /etc/lakehouse/secrets-go.toml with [s3.primary]. Each missing dep prints its install fix command. JSON mode emits the same shape for CI / pre-push consumers. README updated with the task-runner section + just install-hooks on cold-start. Hooks live in .git/hooks/ (untracked); install recipe recreates them on a fresh clone. PATH note: justfile prepends /usr/local/go/bin so recipes find Go without depending on the parent shell's PATH (ADR-001 §1.x lives go there). Verified: just verify exits 0 in 33s wall (vet ~0.1s + test ~0.1s + 9 smokes deterministic per audit baseline). Pre-push hook installed and bash -n clean. Closes audit risk R-004 (smokes not gated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 04:56:50 -05:00
root	1f700e731d	Staffing scale test: full 500K through gateway → embedd → vectord pipeline scripts/staffing_500k/main.go: driver that reads workers_500k.csv, embeds combined-text per worker via /v1/embed, adds to vectord index "workers_500k", runs canonical staffing queries against the populated index. Reproducible end-to-end test of the staffing co-pilot pipeline at production scale. Run results (2026-04-29 ~02:30): 500,000 vectors ingested in 35m 36s (~234/sec avg) vectord peak RSS 4.5 GB (~9 KB/vector incl. HNSW graph) Query latency: embed 40-59ms + search 1-3ms = ~50ms end-to-end GPU avg ~65% (Ollama not the bottleneck — vectord Add is) Semantic recall on canonical queries: "electrician with industrial wiring": top 2 are literal Electricians (d=0.30) "CNC operator with first article": Assembler / Quality Techs (adjacent, d=0.24) "forklift driver OSHA-30": warehouse roles (d=0.33) "warehouse picker night shift bilingual": Material Handlers (d=0.31) "dental hygienist": Production Workers at d=0.49+ — correctly LOW-similarity, signals "no dental hygienists in this manufacturing dataset" rather than hallucinating a fake match. Documented gaps: - storaged's 256 MiB PUT cap blocks single-file LHV1 persistence above ~150K vectors at d=768. Test ran with persistence disabled. - vectord Add is RWMutex-serialized — with GPU at 65% util this is the throughput cap. Concurrent Adds would be 2-3x faster but require careful audit of coder/hnsw thread-safety (G1 scrum documented two known quirks). PHASE_G0_KICKOFF.md gains a "Staffing scale test" section with full metrics + the gaps-surfaced list. The architectural payoff is real: six binaries, one HTTP route, ~50ms from text query to top-K semantically-relevant workers across 500K records. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 02:31:30 -05:00
root	9ee7fc5550	G2: embedd — text → vector via Ollama · 2 scrum fixes Bridges the missing piece for the staffing co-pilot: text inputs to vectord-shaped vectors. Standalone cmd/embedd on :3216 fronted by gateway at /v1/embed. Pluggable embed.Provider interface (G2 ships Ollama; OpenAI/Voyage swap in via the same interface in G3+). Wire format: POST /v1/embed {"texts":[...], "model":"..."} // model optional → 200 {"model","dimension","vectors":[[...]]} Default model: nomic-embed-text (768-d). Ollama returns float64; provider converts to float32 at the boundary so vectors flow through vectord/HNSW without re-conversion. Acceptance smoke 5/5 PASS — including the architectural payoff: end-to-end embed → vectord add → search by re-embedded text returns recall=1 at distance 5.96e-8 (float32 precision noise on identical unit vectors). The staffing co-pilot pipeline (text → vector → similarity search) is now functional end-to-end. All 9 smokes (D1-D6 + G1 + G1P + G2) PASS deterministically. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 0 BLOCK + 4 WARN + 3 INFO - Kimi K2-0905 (openrouter): 0 BLOCK + 2 WARN + 1 INFO - Qwen3-coder (openrouter): "No BLOCKs" (3 tokens) Fixed (2 — 1 convergent + 1 single-reviewer): C1 (Opus + Kimi convergent WARN): per-text 60s timeout × N-text batch was up to N×60s with no batch-level cap. One stuck Ollama call would stall the whole handler indefinitely. Fix: context.WithTimeout(r.Context(), 60s) wraps the entire batch. O-W3 (Opus WARN): empty strings in texts went to Ollama unchecked, producing version-dependent garbage. Fix: reject "" with 400 at the handler boundary so callers get a deterministic answer instead of an upstream-conditional 502. Deferred (4): drainAndClose 64KiB cap (matches G0 pattern), no concurrency limit on /embed (single-tenant G2), missing Accept header (exotic-proxy concern), MaxBytesError string-match redundancy (paranoia layer kept consistent across codebase). Zero false positives this round — Qwen returned 3 tokens "No BLOCKs" and the other two reviewers' findings were all real. Setup confirmed: Ollama 0.21.0 on :11434 with nomic-embed-text loaded. Per-text /api/embeddings used (forward-compat with 0.21+); newer 0.4+ /api/embed batch endpoint can swap in via the Provider interface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 01:42:27 -05:00
root	8b92518d21	G1P: vectord persistence to storaged + scrum (3 fixes incl. 3-way convergent) Adds optional persistence to vectord (G1's HNSW vector search). Single- file framed format per index — eliminates the torn-write class that the 3-way convergent scrum finding identified: _vectors/<name>.lhv1 — single binary blob: [4 bytes magic "LHV1"] [4 bytes envelope_len uint32 BE] [envelope bytes — JSON params + metadata + version] [graph bytes — raw hnsw.Graph.Export] Pre-extraction: internal/catalogd/store_client.go → internal/storeclient/ shared package, since both catalogd and vectord need it. Same pattern as the pre-D5 catalogclient extraction. Optional via [vectord].storaged_url config (empty = ephemeral mode). On startup: List + Load each persisted index. After Create / batch Add / DELETE: Save (or Delete from storaged). Save failures are logged-not- fatal — in-memory state is the source of truth in flight. Acceptance smoke G1P 8/8 PASS — kill+restart preserves state, post- restart search returns dist=0 (graph round-trips exactly), DELETE removes the file, post-delete restart shows count=0. All 8 smokes (D1-D6 + G1 + G1P) PASS deterministically. The g1_smoke gained scripts/g1_smoke.toml that disables persistence so the in-memory API test stays decoupled from any rehydrate-from-storaged state contamination. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 1 BLOCK + 5 WARN + 3 INFO - Kimi K2-0905 (openrouter): 1 BLOCK + 2 WARN - Qwen3-coder (openrouter): 2 BLOCK + 2 WARN + 1 INFO Fixed (3 — 1 convergent + 2 single-reviewer): C1 (Opus + Kimi + Qwen 3-WAY CONVERGENT WARN): Save was non-atomic across two PUTs — envelope-succeeds + graph-fails left a half- saved index that passed the "both present" List filter and silently mismatched metadata against vectors on Load. Fix: collapse to single framed file (no torn-write window possible). O-B1 (Opus BLOCK): isNotFound substring-matched "key not found" against the wrapped error message — brittle, any 5xx body containing that text would silently misclassify as missing. Fix: errors.Is(err, storeclient.ErrKeyNotFound). O-I3 (Opus INFO): handleAdd pre-validation only covered id+dim; NaN/Inf/zero-norm could still fail mid-batch leaving partial commits. Fix: extend pre-validation to call ValidateVector (newly exported) per item before any commit. Dismissed (3 false positives): K-B1 + Q-B1 ("safeKey double-escapes %2F segments") — false convergent. Wire-protocol escape is decoded by storaged's chi router on the way in; on-disk key is the original literal. %2F round-trips correctly through PathEscape → URL → chi decode → S3 key. Q-B2 ("List vulnerable to race conditions") — vectord is single- process; no concurrent Save against List in the same vectord. Deferred (3): rehydrate per-index timeout (G2+ multi-index scale), saveAfter request ctx (matches G0 timeout deferral), Encode RLock during slow writer (documented as buffer-only API). The C1 finding is the strongest signal of the cross-lineage filter: three independent reviewers all flagged the same torn-write hazard. Single-file framing eliminates the class — there's now no Persistor state where envelope and graph can disagree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 01:33:23 -05:00
root	b8c072cf0b	G1: vectord — HNSW vector search via coder/hnsw · 6 scrum fixes applied First G1+ piece. Standalone vectord service with in-memory HNSW indexes keyed by string IDs and optional opaque JSON metadata. Wraps github.com/coder/hnsw v0.6.1 (pure Go, no cgo). New port :3215 with /v1/vectors/* routed through gateway. API: POST /v1/vectors/index create GET /v1/vectors/index list GET /v1/vectors/index/{name} get info DELETE /v1/vectors/index/{name} POST /v1/vectors/index/{name}/add (batch) POST /v1/vectors/index/{name}/search Acceptance smoke 7/7 PASS — including recall=1 on inserted vector w-042 (cosine distance 5.96e-8, float32 precision noise), 200- vector batch round-trip, dim mismatch → 400, missing index → 404, duplicate create → 409. Two upstream library quirks worked around in the wrapper: 1. coder/hnsw.Add panics with "node not added" on re-adding an existing key (length-invariant fires because internal delete+re-add doesn't change Len). Pre-Delete fixes for n>1. 2. Delete of the LAST node leaves layers[0] non-empty but entryless; next Add SIGSEGVs in Dims(). Workaround: when re-adding to a 1-node graph, recreate the underlying graph fresh via resetGraphLocked(). Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 0 BLOCK + 4 WARN + 3 INFO - Kimi K2-0905 (openrouter): 2 BLOCK + 2 WARN + 1 INFO - Qwen3-coder (openrouter): "No BLOCKs" (4 tokens) Fixed (4 real + 2 cleanup): O-W1: Lookup returned the raw []float32 from coder/hnsw — caller mutation would corrupt index. Now copies before return. O-W3: NaN/Inf vectors poison HNSW (distance comparisons return false for both < and >, breaking heap invariants). Zero-norm under cosine produces NaN. Now validated at Add time. K-B1: Re-adding with nil metadata silently cleared the existing entry — JSON-omitted "metadata" field deserializes as nil, making upsert non-idempotent. Now nil = "leave alone"; explicit {} or Delete to clear. O-W4: Batch Add with mid-batch failure left items 0..N-1 committed and item N rejected. Now pre-validates all IDs+dims before any Add. O-I1: jsonItoa hand-roll replaced with strconv.Itoa — no measured allocation win. O-I2: distanceFn re-resolved per Search → use stored i.g.Distance. Dismissed (2 false positives): K-B2 "MaxBytesReader applied after full read" — false, applied BEFORE Decode in decodeJSON K-W1 "Search distances under read lock might see invalidated slices from concurrent Add" — false, RWMutex serializes write-lock during Add against read-lock during Search Deferred (3): HTTP server timeouts (consistent G0 punt), Content-Type validation (internal service behind gateway), Lookup dim assertion (in-memory state can't drift). The K-B1 finding is worth pausing on: nil metadata on re-add is the kind of API ergonomics bug only a code-reading reviewer catches — smoke would never detect it because the smoke always sends explicit metadata. Three lines changed in Add; the resulting API matches what callers actually expect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:50:28 -05:00
root	b1d52306ad	G0 D6: gateway reverse proxy fronting all 4 backing services · 2 scrum fixes · G0 COMPLETE Last day of Phase G0. Gateway promotes the D1 stub endpoints into real reverse-proxies on :3110 fronting storaged + catalogd + ingestd + queryd. /v1 prefix lives at the edge — internal services route on /storage, /catalog, /ingest, /sql, with the prefix stripped by a custom Director per Kimi K2's D1-plan finding. Routes: /v1/storage/* → storaged /v1/catalog/* → catalogd /v1/ingest → ingestd /v1/sql → queryd Acceptance smoke 6/6 PASS — every assertion goes through :3110, none direct to backing services. Full ingest → storage → catalog → query round-trip verified end-to-end. The smoke's "rows[0].name=Alice" assertion is the architectural payoff: five binaries, six HTTP routes, one round-trip through one edge. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 1 BLOCK + 2 WARN + 2 INFO - Kimi K2-0905 (openrouter): 1 BLOCK + 3 WARN + 1 INFO (3 false positives, all from one wrong TrimPrefix theory) - Qwen3-coder (openrouter): 5 completion tokens — "No BLOCKs." Fixed (2, both Opus single-reviewer): O-BLOCK: Director path stripping fails if upstream URL has a non-empty path. The default Director's singleJoiningSlash runs BEFORE the custom code, so an upstream like http://host/api produces /api/v1/storage/... after the join — then TrimPrefix("/v1") is a no-op because the string starts with /api. Fix: strip /v1 BEFORE calling origDirector. New TestProxy_SubPathUpstream regression locks this in. Today: bare-host URLs only, dormant — but moving gateway behind a sub-path in prod would have silently 404'd. O-WARN2: url.Parse is permissive — typo "127.0.0.1:3211" (no scheme) parses fine, produces empty Host, every request 502s. mustParseUpstream fail-fast at startup with a clear message naming the offending config field. Dismissed (3, all Kimi, same false TrimPrefix theory): K-BLOCK "TrimPrefix loops forever on //v1storage" — false, single check-and-trim, no loop K-WARN "no upper bound on repeated // removal" — same false theory K-WARN "goroutines leak if upstream parse fails while binaries running" — confused scope; binaries are separate OS processes launched by the smoke script D1 smoke updated (post-D6): the 501 stub probes are gone (gateway no longer stubs /v1/ingest and /v1/sql). Replaced with proxy probes that verify gateway forwards malformed requests to ingestd and queryd. Launch order changed from parallel to dep-ordered (storaged → catalogd → ingestd → queryd → gateway) since catalogd's rehydrate now needs storaged, queryd's initial Refresh needs catalogd. All six G0 smokes (D1 through D6) PASS end-to-end after every fix round. Phase G0 substrate is complete: 5 binaries, 6 routes, 25 fixes applied across 6 days from cross-lineage review. G1+ next: gRPC adapters, Lance/HNSW vector indices, Go MCP SDK port, distillation rebuild, observer + Langfuse integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:21:54 -05:00
root	9e9e4c26a4	G0 D5: queryd DuckDB SELECT over Parquet via httpfs · 4 scrum fixes Phase G0 Day 5 ships queryd: in-memory DuckDB with custom Connector that runs INSTALL httpfs / LOAD httpfs / CREATE OR REPLACE SECRET (TYPE S3) on every new connection, sourced from SecretsProvider + shared.S3Config. SetMaxOpenConns(1) so registrar's CREATE VIEWs and handler's SELECTs serialize through one connection (avoids cross- connection MVCC visibility edge cases). Registrar.Refresh reads catalogd /catalog/list, runs CREATE OR REPLACE VIEW "name" AS SELECT * FROM read_parquet('s3://bucket/key') per manifest, drops views for removed manifests, skips on unchanged updated_at (the implicit etag). Drop pass runs BEFORE create pass so a poison manifest can't block other manifest refreshes (post-scrum C1 fix). POST /sql with JSON body {"sql":"…"} returns {"columns":[{"name":"id","type":"BIGINT"},…], "rows":[[…]], "row_count":N}. []byte → string conversion so VARCHAR rows JSON-encode as text. 30s default refresh ticker, configurable via [queryd].refresh_every. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 1 BLOCK + 4 WARN + 4 INFO - Kimi K2-0905 (openrouter): 2 BLOCK + 2 WARN + 1 INFO - Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO Fixed (4): C1 (Opus + Kimi convergent): Refresh aborts on first per-view error → drop pass first, collect errors, errors.Join. Poison manifest no longer blocks the rest of the catalog from re-syncing. B-CTX (Opus BLOCK): bootstrap closure captured OpenDB's ctx → cancelled-ctx silently fails every reconnect. context.Background() inside closure; passed ctx only for initial Ping. B-LEAK (Kimi BLOCK): firstLine(stmt) truncated CREATE SECRET to 80 chars but those 80 chars contained KEY_ID + SECRET prefix → log aggregator captures credentials. Stable per-statement labels + redactCreds() filter on wrapped DuckDB errors. JSON-ERR (Opus WARN): swallowed json.Encode error → silent truncated 200 on unsupported column types. slog.Warn the failure. Dismissed (4 false positives): Qwen BLOCK "bootstrap not transactional" — DuckDB DDL is auto-commit Qwen BLOCK "MaxBytesReader after Decode" — false, applied before Kimi BLOCK "concurrent Refresh + user SELECT deadlock" — not a deadlock, just serialization, by design with 10s timeout retry Kimi WARN "dropView leaves r.known inconsistent" — current code returns before the delete; the entry persists for retry Critical reviewer behavior: 1 convergent BLOCK between Opus + Kimi on the per-view error blocking, plus two independent single-reviewer BLOCKs (B-CTX, B-LEAK) that smoke could never have caught. The B-LEAK fix uses defense-in-depth: never pass SQL into the error path AND redact known cred values from DuckDB's own error message. DuckDB cgo path: github.com/duckdb/duckdb-go/v2 v2.10502.0 (per ADR-001 §1) on Go 1.25 + arrow-go. Smoke 6/6 PASS after every fix round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:10:55 -05:00
root	c1e411347a	G0 D4: ingestd CSV → Parquet → catalogd register · 2 scrum fixes Phase G0 Day 4 ships ingestd: multipart CSV upload, Arrow schema inference per ADR-010 (default-to-string on ambiguity), single-pass streaming CSV → Parquet via pqarrow batched writer (Snappy compressed, 8192 rows per batch), PUT to storaged at content-addressed key datasets/<name>/<fp_hex>.parquet, register manifest with catalogd. Acceptance smoke 6/6 PASS including idempotent re-ingest (proves inference is deterministic — same CSV always produces same fingerprint) and schema-drift → 409 (proves catalogd's gate fires on ingest traffic). Schema fingerprint is SHA-256 over (name, type) tuples in header order using ASCII record/unit separators (0x1e/0x1f) so column names with commas can't collide. Nullability intentionally NOT in the fingerprint — a column gaining nulls isn't a schema change. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 4 WARN + 3 INFO (after 2 self-retracted BLOCKs) - Kimi K2-0905 (openrouter): 1 BLOCK + 2 WARN + 1 INFO - Qwen3-coder (openrouter): 2 BLOCK + 2 WARN + 2 INFO Fixed (2, both Opus single-reviewer): C-DRIFT: PUT-then-register on fixed datasets/<name>/data.parquet meant a schema-drift ingest overwrote the live parquet BEFORE catalogd's 409 fired → storaged inconsistent with manifest. Fix: content-addressed key datasets/<name>/<fp_hex>.parquet. Drift writes to a different file (orphan in G2 GC scope); the live data is never corrupted. C-WCLOSE: pqarrow.NewFileWriter not Closed on error paths leaks buffered column data + OS resources per failed ingest. Fix: deferred guarded close with wClosed flag. Dismissed (5, all false positives): Qwen BLOCK "csv.Reader needs LazyQuotes=true for multi-line" — false, Go csv handles RFC 4180 multi-line quoted fields by default Qwen BLOCK "row[i] OOB" — already bounds-checked at schema.go:73 and csv.go:201 Kimi BLOCK "type assertion panic if pqarrow reorders fields" — speculative, no real path Kimi WARN + Qwen WARN×2 "RecordBuilder leak on early error" — false convergent. Outer defer rb.Release() captures the current builder; in-loop release runs before reassignment. No leak. Deferred (6 INFO + accepted-with-rationale on 3 WARN): sample boundary type mismatch (G0 cap bounds peak), string-match paranoia on http.MaxBytesError, multipart double-buffer (G2 spool- to-disk), separator validation, body close ordering, etc. The D4 scrum produced fewer real findings than D3 (2 vs 6) — both were architectural hazards smoke wouldn't catch because the smoke's "schema drift → 409" assertion was passing even in the corrupted- state world. The 409 fires correctly; what was wrong was the PUT having already mutated the live parquet before the validation check. Opus's PUT-then-register read of the order is exactly the kind of architectural insight the cross-lineage scrum is designed to surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:50:10 -05:00
root	66a704ca3e	G0 D3: catalogd Parquet manifests + ADR-020 idempotent register · 6 scrum fixes Phase G0 Day 3 ships catalogd: Arrow Parquet manifest codec, in-memory registry with the ADR-020 idempotency contract (same name+fingerprint reuses dataset_id; different fingerprint → 409 Conflict), HTTP client to storaged for persistence, and rehydration on startup. Acceptance smoke 6/6 PASSES end-to-end including rehydrate-across-restart — the load-bearing test that the catalog/storaged service split actually preserves state. dataset_id derivation diverges from Rust: UUIDv5(namespace, name) instead of v4 surrogate. Same name on any box generates the same dataset_id; rehydrate after disk loss converges to the same identity rather than silently re-issuing. Namespace pinned at a8f3c1d2-4e5b-5a6c-9d8e-7f0a1b2c3d4e — every dataset_id ever issued depends on these bytes. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 1 BLOCK + 5 WARN + 3 INFO - Kimi K2-0905 (openrouter, validated D2): 2 BLOCK + 2 WARN + 1 INFO - Qwen3-coder (openrouter): 2 BLOCK + 2 WARN + 2 INFO Fixed: C1 list-offsets BLOCK (3-way convergent) → ValueOffsets(0) + bounds C2 Rehydrate mutex held across I/O → swap-under-brief-lock pattern S1 split-brain on persist failure → candidate-then-swap S2 brittle string-match for 400 vs 500 → ErrEmptyName/ErrEmptyFingerprint sentinels S3 Get/List shallow-copy aliasing → cloneManifest deep copy S4 keep-alive socket leak on error paths → drainAndClose helper Dismissed (false positives, all single-reviewer): Kimi BLOCK "Decode crashes on empty Parquet" — already handled Kimi INFO "safeKey double-escapes" — wrong, splitting before escape is required Qwen INFO "rb.NewRecord() error unchecked" — API returns no error Deferred to G1+: name validation regex, per-call deadlines, Snappy compression, list pagination continuation tokens (storaged caps at 10k with sentinel for now). Build clean, vet clean, all tests pass, smoke 6/6 PASS after every fix round. arrow-go/v18 + google/uuid added; Go 1.24 → 1.25 forced by arrow-go's minimum. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:36:57 -05:00
root	8cfcdb8e5f	G0 D2: storaged S3 GET/PUT/LIST/DELETE · 3-lineage scrum · 4 fixes applied Phase G0 Day 2 ships storaged: aws-sdk-go-v2 wrapper + chi routes binding 127.0.0.1:3211 with 256 MiB MaxBytesReader, Content-Length up-front 413, and a 4-slot non-blocking semaphore returning 503 + Retry-After:5 when full. Acceptance smoke (6/6 probes) PASSES against the dedicated MinIO bucket lakehouse-go-primary, isolated from the Rust system's lakehouse bucket during coexistence. Cross-lineage scrum on the shipped code: - Opus 4.7 (opencode): 1 BLOCK + 3 WARN + 3 INFO - Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO (3 false positives) - Kimi K2-0905 (openrouter, after route-shopping past opencode's 4k cap and the direct adapter's empty-content reasoning bug): 1 BLOCK + 2 WARN + 1 INFO Fixed: C1 buildRegistry ctx cancel footgun → context.Background() (Opus + Kimi convergent; future credential refresh chains) C2 MaxBytesReader unwrap through manager.Uploader multipart goroutines → Content-Length up-front 413 + string-suffix fallback (Opus + Kimi convergent; latent 500-instead-of-413 in 5-256 MiB range) C3 Bucket.List unbounded accumulation → MaxListResults=10_000 cap (Opus + Kimi convergent; OOM guard) S1 PUT response Content-Type: application/json (Opus single-reviewer) Strict validateKey policy (J approved): rejects empty, >1024B, NUL, leading "/", ".." path components, CR/LF/tab control characters. DELETE exposed at HTTP layer (J approved option A) for symmetry + smoke ergonomics. Build clean, vet clean, all unit tests pass, smoke 6/6 PASS after every fix round. go.mod 1.23 → 1.24 (required by aws-sdk-go-v2). Process finding worth recording: opencode caps non-streaming Kimi at max_tokens=4096; the direct kimi.com adapter consumed 8192 tokens of reasoning but surfaced empty content; openrouter/moonshotai/kimi-k2-0905 delivered structured output in ~33s. Future Kimi scrums should default to that route. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:23:03 -05:00
Claw	ad2ec1aca9	G0 D1 hardened: 3-lineage scrum review on shipped code · 7 fixes applied Code-review pass after D1 shipped, all three model lineages running in parallel against the actual Go source (not docs): Convergent findings (≥2 reviewers — high confidence): - C1 BLOCK · Run() errCh/select race could silently drop fast bind errors. Fixed: net.Listen() now runs synchronously before the goroutine; bind errors surface as Run()'s return value. - C2 BLOCK · scripts/d1_smoke.sh sleep 0.5 races bind on cold boxes. Fixed: replaced with poll_health() loop, 5s/svc budget, 50ms poll. - C3 WARN · LoadConfig silent fallback when file missing. Fixed: emits slog.Warn with path + hint when path given but file absent. Single-reviewer fixes: - S1 WARN · slog.SetDefault inside Run() mutated global state from a library function. Fixed: Run() no longer calls SetDefault. - S2 WARN · os.IsNotExist → errors.Is(err, fs.ErrNotExist) idiom. - S6 WARN · smoke double-curl collapsed to single curl -i parse. Second-pass Opus review on post-fix code caught one more: - head -1 on curl -i fragile against 1xx interim lines. Fixed: awk picks the last HTTP/* status line (robust to 100 Continue). Accepted with rationale (deferred or planned): - S3 secrets-in-lakehouse.toml: D2.3 SecretsProvider already planned - S4 5x cmd/*/main.go duplication: defer until D2 reveals real per-service config consumption - S5 /health log volume: defer post-G0, not on k8s yet - 2nd-pass theoreticals: clean-exit-no-Shutdown path doesn't trigger, defensive defer ln.Close() aspirational, etc. Verification: - go build ./cmd/... exit 0 - go vet ./... clean - ./scripts/d1_smoke.sh D1 acceptance gate: PASSED - 3-lineage code review · 14 findings · 7 fixed · 0 deferred · 5 accepted with rationale Total D1 review coverage across the phase: - 3 doc-review passes (Opus + Kimi + Qwen) — 13 findings, 10 fixed - 1 runtime smoke — 1 finding (port 3100 collision), fixed - 1 code-review parallel pass — 14 findings, 7 fixed - 1 code-review second pass (Opus) — 1 actionable, fixed - Cumulative: 29 findings · 19 fixed inline · 5 accepted · 5 deferred Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 07:07:50 -05:00
Claw	1142f54f23	G0 D1 ships: skeleton + chi + /health × 5 binaries · acceptance gate PASSED Phase G0 Day 1 executed end-to-end after a third-pass review by qwen3-coder:480b consolidated all findings across Opus/Kimi/Qwen lineages. Cross-lineage review consolidation (3 model passes + 1 runtime pass): - Opus 4.7: 9 findings · 7 fixed inline · 2 deferred - Kimi K2.6: 2 BLOCKs (introduced by Opus fixes) · 2 fixed - Qwen3-coder:480b: 2 WARNs · 1 fixed (D2.4 256 MiB cap + 4-slot semaphore on PUTs) · 1 deferred (Q2 view refresh batching) - Runtime smoke: 1 finding (port 3100 collision with live Rust lakehouse) · fixed (Go dev ports shifted to 3110+) - Total: 14 findings · 11 fixed · 3 deferred to G2 What landed in code: - internal/shared/server.go — chi factory, slog JSON, /health, graceful shutdown via signal.NotifyContext - internal/shared/config.go — TOML loader, DefaultConfig, -config flag - cmd/{gateway,storaged,catalogd,ingestd,queryd}/main.go — five binaries, each ~30 lines using the shared factory - lakehouse.toml — G0 dev defaults (3110-3214) - scripts/d1_smoke.sh — repeatable smoke that exits 0 on PASS - go.mod / go.sum — chi v5.2.5, pelletier/go-toml/v2 v2.3.0 Verified end-to-end via scripts/d1_smoke.sh: - All 5 /health endpoints return 200 with correct service name - Gateway /v1/ingest + /v1/sql stubs return 501 with X-Lakehouse-Stub - Graceful shutdown logs cleanly on SIGTERM - DuckDB cgo path verified separately (sql.Open("duckdb","") + ping) D1 ACCEPTANCE GATE: PASSED. Next: D2 — storaged S3 GET/PUT/LIST against MinIO. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 07:00:37 -05:00

38 Commits