golangLAKEHOUSE

Author	SHA1	Message	Date
root	be65f85f17	F: drift quantification — scorer drift first PRD's 5-loop substrate names "drift" as loop 5: quantify when historical decisions stop matching current reality. Distinct from the rating+distillation loop because drift is MEASUREMENT, not LEARNING. The learning loop says "this match worked, remember it"; the drift loop says "this 4-month-old playbook entry — does it still match what the substrate would surface today?" First-shipped drift shape: SCORER drift. When the deterministic scorer's ScorerVersion bumps, historical ScoredRuns may no longer match what the current scorer produces on the same EvidenceRecord. internal/drift/drift.go: - ScorerDriftInput — (EvidenceRecord, persisted_category) pair - ScorerDriftEntry — one mismatch with current reasons attached - CategoryShift — (from, to, count) cell in the shift matrix - ScorerDriftReport — summary + sorted shift matrix + optional entries - ComputeScorerDrift(inputs, includeEntries) — pure function; re-runs ScoreRecord over each input and reports mismatches Why this matters: without a drift quantifier, a scorer-rule change silently invalidates the historical training data feeding the learning loop. With drift quantification, a rule change surfaces a concrete number ("847 of 4701 historical ScoredRuns now disagree") that triggers a re-score-and-retrain cycle rather than letting the substrate quietly rot. Tests (6/6 PASS): - No-drift: all 3 inputs match → 100% matched - Shift detected: 5 inputs, 3 drift cases, drift_rate=0.6, shift matrix shows accepted→partially_accepted x3 - Multiple shifts sorted by count desc - includeEntries=false skips the per-mismatch list - Empty input → all-zero report (no division-by-zero) - ScorerVersion stamped on every report Future drift shapes (deferred to follow-ups, named in package doc): - PLAYBOOK drift: re-run playbook queries through current matrix-search; recorded answer not in top-K = drift - EMBEDDING drift: KS-test on vector distribution at T1 vs T2 - AUDIT BASELINE drift: matches Rust audit_baselines.jsonl longitudinal signal Pure compute. Materialization layer (read scored-runs jsonl + their matching evidence jsonl + feed into ComputeScorerDrift) lands with the distillation materialization commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:06:17 -05:00
root	57d0df125d	E (partial): distillation port — scorer + contamination firewall First slice of the Rust v1.0.0 distillation substrate (e7636f2) ported to Go per ADR-001 #4 (port LOGIC, not bit-identical reproducibility). This commit lands the LOAD-BEARING pieces named in project_distillation_substrate.md memory: - The deterministic Success Scorer (8 sub-scorers + dispatch) - The contamination firewall on SFT samples (the "non-negotiable" spec property: rejected/needs_human_review NEVER ship to SFT) - All on-wire types + validators for ScoredRun, SftSample, EvidenceRecord with Provenance Files: internal/distillation/types.go — types + ScorerVersion + SftNever + ValidateScoredRun + ValidateSftSample internal/distillation/scorer.go — ScoreRecord + 8 class scorers + BuildScoredRun (deterministic) internal/distillation/scorer_test.go — ~40 test cases: - source-class dispatch (verdict / telemetry / extraction) - scrum_review (4 attempt cases) - observer_review (5 verdict cases) - audit (legacy + severity, 9 cases) - auto_apply (4 cases) - outcomes / mode_experiment / extraction - CONTAMINATION FIREWALL: ErrSftContamination sentinel fires on rejected/needs_human_review, distinct from typo errors - empty-pair guard (instruction/response trim != "") - reasons-required ScoredRun validation - deterministic sig_hash on identical input - purity check (input not mutated, repeatable output) Per the 2026-04-29 cross-lineage scrum's discipline: false-positive findings would be dismissed inline (none in this commit). Real findings would be addressed before merge — but this is greenfield port code reviewed against its Rust source line-by-line, which the test suite encodes as truth tables. Explicitly DEFERRED to follow-up commits: - Materialization layer (jsonl read/write, date-partitioned storage in data/scored-runs/YYYY/MM/DD/, evidence index) - SFT exporter (file iteration + filtering — the SCORING firewall is here; the EXPORT firewall is the next layer) - export_preference, export_rag (other export shapes) - Acceptance harness (16/16 acceptance gate that locks v1.0.0) - replay, receipts, build_evidence_index, transforms The scorer + firewall validator are pure functions — operational tooling layers on top without changing the deterministic logic the downstream learning loop depends on. The Go ScorerVersion stays at v1.0.0 to match the Rust e7636f2 baseline; bumping in the Go materialization commit is reserved for the next scoring-rule change, NOT the port itself. 15-smoke regression all green. vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:04:29 -05:00
root	7f42089521	D: embed-text iteration — clean negative finding (3 variants tested) Workers driver embed text reverted to V0 after testing 3 variants on the "Forklift operator with OSHA-30 certification, warehouse experience" reality-test query against 5000 workers (which contains 569 actual Forklift Operators per the 31b4088 probe). V0 (current, restored): "Worker role: <role>. Skills: ... Certifications: ... <resume_text>" → 6 workers in top-8, 0 Forklift Ops, top distance 0.327, top role "Production Worker" V4a (role-doubled): "<role>. <role> with <skills>. ..." drop archetype + resume_text → 6 workers in top-8, 0 Forklift Ops, top distance 0.254, top role "Production Worker" V4b (resume-only): just the resume_text natural-language sentence, no structured prefix → 4 workers in top-8 (WORSE mix — software-engineer candidates filled the displaced slots), 0 Forklift Ops, top distance 0.379 Conclusion: all three variants surface Production Workers / Machine Operators / Line Leads ABOVE Forklift Operators for this query. The 569 actual Forklift Operators in the 5000-row sample don't appear in any top-8. Embed-text design isn't the bottleneck — nomic-embed-text 137M's geometry doesn't separate "Forklift Operator" from "Production Worker" / "Machine Operator" / "Line Lead" in this query's neighborhood. Real fixes belong elsewhere: - Hybrid SQL+semantic (B): pre-filter by role/certs via queryd before semantic ranking. Addresses the gap directly. - Different embedding model: mxbai-embed-large or a staffing- fine-tuned model. Costs an Ollama model swap + re-embedding. - Playbook boost (component 5, already shipped): record successful Forklift placements; future queries surface those workers via similarity. Compounds with use. V0 restored because it has the best worker/candidate mix in top-8 (6 vs 4 in V4b), preserving the multi-corpus reality-test signal quality even if the role match is imperfect. Comments updated to record the experiment so future sessions don't relitigate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:58:39 -05:00
root	a730fc2016	scrum fixes: 4 real findings landed, 4 false positives dismissed Cross-lineage scrum review on the 12 commits of this session (afbb506..06e7152) via Rust gateway :3100 with Opus + Kimi + Qwen3-coder. Results: Real findings landed: 1. Opus BLOCK — vectord BatchAdd intra-batch duplicates panic coder/hnsw's "node not added" length-invariant. Fixed with last-write-wins dedup inside BatchAdd before the pre-pass. Regression test TestBatchAdd_IntraBatchDedup added. 2. Opus + Kimi convergent WARN — strings.Contains(err.Error(), "status 404") was brittle string-matching to detect cold- start playbook state. Fixed: ErrCorpusNotFound sentinel returned by searchCorpus on HTTP 404; fetchPlaybookHits uses errors.Is. 3. Opus WARN — corpusingest.Run returned nil on total batch failure, masking broken pipelines as "empty corpora." Fixed: Stats.FailedBatches counter, ErrPartialFailure sentinel returned when nonzero. New regression test TestRun_NonzeroFailedBatchesReturnsError. 4. Opus WARN — dead var _ = io.EOF in staffing_500k/main.go was justified by a fictional comment. Removed. Drivers (staffing_500k, staffing_candidates, staffing_workers) updated to handle ErrPartialFailure gracefully — print warn, keep running queries — rather than fatal'ing on transient hiccups while still surfacing the failure clearly in the output. Documented (no code change): - Opus WARN: matrixd /matrix/downgrade reads LH_FORCE_FULL_ENRICHMENT from process env when body omits it. Comment now explains the opinionated default and points callers wanting deterministic behavior to pass the field explicitly. False positives dismissed (caught and verified, NOT acted on): A. Kimi BLOCK on errors.Is + wrapped error in cmd/matrixd:223. Verified false: Search wraps with %w (fmt.Errorf("%w: %v", ErrEmbed, err)), so errors.Is matches the chain correctly. B. Kimi INFO "BatchAdd has no unit tests." Verified false: batch_bench_test.go has BenchmarkBatchAdd; the new dedup test TestBatchAdd_IntraBatchDedup adds another. C. Opus BLOCK on missing finite/zero-norm pre-validation in cmd/vectord:280-291. Verified false: line 272 already calls vectord.ValidateVector before BatchAdd, so finite + zero- norm IS checked. Pre-validation is exhaustive. D. Opus WARN on relevance.go tokenRe (Opus self-corrected mid-finding when realizing leading char counts toward token length). Qwen3-coder returned NO FINDINGS — known issue with very long diffs through the OpenRouter free tier; lineage rotation worked as designed (Opus + Kimi between them caught everything Qwen would have). 15-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance, downgrade, playbook). Unit tests all green (corpusingest +1, vectord +1). Per feedback_cross_lineage_review.md: convergent finding #2 (404 detection) is the highest-signal one — both Opus and Kimi flagged it independently. The other Opus findings stand on single-reviewer signal but each one verified against the actual code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:42:39 -05:00
root	06e71520c4	matrix: playbook memory + boost — SPEC §3.4 component 5 of 5 (LEARNING LOOP) Closes SPEC §3.4. The matrix indexer is now a learning meta-index per feedback_meta_index_vision.md — every successful (query → answer) pair recorded via /matrix/playbooks/record boosts that answer for future similar queries. This is the architectural piece that lifts vectord from "static hybrid search" to the meta-index J originally framed in Phase 19 of the Rust system. What's new: - internal/matrix/playbook.go — PlaybookEntry, PlaybookHit, ApplyPlaybookBoost. Pure-function boost math: distance' = distance * (1 - 0.5 * score) Score 0 = no boost (factor 1.0); score 1 = halve distance (factor 0.5). Capped at 0.5 deliberately so a single high- confidence playbook can't dominate the base ranking forever (runaway-feedback-loop guard). - Retriever.Record(entry, corpus) — embeds query_text, ensures playbook corpus exists (idempotent), upserts via deterministic sha256-derived ID (last score wins on re-record of same triple). - Retriever.Search extended with UsePlaybook + PlaybookCorpus + PlaybookTopK + PlaybookMaxDistance. Reuses the query vector — no extra embed call. Missing-corpus 404 = no-op (cold-start state before any Record call), not an error. - POST /v1/matrix/playbooks/record (matrixd) — caller submits {query_text, answer_id, answer_corpus, score, tags?}; gets {playbook_id} back. Storage: a vectord index named "playbook_memory" (configurable per request) with embed(query_text) as the vector and the PlaybookEntry JSON as metadata. Just another corpus — observable from /vectors/index, persistable through G1P, etc. Match key for boost: (AnswerID, AnswerCorpus). Cross-corpus ID collisions don't false-match — verified by TestApplyPlaybookBoost_CorpusAttributionRespected. End-to-end smoke (scripts/playbook_smoke.sh, all assertions PASS): - Baseline search: widget-c at distance 0.6566 (rank 3) - Record playbook: query → widget-c, score=1.0 - Re-search with use_playbook=true: widget-c distance: 0.3283 (rank 2) ratio: 0.5 EXACTLY (matches boost math precisely) playbook_boosted: 1 - widget-c jumped from #3 to #2 — learning loop visible Tests: - 8 unit tests in internal/matrix/playbook_test.go covering Validate, BoostFactor (5 cases), the no-boost identity, the boost-moves-result-up scenario, highest-score wins on duplicate matches, cross-corpus attribution, JSON round-trip, and rejection of empty metadata - scripts/playbook_smoke.sh integration test (3 assertions PASS) 15-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance, downgrade, playbook). SPEC §3.4 NOW COMPLETE: 5 of 5 components shipped. The matrix indexer's port is done as a substrate; remaining work is operational (rating signal sources, telemetry, eventual structured filtering for staffing data — none in §3.4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:34:24 -05:00
root	31b408882b	multi_corpus_e2e: WORKERS_LIMIT env knob — and the embed-text-not-sample-size finding Adds WORKERS_LIMIT env override (default 5000) so the e2e can be re-run at different sample sizes. Tiny change; the interesting part is the FINDING that motivated the run. Investigation: a97881d's reality test put zero Forklift Operators in the top-6 for "Forklift operator with OSHA-30 certification, warehouse experience" — instead returned Production Worker / Machine Operator / Assembler. Hypothesis tested: maybe the 5000-row sample didn't contain forklift operators in retrievable density. Result: hypothesis falsified. Direct probe of workers_500k.parquet: All 500K rows → 55,349 Forklift Operators (11.07%) → 150,328 with "forklift" in certs → 74,852 with OSHA-30 specifically First 5K rows → 569 Forklift Operators (11.38%) → distribution matches global, no ordering bias So 569 forklift operators were IN the corpus the matrix indexer searched and STILL didn't surface in top-6. That means the bottleneck isn't sample size — it's nomic-embed-text + our embed-text template ranking "Production Worker" / "Machine Operator" / "Assembler" as semantically nearer to the query than literal "Forklift Operator". The reality test exposed this faithfully. Three real follow-ups, none in scope of this commit: 1. Embed text design — front-loading role + certs (currently "Worker role: <role>" then skills then certs) might dominate retrieval better. Worth A/B-testing. 2. Hybrid SQL+semantic — pre-filter by role/certs via queryd before semantic ranking. Not in SPEC §3.4 today; would address the "available" / "Chicago" gap from the candidates reality test (0d1553c) too. 3. Playbook-memory boost — SPEC §3.4 component 5. When a query "Forklift OSHA-30" was answered with worker w-X in the past, boost w-X's score for similar future queries. The retrieval gap CAN be bridged by the learning loop without changing the base embedder. Commits the env knob; the finding lives in the commit body so future sessions don't re-run the sample-size hypothesis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:26:32 -05:00
root	a97881d80c	workers corpus + multi-corpus reality test — matrix indexer end-to-end Lands the second real-data corpus (workers_500k) and the first multi-corpus reality test through /v1/matrix/search composing both corpora live. What's new: - scripts/staffing_workers/main.go — parquet driver over workers_500k.parquet, multi-chunk arrow handling (workers parquet has multiple row groups vs candidates' one). Embed text: role + skills + certifications + city + state + archetype + resume_text. IDs prefixed "w-". - scripts/multi_corpus_e2e.sh — first end-to-end test composing both corpora through the matrix indexer. Real-data multi-corpus result (this commit): Query: "Forklift operator with OSHA-30 certification, warehouse experience" Corpora: workers (5000 rows) + candidates (1000 rows) Merged top-8: workers=6, candidates=2 Top hits: w d=0.327 w-4573 Production Worker w d=0.353 w-1726 Machine Operator w d=0.362 w-3806 Production Worker w d=0.366 w-1000 Machine Operator w d=0.374 w-1436 Assembler w d=0.395 w-162 Machine Operator c d=0.440 c-CAND-00727 C#,.NET,Azure c d=0.446 c-CAND-00031 React,TypeScript,Node The matrix indexer correctly chose the right domain — manufacturing/ warehouse roles in workers (correct semantic match for the staffing query) rank ABOVE software-engineer candidates from the candidates corpus. 0.11 gap between the worst worker (0.395) and the best candidate (0.440) — clean distance separation. Compared to the candidates-only e2e run from 0d1553c: candidates-only top: c-CAND-00727 at d=0.4404 multi-corpus top: w-4573 at d=0.3265 (a Production Worker) That's the matrix indexer's whole point made visible: composing domain-distinct corpora surfaces better matches than single-corpus search. Without workers in the search space, the staffing query returned software engineers (wrong domain). With workers, it returns roles in the right ballpark. What's still imperfect (signal for component 5 + future work): - No top-6 worker actually has "Forklift" or "OSHA-30" visible in metadata; "Production Worker" is semantically nearest in this sample. Likely needs a larger workers ingest (5000 from 500K) or skill-keyword boost. - Status/availability still not gated. The staffing-side structured filtering gap from 0d1553c persists; relevance filter (CODE-aware) doesn't address it. Pipeline timings: workers ingest: 5000 rows / 19.2s = 260/sec end-to-end candidates ingest: 1000 rows / 3.1s = 322/sec multi-corpus query (text → embed → 2 parallel vectord → merge): 14ms 14-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance, downgrade). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:22:16 -05:00
root	3968ec8a7b	matrix: strong-model downgrade gate — SPEC §3.4 component 4 of 5 Pure-Go port of mode.rs::execute's pass5 downgrade gate (Rust 2026-04-26). Adds POST /v1/matrix/downgrade endpoint via matrixd. The gate captures the pass5 finding: composing matrix corpora into codereview_lakehouse on a strong model LOST 5/5 head-to-head reps against matrix-free codereview_isolation on grok-4.1-fast (p=0.031). Strong models have enough native capacity that bug fingerprints + adversarial framing + file content carry them; matrix chunks displace depth-of-analysis. Logic (matches Rust mode.rs:614-632): if mode == codereview_lakehouse && !forced_mode && !LH_FORCE_FULL_ENRICHMENT && !is_weak_model(model) → flip to codereview_isolation, record downgraded_from is_weak_model captures the empirical weak-list: - `:free` suffix or `:free/` infix (OpenRouter free tier) - qwen3.5:latest, qwen3:latest (local last-resort rungs) - everything else → strong by default Tests: - 3 unit tests in internal/matrix/downgrade_test.go: IsWeakModel coverage, MaybeDowngrade truth table (5 rows), forced-mode precedence (forced beats every other bypass) - scripts/downgrade_smoke.sh: 6 assertions through gateway covering all 5 truth-table rows + empty-mode 400 14-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance, downgrade). SPEC §3.4 progress: 4 of 5 components shipped (corpus builders, multi-corpus retrieve+merge, relevance filter, downgrade gate). Last component is learning-loop integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:17:55 -05:00
root	9588bd82ae	matrix: relevance filter — SPEC §3.4 component 3 of 5 Faithful port of mcp-server/relevance.ts (Rust observer's adjacency- pollution filter). Same 5-signal scoring, same default threshold 0.3. Adds POST /v1/matrix/relevance endpoint via matrixd. Scoring signals (additive, can sign-flip): path_match +1.0 chunk source/doc_id encodes focus.path filename_match +0.6 chunk text mentions focus's filename defined_match +0.6 chunk text mentions focus.defined_symbols token_overlap +0.4 jaccard of non-stopword tokens prefix_match +0.3 chunk source shares first-2-segment prefix import_penalty -0.5 mentions ONLY imported symbols, no defined ones What this does and doesn't do: - DOES filter code-aware corpora (eventually lakehouse_arch_v1, lakehouse_symbols_v1, scrum_findings_v1) — drops chunks about code the focus file IMPORTS rather than DEFINES, the "adjacency pollution" pattern that makes a reviewer LLM hallucinate imported-crate internals as belonging to the focus - DOES NOT meaningfully filter staffing data — the candidates reality test 2026-04-29 had "exact skill match buried at #3" which is a different problem (semantic-only ranking dominated by secondary text). Staffing needs structured filtering (status gates, location gates) that lives outside this package — future work, not in SPEC §3.4 yet Headline smoke assertion: focus = crates/queryd/src/db.go which defines Connector and imports catalogd::Registry. The filter scores: Connector chunk: +0.68 (defined_match fires, kept) Registry chunk: -0.46 (import_only penalty fires, dropped) unrelated junk: 0.00 (no signals, dropped) That's a 1.14-point gap between what we ARE and what we IMPORT — the entire purpose of the filter. Tests: - 9 unit tests in internal/matrix/relevance_test.go covering Tokenize, Jaccard, ExtractDefinedSymbols (Rust + TS), ExtractImportedSymbols, FilePrefix, ScoreRelevance per-signal, FilterChunks threshold splitting, and the headline AdjacencyPollutionScenario - scripts/relevance_smoke.sh integration smoke (3 assertions PASS): adjacency-pollution scenario, empty-chunks 400, threshold honored 13-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix, relevance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:13:22 -05:00
root	0d1553ca88	candidates corpus: first deep-field reality test on real staffing data Lands the second staffing corpus and the first end-to-end reality test through the full Go pipeline: parquet → corpusingest → embedd → vectord → matrixd → gateway. What's new: - scripts/staffing_candidates/main.go — parquet Source over candidates.parquet (1000 rows, 11 cols), single-chunk arrow-go pqarrow read. Embed text: "Candidate skills: <s>. Based in <city>, <state>. <years> years experience. Status: <status>. <first> <last>." IDs prefixed "c-" so multi-corpus merges against workers ("w-") stay unambiguous. - scripts/candidates_e2e.sh — first integration smoke that runs the full stack (storaged + embedd + vectord + matrixd + gateway), ingests via corpusingest, runs a real query through /v1/matrix/search, prints results. Ephemeral mode (vectord persistence disabled via custom toml) so re-runs don't pollute MinIO _vectors/ and break g1p_smoke's "only-one-persisted-index" assertion. Real bug caught + fixed in corpusingest: When LogProgress > 0, the progress goroutine's only exit was ctx.Done(). With context.Background() in the production driver, Run hung forever after the pipeline finished. Added a stopProgress channel that close()s after wg.Wait(). Regression test TestRun_ProgressLoggerExits bounds Run's wall to 2s with LogProgress=50ms. This is the bug the unit tests didn't catch because every prior test set LogProgress: 0. Reality test surfaced it on first real-data run — exactly the hyperfocus-and-find-architectural-weakness property J framed as the reason for the Go pass. End-to-end output (1000 candidates, query "Python AWS Docker engineer in Chicago available now"): populate: scanned=1000 embedded=1000 added=1000 wall=3.5s matrix returned 5 hits in 26ms The result quality is the interesting signal: top-5 had ZERO Chicago candidates, ZERO active-status candidates, and the exact- skill-match (Python,AWS,Docker) ranked #3 not #1. Pipeline works; retrieval quality has real architectural limits (no structured filtering, no relevance gate, semantic-only ranking dominated by secondary signals like "1 year experience" and "engineer"). This motivates SPEC §3.4 components 3 (relevance filter) and eventually structured filtering — exactly the kind of finding the deep field reality tests are supposed to surface before Enterprise cutover. 12-smoke regression sweep all green. 9 corpusingest unit tests including the new regression. vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:06:27 -05:00
root	166470f532	corpusingest: extract reusable text→vector ingest pipeline Generalizes the staffing_500k driver's embed-and-push loop into internal/corpusingest. Per docs/SPEC.md §3.4 component 1 (corpus builders): adding a new staffing/code/playbook corpus is now one Source impl + one main.go calling Run, not 200 lines of pipeline copy-paste. API: type Source interface { Next() (Row, error) } func Run(ctx, Config, Source) (Stats, error) Library owns: - Index lifecycle (create, optional drop-existing, idempotent reuse on 409) - Parallel embed dispatcher (configurable workers + batch size) - Vectord push batching - Progress logging + Stats reporting - Partial-failure semantics (log + continue per-batch errors; operator decides on re-run via Stats.Embedded vs Scanned delta) Per-corpus driver owns: source parsing + column→Row mapping + post-ingest validation queries. Refactor scripts/staffing_500k/main.go to use it. Driver is now ~190 lines (was 339), with the embed/add plumbing replaced by one Run call. -drop flag added so callers can opt out of the destructive DELETE-first behavior (default still true to keep the 500K test clean-recall semantics). Unit tests (internal/corpusingest/ingest_test.go, 8/8 PASS): - Pipeline shape: 50 rows / 16 batch → 4 embed + 4 add calls, every ID added exactly once, vectors at correct dimension - DropExisting fires DELETE - 409 on create → reuse existing index - Limit stops early - Empty Text rows skipped (counted as scanned, not added) - Required IndexName + Dimension validation - Context cancel stops mid-pipeline Real bug caught and fixed by the test suite: if embedd ever returns fewer vectors than texts in the request (degraded backend), the addBatch loop would panic with index-out-of-range. Worker now length-checks the response and logs+skips on mismatch. 12-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix). vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:47:18 -05:00
root	c1d96b7b60	matrixd: multi-corpus retrieve+merge — SPEC §3.4 component 2 of 5 Lands the matrix indexer's first piece per docs/SPEC.md §3.4: multi-corpus retrieve+merge with corpus attribution per result. Future components (relevance filter, downgrade gate, learning-loop integration) layer on top of this surface. Architecture: - internal/matrix/retrieve.go — Retriever takes (query, corpora, k, per_corpus_k), parallel-fans across vectord indexes, merges by distance ascending, preserves corpus origin per hit - cmd/matrixd — HTTP service on :3217, fronts /v1/matrix/* - gateway proxy + [matrixd] config + lakehouse.toml entry - Either query_text (matrix calls embedd) or query_vector (caller pre-embedded) — vector takes precedence if both set Error policy: fail-loud on any corpus error. Silent partial returns would lie about coverage, defeating the matrix's whole purpose. Bubbles vectord errors as 502 (upstream), validation as 400. Smoke (scripts/matrix_smoke.sh, 6 assertions PASS first try): - /matrix/corpora lists indexes - Multi-corpus search returns hits from BOTH corpora - Top hit is the globally-closest across all corpora (b-near beats a-near at distance 0.05 vs 0.1 — proves merge) - Metadata round-trips through the merge - Distances ascending in result list - Negative paths: empty corpora → 400, missing corpus → 502, no query → 400 12-smoke regression sweep all green (D1-D6, G1, G1P, G2, storaged_cap, pathway, matrix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:39:17 -05:00
root	a7620c8b6f	PRD: name the product vision — small-model pipeline + 5-loop substrate Adds a "Product vision" section before the Direction-pivot section. Captures the framing J flagged 2026-04-29: the Go refactor is not the goal. The goal is a small-model-driven autonomous pipeline that gets better with each run, with frontier models in audit/oversight, not the hot path. Five loops named explicitly: 1. Knowledge pathway (pathway memory + matrix indexer) 2. Execution (small models on focused context) 3. Observer (refines configs that got the model to a good pathway) 4. Rating + distillation (outcomes fold back into the playbook) 5. Drift (measure when the playbook stops matching reality) Triage / human-in-loop named as the system's job, not an escape hatch. The gate: "playbook + matrix indexer must give the results we're looking for" — single load-bearing acceptance criterion. Why Go after Rust: second-language pass surfaces architectural weaknesses Rust hid; the pipeline must work AS A PIPELINE, not as crates that interact. Maps existing Rust components (✓ pathway, ✓ matrix, ✓ observer, ✓ distillation, ✓ auditor; partial: drift, rating gate, triage). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:17:01 -05:00
root	71b35fb85e	SPEC §1 + §3.4: name matrix indexer as a port target Adds matrix indexer as its own row in the §1 component table and a new §3.4 with port plan. Distinct from vectord (substrate); lives at internal/matrix/ + gateway /v1/matrix/. Five components in dependency order: corpus builders → multi-corpus retrieve+merge → relevance filter → strong-model downgrade gate → learning-loop integration. Locks in the framing J flagged 2026-04-29: in Rust the matrix indexer was emergent across mode.rs + build__corpus.ts + observer /relevance, and earlier port-planning reduced it to "we have vectord." The SPEC now names it explicitly so the port preserves the multi-corpus retrieval shape AND the learning loop, not just the HNSW substrate. Sharding-by-id was investigated as a throughput fix and rejected — corpus-as-shard at the matrix layer is the existing retrieval shape and parallelizes Adds for free. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:12:10 -05:00
root	f1c188323c	vectord: BatchAdd — single-lock variadic batch (Option A) Replaces the per-item Add loop in the HTTP handler with one call to Index.BatchAdd, which acquires the write-lock once and pushes the whole batch through coder/hnsw's variadic Graph.Add. Pre-validation stays in the handler so per-item error messages keep their item-index precision. Microbench (internal/vectord/batch_bench_test.go) at d=768 cosine: N=16 SingleAdd 283µs/op → BatchAdd 170µs/op 1.66× N=128 SingleAdd 7.9ms/op → BatchAdd 7.5ms/op 1.05× N=1024 SingleAdd 87.5ms/op → BatchAdd 83.4ms/op 1.05× Win is biggest at staffing-driver batch sizes (N=16) where per-call lock + validation overhead is a meaningful fraction. At larger N the inner HNSW neighborhood search per insert dominates, which is the load-bearing finding for Option B (sharded indexes): the throughput ceiling lives inside the library, not at the lock, so sharding to N parallel Graphs is the only path to true concurrent-Add throughput. g1, g1p, g2 smokes all PASS post-change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:05:48 -05:00
root	afbb506dbc	pathwayd: HTTP service over internal/pathway · 11/11 smoke gate Network-callable Mem0-style trace memory at :3217, fronted by gateway /v1/pathway/*. Closes the ADR-004 wire-up: store substrate landed in 2a6234f, this lands the HTTP surface + [pathwayd] config + acceptance gate. Smoke proves the architecturally distinctive properties: Revise → History walks the predecessor chain backward (audit trail), Retire excludes from Search default but stays Get-able, AddIdempotent bumps replay_count without replacing — and all survive kill+restart via JSONL log replay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 17:49:42 -05:00
root	2a6234ff82	ADR-004 + internal/pathway: Mem0 versioned trace substrate Closes Sprint 2 design-bar work (audit reports/scrum/sprint-backlog.md): S2.1 — ADR-004 documents the pathway-memory data model S2.2 — pathway port lands with deterministic fixture corpus and full test coverage on day one S2.3 — retired traces are excluded from retrieval (test passes; would fail without the filter) Mem0-style operations: Add / AddIdempotent / Update / Revise / Retire / Get / History / Search. Each operation is a method on Store; persistence is JSONL append-only with corruption recovery on Replay. internal/pathway/types.go Trace + event + SearchFilter + sentinel errors internal/pathway/store.go in-memory state + RWMutex + ops internal/pathway/persistor.go JSONL append-only log with replay internal/pathway/store_test.go 20 test funcs covering all 7 Sprint 2 claim rows + concurrency internal/pathway/persistor_test.go 6 test funcs covering missing- file, corruption recovery, long-line handling, parent-dir auto-create, apply-error skip behavior Sprint 2 claim coverage row-by-row: ADD TestAdd_AssignsUIDAndTimestamps + TestAdd_RejectsInvalidJSON UPDATE TestUpdate_ReplacesContentSameUID + Update_MissingUID_Errors REVISE TestRevise_LinksToPredecessorViaHistory + TestRevise_PredecessorMissing_Errors + TestRevise_ChainOfThree_BackwardWalk RETIRE TestRetire_ExcludedFromSearch + TestRetire_StillAccessibleViaGet + TestRetire_StillAccessibleViaHistory HISTORY/cycle TestHistory_CycleDetected (injected via internal map), TestHistory_PredecessorMissing_TruncatesChain, TestHistory_UnknownUID_ErrorsClean REPLAY/dup TestAddIdempotent_IncrementsReplayCount (locks the "replay preserves original content" rule per ADR-004) CORRUPTION TestPersistor_CorruptedLines_Skipped + TestPersistor_ApplyError_Skipped ROUND-TRIP TestPersistor_RoundTrip locks the full Save → fresh Store → Load → Stats-match contract Two real bugs caught during testing: - Add returned the same Trace stored in the map, so callers holding a reference saw later mutations. Fixed: clone before return (matches Get's contract). Same fix in AddIdempotent + Revise. - Test typo: {"v":different} isn't valid JSON; AddIdempotent's json.Valid rejected it as ErrInvalidContent. Test fixed to use {"v":"different"}; the validation behavior is correct. Skipped this commit (next): - cmd/pathwayd HTTP binary - gateway routing for /v1/pathway/ - end-to-end smoke These add the wire surface; the substrate ships first so the wire layer can be a pure proxy in the next commit. Verified: go test -count=1 ./internal/pathway/ — 26 tests green just verify — vet + test + 9 smokes 34s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 07:23:30 -05:00
root	ad1670d36a	storaged cap smoke — verifies ADR-002 at 300 MiB Closes the "needs heavy integration smoke" follow-up from the ADR-002 commit (423a381). Until now the per-prefix PUT cap was verified only by unit tests + commits' theory; this smoke runs the actual cap path with real bytes. Three assertions, ~2s wall: 1. PUT 300 MiB to _vectors/<key> → 200 (cap raised to 4 GiB for the vectord persistence prefix). 2. PUT same 300 MiB to datasets/<key> → 413 (default 256 MiB cap still protects routine traffic). 3. GET _vectors/<key> → sha256 round-trips (no truncation between cap-raise and S3 multipart streaming). scripts/storaged_cap_smoke.sh Builds storaged + gateway, boots them, generates 300 MiB deterministic /dev/zero payload (sha stable across runs), runs the 3 assertions, cleans up the keys + processes via trap. /dev/zero generation chosen over yes/head pipe — pipefail catches the SIGPIPE from yes when head closes early. just smoke-storaged-cap Wrapper recipe. Outside the main `just verify` chain because 300 MiB payload generation + transfer is MB-heavy. Run after meaningful storaged or vectord-persistence changes. Verified: bash scripts/storaged_cap_smoke.sh — 3/3 PASS · 2s wall just verify — vet + test + 9 smokes still 33s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 07:14:57 -05:00
root	fa56134b90	ADR-003 wiring: Bearer token + IP allowlist middleware Implements the auth posture from ADR-003 (commit 0d18ffa). Two independent layers — Bearer token (constant-time compare via crypto/subtle) and IP allowlist (CIDR set) — composed in shared.Run so every binary inherits the same gate without per-binary wiring. Together with the bind-gate from commit 6af0520, this mechanically closes audit risks R-001 + R-007: - non-loopback bind without auth.token = startup refuse - non-loopback bind WITH auth.token + override env = allowed - loopback bind = all gates open (G0 dev unchanged) internal/shared/auth.go (NEW) RequireAuth(cfg AuthConfig) returns chi-compatible middleware. Empty Token + empty AllowedIPs → pass-through (G0 dev mode). Token-only → 401 Bearer mismatch. AllowedIPs-only → 403 source IP not in CIDR set. Both → both gates apply. /health bypasses both layers (load-balancer / liveness probes shouldn't carry tokens). CIDR parsing pre-runs at boot; bare IP (no /N) treated as /32 (or /128 for IPv6). Invalid entries log warn and drop, fail-loud-but- not-fatal so a typo doesn't kill the binary. Token comparison: subtle.ConstantTimeCompare on the full "Bearer <token>" wire-format string. Length-mismatch returns 0 (per stdlib spec), so wrong-length tokens reject without timing leak. Pre-encoded comparison slice stored in the middleware closure — one allocation per request. Source-IP extraction prefers net.SplitHostPort fallback to RemoteAddr-as-is for httptest compatibility. X-Forwarded-For support is a follow-up when a trusted proxy fronts the gateway (config knob TBD per ADR-003 §"Future"). internal/shared/server.go Run signature: gained AuthConfig parameter (4th arg). /health stays mounted on the outer router (public). Registered routes go inside chi.Group with RequireAuth applied — empty config = transparent group. Added requireAuthOnNonLoopback startup check: non-loopback bind with empty Token = refuse to start (cites R-001 + R-007 by name). internal/shared/config.go AuthConfig type added with TOML tags. Fields: Token, AllowedIPs. Composed into Config under [auth]. cmd/<svc>/main.go × 7 (catalogd, embedd, gateway, ingestd, queryd, storaged, vectord, mcpd is unaffected — stdio doesn't bind a port) Each call site adds cfg.Auth as the 4th arg to shared.Run. No other changes — middleware applies via shared.Run uniformly. internal/shared/auth_test.go (12 test funcs) Empty config pass-through, missing-token 401, wrong-token 401, correct-token 200, raw-token-without-Bearer-prefix 401, /health always public, IP allowlist allow + reject, bare IP /32, both layers when both configured, invalid CIDR drop-with-warn, RemoteAddr shape extraction. The constant-time comparison is verified by inspection (comments in auth.go) plus the existence of the passthrough test (length-mismatch case). Verified: go test -count=1 ./internal/shared/ — all green (was 21, now 33 funcs) just verify — vet + test + 9 smokes 33s just proof contract — 53/0/1 unchanged Smokes + proof harness keep working without any token configuration: default Auth is empty struct → middleware is no-op → existing tests pass unchanged. To exercise the gate, operators set [auth].token in lakehouse.toml (or, per the "future" note in the ADR, via env var). Closes audit findings: R-001 HIGH — fully mechanically closed (was: partial via bind gate) R-007 MED — fully mechanically closed (was: design-only ADR-003) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 07:11:34 -05:00
root	8f4c16fab1	mcpd: Go MCP SDK port — replaces Bun mcp-server tool surface New cmd/mcpd binary using github.com/modelcontextprotocol/go-sdk v1.5.0 over stdio transport. Exposes Lakehouse capabilities as MCP tools: list_datasets, get_manifest, query_sql, embed_text, search_vectors. Each tool proxies to the gateway via HTTP. Replaces the MCP-tool subset of the Rust system's Bun mcp-server.ts (the audit's "split this 2520-line empire" finding from R-005). HTTP demo routes (the staffing co-pilot UI at /api/intelligence/, /headshots/, etc.) stay Bun until G5 cutover — those are demo- specific and depend on matrix-indexer signals not yet ported. Architecture: cmd/mcpd/main.go (235 LoC) main() reads --gateway flag, builds server via buildServer(), runs on StdioTransport. Each tool's args is a typed struct with jsonschema tags (the SDK's canonical pattern); reflection generates the JSON Schema automatically. gatewayClient: thin HTTP wrapper over the configured gateway URL. 30s per-request timeout. 16 MiB tool-response cap. Non-2xx surfaces as IsError CallToolResult (NOT as transport error) so the LLM caller sees the error text and can decide how to react. proxy() handles GET + POST + JSON body uniformly. errorResult() + jsonResult() helpers normalize CallToolResult shape. cmd/mcpd/main_test.go (13 test funcs) Tests the full MCP wire end-to-end without a subprocess: spin up a fake gateway via httptest, build the MCP server pointed at it, connect a client via in-memory transports (NewInMemoryTransports), call each tool. Each tool gets: - happy path (gateway returns 200 → tool returns content) - input validation (missing required fields → IsError) - upstream error (gateway 4xx → tool returns IsError) Plus TestListTools verifies all 5 tools register; TestGatewayUnreachable verifies network-level failures surface as IsError, not panics. Setup for Claude Desktop / Code documented in README: { "mcpServers": { "lakehouse": { "command": "/path/to/bin/mcpd", "args": ["--gateway", "http://127.0.0.1:3110"] } } } Verified: go test -count=1 ./cmd/mcpd/ — 13/13 green just verify — vet + test + 9 smokes 35s Out of scope for this commit: - Resources (mcp.AddResource): not needed yet; tools cover the interactive surface. Add when an LLM-side use case shows up. - Prompts (mcp.AddPrompt): same. - Streamable transports (HTTP, SSE): stdio is the universal one; streamable can be added with srv.Run(ctx, &mcp.StreamableHTTPHandler{}) swap if a daemon-mode deploy makes sense. - mcpd inside the daemon-supervised stack: it's stdio-only and spawned by the MCP client, not run as a service. Adding a daemon-mode (HTTP transport on a port) is a follow-up if MCP consumers want long-lived sessions. This is a tool-surface only port. The Bun mcp-server.ts also serves HTTP demo routes (/api/catalog/datasets, /intelligence/, /headshots/) that depend on the matrix-indexer signals from the Rust system; those stay Bun until G5 cutover when the staffing co-pilot service ports to Go. Direct deps added: github.com/modelcontextprotocol/go-sdk v1.5.0 Transitive (resolved by go mod tidy): github.com/google/jsonschema-go v0.4.2 github.com/yosida95/uritemplate/v3 v3.0.2 golang.org/x/oauth2 v0.35.0 github.com/segmentio/encoding v0.5.4 github.com/golang-jwt/jwt/v5 v5.3.1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 07:00:38 -05:00
root	56844c3f31	embed cache — LRU at /v1/embed for repeat-query elimination Adds CachedProvider wrapping the embedding Provider with a thread-safe LRU keyed on (effective_model, sha256(text)) → []float32. Repeat queries return the stored vector without round-tripping to Ollama. Why this matters: the staffing 500K test (memory project_golang_lakehouse) documented that the staffing co-pilot replays many of the same query texts ("forklift driver IL", "welder Chicago", "warehouse safety", etc). Each repeat paid the ~50ms Ollama round-trip. Cached repeats now serve in <1µs (LRU lookup + sha256 of input). Memory budget: ~3 KiB per entry at d=768. Default 10K entries ≈ 30 MiB. Configurable via [embedd].cache_size; 0 disables (pass-through mode). Per-text caching, not per-batch — a batch with mixed hits/misses only fetches the misses upstream, then merges the result preserving caller input order. Three-text batch with one miss = one upstream call for that one text instead of three. Implementation: internal/embed/cached.go (NEW, 150 LoC) CachedProvider implements Provider; uses hashicorp/golang-lru/v2. Key shape: "<model>:<sha256-hex>". Empty model resolves to defaultModel (request-derived) for the key — NOT res.Model (upstream-derived), so future requests with same input shape hit the same key. Caught by TestCachedProvider_EmptyModelResolvesToDefault. Atomic hit/miss counters + Stats() + HitRate() + Len(). internal/embed/cached_test.go (NEW, 12 test funcs) Pass-through-when-zero, hit-on-repeat, mixed-batch only fetches misses, model-key isolation, empty-model resolves to default, LRU eviction at cap, error propagation, all-hits synthesized without upstream call, hit-rate accumulation, empty-texts rejected, concurrent-safe (50 goroutines × 100 calls), key stability + distinctness. internal/shared/config.go EmbeddConfig.CacheSize (toml: cache_size). Default 10000. cmd/embedd/main.go Wraps Ollama Provider with CachedProvider on startup. Adds /embed/stats endpoint exposing hits / misses / hit_rate / size. Operators check the rate to confirm the cache is working (high rate = good) or sized wrong (low rate + many misses on a workload that should have repeats). cmd/embedd/main_test.go Stats endpoint tests — disabled mode shape, enabled mode tracks hits + misses across repeat calls. One real bug caught by my own test: Initial implementation cached under res.Model (upstream-resolved) rather than effectiveModel (request-resolved). A request with model="" caching under "test-model" (Ollama's default), then a request with model="the-default" (our config default) missing the cache. Fix: always use the request-derived effectiveModel for keys; that's the predictable side. Locked by TestCachedProvider_EmptyModelResolvesToDefault. Verified: go test -count=1 ./internal/embed/ — all 12 cached tests + 6 ollama tests green go test -count=1 ./cmd/embedd/ — stats endpoint tests green just verify — vet + test + 9 smokes 33s Production benefit: ~50ms Ollama round-trip → <1µs cache lookup for cached entries. At 10K-entry default + ~30% repeat rate (typical staffing co-pilot workload), saves several seconds per staffer-query session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:54:30 -05:00
root	fb08232f58	Batch 4: embed fixture-mode — partial R-006 closure Adds cmd/fake_ollama, a minimal Ollama-API-compatible fake that implements just enough surface for embedd to drive end-to-end without a real Ollama install: GET /api/tags — fixed model list including nomic-embed-text POST /api/embeddings — deterministic dim-D vector from sha256(prompt) GET /health — for the smoke's poll_health helper Same prompt → bit-identical vector across runs, machines, and CI nodes. Vectors are NOT semantically meaningful; the fake validates the embed CONTRACT (dimension echo, response shape, status codes, deterministic round-trip), not real semantic ranking. Real ranking still requires real Ollama and lives in scripts/g2_smoke.sh + the integration tier of the proof harness. scripts/g2_smoke_fixtures.sh — full chain smoke against the fake: - Build fake_ollama + embedd + vectord + gateway - Start fake on :11435 (distinct from real Ollama at :11434) - Generate temp lakehouse.toml with provider_url override - Boot embedd/vectord/gateway with --config <override> - 4 assertions: dim=768, deterministic same-text, different-text divergence, bad-model → 4xx/5xx (fake 404 → embedd 502) - Trap-cleanup tears down all 4 binaries + tmp config Wired into the task runner: just smoke-g2-fixtures Closes R-006 partially: - Embed half: ✓ — CI / fresh-clone reviewers without Ollama can now run the embed contract smoke - Storage half: deferred — mocking S3 protocol is non-trivial (multipart, signed URLs, etc.) and MinIO itself is lightweight enough to install via Docker in any CI environment. Documented as Sprint 0 follow-up if a CI system without Docker shows up. What this DOESN'T cover: - Real semantic similarity (use scripts/g2_smoke.sh + real Ollama) - Real Ollama API quirks (timeouts, version-specific shapes, /api/embed batch endpoint that newer versions support) Verified: bash scripts/g2_smoke_fixtures.sh — 4/4 assertions PASS, ~3s wall just verify — vet + test + 9 smokes still green Doesn't replace the existing g2_smoke.sh (which still requires real Ollama and exercises the actual embed semantics). Adds an alternate mode for portability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:22:07 -05:00
root	0f79bce948	Batch 3: cmd/<bin>/main_test.go × 6 — closes R-005 Adds main_test.go for each of the 6 cmd binaries that lacked them (storaged already had main_test.go; that's where the pattern came from). Each test file focuses on the cmd-specific surface — route mounts, body caps, decode/validation paths — without re-testing internal package logic that's covered elsewhere. cmd/catalogd/main_test.go — 6 funcs TestRoutesMounted: chi.Walk asserts /catalog/{register,manifest/*,list} TestHandleRegister_BodyTooLarge: 5 MiB body → 4xx TestHandleRegister_MalformedJSON: 400 TestHandleRegister_EmptyName_400: ErrEmptyName surfaces as 400 TestHandleGetManifest_404 + TestHandleList_EmptyShape cmd/embedd/main_test.go — 8 funcs stubProvider implements embed.Provider deterministically TestRoutesMounted, MalformedJSON_400, EmptyTextRejected_400 (per scrum O-W3), UpstreamError_502 (provider error → 502, not 500), HappyPath_ProviderEcho, BodyTooLarge (4xx range), TestItoa (covers the no-strconv helper) cmd/gateway/main_test.go — 4 funcs TestMustParseUpstream_HappyPaths: 3 valid URLs TestMustParseUpstream_FailureExits: re-execs the test binary in a subprocess with env flag (standard pattern for testing os.Exit callers); subprocess invokes mustParseUpstream("127.0.0.1:3211") [missing scheme]; expects exit non-zero. Same pattern for garbage. TestUpstreamConfigKeys_DocumentedShape: locks the 6 _url keys cmd/ingestd/main_test.go — 7 funcs Stubs both storaged and catalogd via httptest.Server so the cmd layer can be exercised without bringing the full chain up. TestHandleIngest_MissingNameQueryParam: 400 with "name" in body TestHandleIngest_MalformedMultipart: 400 TestHandleIngest_MissingFormFile: 400 (valid multipart, wrong field) TestHandleIngest_BodyTooLarge: 4xx TestEscapeKeyPath: 6-case URL-escape table (apostrophe, space, etc.) TestParquetKeyPath_Format: locks the datasets/<n>/<fp>.parquet shape per scrum C-DRIFT (any rename breaks idempotent re-ingest) cmd/queryd/main_test.go — 6 funcs Tests pre-DB paths (decode, body cap, empty SQL); db.QueryContext itself needs DuckDB so it's covered by GOLAKE-040 in the proof harness, not unit tests. handlers.db = nil here is intentional. TestHandleSQL_EmptySQL_400: 3 cases (empty, whitespace, mixed-WS) TestMaxSQLBodyBytes_Reasonable: locks the 64 KiB constant in a sane range so a refactor can't blow it open TestPrimaryBucket_Constant: locks "primary" — secrets lookup uses this; rename = silent secret-resolution failure at boot cmd/vectord/main_test.go — 14 funcs All 6 routes verified mounted. handlers.persist = nil = pure in-memory mode; persistence is GOLAKE-070 in the proof harness. Coverage of every error branch in handleCreate/Add/Search/Delete: missing index → 404, dim mismatch → 400, empty items → 400, empty id → 400, malformed JSON → 400, body too large → 4xx, happy create → 201, happy list → 200. One real finding caught during writing: Body-cap rejection is sometimes 413 (typed MaxBytesError survives unwrap) and sometimes 400 (decoder wraps it as a generic decode error). Both are valid client-error contracts; the contract isn't "exactly 413" but "fails loud as 4xx, never silent 200 or 5xx." Tests assert 4xx range. The proof harness's proof_assert_status_4xx already had this shape — just bringing the unit tests in line with it. Verified: go test -count=1 -short ./cmd/... — all 7 packages green just verify — vet + test + 9 smokes 35s Closes audit risk R-005 (6/7 cmd/main.go untested). Combined with the proof harness's wiring coverage, every cmd-level handler now has both unit-test and integration-test coverage of the wiring layer. R-005 → CLOSED. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:18:46 -05:00
root	1ec85b0a16	Batch 2: perf baseline — multi-sample + warmup + MAD threshold Replaces single-shot baselines (40% noise floor flagged in Phase E) with noise-aware regression detection. What changed: ingest n=3 runs (was 1) with 3-pass warmup vector_add n=3 runs (was 1) with 3-pass warmup query n=20 samples (unchanged) with 50-pass warmup search n=20 samples (unchanged) with 50-pass warmup RSS n=1 (unchanged — steady-state in G0) Each metric stored as {value: median, mad: median absolute deviation} in baseline.json (schema: v2-multisample-mad). New regression detection: threshold = max(3 * baseline.mad, value * 0.75) REGRESSION iff \|actual - baseline.value\| > threshold AND direction signals worse (lower throughput / higher latency). Why these specific numbers: 3MAD = standard "outside the spread" bound; lets high-variance metrics tolerate their own noise. 75% floor = empirical observation: even with 50 warmups, single- host inter-run variance on bootstrap-cold queryd was consistently 90-130% on this box. 75% catches >75% regressions cleanly while ignoring known noise. lib/metrics.sh: new proof_compute_mad helper computes MAD from a file of one-number-per-line samples. Used for both regen (to write the baseline.mad value) and diff (read from baseline). Honest finding from this iteration's 3 back-to-back diff runs: query_ms shows 90-130% delta from baseline consistently — not random noise but a systematic 2x gap between regen-time and steady-state. The regen captured a particularly fast moment; steady-state is slower. Operator workflow: regenerate the baseline at a known-representative state via `bash tests/proof/run_proof.sh --mode performance --regenerate-baseline` rather than expecting the harness to track a moving target. The harness's value here is the EVIDENCE RECORD (every run captures median+MAD+p95 plus all raw samples in raw/metrics/), not the gate. Even false-positive REGRESSION skips give operators "this run was 20ms vs baseline 10ms" which is informative. Sample counts also written into baseline.json under "samples" so a future audit can verify the methodology that produced the values. Verified across 3 back-to-back runs: ingest_rows_per_sec PASS (delta within 75%, mostly < 10%) vectors_per_sec_add PASS search_ms PASS rss_ PASS query_ms REGRESSION flagged (130/100/90%) — known systematic gap, not bug Closes the "40% noise floor" follow-up from Phase E FINAL_REPORT. Honest about limitations: hard regression gating on a busy single- host setup needs either much bigger sample counts (n≥100), longer warmup, or moving to a dedicated benchmark host. Documented inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:13:47 -05:00
root	0d18ffa780	ADR-003: inter-service auth posture — Bearer + IP allowlist Locks in the auth model that R-001 + R-007 will be retrofitted against. Doc-only — wiring deferred to Sprint 1 when the first non-loopback binding is needed. Decision: Bearer token (from secrets-go.toml [auth] section) + IP allowlist (CIDR list). Both layers required when auth is on; empty token = G0 dev no-op. /health exempt. Implementation shape (when it lands): - internal/shared/auth.go middleware: one chi r.Use line per binary - shared.Run gates: refuses non-loopback bind without configured token - subtle.ConstantTimeCompare for token equality (timing-safe) Alternatives considered + rejected: mTLS — too heavy for single-machine inter-service traffic JWT — buys nothing over Bearer without external IdP IP-only — one stolen IP entry = full access; no defense depth OAuth2 — no external IdP commitment in G0-G3 timeline What this doesn't do: - Doesn't implement (code lands Sprint 1) - Doesn't break G0 dev (empty token = middleware no-op) - Doesn't address gateway→end-user auth (different ADR shape) Closes the design-decision blocker for R-001 and R-007. Wiring ticket: Sprint 1 backlog story S1.2. Also lifts ADR-002 (storaged per-prefix PUT cap) into the doc — it was implemented in 423a381 but not yet recorded as an ADR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:05:59 -05:00
root	423a3817c5	D: storaged per-prefix PUT cap — vectord _vectors/ → 4 GiB Closes the documented 500K-test gap (memory project_golang_lakehouse: "storaged 256 MiB PUT cap blocks single-file LHV1 persistence above ~150K vectors at d=768"). Vectord persistence under "_vectors/" now gets a 4 GiB cap; everything else (parquets, manifests, ingest) keeps the 256 MiB default. Why per-prefix and not "raise globally": - 256 MiB cap is a real DoS protection — runaway clients can't drain the daemon. Raising it for ALL traffic would expand the attack surface for routine paths that have no need. - Per-prefix preserves existing protection while opening the one documented production-scale path. Why not split LHV1 across multiple keys (the alternative): - G1P shipped a single-Put framed format SPECIFICALLY to eliminate the torn-write class (memory: "Single Put eliminates the torn- write class that the 3-way convergent scrum finding identified"). - Multi-key LHV1 would re-introduce the half-saved-state failure mode we just paid to fix. Streaming via existing manager.Uploader is the better architectural answer. Why not bump the cap operationally via env/config: - Future operator-driven cap can drop in cleanly via the maxPutBytesFor function. Started with hardcoded 4 GiB to keep this commit small; config knob is a follow-up if production workloads diverge from the documented 500K-vector ceiling. manager.Uploader is already streaming-multipart on the outbound S3 side; the inbound MaxBytesReader cap is a safety gate, not a memory bottleneck. So raising it for vectord just lets the existing streaming path actually flow, without introducing new memory pressure (4-slot semaphore × 4 GiB worst case = 16 GiB only if all slots simultaneously max out — vanishingly unlikely). Implementation: cmd/storaged/main.go: new constant maxPutBytesVectors = 4 GiB (covers >700K vectors @ d=768) new constant vectorsPrefix = "_vectors/" (synced with vectord.VectorPrefix) new function maxPutBytesFor(key) → cap-by-prefix handlePut: ContentLength check + MaxBytesReader use the per-key cap cmd/storaged/main_test.go (3 new test funcs): TestMaxPutBytesFor: 7 cases incl. nested prefix, substring-but-not- prefix, empty key, parquet/manifest paths. TestVectorPrefixSyncWithVectord: regression test that asserts vectorsPrefix == vectord.VectorPrefix. A future rename surfaces here instead of silently bypassing the larger cap. TestVectorCapAccommodates500KStaffingTest: bounds the cap above the documented production workload (~700 MiB conservative). Verified: go test ./cmd/storaged/ — all green (was 1 func, now 4) just verify — 9 smokes still pass · 32s wall just proof contract — 53/0/1 unchanged Out of scope for this commit (deserves its own): - Heavy integration smoke: 200K dim=768 synthetic vectors → ~700 MiB LHV1 → kill+restart vectord → recall=1. ~5-10 min wall; follow-up if you want production-scale persistence verified end-to-end. Unit tests + existing g1p_smoke cover the wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:00:09 -05:00
root	6af0520ed2	A: fail-loud on non-loopback bind — closes worst case of R-001 shared.Run now refuses to bind a non-loopback address unless the LH_<SERVICE>_ALLOW_NONLOOPBACK=1 env is set. Single change covers all 7 binaries via the existing Run call site; no per-binary wiring needed. Closes the accidental-0.0.0.0 deploy attack surface for R-001: queryd /sql is RCE-equivalent off loopback (DuckDB has filesystem read + COPY TO + read_text), but the gate applies to every binary uniformly so the same posture covers vectord (mutation routes), catalogd (manifest writes), and the others. What passes the gate: 127.0.0.1:port, 127.x.y.z:port (full /8), [::1]:port, localhost:port, OR explicit env LH_<SVC>_ALLOW_NONLOOPBACK=1 What fail-louds: 0.0.0.0:port, [::]:port, :port (all interfaces), any non-loopback IP, any non-localhost hostname, unparseable shapes ("", "no port", garbage) Override env is strict equality "1" — typos like "true"/"yes" do NOT trigger it, so a future operator can't accidentally expose by typing the wrong value. Override fires log a structured warn so the choice is auditable in production. Error message cites the env name AND R-001 by name so operators see the fix path without grepping: "refusing non-loopback bind \"0.0.0.0:3214\" for \"queryd\" (set LH_QUERYD_ALLOW_NONLOOPBACK=1 to override; see audit R-001)" internal/shared/bind.go — requireLoopbackOrOverride + isLoopbackAddr internal/shared/bind_test.go — 7 test funcs incl. table-driven IPv4/IPv6/hostname coverage and per-service env isolation internal/shared/server.go — 1-line gate in Run before listen Verified: go test -short ./internal/shared/ — all green (was 14 funcs, now 21) just verify — vet + test + 9 smokes still 33s Doesn't address R-001's full attack surface (any reachable port can issue arbitrary SQL); ADR-003 + Bearer-token middleware is the follow-up. This commit makes the implicit "localhost-only is the auth layer" guarantee explicit and un-bypassable without explicit env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 05:56:42 -05:00
root	125e1c80b9	tests: close R-002 / R-003 / R-008 — internal/shared, storeclient, queryd/db.go Audit-driven follow-up to the Rust scrum review on the 3 untested HIGH-risk packages. Both the audit (reports/scrum/risk-register.md) and the scrum (tests/real-world/runs/scrum_mojxb5bw/) independently flagged these files as the highest-leverage missing test coverage. internal/shared/server_test.go — 8 test funcs newListener: valid addr, invalid addr (non-numeric port, port out of range, port-already-in-use surfacing as net.OpError). Empty-addr-is-valid: documents the net.Listen quirk that "" binds an OS-picked port — future readers don't need to relitigate. HealthResponse marshal: JSON shape stable, round-trip clean. /health handler reconstructed via httptest.Server: status 200, Content-Type application/json, body fields stable. RegisterRoutes callback: contract verified (callback is invoked with a real chi.Router, mounted route reachable end-to-end). Run bind-failure surface: synchronous error, not a goroutine swallow — the contract Run depends on per the race-safe-startup comment. internal/shared/config_test.go — 6 test funcs DefaultConfig G0 port pinning: every binary's default bind locked in (3110/3211-3216) so a refactor can't silently flip a port. LoadConfig empty path: returns DefaultConfig, no error. LoadConfig missing file: returns DefaultConfig, logs warn (the warn line shows up in test output, captured-but-not-asserted). LoadConfig valid TOML: partial overrides land, unspecified sections keep defaults (TOML decoder leave-alone behavior). LoadConfig invalid TOML: returns wrapped 'parse config' error. LoadConfig unreadable file: skipped under root (root reads 0000); captures the read-error wrap path for non-root contexts. internal/storeclient/client_test.go — 14 test funcs safeKey table-driven: plain segments, single slash, empty, trailing slash, space (→ %20), apostrophe (→ %27), unicode (→ %C3%A9), deep nesting. Locks URL-escape contract per scrum suggestion. recordingServer helper backs Put/Get/Delete/List against httptest.Server: verifies method, path, body bytes round-trip. ErrKeyNotFound on 404 (errors.Is round-trip). Non-OK status wraps body preview into the error chain. Delete accepts both 200 and 204 (S3 vs compatible-store quirk). List parses JSON shape and surfaces query-string prefix. Context cancellation propagates through Put as context.Canceled. internal/queryd/db_test.go — 5 test funcs (with subtests) sqlEscape table-driven: 8 cases including empty, all-quotes, nested apostrophes (the case from the scrum suggestion). redactCreds table-driven: 6 cases — both keys, single keys, empty, multi-occurrence, placeholder-collision (lossy but safe). buildBootstrap statement order: INSTALL → LOAD → CREATE SECRET. buildBootstrap endpoint schemes: http strips + USE_SSL false, https keeps SSL true, no-scheme defaults SSL true (prod ambient). buildBootstrap URL_STYLE: 'path' vs 'vhost' branch. buildBootstrap escapes credential quotes: future SSO-token-with- apostrophe doesn't break out of the SQL string literal — the belt holds when the suspenders snap. Real finding caught by my own test: net.Listen("tcp", "") succeeds (OS-picked port) — captured as TestNewListener_EmptyAddrIsValid so the quirk is documented. Verified: go test -short ./... — every internal/ package now has tests (no more 'no test files' lines for shared/storeclient). just verify — vet + test + 9 smokes green in 33s. just proof contract — 53/0/1 green (no harness regression). Closes: R-002 internal/shared zero tests HIGH R-003 internal/storeclient zero tests HIGH R-008 queryd/db.go untested MED (sqlEscape, redactCreds, CREATE SECRET formation) Composite scrum score should move from 43 → ~46 / 60 — the three HIGH/MED risks closed, internal/shared and internal/storeclient become "tested + load-bearing" instead of "untested + load-bearing." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 05:51:05 -05:00
root	ff9823b871	scrum audit re-run: 35 → 43 / 60 after Phase A-E + S0.3 Re-runs the SCRUM.md framework against HEAD (4840c10) to score the delta from the audit baseline at 91edd43. Composite +8. Scoring deltas: Reproducibility 7 → 9 (just verify, just doctor, pre-push hook) Test Coverage 6 → 8 (168 proof harness assertions; Go-test gaps in shared/storeclient remain) Trust Boundary 7 → 7 (no code change; R-001/R-007 open) Memory Correctness 3 → 4 (vectord persistence proven; Mem0 pathway/playbook still not ported) Deployment Readiness 4 → 5 (just doctor; REPLICATION/systemd open) Maintainability 8 → 8 (spine unchanged; harness obeys CLAUDE_REFACTOR_GUARDRAILS) Risk register changes: R-004 (smokes not gated) CLOSED — just verify + pre-push hook R-005 (cmd/main.go untested) partial — proof harness covers wiring R-012 (empty tests/ dir) CLOSED — populated by harness R-001/R-002/R-003/R-006/R-007/R-008/R-009/R-010 unchanged Sprint 0 progress: S0.1 just doctor DONE S0.3 just verify + pre-push DONE S0.6 tests/ dir cleanup DONE S0.2 just smoke-fixtures open S0.4 cmd/main_test × 6 partial (harness coverage; go-test gap) S0.5 shared/storeclient tests open (HIGH risks still unaddressed) New finding from this rerun (worth recording): Queryd refresh-tick race in 04_query_correctness — cache-warm binaries fire SELECTs faster than queryd's 500ms refresh tick. Caught by integration mode going 104/0/1 → 102/1/1, fixed at 4840c10 with proof_wait_for_sql helper. Exactly the failure-mode the harness was designed to catch. Original 5 audit reports preserved as immutable history at 91edd43; this file documents the delta only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 05:37:45 -05:00
root	4840c10311	proof harness: fix queryd refresh-tick race in 04_query_correctness Caught by the audit rerun: with cache-warm binaries, 04 fires its first SELECT faster than queryd's 500ms refresh tick — Q1 returned 400 ("table not found") even though 03_ingest had registered the manifest. Subsequent queries (after the next tick) succeeded. This is an eventual-consistency wait, not a retry — queryd's contract is that views appear within one tick of catalogd having the manifest. Production code does not need changing. Added to lib/http.sh: proof_wait_for_sql <budget_sec> <sql> polls a SQL probe until it returns 200 or budget elapses; emits no evidence (test setup, not a claim). Used in 04_query_correctness: Wait up to 5s for queryd to have the view before running the 5 SQL assertions. Skip-with-loud-reason if the view never appears. Verified: integration mode back to 104 pass / 0 fail / 1 skip after fix. The skip is the unchanged GOLAKE-085 informational record. This is exactly the kind of finding the harness was designed to surface — the regression existed in the codebase the moment Phase D shipped, but only fired when the next compare run hit cache-warm timing. Without the harness, it would have surfaced on a CI run weeks from now and been hard to bisect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 05:36:28 -05:00
root	4bb6548cbc	proof harness Phase E: FINAL_REPORT.md answers the 9 mandated questions Per docs/TEST_PROOF_SCOPE.md, this is the closing deliverable for the proof harness: a single document that names what's proven, what's partially proven, what failed, what was skipped and why, what evidence exists for each, what bottlenecks were measured, what contract drift was found, what refactor risks remain, and what to fix first. Per-run report dirs (tests/proof/reports/proof-<ts>/) keep their existing summary.md + summary.json + raw/ structure — they are the replayable evidence chain. FINAL_REPORT.md is the stable, repo-tracked synthesis pointing at them. Headline findings (no surprises — harness behaves as designed): - 24 claims encoded; 22 fully proven, 1 informational (GOLAKE-085 duplicate vector ID, contract not yet specified), 0 failed. - 4 contract-drift findings recorded as canonical: vectord add body field is `items` not `vectors`, search response is `results` not `hits`, index info is `length` not `count`, status codes 201/204 not 200. All caught during Phase B; all now pinned by the harness. - Performance baseline shows queryd as the largest RSS (69 MiB, DuckDB process); single-sample noise floor is ~40% — tightening to multi-sample medians is a documented Sprint follow-up. - HIGH-risk audit findings (R-001 queryd /sql, R-002/R-003 untested shared+storeclient) are NOT closed by the harness — it's a multiplier, not a replacement for unit tests + auth posture. The proof harness is complete. 11 cases · 3 modes · 168 assertions peak across all tiers · ~22s total wall (contract+integration+perf). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 05:32:56 -05:00
root	175ad59cb3	proof harness Phase D: performance baseline · 1000-row ingest, p50/p95 GOLAKE-100. First run writes tests/proof/baseline.json; subsequent runs diff against it. >10% regression emits a SKIP with REGRESSION detail (not a fail — perf claim is required:false in claims.yaml so the gate stays green; the human summary tells the regression story honestly). Skip-with-loud-reason if any earlier case in the run failed, per spec "performance only after contract+integration pass." Workload (deterministic, repeatable): ingest 1000-row CSV (5 roles × 5 cities × seeded scores) → /v1/ingest query SELECT count(*) ×20 against the just-ingested dataset vector add 200 dim=4 vectors with formulaic content (no Ollama) search ×20 against the perf index with a fixed query vector RSS per-service post-workload sample via /proc/<pid>/status Recorded metrics: ingest_rows_per_sec, query_p50_ms, query_p95_ms, vectors_per_sec_add, search_p50_ms, search_p95_ms, rss_{storaged,catalogd,ingestd,queryd,vectord,embedd,gateway}_mb baseline.json on this box (committed): 25000 rows/sec ingest · 17ms p50 / 24ms p95 query 6250 vectors/sec add · 8ms p50 / 20ms p95 search queryd 69 MiB · vectord 14 MiB · others 11-29 MiB Honest measurement-design finding from the very first compare run: back-to-back runs surfaced -41% ingest and +29% query p50 — pure disk-cache + queryd-cold-start noise. Single-sample baselines have real noise floor ≈40%. Recorded as REGRESSION skips so the human summary surfaces it, not a code regression. Tightening the threshold or moving to multi-sample medians is a Phase E recommendation. Verified end-to-end: just proof contract — 53 pass · 1 skip · ~4s just proof integration — 104 pass · 1 skip · ~8s just proof performance — 110 pass · 3 skip · ~10s just verify — 9 smokes still green · 29s All 11 cases (4 contract + 6 integration + 1 performance) deterministic end-to-end. Phase E (final report against the 9 mandated questions) is the last piece. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 05:30:11 -05:00
root	1313eb2173	proof harness Phase C: 6 integration cases · 104/0/1 green Adds the integration tier — full chain CSV→Parquet→SQL and full text→embed→vector→search. All 10 cases (4 contract + 6 integration) end-to-end deterministic; 8s wall total. Cases added: 01_storage_roundtrip.sh GOLAKE-010-012. PUT 1KiB → GET sha256-equal → LIST contains key → DELETE 200/204 → GET 404. Deterministic key under proof/<case_id>/ so concurrent runs don't collide. 02_catalog_manifest.sh GOLAKE-020-022. Fresh register existing=false → manifest read matches → list contains dataset_id → idempotent re-register existing=true with stable dataset_id → schema-drift register 409 (the ADR-020 contract). Per-run unique name via PROOF_RUN_ID so existing=false is meaningful. 03_ingest_csv_to_parquet.sh GOLAKE-030. workers.csv (5 rows) via /v1/ingest multipart → parquet object on storaged → catalog manifest with row_count=5. Verifies content-addressed key shape (datasets/<n>/<fp>.parquet). 04_query_correctness.sh GOLAKE-040. The 5 SQL assertions from fixtures/expected/queries.json against the workers fixture: count=5, Chicago=2, max=95, safety→Barbara, Houston avg=89.5. Iterates the YAML claims, runs each query, compares response columns to expected values. 06_vector_add_search.sh integration extension GOLAKE-051. text → /v1/embed (4 docs from fixtures/text/docs.txt) → vectord add → search by query embedding. Top-1 ID per query asserted against fixtures/expected/rankings.json. First run (or --regenerate-rankings) writes the fixture and emits a skip with explicit reason; subsequent runs assert against it. 07_vector_persistence_restart.sh GOLAKE-070. add 4 unit-basis vectors → search → record top-1 distance → SIGTERM vectord → restart with the same --config → poll /health for 8s → search again → top-1 ID and distance match bit-identically. Skips with reason if vectord PID can't be found or post-restart bind times out. Two harness improvements landed alongside: run_proof.sh writes a temp lakehouse_proof.toml with refresh_every="500ms" override and passes --config to all booted binaries. Production default is 30s; 04_query_correctness needs queryd to pick up the new view within a tick. Production config unchanged. cleanup() now pgreps for any orphan bin/<svc> processes (anchored to start-of-argv per memory feedback_pkill_scope.md) so a case that restarts a service mid-run still gets cleaned up. lib/http.sh adds proof_call(case_id, probe, method, url, args...) — escape hatch for cases that need raw curl args (multipart -F, custom headers). Used by 03_ingest for the multipart upload that conflicts with proof_post's --data + Content-Type defaults. lib/env.sh exports PROOF_RUN_ID — short unique id derived from the report directory timestamp. Used by 02 and 07 for fresh-each-run state isolation. Two real findings recorded as evidence (no code changes): - rankings.json fixture pinned: 4 queries → 4 distinct top-1 docs via nomic-embed-text. A model swap that changes ranking now fails the harness loudly; --regenerate-rankings is the override. - vectord persistence kill+restart preserves top-1 distance bit-identically — the LHV1 single-Put framed format from G1P round-trips exactly through Save/Load. Verified end-to-end: just proof contract — 53 pass (4 cases) just proof integration — 104 pass (10 cases) · 8s wall just verify — 9 smokes still green · 33s wall Phase D (performance baseline) lands next: 10_perf_baseline measures rows/sec ingest, vectors/sec add, p50/p95 query+search latency, RSS, CPU. First run writes tests/proof/baseline.json; later runs diff against it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 05:26:00 -05:00
root	6d18394416	proof harness Phase B: 4 contract cases · 53/0/1 green Added the contract tier above 00_health canary. All 5 contract cases now cover GOLAKE-001-003, 050, 060-061, 080-085 — 53 assertions pass, 1 informational skip, 0 fail. Wall: 4s end-to-end (cached binaries). Cases: 05_embedding_contract.sh GOLAKE-050. POST /v1/embed with one short text → asserts dim=768, one vector returned, vector length matches dimension, sum of squared elements > 0 (proxy for non-zero), response.model echoed. Skips with explicit reason if Ollama is unreachable (502 from embedd) — per spec hard rule "skipped tests do not appear as passed." 06_vector_add_search.sh GOLAKE-060 + GOLAKE-061. Synthetic dim=4 unit basis vectors. Create index → add 3 vectors → get-index returns length=3 → search([1,0,0,0],k=3) returns v1 at rank 1 with distance < 0.001. Cleanup with DELETE. No embedd dependency — pure contract layer. 08_gateway_contracts.sh GOLAKE-003. For each /v1/* route, asserts gateway and direct upstream return identical status AND identical response body (sha256 match). Confirms gateway is a proxy not a transformer. Status passthrough verified on both 200 path (storage/list, catalog/list) and 4xx path (sql empty body → 400 from queryd). 09_failure_modes.sh GOLAKE-080..085. Six failure-mode contracts: 080 malformed JSON → 4xx on catalog/ingest/sql/embed 081 missing required field → 4xx on catalog/vectors/embed 082 bad SQL → 4xx with non-empty error body 083 vector dim mismatch → 4xx 084 missing storage object → 404 085 duplicate vector ID → INFORMATIONAL (spec says required:false) first/second statuses recorded as evidence; contract decided later from the recorded record. Two new lib helpers in lib/assert.sh: proof_assert_status_in <id> <claim> "200 201 204" <probe> pass if status is in the space-separated list. Used for delete-returns-200-or-204 case where vectord returns 204. proof_assert_status_4xx <id> <claim> <probe> pass if status in [400, 500). Used for failure modes where the specific 4xx code may vary (400 vs 422 vs 409). Records actual code as evidence. Two real contract findings recorded by the harness during build: - vectord add expects {"items": [...]}, not {"vectors": [...]}. My initial test sent the wrong field; would have masked the bug forever in CI. The harness caught it via the assertion failure. - vectord create returns 201 Created, delete returns 204 No Content. Documented in the test fixtures as canonical. Regression: just verify wall 33s, vet + test + 9 smokes still green. Phase C (integration) lands next: 01_storage_roundtrip, 02_catalog_manifest, 03_ingest_csv_to_parquet, 04_query_correctness, 05/06 integration extends, 07_vector_persistence_restart. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 05:15:04 -05:00
root	a81291e38c	proof harness Phase A: scaffolding + canary case green Per docs/TEST_PROOF_SCOPE.md, building the claims-verification tier above the smoke chain. This commit lays the scaffolding and proves the orchestrator end-to-end with one canary case (00_health). What landed: tests/proof/ README.md how to read a report, layout, modes claims.yaml 24 claims enumerated (GOLAKE-001..100) run_proof.sh orchestrator with --mode {contract\|integration\|performance} and --no-bootstrap / --regenerate-{rankings,baseline} lib/ env.sh service URLs, report dir, mode, git context http.sh curl wrappers writing per-probe JSON + body + headers assert.sh proof_assert_{eq,ne,contains,lt,gt,status,json_eq} + proof_skip — each emits one JSONL record per call metrics.sh start/stop timers, value capture, RSS sampling, percentile compute (for Phase D) cases/ 00_health.sh canary — gateway + 6 services /health → 200, body identifies service, latency < 500ms (21 assertions) fixtures/ csv/workers.csv spec's 5-row deterministic CSV text/docs.txt 4 deterministic vector docs expected/queries.json expected results for the 5 SQL assertions Wired into the task runner: just proof contract # canary only this commit just proof integration # Phase C just proof performance # Phase D .gitignore: /tests/proof/reports/* with !.gitkeep — same pattern as reports/scrum/_evidence/. Per-run output is a runtime artifact. Specs landed alongside (J's drops): docs/TEST_PROOF_SCOPE.md the harness contract this implements docs/CLAUDE_REFACTOR_GUARDRAILS.md process discipline this harness obeys Verified end-to-end (cached binaries): just proof contract wall < 2s, 21 pass / 0 fail / 0 skip just verify wall 31s, vet + test + 9 smokes still green Two bugs fixed during canary run, both in run_proof.sh aggregation: - grep -c exits 1 on zero matches; the `\|\| echo 0` form concatenated "0\n0" and broke jq --argjson + integer comparison. Fixed via a _count helper that captures count-or-zero cleanly. - per-case table iterated case scripts (filename-based) but cases write evidence under CASE_ID. Switched to JSONL-file iteration so multi-case scripts work and the mapping is faithful. Phase B (contract cases) lands next: 05_embedding, 06_vector_add, 08_gateway_contracts, 09_failure_modes. Each sourcing the same lib helpers and writing to the same report shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 05:08:51 -05:00
root	e31638204d	S0.3: just verify + pre-push hook gates the smoke chain Sprint 0 / R-004 / GATE-0.4 — the 9-smoke chain is no longer documentation only. One command (`just verify`) runs vet + tests + all 9 smokes; pre-push hook calls it; a regression cannot leave this machine without explicit --no-verify override. Recipes: just verify full gate (33s wall on this box) just smoke <day> single smoke (d1..d6, g1, g1p, g2) just smoke-all all 9 smokes only just doctor dep probe with structured output (--json for CI / pre-push) just install-hooks install .git/hooks/pre-push just fmt\|vet\|test\|build\|clean scripts/doctor.sh probes Go ≥1.25, gcc, MinIO at :9000 with bucket lakehouse-go-primary, Ollama at :11434 with nomic-embed-text loaded, /etc/lakehouse/secrets-go.toml with [s3.primary]. Each missing dep prints its install fix command. JSON mode emits the same shape for CI / pre-push consumers. README updated with the task-runner section + just install-hooks on cold-start. Hooks live in .git/hooks/ (untracked); install recipe recreates them on a fresh clone. PATH note: justfile prepends /usr/local/go/bin so recipes find Go without depending on the parent shell's PATH (ADR-001 §1.x lives go there). Verified: just verify exits 0 in 33s wall (vet ~0.1s + test ~0.1s + 9 smokes deterministic per audit baseline). Pre-push hook installed and bash -n clean. Closes audit risk R-004 (smokes not gated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 04:56:50 -05:00
root	91edd43164	scrum audit: 5 reports under reports/scrum/ · score 35/60 Adapts docs/SCRUM.md framework (originally written for the matrix-agent-validated repo) to the Go rewrite. Five deliverables: golang-lakehouse-scrum-test.md top-line + scoring + verdict risk-register.md 12 findings, R-001..R-012 claim-coverage-table.md claim/test/risk for Sprint 2 sprint-backlog.md 5 sprints, ~2 weeks of work acceptance-gates.md DoD as runnable commands Every claim cites file:line, command output, or "missing evidence." Smoke chain ran clean (33s wall, all 9 PASS) and is captured in reports/scrum/_evidence/smoke_chain.log (gitignored — runtime artifact). Scoring: Reproducibility 7/10 9 smokes deterministic, no just/CI gate Test Coverage 6/10 internal/ packages tested, 6/7 cmd/ aren't Trust Boundary 7/10 escapes ok, zero auth, /sql is RCE-eq off-loopback Memory Correctness 3/10 pathway/playbook/observer not yet ported Deployment Readiness 4/10 no REPLICATION, no env template, no systemd Maintainability 8/10 no god-files, 7 lean binaries, ADRs current Top three risks: R-001 HIGH queryd /sql + DuckDB + non-loopback bind = RCE-equivalent R-002 HIGH internal/shared (server.go + config.go) zero tests R-003 HIGH internal/storeclient zero tests, used by 2 services R-004 MED 9-smoke chain green but not gated (no justfile/hook) The audit is the work; refactors come after. Sprint 0 owns coverage + CI gating; Sprint 1 owns trust-boundary decisions; Sprints 2-3 are mostly design-bar work for unbuilt agent components. .gitignore exception: /reports/* + !/reports/scrum/ keeps reports/ a runtime-artifact directory while exposing reports/scrum/ as tracked documentation. Mirrors the pattern future audit passes will land in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 04:51:47 -05:00
root	1f700e731d	Staffing scale test: full 500K through gateway → embedd → vectord pipeline scripts/staffing_500k/main.go: driver that reads workers_500k.csv, embeds combined-text per worker via /v1/embed, adds to vectord index "workers_500k", runs canonical staffing queries against the populated index. Reproducible end-to-end test of the staffing co-pilot pipeline at production scale. Run results (2026-04-29 ~02:30): 500,000 vectors ingested in 35m 36s (~234/sec avg) vectord peak RSS 4.5 GB (~9 KB/vector incl. HNSW graph) Query latency: embed 40-59ms + search 1-3ms = ~50ms end-to-end GPU avg ~65% (Ollama not the bottleneck — vectord Add is) Semantic recall on canonical queries: "electrician with industrial wiring": top 2 are literal Electricians (d=0.30) "CNC operator with first article": Assembler / Quality Techs (adjacent, d=0.24) "forklift driver OSHA-30": warehouse roles (d=0.33) "warehouse picker night shift bilingual": Material Handlers (d=0.31) "dental hygienist": Production Workers at d=0.49+ — correctly LOW-similarity, signals "no dental hygienists in this manufacturing dataset" rather than hallucinating a fake match. Documented gaps: - storaged's 256 MiB PUT cap blocks single-file LHV1 persistence above ~150K vectors at d=768. Test ran with persistence disabled. - vectord Add is RWMutex-serialized — with GPU at 65% util this is the throughput cap. Concurrent Adds would be 2-3x faster but require careful audit of coder/hnsw thread-safety (G1 scrum documented two known quirks). PHASE_G0_KICKOFF.md gains a "Staffing scale test" section with full metrics + the gaps-surfaced list. The architectural payoff is real: six binaries, one HTTP route, ~50ms from text query to top-K semantically-relevant workers across 500K records. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 02:31:30 -05:00
root	0cb29cda15	docs: README + PHASE_G0_KICKOFF reflect post-G0 state (G1, G1P, G2) README was stuck on "Pre-Phase G0, implementation has not started" while we shipped through G2. Updated to reflect the current 7-binary service inventory, the 9 acceptance smokes, the cold-start deps (MinIO bucket, Ollama with nomic-embed-text, secrets-go.toml). PHASE_G0_KICKOFF gains a "Post-G0 work" pointer at the end — brief table mapping each G1+/G2 commit to its smoke + scrum-fix count. Full per-day detail stays in commit messages and the project memory file. No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 01:45:59 -05:00
root	9ee7fc5550	G2: embedd — text → vector via Ollama · 2 scrum fixes Bridges the missing piece for the staffing co-pilot: text inputs to vectord-shaped vectors. Standalone cmd/embedd on :3216 fronted by gateway at /v1/embed. Pluggable embed.Provider interface (G2 ships Ollama; OpenAI/Voyage swap in via the same interface in G3+). Wire format: POST /v1/embed {"texts":[...], "model":"..."} // model optional → 200 {"model","dimension","vectors":[[...]]} Default model: nomic-embed-text (768-d). Ollama returns float64; provider converts to float32 at the boundary so vectors flow through vectord/HNSW without re-conversion. Acceptance smoke 5/5 PASS — including the architectural payoff: end-to-end embed → vectord add → search by re-embedded text returns recall=1 at distance 5.96e-8 (float32 precision noise on identical unit vectors). The staffing co-pilot pipeline (text → vector → similarity search) is now functional end-to-end. All 9 smokes (D1-D6 + G1 + G1P + G2) PASS deterministically. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 0 BLOCK + 4 WARN + 3 INFO - Kimi K2-0905 (openrouter): 0 BLOCK + 2 WARN + 1 INFO - Qwen3-coder (openrouter): "No BLOCKs" (3 tokens) Fixed (2 — 1 convergent + 1 single-reviewer): C1 (Opus + Kimi convergent WARN): per-text 60s timeout × N-text batch was up to N×60s with no batch-level cap. One stuck Ollama call would stall the whole handler indefinitely. Fix: context.WithTimeout(r.Context(), 60s) wraps the entire batch. O-W3 (Opus WARN): empty strings in texts went to Ollama unchecked, producing version-dependent garbage. Fix: reject "" with 400 at the handler boundary so callers get a deterministic answer instead of an upstream-conditional 502. Deferred (4): drainAndClose 64KiB cap (matches G0 pattern), no concurrency limit on /embed (single-tenant G2), missing Accept header (exotic-proxy concern), MaxBytesError string-match redundancy (paranoia layer kept consistent across codebase). Zero false positives this round — Qwen returned 3 tokens "No BLOCKs" and the other two reviewers' findings were all real. Setup confirmed: Ollama 0.21.0 on :11434 with nomic-embed-text loaded. Per-text /api/embeddings used (forward-compat with 0.21+); newer 0.4+ /api/embed batch endpoint can swap in via the Provider interface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 01:42:27 -05:00
root	8b92518d21	G1P: vectord persistence to storaged + scrum (3 fixes incl. 3-way convergent) Adds optional persistence to vectord (G1's HNSW vector search). Single- file framed format per index — eliminates the torn-write class that the 3-way convergent scrum finding identified: _vectors/<name>.lhv1 — single binary blob: [4 bytes magic "LHV1"] [4 bytes envelope_len uint32 BE] [envelope bytes — JSON params + metadata + version] [graph bytes — raw hnsw.Graph.Export] Pre-extraction: internal/catalogd/store_client.go → internal/storeclient/ shared package, since both catalogd and vectord need it. Same pattern as the pre-D5 catalogclient extraction. Optional via [vectord].storaged_url config (empty = ephemeral mode). On startup: List + Load each persisted index. After Create / batch Add / DELETE: Save (or Delete from storaged). Save failures are logged-not- fatal — in-memory state is the source of truth in flight. Acceptance smoke G1P 8/8 PASS — kill+restart preserves state, post- restart search returns dist=0 (graph round-trips exactly), DELETE removes the file, post-delete restart shows count=0. All 8 smokes (D1-D6 + G1 + G1P) PASS deterministically. The g1_smoke gained scripts/g1_smoke.toml that disables persistence so the in-memory API test stays decoupled from any rehydrate-from-storaged state contamination. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 1 BLOCK + 5 WARN + 3 INFO - Kimi K2-0905 (openrouter): 1 BLOCK + 2 WARN - Qwen3-coder (openrouter): 2 BLOCK + 2 WARN + 1 INFO Fixed (3 — 1 convergent + 2 single-reviewer): C1 (Opus + Kimi + Qwen 3-WAY CONVERGENT WARN): Save was non-atomic across two PUTs — envelope-succeeds + graph-fails left a half- saved index that passed the "both present" List filter and silently mismatched metadata against vectors on Load. Fix: collapse to single framed file (no torn-write window possible). O-B1 (Opus BLOCK): isNotFound substring-matched "key not found" against the wrapped error message — brittle, any 5xx body containing that text would silently misclassify as missing. Fix: errors.Is(err, storeclient.ErrKeyNotFound). O-I3 (Opus INFO): handleAdd pre-validation only covered id+dim; NaN/Inf/zero-norm could still fail mid-batch leaving partial commits. Fix: extend pre-validation to call ValidateVector (newly exported) per item before any commit. Dismissed (3 false positives): K-B1 + Q-B1 ("safeKey double-escapes %2F segments") — false convergent. Wire-protocol escape is decoded by storaged's chi router on the way in; on-disk key is the original literal. %2F round-trips correctly through PathEscape → URL → chi decode → S3 key. Q-B2 ("List vulnerable to race conditions") — vectord is single- process; no concurrent Save against List in the same vectord. Deferred (3): rehydrate per-index timeout (G2+ multi-index scale), saveAfter request ctx (matches G0 timeout deferral), Encode RLock during slow writer (documented as buffer-only API). The C1 finding is the strongest signal of the cross-lineage filter: three independent reviewers all flagged the same torn-write hazard. Single-file framing eliminates the class — there's now no Persistor state where envelope and graph can disagree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 01:33:23 -05:00
root	b8c072cf0b	G1: vectord — HNSW vector search via coder/hnsw · 6 scrum fixes applied First G1+ piece. Standalone vectord service with in-memory HNSW indexes keyed by string IDs and optional opaque JSON metadata. Wraps github.com/coder/hnsw v0.6.1 (pure Go, no cgo). New port :3215 with /v1/vectors/* routed through gateway. API: POST /v1/vectors/index create GET /v1/vectors/index list GET /v1/vectors/index/{name} get info DELETE /v1/vectors/index/{name} POST /v1/vectors/index/{name}/add (batch) POST /v1/vectors/index/{name}/search Acceptance smoke 7/7 PASS — including recall=1 on inserted vector w-042 (cosine distance 5.96e-8, float32 precision noise), 200- vector batch round-trip, dim mismatch → 400, missing index → 404, duplicate create → 409. Two upstream library quirks worked around in the wrapper: 1. coder/hnsw.Add panics with "node not added" on re-adding an existing key (length-invariant fires because internal delete+re-add doesn't change Len). Pre-Delete fixes for n>1. 2. Delete of the LAST node leaves layers[0] non-empty but entryless; next Add SIGSEGVs in Dims(). Workaround: when re-adding to a 1-node graph, recreate the underlying graph fresh via resetGraphLocked(). Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 0 BLOCK + 4 WARN + 3 INFO - Kimi K2-0905 (openrouter): 2 BLOCK + 2 WARN + 1 INFO - Qwen3-coder (openrouter): "No BLOCKs" (4 tokens) Fixed (4 real + 2 cleanup): O-W1: Lookup returned the raw []float32 from coder/hnsw — caller mutation would corrupt index. Now copies before return. O-W3: NaN/Inf vectors poison HNSW (distance comparisons return false for both < and >, breaking heap invariants). Zero-norm under cosine produces NaN. Now validated at Add time. K-B1: Re-adding with nil metadata silently cleared the existing entry — JSON-omitted "metadata" field deserializes as nil, making upsert non-idempotent. Now nil = "leave alone"; explicit {} or Delete to clear. O-W4: Batch Add with mid-batch failure left items 0..N-1 committed and item N rejected. Now pre-validates all IDs+dims before any Add. O-I1: jsonItoa hand-roll replaced with strconv.Itoa — no measured allocation win. O-I2: distanceFn re-resolved per Search → use stored i.g.Distance. Dismissed (2 false positives): K-B2 "MaxBytesReader applied after full read" — false, applied BEFORE Decode in decodeJSON K-W1 "Search distances under read lock might see invalidated slices from concurrent Add" — false, RWMutex serializes write-lock during Add against read-lock during Search Deferred (3): HTTP server timeouts (consistent G0 punt), Content-Type validation (internal service behind gateway), Lookup dim assertion (in-memory state can't drift). The K-B1 finding is worth pausing on: nil metadata on re-add is the kind of API ergonomics bug only a code-reading reviewer catches — smoke would never detect it because the smoke always sends explicit metadata. Three lines changed in Add; the resulting API matches what callers actually expect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:50:28 -05:00
root	d023b07b30	Real-scale validation post-G0: configurable ingest cap + workers_500k metrics Validated G0 substrate against the production workers_500k.parquet dataset (18 cols × 500,000 rows). Findings + one applied fix: Finding #1 (FIXED): ingestd's hardcoded 256 MiB cap rejected the 500K CSV (344 MiB) with 413. Cap fired correctly, no OOM. Extracted to [ingestd].max_ingest_bytes config field; default 256 MiB, override per deployment for known-large workloads. With cap bumped to 512 MiB, 500K ingest succeeds in 3.12s with ingestd peak RSS 209 MiB. Finding #2 (deferred): ingestd doesn't release memory between ingests. Go runtime conservative; long-running daemon, fine. Finding #3: DuckDB-via-httpfs is healthy at 500K. GROUP BY 45ms, count(*) 24ms, AVG 47ms, schema introspection 25ms. Sub-linear scaling vs 100K — the s3:// read path is not a bottleneck. Finding #4: ADR-010 type inference correctly handled real staffing data. worker_id → BIGINT, numeric scores → DOUBLE, multi-line resume_text → VARCHAR. 1000-row sample sufficient. Finding #5: Go's encoding/csv handles RFC 4180 quoted-comma fields and multi-line quoted text without LazyQuotes — confirming the D4 scrum's dismissal of Qwen's BLOCK on this point. Net: substrate handles production-scale data with one config knob. No correctness issues, no OOMs, no silent type errors. All 6 G0 smokes still PASS after the cap-config change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:32:08 -05:00
root	b1d52306ad	G0 D6: gateway reverse proxy fronting all 4 backing services · 2 scrum fixes · G0 COMPLETE Last day of Phase G0. Gateway promotes the D1 stub endpoints into real reverse-proxies on :3110 fronting storaged + catalogd + ingestd + queryd. /v1 prefix lives at the edge — internal services route on /storage, /catalog, /ingest, /sql, with the prefix stripped by a custom Director per Kimi K2's D1-plan finding. Routes: /v1/storage/* → storaged /v1/catalog/* → catalogd /v1/ingest → ingestd /v1/sql → queryd Acceptance smoke 6/6 PASS — every assertion goes through :3110, none direct to backing services. Full ingest → storage → catalog → query round-trip verified end-to-end. The smoke's "rows[0].name=Alice" assertion is the architectural payoff: five binaries, six HTTP routes, one round-trip through one edge. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 1 BLOCK + 2 WARN + 2 INFO - Kimi K2-0905 (openrouter): 1 BLOCK + 3 WARN + 1 INFO (3 false positives, all from one wrong TrimPrefix theory) - Qwen3-coder (openrouter): 5 completion tokens — "No BLOCKs." Fixed (2, both Opus single-reviewer): O-BLOCK: Director path stripping fails if upstream URL has a non-empty path. The default Director's singleJoiningSlash runs BEFORE the custom code, so an upstream like http://host/api produces /api/v1/storage/... after the join — then TrimPrefix("/v1") is a no-op because the string starts with /api. Fix: strip /v1 BEFORE calling origDirector. New TestProxy_SubPathUpstream regression locks this in. Today: bare-host URLs only, dormant — but moving gateway behind a sub-path in prod would have silently 404'd. O-WARN2: url.Parse is permissive — typo "127.0.0.1:3211" (no scheme) parses fine, produces empty Host, every request 502s. mustParseUpstream fail-fast at startup with a clear message naming the offending config field. Dismissed (3, all Kimi, same false TrimPrefix theory): K-BLOCK "TrimPrefix loops forever on //v1storage" — false, single check-and-trim, no loop K-WARN "no upper bound on repeated // removal" — same false theory K-WARN "goroutines leak if upstream parse fails while binaries running" — confused scope; binaries are separate OS processes launched by the smoke script D1 smoke updated (post-D6): the 501 stub probes are gone (gateway no longer stubs /v1/ingest and /v1/sql). Replaced with proxy probes that verify gateway forwards malformed requests to ingestd and queryd. Launch order changed from parallel to dep-ordered (storaged → catalogd → ingestd → queryd → gateway) since catalogd's rehydrate now needs storaged, queryd's initial Refresh needs catalogd. All six G0 smokes (D1 through D6) PASS end-to-end after every fix round. Phase G0 substrate is complete: 5 binaries, 6 routes, 25 fixes applied across 6 days from cross-lineage review. G1+ next: gRPC adapters, Lance/HNSW vector indices, Go MCP SDK port, distillation rebuild, observer + Langfuse integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:21:54 -05:00
root	9e9e4c26a4	G0 D5: queryd DuckDB SELECT over Parquet via httpfs · 4 scrum fixes Phase G0 Day 5 ships queryd: in-memory DuckDB with custom Connector that runs INSTALL httpfs / LOAD httpfs / CREATE OR REPLACE SECRET (TYPE S3) on every new connection, sourced from SecretsProvider + shared.S3Config. SetMaxOpenConns(1) so registrar's CREATE VIEWs and handler's SELECTs serialize through one connection (avoids cross- connection MVCC visibility edge cases). Registrar.Refresh reads catalogd /catalog/list, runs CREATE OR REPLACE VIEW "name" AS SELECT * FROM read_parquet('s3://bucket/key') per manifest, drops views for removed manifests, skips on unchanged updated_at (the implicit etag). Drop pass runs BEFORE create pass so a poison manifest can't block other manifest refreshes (post-scrum C1 fix). POST /sql with JSON body {"sql":"…"} returns {"columns":[{"name":"id","type":"BIGINT"},…], "rows":[[…]], "row_count":N}. []byte → string conversion so VARCHAR rows JSON-encode as text. 30s default refresh ticker, configurable via [queryd].refresh_every. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 1 BLOCK + 4 WARN + 4 INFO - Kimi K2-0905 (openrouter): 2 BLOCK + 2 WARN + 1 INFO - Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO Fixed (4): C1 (Opus + Kimi convergent): Refresh aborts on first per-view error → drop pass first, collect errors, errors.Join. Poison manifest no longer blocks the rest of the catalog from re-syncing. B-CTX (Opus BLOCK): bootstrap closure captured OpenDB's ctx → cancelled-ctx silently fails every reconnect. context.Background() inside closure; passed ctx only for initial Ping. B-LEAK (Kimi BLOCK): firstLine(stmt) truncated CREATE SECRET to 80 chars but those 80 chars contained KEY_ID + SECRET prefix → log aggregator captures credentials. Stable per-statement labels + redactCreds() filter on wrapped DuckDB errors. JSON-ERR (Opus WARN): swallowed json.Encode error → silent truncated 200 on unsupported column types. slog.Warn the failure. Dismissed (4 false positives): Qwen BLOCK "bootstrap not transactional" — DuckDB DDL is auto-commit Qwen BLOCK "MaxBytesReader after Decode" — false, applied before Kimi BLOCK "concurrent Refresh + user SELECT deadlock" — not a deadlock, just serialization, by design with 10s timeout retry Kimi WARN "dropView leaves r.known inconsistent" — current code returns before the delete; the entry persists for retry Critical reviewer behavior: 1 convergent BLOCK between Opus + Kimi on the per-view error blocking, plus two independent single-reviewer BLOCKs (B-CTX, B-LEAK) that smoke could never have caught. The B-LEAK fix uses defense-in-depth: never pass SQL into the error path AND redact known cred values from DuckDB's own error message. DuckDB cgo path: github.com/duckdb/duckdb-go/v2 v2.10502.0 (per ADR-001 §1) on Go 1.25 + arrow-go. Smoke 6/6 PASS after every fix round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 00:10:55 -05:00
root	4205ecd0f0	Pre-D5: extract CatalogClient to internal/catalogclient/ + add List queryd (D5) needs the same HTTP client to catalogd that ingestd uses, but the client lived in internal/ingestd — having queryd import from ingestd would invert the data-flow direction (ingestd is upstream of queryd; the package dep should not point back). Extract to a shared internal/catalogclient/ package now, before D5 forces it under implementation pressure. Adds the List(ctx) method queryd will need for view registration. Unit tests cover Register success/conflict and List success/error paths against an httptest.Server fake. ingestd's import flips from internal/ingestd → internal/catalogclient; the wire format and behavior are unchanged. All four smokes (D1/D2/D3/ D4) PASS unchanged. DuckDB cgo path re-verified with the official github.com/duckdb/duckdb-go/v2 (per ADR-001) on Go 1.25 + arrow-go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:58:34 -05:00
root	c1e411347a	G0 D4: ingestd CSV → Parquet → catalogd register · 2 scrum fixes Phase G0 Day 4 ships ingestd: multipart CSV upload, Arrow schema inference per ADR-010 (default-to-string on ambiguity), single-pass streaming CSV → Parquet via pqarrow batched writer (Snappy compressed, 8192 rows per batch), PUT to storaged at content-addressed key datasets/<name>/<fp_hex>.parquet, register manifest with catalogd. Acceptance smoke 6/6 PASS including idempotent re-ingest (proves inference is deterministic — same CSV always produces same fingerprint) and schema-drift → 409 (proves catalogd's gate fires on ingest traffic). Schema fingerprint is SHA-256 over (name, type) tuples in header order using ASCII record/unit separators (0x1e/0x1f) so column names with commas can't collide. Nullability intentionally NOT in the fingerprint — a column gaining nulls isn't a schema change. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 4 WARN + 3 INFO (after 2 self-retracted BLOCKs) - Kimi K2-0905 (openrouter): 1 BLOCK + 2 WARN + 1 INFO - Qwen3-coder (openrouter): 2 BLOCK + 2 WARN + 2 INFO Fixed (2, both Opus single-reviewer): C-DRIFT: PUT-then-register on fixed datasets/<name>/data.parquet meant a schema-drift ingest overwrote the live parquet BEFORE catalogd's 409 fired → storaged inconsistent with manifest. Fix: content-addressed key datasets/<name>/<fp_hex>.parquet. Drift writes to a different file (orphan in G2 GC scope); the live data is never corrupted. C-WCLOSE: pqarrow.NewFileWriter not Closed on error paths leaks buffered column data + OS resources per failed ingest. Fix: deferred guarded close with wClosed flag. Dismissed (5, all false positives): Qwen BLOCK "csv.Reader needs LazyQuotes=true for multi-line" — false, Go csv handles RFC 4180 multi-line quoted fields by default Qwen BLOCK "row[i] OOB" — already bounds-checked at schema.go:73 and csv.go:201 Kimi BLOCK "type assertion panic if pqarrow reorders fields" — speculative, no real path Kimi WARN + Qwen WARN×2 "RecordBuilder leak on early error" — false convergent. Outer defer rb.Release() captures the current builder; in-loop release runs before reassignment. No leak. Deferred (6 INFO + accepted-with-rationale on 3 WARN): sample boundary type mismatch (G0 cap bounds peak), string-match paranoia on http.MaxBytesError, multipart double-buffer (G2 spool- to-disk), separator validation, body close ordering, etc. The D4 scrum produced fewer real findings than D3 (2 vs 6) — both were architectural hazards smoke wouldn't catch because the smoke's "schema drift → 409" assertion was passing even in the corrupted- state world. The 409 fires correctly; what was wrong was the PUT having already mutated the live parquet before the validation check. Opus's PUT-then-register read of the order is exactly the kind of architectural insight the cross-lineage scrum is designed to surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:50:10 -05:00
root	66a704ca3e	G0 D3: catalogd Parquet manifests + ADR-020 idempotent register · 6 scrum fixes Phase G0 Day 3 ships catalogd: Arrow Parquet manifest codec, in-memory registry with the ADR-020 idempotency contract (same name+fingerprint reuses dataset_id; different fingerprint → 409 Conflict), HTTP client to storaged for persistence, and rehydration on startup. Acceptance smoke 6/6 PASSES end-to-end including rehydrate-across-restart — the load-bearing test that the catalog/storaged service split actually preserves state. dataset_id derivation diverges from Rust: UUIDv5(namespace, name) instead of v4 surrogate. Same name on any box generates the same dataset_id; rehydrate after disk loss converges to the same identity rather than silently re-issuing. Namespace pinned at a8f3c1d2-4e5b-5a6c-9d8e-7f0a1b2c3d4e — every dataset_id ever issued depends on these bytes. Cross-lineage scrum on shipped code: - Opus 4.7 (opencode): 1 BLOCK + 5 WARN + 3 INFO - Kimi K2-0905 (openrouter, validated D2): 2 BLOCK + 2 WARN + 1 INFO - Qwen3-coder (openrouter): 2 BLOCK + 2 WARN + 2 INFO Fixed: C1 list-offsets BLOCK (3-way convergent) → ValueOffsets(0) + bounds C2 Rehydrate mutex held across I/O → swap-under-brief-lock pattern S1 split-brain on persist failure → candidate-then-swap S2 brittle string-match for 400 vs 500 → ErrEmptyName/ErrEmptyFingerprint sentinels S3 Get/List shallow-copy aliasing → cloneManifest deep copy S4 keep-alive socket leak on error paths → drainAndClose helper Dismissed (false positives, all single-reviewer): Kimi BLOCK "Decode crashes on empty Parquet" — already handled Kimi INFO "safeKey double-escapes" — wrong, splitting before escape is required Qwen INFO "rb.NewRecord() error unchecked" — API returns no error Deferred to G1+: name validation regex, per-call deadlines, Snappy compression, list pagination continuation tokens (storaged caps at 10k with sentinel for now). Build clean, vet clean, all tests pass, smoke 6/6 PASS after every fix round. arrow-go/v18 + google/uuid added; Go 1.24 → 1.25 forced by arrow-go's minimum. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:36:57 -05:00
root	8cfcdb8e5f	G0 D2: storaged S3 GET/PUT/LIST/DELETE · 3-lineage scrum · 4 fixes applied Phase G0 Day 2 ships storaged: aws-sdk-go-v2 wrapper + chi routes binding 127.0.0.1:3211 with 256 MiB MaxBytesReader, Content-Length up-front 413, and a 4-slot non-blocking semaphore returning 503 + Retry-After:5 when full. Acceptance smoke (6/6 probes) PASSES against the dedicated MinIO bucket lakehouse-go-primary, isolated from the Rust system's lakehouse bucket during coexistence. Cross-lineage scrum on the shipped code: - Opus 4.7 (opencode): 1 BLOCK + 3 WARN + 3 INFO - Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO (3 false positives) - Kimi K2-0905 (openrouter, after route-shopping past opencode's 4k cap and the direct adapter's empty-content reasoning bug): 1 BLOCK + 2 WARN + 1 INFO Fixed: C1 buildRegistry ctx cancel footgun → context.Background() (Opus + Kimi convergent; future credential refresh chains) C2 MaxBytesReader unwrap through manager.Uploader multipart goroutines → Content-Length up-front 413 + string-suffix fallback (Opus + Kimi convergent; latent 500-instead-of-413 in 5-256 MiB range) C3 Bucket.List unbounded accumulation → MaxListResults=10_000 cap (Opus + Kimi convergent; OOM guard) S1 PUT response Content-Type: application/json (Opus single-reviewer) Strict validateKey policy (J approved): rejects empty, >1024B, NUL, leading "/", ".." path components, CR/LF/tab control characters. DELETE exposed at HTTP layer (J approved option A) for symmetry + smoke ergonomics. Build clean, vet clean, all unit tests pass, smoke 6/6 PASS after every fix round. go.mod 1.23 → 1.24 (required by aws-sdk-go-v2). Process finding worth recording: opencode caps non-streaming Kimi at max_tokens=4096; the direct kimi.com adapter consumed 8192 tokens of reasoning but surfaced empty content; openrouter/moonshotai/kimi-k2-0905 delivered structured output in ~33s. Future Kimi scrums should default to that route. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 23:23:03 -05:00
Claw	ad2ec1aca9	G0 D1 hardened: 3-lineage scrum review on shipped code · 7 fixes applied Code-review pass after D1 shipped, all three model lineages running in parallel against the actual Go source (not docs): Convergent findings (≥2 reviewers — high confidence): - C1 BLOCK · Run() errCh/select race could silently drop fast bind errors. Fixed: net.Listen() now runs synchronously before the goroutine; bind errors surface as Run()'s return value. - C2 BLOCK · scripts/d1_smoke.sh sleep 0.5 races bind on cold boxes. Fixed: replaced with poll_health() loop, 5s/svc budget, 50ms poll. - C3 WARN · LoadConfig silent fallback when file missing. Fixed: emits slog.Warn with path + hint when path given but file absent. Single-reviewer fixes: - S1 WARN · slog.SetDefault inside Run() mutated global state from a library function. Fixed: Run() no longer calls SetDefault. - S2 WARN · os.IsNotExist → errors.Is(err, fs.ErrNotExist) idiom. - S6 WARN · smoke double-curl collapsed to single curl -i parse. Second-pass Opus review on post-fix code caught one more: - head -1 on curl -i fragile against 1xx interim lines. Fixed: awk picks the last HTTP/* status line (robust to 100 Continue). Accepted with rationale (deferred or planned): - S3 secrets-in-lakehouse.toml: D2.3 SecretsProvider already planned - S4 5x cmd/*/main.go duplication: defer until D2 reveals real per-service config consumption - S5 /health log volume: defer post-G0, not on k8s yet - 2nd-pass theoreticals: clean-exit-no-Shutdown path doesn't trigger, defensive defer ln.Close() aspirational, etc. Verification: - go build ./cmd/... exit 0 - go vet ./... clean - ./scripts/d1_smoke.sh D1 acceptance gate: PASSED - 3-lineage code review · 14 findings · 7 fixed · 0 deferred · 5 accepted with rationale Total D1 review coverage across the phase: - 3 doc-review passes (Opus + Kimi + Qwen) — 13 findings, 10 fixed - 1 runtime smoke — 1 finding (port 3100 collision), fixed - 1 code-review parallel pass — 14 findings, 7 fixed - 1 code-review second pass (Opus) — 1 actionable, fixed - Cumulative: 29 findings · 19 fixed inline · 5 accepted · 5 deferred Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 07:07:50 -05:00

1 2 3 4

155 Commits