3-lineage scrum (Opus 4.7 / Kimi K2.6 / Qwen3-coder) on today's wave
landed 4 real findings (2 BLOCK + 2 WARN) and 2 INFO touch-ups.
Verbatim verdicts + disposition table at:
reports/scrum/_evidence/2026-04-30/
B-1 (BLOCK Opus + INFO Kimi convergent) — ResolveKey API:
collapse from 3-arg (envVar, envFileName, envFilePath) to 2-arg
(envVar, envFilePath). Pre-fix every chatd caller passed the env
var name twice; if operator renamed *_key_env in lakehouse.toml
while keeping the canonical KEY= line in the .env file, fallback
silently missed.
B-2 (WARN Opus + WARN Kimi convergent) — handleProviders probe:
drop the synthesize-then-Resolve probe; look up by name directly
via Registry.Available(name). Prior probe synthesized "<name>/probe"
model strings and routed through Resolve, fragile to any future
routing rule (e.g. cloud-suffix special case).
B-3 (BLOCK Opus single — verified by trace + end-to-end probe) —
OllamaCloud.Chat StripPrefix used "cloud" but registry routes
"ollama_cloud/<m>". Result: upstream got the prefixed model name
and 400'd. Smoke missed it because chatd_smoke runs without
ollama_cloud registered. Now strips the right prefix; new
TestOllamaCloud_StripsCorrectPrefix locks both prefix + suffix
cases. Verified live: ollama_cloud/deepseek-v3.2 round-trips
cleanly through the real ollama.com endpoint.
B-4 (WARN Opus single) — Ollama finishReason: read done_reason
field instead of inferring from done bool alone. Newer Ollama
reports done=true with done_reason="length" on truncation; the
prior code mapped that to "stop" and lost the truncation signal
the playbook_lift judge needs to retry. New
TestFinishReasonFromOllama_PrefersDoneReason covers the fallback
ladder.
INFOs:
- B-5: replace hand-rolled insertion sort in Registry.Names with
sort.Strings (Opus called the "avoid sort import" comment a
false economy — correct).
- A-1: clarify the playbook_lift.sh comment around -judge "" arg
passing (Opus noted the comment said "env priority" but didn't
reflect that the empty arg also passes through the Go driver's
resolution chain).
False positives dismissed (3, documented in disposition.md):
- Kimi: TestMaybeDowngrade_WithConfigList wrong assertion (test IS
correct per design — model excluded from weak list = strong = downgrade)
- Qwen: nil-deref claim (defensive code already handles nil)
- Opus: qwen3.5:latest doesn't exist on Ollama hub (true on the
public hub but local install has it)
just verify: PASS. chatd_smoke 6/6 PASS. New regression tests:
3 (B-2, B-3, B-4 each get a focused test).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
new cmd/chatd on :3220 routes /v1/chat to the right provider based
on model-name prefix or :cloud suffix. closes the architectural gap
named in lakehouse.toml [models]: tiers map to model IDs, but until
phase 4 there was no service that could actually CALL those models
from go.
routing rules (registry.Resolve):
ollama/<m> → local Ollama (prefix stripped)
ollama_cloud/<m> → Ollama Cloud
<m>:cloud → Ollama Cloud (suffix variant — kimi-k2.6:cloud)
openrouter/<v>/<m> → OpenRouter (prefix stripped, OpenAI-compat)
opencode/<m> → OpenCode unified Zen+Go
kimi/<m> → Kimi For Coding (api.kimi.com/coding/v1)
bare names → local Ollama (default)
provider implementations:
- internal/chat/types.go Provider interface, Request/Response, errors
- internal/chat/registry.go prefix + :cloud suffix dispatch
- internal/chat/ollama.go local Ollama via /api/chat (think=false default)
- internal/chat/ollama_cloud.go Ollama Cloud via /api/generate (Bearer auth)
- internal/chat/openai_compat.go shared OpenAI Chat Completions for the
OpenRouter/OpenCode/Kimi family
- internal/chat/builder.go BuildRegistry from BuilderInput;
ResolveKey reads env then .env file fallback
config:
- ChatdConfig in internal/shared/config.go with bind, ollama_url,
per-provider key env names + .env fallback paths, timeout
- Gateway gains chatd_url + /v1/chat + /v1/chat/* routes
- lakehouse.toml [chatd] block with /etc/lakehouse/<provider>.env defaults
tests (19 in internal/chat):
- registry: prefix + :cloud + errors + telemetry + provider listing
- ollama: happy path + prefix strip + format=json + 500 mapping +
flatten_messages
- openai_compat: happy path + format=json + 429 mapping + zero-choices
think=false default in ollama + ollama_cloud — local hot path skips
reasoning, low-budget callers (the playbook_lift judge at max_tokens=10)
get direct answers instead of empty content + done_reason=length.
proven via chatd_smoke acceptance.
acceptance gate: scripts/chatd_smoke.sh — 6/6 PASS:
1. /v1/chat/providers lists exactly registered providers (1 in dev mode)
2. bare model → ollama default with content + token counts + latency
3. explicit ollama/<m> → prefix stripped at upstream
4. <m>:cloud without ollama_cloud registered → 404 (no silent fall-through)
5. unknown/<m> → falls through to default → upstream 502 (no prefix rewrite)
6. missing model field → 400
just verify: PASS (vet + 30 packages × short tests + 9 smokes).
chatd_smoke is a domain smoke (not in just verify, mirrors matrix /
observer / pathway pattern).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
migrate the strong-model auto-downgrade gate from a hardcoded weak
list to cfg.Models.WeakModels. backward compatible: existing API
preserved, callers that don't migrate keep using DefaultWeakModels.
changes:
- internal/matrix/downgrade.go: split IsWeakModel into rule-based
base (`:free` suffix/infix) + literal-list lookup. New
IsWeakModelInList(model, list) takes the config-supplied list.
DowngradeInput grows a WeakModels field; nil falls back to
DefaultWeakModels (preserves pre-phase-2 behavior).
- internal/workflow/modes.go: add MatrixDowngradeWithWeakList(list)
factory mirroring MatrixSearch's pattern. Plain MatrixDowngrade
kept for backward compat.
- cmd/matrixd/main.go: handlers struct holds weakModels populated
from cfg.Models.WeakModels at startup; handleDowngrade threads it
into every DowngradeInput.
- cmd/observerd/main.go: registerBuiltinModes accepts weakModels
and uses the factory variant. observerd reads cfg.Models.WeakModels
in main().
end-to-end verified: downgrade + matrix + observer + workflow smokes
all pass. Existing TestMaybeDowngrade_TruthTable + TestIsWeakModel
unchanged (backward compat). Two new tests cover the config path:
- TestIsWeakModelInList — covers rule + literal + empty + nil
- TestMaybeDowngrade_WithConfigList — verifies cfg list overrides
default
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codifies the small-model-pipeline tiering (per project_small_model_pipeline_vision.md)
in lakehouse.toml [models] section. Tier names map to actual model
IDs; bumping a model means editing one line, not hunting through code.
Tier philosophy:
- local_* : on-box Ollama. Inner-loop hot path. Repeated calls.
- cloud_* : Ollama Cloud (Pro plan). Larger context, fail-up tier.
- frontier_* : OpenRouter / OpenCode. Rate-limited, billed per call.
weak_models is the codified "local-hot-path eligible" list — phase 2
will migrate matrix.downgrade to read it instead of hardcoding.
Defaults reflect 2026-04-29 architecture: qwen3.5:latest as local
(stronger than qwen2.5, same JSON-clean property), kimi-k2.6 as cloud
judge (kimi-k2:1t still upstream-broken), opus-4-7 + kimi-k2-0905 as
frontier review/arch via OpenRouter, opencode/claude-opus-4-7 as
frontier_free leveraging the OpenCode subscription.
3 new tests in internal/shared/config_test.go:
- TestDefaultConfig_ModelsTier — locks tier defaults
- TestModelsConfig_IsWeak — weak-bypass list
- TestLoadConfig_ModelsTOMLRoundTrip — override semantics
just verify PASS (g2 had one flake on first run — Ollama transfer
truncation; clean on retry, unrelated to this change).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 small fixes from the §3.8 scrum2 review wave:
- workflow.stringifyValue now JSON-marshals maps/slices instead of
fmt.Sprint %v (Opus+Kimi convergent: LLM modes were getting Go's
map[k:v] syntax, which is unparseable as JSON context).
- workflow.detectCycle removed — duplicate of topoSort that discarded
the useful node ID. Validate() now calls topoSort directly and
returns its wrapped ErrCycle.
- observer.SourceWorkflow named constant — was an implicit string
cast (observer.Source("workflow")) at the cmd/observerd handler.
- Unused context imports + dead silencer comments removed across
workflow/modes.go and observerd/main.go.
- Unused store parameter dropped from registerBuiltinModes (reserved
comment removed; can be re-added when a mode actually needs it).
just verify still PASS — these are pure cleanup, no behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
distillation.score, drift.scorer)
Lands the workflow.Mode adapters for the §3.4 components + the
distillation scorer + drift quantifier. Workflows can now compose
real measurement capabilities; the substrate's parallel
capabilities become composable Lego bricks (per the prior commit's
closing insight).
Modes registered (in observerd's registerBuiltinModes):
Pure-function wrappers (no I/O):
- matrix.relevance → matrix.FilterChunks
- matrix.downgrade → matrix.MaybeDowngrade
- distillation.score → distillation.ScoreRecord
- drift.scorer → drift.ComputeScorerDrift
HTTP-backed:
- matrix.search → POST matrixd /matrix/search
(registered only when matrixd_url is set)
Fixture (kept from §3.8 first slice):
- fixture.echo, fixture.upper
internal/workflow/modes.go:
Each mode follows the same glue pattern: marshal generic input
through a typed struct (free schema validation + clear error
messages), call the underlying capability, return a generic
output map. Roundtrip-via-JSON gives us schema validation
without writing custom field-by-field coercion.
internal/workflow/modes_test.go (10 tests, all PASS):
- matrix.relevance filters adjacency pollution (Connector kept,
catalogd::Registry dropped — same headline as the relevance
smoke, run through the workflow mode)
- matrix.downgrade flips lakehouse→isolation on strong model;
keeps lakehouse on weak (qwen3.5:latest); errors on missing
fields
- distillation.score rates scrum_review attempt_1 as accepted;
rejects empty record
- drift.scorer reports zero drift on matched inputs; errors on
empty inputs slice
- matrix.search HTTP flow round-trips through httptest fake
matrixd; non-OK status surfaces a clear error
scripts/workflow_smoke.sh (5 assertions PASS, was 4):
New assertion #5: real-mode chain
matrix.downgrade (lakehouse + grok-4.1-fast → isolation)
→ distillation.score (scrum_review attempt_1 → accepted)
Proves §3.4 components compose through the workflow runner with
no fixture intermediation. Both nodes ran successfully, runner
recorded provenance, status=succeeded.
Mode listing assertion now expects 7 modes (5 real + 2 fixture)
instead of just the fixtures.
17-smoke regression all green. SPEC §3.8 acceptance gate G3.8.D
("Mode catalog dispatches matrix.search invocation to the matrixd
backend without going through HTTP") still pending — current path
goes through HTTP for matrix.search, which is the cleaner service-
mesh shape but slower than direct in-process. In-process dispatch
when matrixd is co-resident is a future optimization.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the reality-test gap surfaced by the candidates and
multi-corpus e2e runs (0d1553c, a97881d): semantic-only retrieval
can't gate by status / state / availability. SearchRequest now
takes an optional MetadataFilter map; results whose metadata
doesn't match every key are dropped before top-K truncation.
Filter value semantics:
string|number|bool → exact equality (JSON-canonical, so 1 ≡ 1.0)
[]any → OR within key (any element matching wins)
AND across keys: every filter key must match.
Missing key in metadata = drop. Malformed metadata = drop. Filter
absent or empty = pass through (zero overhead).
The response now reports MetadataFilterDropped so callers can see
how aggressive the filter was without re-querying.
Caveat (also captured in code comment): this is POST-retrieval, not
PRE-filtering via SQL. Aggressive filters can shrink the result set
below K; caller should bump PerCorpusK to compensate. A queryd-
backed pre-filter is a future commit; this lands the user-visible
fix today.
Tests:
- 7 unit tests (internal/matrix/filter_test.go) covering: nil/
empty filter pass-through, missing-metadata always-fails,
single-value exact match (incl. numeric 5 ≡ 5.0), AND across
keys, OR within list, bool match, malformed JSON metadata
- matrix_smoke.sh: new assertion #7 — filter
label∈{"a near","b near"} drops the 4 mid/far entries from the
6-entry pool, keeping exactly 2 (one per corpus, both with the
matching label). Dropped count surfaces in the response.
15-smoke regression all green. vet clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PRD's 5-loop substrate names "drift" as loop 5: quantify when
historical decisions stop matching current reality. Distinct from
the rating+distillation loop because drift is MEASUREMENT, not
LEARNING. The learning loop says "this match worked, remember it";
the drift loop says "this 4-month-old playbook entry — does it
still match what the substrate would surface today?"
First-shipped drift shape: SCORER drift. When the deterministic
scorer's ScorerVersion bumps, historical ScoredRuns may no longer
match what the current scorer produces on the same EvidenceRecord.
internal/drift/drift.go:
- ScorerDriftInput — (EvidenceRecord, persisted_category) pair
- ScorerDriftEntry — one mismatch with current reasons attached
- CategoryShift — (from, to, count) cell in the shift matrix
- ScorerDriftReport — summary + sorted shift matrix + optional entries
- ComputeScorerDrift(inputs, includeEntries) — pure function;
re-runs ScoreRecord over each input and reports mismatches
Why this matters: without a drift quantifier, a scorer-rule change
silently invalidates the historical training data feeding the
learning loop. With drift quantification, a rule change surfaces
a concrete number ("847 of 4701 historical ScoredRuns now
disagree") that triggers a re-score-and-retrain cycle rather than
letting the substrate quietly rot.
Tests (6/6 PASS):
- No-drift: all 3 inputs match → 100% matched
- Shift detected: 5 inputs, 3 drift cases, drift_rate=0.6,
shift matrix shows accepted→partially_accepted x3
- Multiple shifts sorted by count desc
- includeEntries=false skips the per-mismatch list
- Empty input → all-zero report (no division-by-zero)
- ScorerVersion stamped on every report
Future drift shapes (deferred to follow-ups, named in package doc):
- PLAYBOOK drift: re-run playbook queries through current
matrix-search; recorded answer not in top-K = drift
- EMBEDDING drift: KS-test on vector distribution at T1 vs T2
- AUDIT BASELINE drift: matches Rust audit_baselines.jsonl
longitudinal signal
Pure compute. Materialization layer (read scored-runs jsonl + their
matching evidence jsonl + feed into ComputeScorerDrift) lands with
the distillation materialization commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First slice of the Rust v1.0.0 distillation substrate (e7636f2)
ported to Go per ADR-001 #4 (port LOGIC, not bit-identical
reproducibility). This commit lands the LOAD-BEARING pieces named
in project_distillation_substrate.md memory:
- The deterministic Success Scorer (8 sub-scorers + dispatch)
- The contamination firewall on SFT samples (the "non-negotiable"
spec property: rejected/needs_human_review NEVER ship to SFT)
- All on-wire types + validators for ScoredRun, SftSample,
EvidenceRecord with Provenance
Files:
internal/distillation/types.go — types + ScorerVersion + SftNever
+ ValidateScoredRun + ValidateSftSample
internal/distillation/scorer.go — ScoreRecord + 8 class scorers +
BuildScoredRun (deterministic)
internal/distillation/scorer_test.go — ~40 test cases:
- source-class dispatch (verdict / telemetry / extraction)
- scrum_review (4 attempt cases)
- observer_review (5 verdict cases)
- audit (legacy + severity, 9 cases)
- auto_apply (4 cases)
- outcomes / mode_experiment / extraction
- CONTAMINATION FIREWALL: ErrSftContamination sentinel fires
on rejected/needs_human_review, distinct from typo errors
- empty-pair guard (instruction/response trim != "")
- reasons-required ScoredRun validation
- deterministic sig_hash on identical input
- purity check (input not mutated, repeatable output)
Per the 2026-04-29 cross-lineage scrum's discipline: false-positive
findings would be dismissed inline (none in this commit). Real
findings would be addressed before merge — but this is greenfield
port code reviewed against its Rust source line-by-line, which the
test suite encodes as truth tables.
Explicitly DEFERRED to follow-up commits:
- Materialization layer (jsonl read/write, date-partitioned
storage in data/scored-runs/YYYY/MM/DD/, evidence index)
- SFT exporter (file iteration + filtering — the SCORING firewall
is here; the EXPORT firewall is the next layer)
- export_preference, export_rag (other export shapes)
- Acceptance harness (16/16 acceptance gate that locks v1.0.0)
- replay, receipts, build_evidence_index, transforms
The scorer + firewall validator are pure functions — operational
tooling layers on top without changing the deterministic logic the
downstream learning loop depends on. The Go ScorerVersion stays at
v1.0.0 to match the Rust e7636f2 baseline; bumping in the Go
materialization commit is reserved for the next scoring-rule
change, NOT the port itself.
15-smoke regression all green. vet clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-lineage scrum review on the 12 commits of this session
(afbb506..06e7152) via Rust gateway :3100 with Opus + Kimi +
Qwen3-coder. Results:
Real findings landed:
1. Opus BLOCK — vectord BatchAdd intra-batch duplicates panic
coder/hnsw's "node not added" length-invariant. Fixed with
last-write-wins dedup inside BatchAdd before the pre-pass.
Regression test TestBatchAdd_IntraBatchDedup added.
2. Opus + Kimi convergent WARN — strings.Contains(err.Error(),
"status 404") was brittle string-matching to detect cold-
start playbook state. Fixed: ErrCorpusNotFound sentinel
returned by searchCorpus on HTTP 404; fetchPlaybookHits
uses errors.Is.
3. Opus WARN — corpusingest.Run returned nil on total batch
failure, masking broken pipelines as "empty corpora." Fixed:
Stats.FailedBatches counter, ErrPartialFailure sentinel
returned when nonzero. New regression test
TestRun_NonzeroFailedBatchesReturnsError.
4. Opus WARN — dead var _ = io.EOF in staffing_500k/main.go
was justified by a fictional comment. Removed.
Drivers (staffing_500k, staffing_candidates, staffing_workers)
updated to handle ErrPartialFailure gracefully — print warn, keep
running queries — rather than fatal'ing on transient hiccups
while still surfacing the failure clearly in the output.
Documented (no code change):
- Opus WARN: matrixd /matrix/downgrade reads
LH_FORCE_FULL_ENRICHMENT from process env when body omits
it. Comment now explains the opinionated default and points
callers wanting deterministic behavior to pass the field
explicitly.
False positives dismissed (caught and verified, NOT acted on):
A. Kimi BLOCK on errors.Is + wrapped error in cmd/matrixd:223.
Verified false: Search wraps with %w (fmt.Errorf("%w: %v",
ErrEmbed, err)), so errors.Is matches the chain correctly.
B. Kimi INFO "BatchAdd has no unit tests." Verified false:
batch_bench_test.go has BenchmarkBatchAdd; the new dedup
test TestBatchAdd_IntraBatchDedup adds another.
C. Opus BLOCK on missing finite/zero-norm pre-validation in
cmd/vectord:280-291. Verified false: line 272 already calls
vectord.ValidateVector before BatchAdd, so finite + zero-
norm IS checked. Pre-validation is exhaustive.
D. Opus WARN on relevance.go tokenRe (Opus self-corrected
mid-finding when realizing leading char counts toward token
length).
Qwen3-coder returned NO FINDINGS — known issue with very long
diffs through the OpenRouter free tier; lineage rotation worked
as designed (Opus + Kimi between them caught everything Qwen
would have).
15-smoke regression sweep all green (D1-D6, G1, G1P, G2,
storaged_cap, pathway, matrix, relevance, downgrade, playbook).
Unit tests all green (corpusingest +1, vectord +1).
Per feedback_cross_lineage_review.md: convergent finding #2 (404
detection) is the highest-signal one — both Opus and Kimi
flagged it independently. The other Opus findings stand on
single-reviewer signal but each one verified against the actual
code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes SPEC §3.4. The matrix indexer is now a learning meta-index per
feedback_meta_index_vision.md — every successful (query → answer)
pair recorded via /matrix/playbooks/record boosts that answer for
future similar queries.
This is the architectural piece that lifts vectord from "static
hybrid search" to the meta-index J originally framed in Phase 19 of
the Rust system.
What's new:
- internal/matrix/playbook.go — PlaybookEntry, PlaybookHit,
ApplyPlaybookBoost. Pure-function boost math:
distance' = distance * (1 - 0.5 * score)
Score 0 = no boost (factor 1.0); score 1 = halve distance
(factor 0.5). Capped at 0.5 deliberately so a single high-
confidence playbook can't dominate the base ranking forever
(runaway-feedback-loop guard).
- Retriever.Record(entry, corpus) — embeds query_text, ensures
playbook corpus exists (idempotent), upserts via deterministic
sha256-derived ID (last score wins on re-record of same triple).
- Retriever.Search extended with UsePlaybook + PlaybookCorpus +
PlaybookTopK + PlaybookMaxDistance. Reuses the query vector —
no extra embed call. Missing-corpus 404 = no-op (cold-start
state before any Record call), not an error.
- POST /v1/matrix/playbooks/record (matrixd) — caller submits
{query_text, answer_id, answer_corpus, score, tags?}; gets
{playbook_id} back.
Storage: a vectord index named "playbook_memory" (configurable per
request) with embed(query_text) as the vector and the
PlaybookEntry JSON as metadata. Just another corpus — observable
from /vectors/index, persistable through G1P, etc.
Match key for boost: (AnswerID, AnswerCorpus). Cross-corpus ID
collisions don't false-match — verified by
TestApplyPlaybookBoost_CorpusAttributionRespected.
End-to-end smoke (scripts/playbook_smoke.sh, all assertions PASS):
- Baseline search: widget-c at distance 0.6566 (rank 3)
- Record playbook: query → widget-c, score=1.0
- Re-search with use_playbook=true:
widget-c distance: 0.3283 (rank 2)
ratio: 0.5 EXACTLY (matches boost math precisely)
playbook_boosted: 1
- widget-c jumped from #3 to #2 — learning loop visible
Tests:
- 8 unit tests in internal/matrix/playbook_test.go covering
Validate, BoostFactor (5 cases), the no-boost identity, the
boost-moves-result-up scenario, highest-score wins on duplicate
matches, cross-corpus attribution, JSON round-trip, and
rejection of empty metadata
- scripts/playbook_smoke.sh integration test (3 assertions PASS)
15-smoke regression sweep all green (D1-D6, G1, G1P, G2,
storaged_cap, pathway, matrix, relevance, downgrade, playbook).
SPEC §3.4 NOW COMPLETE: 5 of 5 components shipped. The matrix
indexer's port is done as a substrate; remaining work is operational
(rating signal sources, telemetry, eventual structured filtering for
staffing data — none in §3.4).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Faithful port of mcp-server/relevance.ts (Rust observer's adjacency-
pollution filter). Same 5-signal scoring, same default threshold 0.3.
Adds POST /v1/matrix/relevance endpoint via matrixd.
Scoring signals (additive, can sign-flip):
path_match +1.0 chunk source/doc_id encodes focus.path
filename_match +0.6 chunk text mentions focus's filename
defined_match +0.6 chunk text mentions focus.defined_symbols
token_overlap +0.4 jaccard of non-stopword tokens
prefix_match +0.3 chunk source shares first-2-segment prefix
import_penalty -0.5 mentions ONLY imported symbols, no defined ones
What this does and doesn't do:
- DOES filter code-aware corpora (eventually lakehouse_arch_v1,
lakehouse_symbols_v1, scrum_findings_v1) — drops chunks about
code the focus file IMPORTS rather than DEFINES, the
"adjacency pollution" pattern that makes a reviewer LLM
hallucinate imported-crate internals as belonging to the focus
- DOES NOT meaningfully filter staffing data — the candidates
reality test 2026-04-29 had "exact skill match buried at #3"
which is a different problem (semantic-only ranking dominated
by secondary text). Staffing needs structured filtering
(status gates, location gates) that lives outside this
package — future work, not in SPEC §3.4 yet
Headline smoke assertion: focus = crates/queryd/src/db.go which
defines Connector and imports catalogd::Registry. The filter
scores:
Connector chunk: +0.68 (defined_match fires, kept)
Registry chunk: -0.46 (import_only penalty fires, dropped)
unrelated junk: 0.00 (no signals, dropped)
That's a 1.14-point gap between what we ARE and what we IMPORT —
the entire purpose of the filter.
Tests:
- 9 unit tests in internal/matrix/relevance_test.go covering
Tokenize, Jaccard, ExtractDefinedSymbols (Rust + TS),
ExtractImportedSymbols, FilePrefix, ScoreRelevance per-signal,
FilterChunks threshold splitting, and the headline
AdjacencyPollutionScenario
- scripts/relevance_smoke.sh integration smoke (3 assertions PASS):
adjacency-pollution scenario, empty-chunks 400, threshold honored
13-smoke regression sweep all green (D1-D6, G1, G1P, G2,
storaged_cap, pathway, matrix, relevance).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the second staffing corpus and the first end-to-end reality test
through the full Go pipeline: parquet → corpusingest → embedd →
vectord → matrixd → gateway.
What's new:
- scripts/staffing_candidates/main.go — parquet Source over
candidates.parquet (1000 rows, 11 cols), single-chunk arrow-go
pqarrow read. Embed text: "Candidate skills: <s>. Based in
<city>, <state>. <years> years experience. Status: <status>.
<first> <last>." IDs prefixed "c-" so multi-corpus merges
against workers ("w-") stay unambiguous.
- scripts/candidates_e2e.sh — first integration smoke that runs
the full stack (storaged + embedd + vectord + matrixd + gateway),
ingests via corpusingest, runs a real query through
/v1/matrix/search, prints results. Ephemeral mode (vectord
persistence disabled via custom toml) so re-runs don't pollute
MinIO _vectors/ and break g1p_smoke's "only-one-persisted-index"
assertion.
Real bug caught + fixed in corpusingest:
When LogProgress > 0, the progress goroutine's only exit was
ctx.Done(). With context.Background() in the production driver,
Run hung forever after the pipeline finished. Added a stopProgress
channel that close()s after wg.Wait(). Regression test
TestRun_ProgressLoggerExits bounds Run's wall to 2s with
LogProgress=50ms.
This is the bug the unit tests didn't catch because every prior test
set LogProgress: 0. Reality test surfaced it on first real-data
run — exactly the hyperfocus-and-find-architectural-weakness
property J framed as the reason for the Go pass.
End-to-end output (1000 candidates, query "Python AWS Docker
engineer in Chicago available now"):
populate: scanned=1000 embedded=1000 added=1000 wall=3.5s
matrix returned 5 hits in 26ms
The result quality is the interesting signal: top-5 had ZERO
Chicago candidates, ZERO active-status candidates, and the exact-
skill-match (Python,AWS,Docker) ranked #3 not #1. Pipeline works;
retrieval quality has real architectural limits (no structured
filtering, no relevance gate, semantic-only ranking dominated by
secondary signals like "1 year experience" and "engineer"). This
motivates SPEC §3.4 components 3 (relevance filter) and
eventually structured filtering — exactly the kind of finding the
deep field reality tests are supposed to surface before Enterprise
cutover.
12-smoke regression sweep all green. 9 corpusingest unit tests
including the new regression. vet clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generalizes the staffing_500k driver's embed-and-push loop into
internal/corpusingest. Per docs/SPEC.md §3.4 component 1 (corpus
builders): adding a new staffing/code/playbook corpus is now one
Source impl + one main.go calling Run, not 200 lines of pipeline
copy-paste.
API:
type Source interface { Next() (Row, error) }
func Run(ctx, Config, Source) (Stats, error)
Library owns:
- Index lifecycle (create, optional drop-existing, idempotent
reuse on 409)
- Parallel embed dispatcher (configurable workers + batch size)
- Vectord push batching
- Progress logging + Stats reporting
- Partial-failure semantics (log + continue per-batch errors;
operator decides on re-run via Stats.Embedded vs Scanned delta)
Per-corpus driver owns: source parsing + column→Row mapping +
post-ingest validation queries.
Refactor scripts/staffing_500k/main.go to use it. Driver is now
~190 lines (was 339), with the embed/add plumbing replaced by one
Run call. -drop flag added so callers can opt out of the destructive
DELETE-first behavior (default still true to keep the 500K test
clean-recall semantics).
Unit tests (internal/corpusingest/ingest_test.go, 8/8 PASS):
- Pipeline shape: 50 rows / 16 batch → 4 embed + 4 add calls,
every ID added exactly once, vectors at correct dimension
- DropExisting fires DELETE
- 409 on create → reuse existing index
- Limit stops early
- Empty Text rows skipped (counted as scanned, not added)
- Required IndexName + Dimension validation
- Context cancel stops mid-pipeline
Real bug caught and fixed by the test suite: if embedd ever returns
fewer vectors than texts in the request (degraded backend), the
addBatch loop would panic with index-out-of-range. Worker now
length-checks the response and logs+skips on mismatch.
12-smoke regression sweep all green (D1-D6, G1, G1P, G2,
storaged_cap, pathway, matrix). vet clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the matrix indexer's first piece per docs/SPEC.md §3.4:
multi-corpus retrieve+merge with corpus attribution per result.
Future components (relevance filter, downgrade gate, learning-loop
integration) layer on top of this surface.
Architecture:
- internal/matrix/retrieve.go — Retriever takes (query, corpora,
k, per_corpus_k), parallel-fans across vectord indexes, merges
by distance ascending, preserves corpus origin per hit
- cmd/matrixd — HTTP service on :3217, fronts /v1/matrix/*
- gateway proxy + [matrixd] config + lakehouse.toml entry
- Either query_text (matrix calls embedd) or query_vector
(caller pre-embedded) — vector takes precedence if both set
Error policy: fail-loud on any corpus error. Silent partial returns
would lie about coverage, defeating the matrix's whole purpose.
Bubbles vectord errors as 502 (upstream), validation as 400.
Smoke (scripts/matrix_smoke.sh, 6 assertions PASS first try):
- /matrix/corpora lists indexes
- Multi-corpus search returns hits from BOTH corpora
- Top hit is the globally-closest across all corpora
(b-near beats a-near at distance 0.05 vs 0.1 — proves merge)
- Metadata round-trips through the merge
- Distances ascending in result list
- Negative paths: empty corpora → 400, missing corpus → 502,
no query → 400
12-smoke regression sweep all green (D1-D6, G1, G1P, G2,
storaged_cap, pathway, matrix).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the per-item Add loop in the HTTP handler with one call to
Index.BatchAdd, which acquires the write-lock once and pushes the
whole batch through coder/hnsw's variadic Graph.Add. Pre-validation
stays in the handler so per-item error messages keep their item-index
precision.
Microbench (internal/vectord/batch_bench_test.go) at d=768 cosine:
N=16 SingleAdd 283µs/op → BatchAdd 170µs/op 1.66×
N=128 SingleAdd 7.9ms/op → BatchAdd 7.5ms/op 1.05×
N=1024 SingleAdd 87.5ms/op → BatchAdd 83.4ms/op 1.05×
Win is biggest at staffing-driver batch sizes (N=16) where
per-call lock + validation overhead is a meaningful fraction. At
larger N the inner HNSW neighborhood search per insert dominates,
which is the load-bearing finding for Option B (sharded indexes):
the throughput ceiling lives inside the library, not at the lock,
so sharding to N parallel Graphs is the only path to true
concurrent-Add throughput.
g1, g1p, g2 smokes all PASS post-change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Network-callable Mem0-style trace memory at :3217, fronted by gateway
/v1/pathway/*. Closes the ADR-004 wire-up: store substrate landed in
2a6234f, this lands the HTTP surface + [pathwayd] config + acceptance
gate.
Smoke proves the architecturally distinctive properties: Revise →
History walks the predecessor chain backward (audit trail), Retire
excludes from Search default but stays Get-able, AddIdempotent bumps
replay_count without replacing — and all survive kill+restart via
JSONL log replay.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Sprint 2 design-bar work (audit reports/scrum/sprint-backlog.md):
S2.1 — ADR-004 documents the pathway-memory data model
S2.2 — pathway port lands with deterministic fixture corpus
and full test coverage on day one
S2.3 — retired traces are excluded from retrieval (test
passes; would fail without the filter)
Mem0-style operations: Add / AddIdempotent / Update / Revise /
Retire / Get / History / Search. Each operation is a method on
Store; persistence is JSONL append-only with corruption recovery
on Replay.
internal/pathway/types.go Trace + event + SearchFilter + sentinel errors
internal/pathway/store.go in-memory state + RWMutex + ops
internal/pathway/persistor.go JSONL append-only log with replay
internal/pathway/store_test.go 20 test funcs covering all 7
Sprint 2 claim rows + concurrency
internal/pathway/persistor_test.go 6 test funcs covering missing-
file, corruption recovery, long-line
handling, parent-dir auto-create,
apply-error skip behavior
Sprint 2 claim coverage row-by-row:
ADD TestAdd_AssignsUIDAndTimestamps + TestAdd_RejectsInvalidJSON
UPDATE TestUpdate_ReplacesContentSameUID + Update_MissingUID_Errors
REVISE TestRevise_LinksToPredecessorViaHistory +
TestRevise_PredecessorMissing_Errors +
TestRevise_ChainOfThree_BackwardWalk
RETIRE TestRetire_ExcludedFromSearch +
TestRetire_StillAccessibleViaGet +
TestRetire_StillAccessibleViaHistory
HISTORY/cycle TestHistory_CycleDetected (injected via internal map),
TestHistory_PredecessorMissing_TruncatesChain,
TestHistory_UnknownUID_ErrorsClean
REPLAY/dup TestAddIdempotent_IncrementsReplayCount (locks the
"replay preserves original content" rule per ADR-004)
CORRUPTION TestPersistor_CorruptedLines_Skipped +
TestPersistor_ApplyError_Skipped
ROUND-TRIP TestPersistor_RoundTrip locks the full Save → fresh
Store → Load → Stats-match contract
Two real bugs caught during testing:
- Add returned the same *Trace stored in the map, so callers
holding a reference saw later mutations. Fixed: clone before
return (matches Get's contract). Same fix in AddIdempotent
+ Revise.
- Test typo: {"v":different} isn't valid JSON; AddIdempotent's
json.Valid rejected it as ErrInvalidContent. Test fixed to
use {"v":"different"}; the validation behavior is correct.
Skipped this commit (next):
- cmd/pathwayd HTTP binary
- gateway routing for /v1/pathway/*
- end-to-end smoke
These add the wire surface; the substrate ships first so the
wire layer can be a pure proxy in the next commit.
Verified:
go test -count=1 ./internal/pathway/ — 26 tests green
just verify — vet + test + 9 smokes 34s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the auth posture from ADR-003 (commit 0d18ffa). Two
independent layers — Bearer token (constant-time compare via
crypto/subtle) and IP allowlist (CIDR set) — composed in shared.Run
so every binary inherits the same gate without per-binary wiring.
Together with the bind-gate from commit 6af0520, this mechanically
closes audit risks R-001 + R-007:
- non-loopback bind without auth.token = startup refuse
- non-loopback bind WITH auth.token + override env = allowed
- loopback bind = all gates open (G0 dev unchanged)
internal/shared/auth.go (NEW)
RequireAuth(cfg AuthConfig) returns chi-compatible middleware.
Empty Token + empty AllowedIPs → pass-through (G0 dev mode).
Token-only → 401 Bearer mismatch.
AllowedIPs-only → 403 source IP not in CIDR set.
Both → both gates apply.
/health bypasses both layers (load-balancer / liveness probes
shouldn't carry tokens).
CIDR parsing pre-runs at boot; bare IP (no /N) treated as /32 (or
/128 for IPv6). Invalid entries log warn and drop, fail-loud-but-
not-fatal so a typo doesn't kill the binary.
Token comparison: subtle.ConstantTimeCompare on the full
"Bearer <token>" wire-format string. Length-mismatch returns 0
(per stdlib spec), so wrong-length tokens reject without timing
leak. Pre-encoded comparison slice stored in the middleware
closure — one allocation per request.
Source-IP extraction prefers net.SplitHostPort fallback to
RemoteAddr-as-is for httptest compatibility. X-Forwarded-For
support is a follow-up when a trusted proxy fronts the gateway
(config knob TBD per ADR-003 §"Future").
internal/shared/server.go
Run signature: gained AuthConfig parameter (4th arg).
/health stays mounted on the outer router (public).
Registered routes go inside chi.Group with RequireAuth applied —
empty config = transparent group.
Added requireAuthOnNonLoopback startup check: non-loopback bind
with empty Token = refuse to start (cites R-001 + R-007 by name).
internal/shared/config.go
AuthConfig type added with TOML tags. Fields: Token, AllowedIPs.
Composed into Config under [auth].
cmd/<svc>/main.go × 7 (catalogd, embedd, gateway, ingestd, queryd,
storaged, vectord, mcpd is unaffected — stdio doesn't bind a port)
Each call site adds cfg.Auth as the 4th arg to shared.Run. No
other changes — middleware applies via shared.Run uniformly.
internal/shared/auth_test.go (12 test funcs)
Empty config pass-through, missing-token 401, wrong-token 401,
correct-token 200, raw-token-without-Bearer-prefix 401, /health
always public, IP allowlist allow + reject, bare IP /32, both
layers when both configured, invalid CIDR drop-with-warn, RemoteAddr
shape extraction. The constant-time comparison is verified by
inspection (comments in auth.go) plus the existence of the
passthrough test (length-mismatch case).
Verified:
go test -count=1 ./internal/shared/ — all green (was 21, now 33 funcs)
just verify — vet + test + 9 smokes 33s
just proof contract — 53/0/1 unchanged
Smokes + proof harness keep working without any token configuration:
default Auth is empty struct → middleware is no-op → existing tests
pass unchanged. To exercise the gate, operators set [auth].token in
lakehouse.toml (or, per the "future" note in the ADR, via env var).
Closes audit findings:
R-001 HIGH — fully mechanically closed (was: partial via bind gate)
R-007 MED — fully mechanically closed (was: design-only ADR-003)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds CachedProvider wrapping the embedding Provider with a thread-safe
LRU keyed on (effective_model, sha256(text)) → []float32. Repeat
queries return the stored vector without round-tripping to Ollama.
Why this matters: the staffing 500K test (memory project_golang_lakehouse)
documented that the staffing co-pilot replays many of the same query
texts ("forklift driver IL", "welder Chicago", "warehouse safety", etc).
Each repeat paid the ~50ms Ollama round-trip. Cached repeats now serve
in <1µs (LRU lookup + sha256 of input).
Memory budget: ~3 KiB per entry at d=768. Default 10K entries ≈ 30 MiB.
Configurable via [embedd].cache_size; 0 disables (pass-through mode).
Per-text caching, not per-batch — a batch with mixed hits/misses only
fetches the misses upstream, then merges the result preserving caller
input order. Three-text batch with one miss = one upstream call for
that one text instead of three.
Implementation:
internal/embed/cached.go (NEW, 150 LoC)
CachedProvider implements Provider; uses hashicorp/golang-lru/v2.
Key shape: "<model>:<sha256-hex>". Empty model resolves to
defaultModel (request-derived) for the key — NOT res.Model
(upstream-derived), so future requests with same input shape
hit the same key. Caught by TestCachedProvider_EmptyModelResolvesToDefault.
Atomic hit/miss counters + Stats() + HitRate() + Len().
internal/embed/cached_test.go (NEW, 12 test funcs)
Pass-through-when-zero, hit-on-repeat, mixed-batch only fetches
misses, model-key isolation, empty-model resolves to default,
LRU eviction at cap, error propagation, all-hits synthesized
without upstream call, hit-rate accumulation, empty-texts
rejected, concurrent-safe (50 goroutines × 100 calls), key
stability + distinctness.
internal/shared/config.go
EmbeddConfig.CacheSize (toml: cache_size). Default 10000.
cmd/embedd/main.go
Wraps Ollama Provider with CachedProvider on startup. Adds
/embed/stats endpoint exposing hits / misses / hit_rate / size.
Operators check the rate to confirm the cache is working
(high rate = good) or sized wrong (low rate + many misses on a
workload that should have repeats).
cmd/embedd/main_test.go
Stats endpoint tests — disabled mode shape, enabled mode tracks
hits + misses across repeat calls.
One real bug caught by my own test:
Initial implementation cached under res.Model (upstream-resolved)
rather than effectiveModel (request-resolved). A request with
model="" caching under "test-model" (Ollama's default), then a
request with model="the-default" (our config default) missing
the cache. Fix: always use the request-derived effectiveModel
for keys; that's the predictable side. Locked by
TestCachedProvider_EmptyModelResolvesToDefault.
Verified:
go test -count=1 ./internal/embed/ — all 12 cached tests + 6 ollama tests green
go test -count=1 ./cmd/embedd/ — stats endpoint tests green
just verify — vet + test + 9 smokes 33s
Production benefit:
~50ms Ollama round-trip → <1µs cache lookup for cached entries.
At 10K-entry default + ~30% repeat rate (typical staffing co-pilot
workload), saves several seconds per staffer-query session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
shared.Run now refuses to bind a non-loopback address unless the
LH_<SERVICE>_ALLOW_NONLOOPBACK=1 env is set. Single change covers
all 7 binaries via the existing Run call site; no per-binary
wiring needed.
Closes the accidental-0.0.0.0 deploy attack surface for R-001:
queryd /sql is RCE-equivalent off loopback (DuckDB has filesystem
read + COPY TO + read_text), but the gate applies to every binary
uniformly so the same posture covers vectord (mutation routes),
catalogd (manifest writes), and the others.
What passes the gate:
127.0.0.1:port, 127.x.y.z:port (full /8), [::1]:port,
localhost:port, OR explicit env LH_<SVC>_ALLOW_NONLOOPBACK=1
What fail-louds:
0.0.0.0:port, [::]:port, :port (all interfaces),
any non-loopback IP, any non-localhost hostname,
unparseable shapes ("", "no port", garbage)
Override env is strict equality "1" — typos like "true"/"yes" do NOT
trigger it, so a future operator can't accidentally expose by typing
the wrong value. Override fires log a structured warn so the choice
is auditable in production.
Error message cites the env name AND R-001 by name so operators see
the fix path without grepping:
"refusing non-loopback bind \"0.0.0.0:3214\" for \"queryd\"
(set LH_QUERYD_ALLOW_NONLOOPBACK=1 to override; see audit R-001)"
internal/shared/bind.go — requireLoopbackOrOverride + isLoopbackAddr
internal/shared/bind_test.go — 7 test funcs incl. table-driven
IPv4/IPv6/hostname coverage and
per-service env isolation
internal/shared/server.go — 1-line gate in Run before listen
Verified:
go test -short ./internal/shared/ — all green (was 14 funcs, now 21)
just verify — vet + test + 9 smokes still 33s
Doesn't address R-001's full attack surface (any reachable port can
issue arbitrary SQL); ADR-003 + Bearer-token middleware is the
follow-up. This commit makes the implicit "localhost-only is the auth
layer" guarantee explicit and un-bypassable without explicit env.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit-driven follow-up to the Rust scrum review on the 3 untested
HIGH-risk packages. Both the audit (reports/scrum/risk-register.md)
and the scrum (tests/real-world/runs/scrum_mojxb5bw/) independently
flagged these files as the highest-leverage missing test coverage.
internal/shared/server_test.go — 8 test funcs
newListener: valid addr, invalid addr (non-numeric port, port
out of range, port-already-in-use surfacing as net.OpError).
Empty-addr-is-valid: documents the net.Listen quirk that "" binds
an OS-picked port — future readers don't need to relitigate.
HealthResponse marshal: JSON shape stable, round-trip clean.
/health handler reconstructed via httptest.Server: status 200,
Content-Type application/json, body fields stable.
RegisterRoutes callback: contract verified (callback is invoked
with a real chi.Router, mounted route reachable end-to-end).
Run bind-failure surface: synchronous error, not a goroutine swallow
— the contract Run depends on per the race-safe-startup comment.
internal/shared/config_test.go — 6 test funcs
DefaultConfig G0 port pinning: every binary's default bind locked
in (3110/3211-3216) so a refactor can't silently flip a port.
LoadConfig empty path: returns DefaultConfig, no error.
LoadConfig missing file: returns DefaultConfig, logs warn (the warn
line shows up in test output, captured-but-not-asserted).
LoadConfig valid TOML: partial overrides land, unspecified sections
keep defaults (TOML decoder leave-alone behavior).
LoadConfig invalid TOML: returns wrapped 'parse config' error.
LoadConfig unreadable file: skipped under root (root reads 0000);
captures the read-error wrap path for non-root contexts.
internal/storeclient/client_test.go — 14 test funcs
safeKey table-driven: plain segments, single slash, empty, trailing
slash, space (→ %20), apostrophe (→ %27), unicode (→ %C3%A9),
deep nesting. Locks URL-escape contract per scrum suggestion.
recordingServer helper backs Put/Get/Delete/List against
httptest.Server: verifies method, path, body bytes round-trip.
ErrKeyNotFound on 404 (errors.Is round-trip).
Non-OK status wraps body preview into the error chain.
Delete accepts both 200 and 204 (S3 vs compatible-store quirk).
List parses JSON shape and surfaces query-string prefix.
Context cancellation propagates through Put as context.Canceled.
internal/queryd/db_test.go — 5 test funcs (with subtests)
sqlEscape table-driven: 8 cases including empty, all-quotes,
nested apostrophes (the case from the scrum suggestion).
redactCreds table-driven: 6 cases — both keys, single keys,
empty, multi-occurrence, placeholder-collision (lossy but safe).
buildBootstrap statement order: INSTALL → LOAD → CREATE SECRET.
buildBootstrap endpoint schemes: http strips + USE_SSL false,
https keeps SSL true, no-scheme defaults SSL true (prod ambient).
buildBootstrap URL_STYLE: 'path' vs 'vhost' branch.
buildBootstrap escapes credential quotes: future SSO-token-with-
apostrophe doesn't break out of the SQL string literal — the
belt holds when the suspenders snap.
Real finding caught by my own test:
net.Listen("tcp", "") succeeds (OS-picked port) — captured as
TestNewListener_EmptyAddrIsValid so the quirk is documented.
Verified:
go test -short ./... — every internal/ package now has tests
(no more 'no test files' lines for shared/storeclient).
just verify — vet + test + 9 smokes green in 33s.
just proof contract — 53/0/1 green (no harness regression).
Closes:
R-002 internal/shared zero tests HIGH
R-003 internal/storeclient zero tests HIGH
R-008 queryd/db.go untested MED (sqlEscape, redactCreds,
CREATE SECRET formation)
Composite scrum score should move from 43 → ~46 / 60 — the three
HIGH/MED risks closed, internal/shared and internal/storeclient
become "tested + load-bearing" instead of "untested + load-bearing."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bridges the missing piece for the staffing co-pilot: text inputs to
vectord-shaped vectors. Standalone cmd/embedd on :3216 fronted by
gateway at /v1/embed. Pluggable embed.Provider interface (G2 ships
Ollama; OpenAI/Voyage swap in via the same interface in G3+).
Wire format:
POST /v1/embed {"texts":[...], "model":"..."} // model optional
→ 200 {"model","dimension","vectors":[[...]]}
Default model: nomic-embed-text (768-d). Ollama returns float64;
provider converts to float32 at the boundary so vectors flow through
vectord/HNSW without re-conversion.
Acceptance smoke 5/5 PASS — including the architectural payoff:
end-to-end embed → vectord add → search by re-embedded text returns
recall=1 at distance 5.96e-8 (float32 precision noise on identical
unit vectors). The staffing co-pilot pipeline (text → vector →
similarity search) is now functional end-to-end.
All 9 smokes (D1-D6 + G1 + G1P + G2) PASS deterministically.
Cross-lineage scrum on shipped code:
- Opus 4.7 (opencode): 0 BLOCK + 4 WARN + 3 INFO
- Kimi K2-0905 (openrouter): 0 BLOCK + 2 WARN + 1 INFO
- Qwen3-coder (openrouter): "No BLOCKs" (3 tokens)
Fixed (2 — 1 convergent + 1 single-reviewer):
C1 (Opus + Kimi convergent WARN): per-text 60s timeout × N-text
batch was up to N×60s with no batch-level cap. One stuck Ollama
call would stall the whole handler indefinitely. Fix:
context.WithTimeout(r.Context(), 60s) wraps the entire batch.
O-W3 (Opus WARN): empty strings in texts went to Ollama unchecked,
producing version-dependent garbage. Fix: reject "" with 400 at
the handler boundary so callers get a deterministic answer
instead of an upstream-conditional 502.
Deferred (4): drainAndClose 64KiB cap (matches G0 pattern), no
concurrency limit on /embed (single-tenant G2), missing Accept
header (exotic-proxy concern), MaxBytesError string-match
redundancy (paranoia layer kept consistent across codebase).
Zero false positives this round — Qwen returned 3 tokens "No BLOCKs"
and the other two reviewers' findings were all real.
Setup confirmed: Ollama 0.21.0 on :11434 with nomic-embed-text loaded.
Per-text /api/embeddings used (forward-compat with 0.21+); newer
0.4+ /api/embed batch endpoint can swap in via the Provider interface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds optional persistence to vectord (G1's HNSW vector search). Single-
file framed format per index — eliminates the torn-write class that
the 3-way convergent scrum finding identified:
_vectors/<name>.lhv1 — single binary blob:
[4 bytes magic "LHV1"]
[4 bytes envelope_len uint32 BE]
[envelope bytes — JSON params + metadata + version]
[graph bytes — raw hnsw.Graph.Export]
Pre-extraction: internal/catalogd/store_client.go → internal/storeclient/
shared package, since both catalogd and vectord need it. Same pattern as
the pre-D5 catalogclient extraction.
Optional via [vectord].storaged_url config (empty = ephemeral mode).
On startup: List + Load each persisted index. After Create / batch Add /
DELETE: Save (or Delete from storaged). Save failures are logged-not-
fatal — in-memory state is the source of truth in flight.
Acceptance smoke G1P 8/8 PASS — kill+restart preserves state, post-
restart search returns dist=0 (graph round-trips exactly), DELETE
removes the file, post-delete restart shows count=0.
All 8 smokes (D1-D6 + G1 + G1P) PASS deterministically. The g1_smoke
gained scripts/g1_smoke.toml that disables persistence so the
in-memory API test stays decoupled from any rehydrate-from-storaged
state contamination.
Cross-lineage scrum on shipped code:
- Opus 4.7 (opencode): 1 BLOCK + 5 WARN + 3 INFO
- Kimi K2-0905 (openrouter): 1 BLOCK + 2 WARN
- Qwen3-coder (openrouter): 2 BLOCK + 2 WARN + 1 INFO
Fixed (3 — 1 convergent + 2 single-reviewer):
C1 (Opus + Kimi + Qwen 3-WAY CONVERGENT WARN): Save was non-atomic
across two PUTs — envelope-succeeds + graph-fails left a half-
saved index that passed the "both present" List filter and
silently mismatched metadata against vectors on Load.
Fix: collapse to single framed file (no torn-write window
possible).
O-B1 (Opus BLOCK): isNotFound substring-matched "key not found"
against the wrapped error message — brittle, any 5xx body
containing that text would silently misclassify as missing.
Fix: errors.Is(err, storeclient.ErrKeyNotFound).
O-I3 (Opus INFO): handleAdd pre-validation only covered id+dim;
NaN/Inf/zero-norm could still fail mid-batch leaving partial
commits. Fix: extend pre-validation to call ValidateVector
(newly exported) per item before any commit.
Dismissed (3 false positives):
K-B1 + Q-B1 ("safeKey double-escapes %2F segments") — false
convergent. Wire-protocol escape is decoded by storaged's chi
router on the way in; on-disk key is the original literal.
%2F round-trips correctly through PathEscape → URL → chi decode
→ S3 key.
Q-B2 ("List vulnerable to race conditions") — vectord is single-
process; no concurrent Save against List in the same vectord.
Deferred (3): rehydrate per-index timeout (G2+ multi-index scale),
saveAfter request ctx (matches G0 timeout deferral), Encode RLock
during slow writer (documented as buffer-only API).
The C1 finding is the strongest signal of the cross-lineage filter:
three independent reviewers all flagged the same torn-write hazard.
Single-file framing eliminates the class — there's now no Persistor
state where envelope and graph can disagree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First G1+ piece. Standalone vectord service with in-memory HNSW
indexes keyed by string IDs and optional opaque JSON metadata.
Wraps github.com/coder/hnsw v0.6.1 (pure Go, no cgo). New port
:3215 with /v1/vectors/* routed through gateway.
API:
POST /v1/vectors/index create
GET /v1/vectors/index list
GET /v1/vectors/index/{name} get info
DELETE /v1/vectors/index/{name}
POST /v1/vectors/index/{name}/add (batch)
POST /v1/vectors/index/{name}/search
Acceptance smoke 7/7 PASS — including recall=1 on inserted vector
w-042 (cosine distance 5.96e-8, float32 precision noise), 200-
vector batch round-trip, dim mismatch → 400, missing index → 404,
duplicate create → 409.
Two upstream library quirks worked around in the wrapper:
1. coder/hnsw.Add panics with "node not added" on re-adding an
existing key (length-invariant fires because internal
delete+re-add doesn't change Len). Pre-Delete fixes for n>1.
2. Delete of the LAST node leaves layers[0] non-empty but
entryless; next Add SIGSEGVs in Dims(). Workaround: when
re-adding to a 1-node graph, recreate the underlying graph
fresh via resetGraphLocked().
Cross-lineage scrum on shipped code:
- Opus 4.7 (opencode): 0 BLOCK + 4 WARN + 3 INFO
- Kimi K2-0905 (openrouter): 2 BLOCK + 2 WARN + 1 INFO
- Qwen3-coder (openrouter): "No BLOCKs" (4 tokens)
Fixed (4 real + 2 cleanup):
O-W1: Lookup returned the raw []float32 from coder/hnsw — caller
mutation would corrupt index. Now copies before return.
O-W3: NaN/Inf vectors poison HNSW (distance comparisons return
false for both < and >, breaking heap invariants). Zero-norm
under cosine produces NaN. Now validated at Add time.
K-B1: Re-adding with nil metadata silently cleared the existing
entry — JSON-omitted "metadata" field deserializes as nil,
making upsert non-idempotent. Now nil = "leave alone"; explicit
{} or Delete to clear.
O-W4: Batch Add with mid-batch failure left items 0..N-1
committed and item N rejected. Now pre-validates all IDs+dims
before any Add.
O-I1: jsonItoa hand-roll replaced with strconv.Itoa — no
measured allocation win.
O-I2: distanceFn re-resolved per Search → use stored i.g.Distance.
Dismissed (2 false positives):
K-B2 "MaxBytesReader applied after full read" — false, applied
BEFORE Decode in decodeJSON
K-W1 "Search distances under read lock might see invalidated
slices from concurrent Add" — false, RWMutex serializes
write-lock during Add against read-lock during Search
Deferred (3): HTTP server timeouts (consistent G0 punt),
Content-Type validation (internal service behind gateway), Lookup
dim assertion (in-memory state can't drift).
The K-B1 finding is worth pausing on: nil metadata on re-add is
the kind of API ergonomics bug only a code-reading reviewer
catches — smoke would never detect it because the smoke always
sends explicit metadata. Three lines changed in Add; the resulting
API matches what callers actually expect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Validated G0 substrate against the production workers_500k.parquet
dataset (18 cols × 500,000 rows). Findings + one applied fix:
Finding #1 (FIXED): ingestd's hardcoded 256 MiB cap rejected the 500K
CSV (344 MiB) with 413. Cap fired correctly, no OOM. Extracted to
[ingestd].max_ingest_bytes config field; default 256 MiB, override
per deployment for known-large workloads. With cap bumped to 512 MiB,
500K ingest succeeds in 3.12s with ingestd peak RSS 209 MiB.
Finding #2 (deferred): ingestd doesn't release memory between
ingests. Go runtime conservative; long-running daemon, fine.
Finding #3: DuckDB-via-httpfs is healthy at 500K. GROUP BY 45ms,
count(*) 24ms, AVG 47ms, schema introspection 25ms. Sub-linear
scaling vs 100K — the s3:// read path is not a bottleneck.
Finding #4: ADR-010 type inference correctly handled real staffing
data. worker_id → BIGINT, numeric scores → DOUBLE, multi-line
resume_text → VARCHAR. 1000-row sample sufficient.
Finding #5: Go's encoding/csv handles RFC 4180 quoted-comma fields
and multi-line quoted text without LazyQuotes — confirming the D4
scrum's dismissal of Qwen's BLOCK on this point.
Net: substrate handles production-scale data with one config knob.
No correctness issues, no OOMs, no silent type errors.
All 6 G0 smokes still PASS after the cap-config change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last day of Phase G0. Gateway promotes the D1 stub endpoints into
real reverse-proxies on :3110 fronting storaged + catalogd + ingestd
+ queryd. /v1 prefix lives at the edge — internal services route on
/storage, /catalog, /ingest, /sql, with the prefix stripped by a
custom Director per Kimi K2's D1-plan finding.
Routes:
/v1/storage/* → storaged
/v1/catalog/* → catalogd
/v1/ingest → ingestd
/v1/sql → queryd
Acceptance smoke 6/6 PASS — every assertion goes through :3110, none
direct to backing services. Full ingest → storage → catalog → query
round-trip verified end-to-end. The smoke's "rows[0].name=Alice"
assertion is the architectural payoff: five binaries, six HTTP
routes, one round-trip through one edge.
Cross-lineage scrum on shipped code:
- Opus 4.7 (opencode): 1 BLOCK + 2 WARN + 2 INFO
- Kimi K2-0905 (openrouter): 1 BLOCK + 3 WARN + 1 INFO (3 false positives, all from one wrong TrimPrefix theory)
- Qwen3-coder (openrouter): 5 completion tokens — "No BLOCKs."
Fixed (2, both Opus single-reviewer):
O-BLOCK: Director path stripping fails if upstream URL has a
non-empty path. The default Director's singleJoiningSlash runs
BEFORE the custom code, so an upstream like http://host/api
produces /api/v1/storage/... after the join — then TrimPrefix("/v1")
is a no-op because the string starts with /api. Fix: strip /v1
BEFORE calling origDirector. New TestProxy_SubPathUpstream regression
locks this in. Today: bare-host URLs only, dormant — but moving
gateway behind a sub-path in prod would have silently 404'd.
O-WARN2: url.Parse is permissive — typo "127.0.0.1:3211" (no scheme)
parses fine, produces empty Host, every request 502s. mustParseUpstream
fail-fast at startup with a clear message naming the offending
config field.
Dismissed (3, all Kimi, same false TrimPrefix theory):
K-BLOCK "TrimPrefix loops forever on //v1storage" — false, single
check-and-trim, no loop
K-WARN "no upper bound on repeated // removal" — same false theory
K-WARN "goroutines leak if upstream parse fails while binaries
running" — confused scope; binaries are separate OS processes
launched by the smoke script
D1 smoke updated (post-D6): the 501 stub probes are gone (gateway no
longer stubs /v1/ingest and /v1/sql). Replaced with proxy probes that
verify gateway forwards malformed requests to ingestd and queryd. Launch
order changed from parallel to dep-ordered (storaged → catalogd →
ingestd → queryd → gateway) since catalogd's rehydrate now needs
storaged, queryd's initial Refresh needs catalogd.
All six G0 smokes (D1 through D6) PASS end-to-end after every fix
round. Phase G0 substrate is complete: 5 binaries, 6 routes, 25 fixes
applied across 6 days from cross-lineage review.
G1+ next: gRPC adapters, Lance/HNSW vector indices, Go MCP SDK port,
distillation rebuild, observer + Langfuse integration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase G0 Day 5 ships queryd: in-memory DuckDB with custom Connector
that runs INSTALL httpfs / LOAD httpfs / CREATE OR REPLACE SECRET
(TYPE S3) on every new connection, sourced from SecretsProvider +
shared.S3Config. SetMaxOpenConns(1) so registrar's CREATE VIEWs and
handler's SELECTs serialize through one connection (avoids cross-
connection MVCC visibility edge cases).
Registrar.Refresh reads catalogd /catalog/list, runs CREATE OR
REPLACE VIEW "name" AS SELECT * FROM read_parquet('s3://bucket/key')
per manifest, drops views for removed manifests, skips on unchanged
updated_at (the implicit etag). Drop pass runs BEFORE create pass so
a poison manifest can't block other manifest refreshes (post-scrum
C1 fix).
POST /sql with JSON body {"sql":"…"} returns
{"columns":[{"name":"id","type":"BIGINT"},…], "rows":[[…]],
"row_count":N}. []byte → string conversion so VARCHAR rows
JSON-encode as text. 30s default refresh ticker, configurable via
[queryd].refresh_every.
Cross-lineage scrum on shipped code:
- Opus 4.7 (opencode): 1 BLOCK + 4 WARN + 4 INFO
- Kimi K2-0905 (openrouter): 2 BLOCK + 2 WARN + 1 INFO
- Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO
Fixed (4):
C1 (Opus + Kimi convergent): Refresh aborts on first per-view error
→ drop pass first, collect errors, errors.Join. Poison manifest
no longer blocks the rest of the catalog from re-syncing.
B-CTX (Opus BLOCK): bootstrap closure captured OpenDB's ctx →
cancelled-ctx silently fails every reconnect. context.Background()
inside closure; passed ctx only for initial Ping.
B-LEAK (Kimi BLOCK): firstLine(stmt) truncated CREATE SECRET to 80
chars but those 80 chars contained KEY_ID + SECRET prefix → log
aggregator captures credentials. Stable per-statement labels +
redactCreds() filter on wrapped DuckDB errors.
JSON-ERR (Opus WARN): swallowed json.Encode error → silent
truncated 200 on unsupported column types. slog.Warn the failure.
Dismissed (4 false positives):
Qwen BLOCK "bootstrap not transactional" — DuckDB DDL is auto-commit
Qwen BLOCK "MaxBytesReader after Decode" — false, applied before
Kimi BLOCK "concurrent Refresh + user SELECT deadlock" — not a
deadlock, just serialization, by design with 10s timeout retry
Kimi WARN "dropView leaves r.known inconsistent" — current code
returns before the delete; the entry persists for retry
Critical reviewer behavior: 1 convergent BLOCK between Opus + Kimi
on the per-view error blocking, plus two independent single-reviewer
BLOCKs (B-CTX, B-LEAK) that smoke could never have caught. The
B-LEAK fix uses defense-in-depth: never pass SQL into the error
path AND redact known cred values from DuckDB's own error message.
DuckDB cgo path: github.com/duckdb/duckdb-go/v2 v2.10502.0 (per
ADR-001 §1) on Go 1.25 + arrow-go. Smoke 6/6 PASS after every
fix round.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
queryd (D5) needs the same HTTP client to catalogd that ingestd uses,
but the client lived in internal/ingestd — having queryd import from
ingestd would invert the data-flow direction (ingestd is upstream of
queryd; the package dep should not point back). Extract to a shared
internal/catalogclient/ package now, before D5 forces it under
implementation pressure.
Adds the List(ctx) method queryd will need for view registration.
Unit tests cover Register success/conflict and List success/error
paths against an httptest.Server fake.
ingestd's import flips from internal/ingestd → internal/catalogclient;
the wire format and behavior are unchanged. All four smokes (D1/D2/D3/
D4) PASS unchanged. DuckDB cgo path re-verified with the official
github.com/duckdb/duckdb-go/v2 (per ADR-001) on Go 1.25 + arrow-go.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase G0 Day 4 ships ingestd: multipart CSV upload, Arrow schema
inference per ADR-010 (default-to-string on ambiguity), single-pass
streaming CSV → Parquet via pqarrow batched writer (Snappy compressed,
8192 rows per batch), PUT to storaged at content-addressed key
datasets/<name>/<fp_hex>.parquet, register manifest with catalogd.
Acceptance smoke 6/6 PASS including idempotent re-ingest (proves
inference is deterministic — same CSV always produces same fingerprint)
and schema-drift → 409 (proves catalogd's gate fires on ingest traffic).
Schema fingerprint is SHA-256 over (name, type) tuples in header order
using ASCII record/unit separators (0x1e/0x1f) so column names with
commas can't collide. Nullability intentionally NOT in the fingerprint
— a column gaining nulls isn't a schema change.
Cross-lineage scrum on shipped code:
- Opus 4.7 (opencode): 4 WARN + 3 INFO (after 2 self-retracted BLOCKs)
- Kimi K2-0905 (openrouter): 1 BLOCK + 2 WARN + 1 INFO
- Qwen3-coder (openrouter): 2 BLOCK + 2 WARN + 2 INFO
Fixed (2, both Opus single-reviewer):
C-DRIFT: PUT-then-register on fixed datasets/<name>/data.parquet
meant a schema-drift ingest overwrote the live parquet BEFORE
catalogd's 409 fired → storaged inconsistent with manifest.
Fix: content-addressed key datasets/<name>/<fp_hex>.parquet.
Drift writes to a different file (orphan in G2 GC scope); the
live data is never corrupted.
C-WCLOSE: pqarrow.NewFileWriter not Closed on error paths leaks
buffered column data + OS resources per failed ingest.
Fix: deferred guarded close with wClosed flag.
Dismissed (5, all false positives):
Qwen BLOCK "csv.Reader needs LazyQuotes=true for multi-line" — false,
Go csv handles RFC 4180 multi-line quoted fields by default
Qwen BLOCK "row[i] OOB" — already bounds-checked at schema.go:73
and csv.go:201
Kimi BLOCK "type assertion panic if pqarrow reorders fields" —
speculative, no real path
Kimi WARN + Qwen WARN×2 "RecordBuilder leak on early error" —
false convergent. Outer defer rb.Release() captures the current
builder; in-loop release runs before reassignment. No leak.
Deferred (6 INFO + accepted-with-rationale on 3 WARN): sample
boundary type mismatch (G0 cap bounds peak), string-match
paranoia on http.MaxBytesError, multipart double-buffer (G2 spool-
to-disk), separator validation, body close ordering, etc.
The D4 scrum produced fewer real findings than D3 (2 vs 6) — both
were architectural hazards smoke wouldn't catch because the smoke's
"schema drift → 409" assertion was passing even in the corrupted-
state world. The 409 fires correctly; what was wrong was the PUT
having already mutated the live parquet before the validation check.
Opus's PUT-then-register read of the order is exactly the kind of
architectural insight the cross-lineage scrum is designed to surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase G0 Day 3 ships catalogd: Arrow Parquet manifest codec, in-memory
registry with the ADR-020 idempotency contract (same name+fingerprint
reuses dataset_id; different fingerprint → 409 Conflict), HTTP client
to storaged for persistence, and rehydration on startup. Acceptance
smoke 6/6 PASSES end-to-end including rehydrate-across-restart — the
load-bearing test that the catalog/storaged service split actually
preserves state.
dataset_id derivation diverges from Rust: UUIDv5(namespace, name)
instead of v4 surrogate. Same name on any box generates the same
dataset_id; rehydrate after disk loss converges to the same identity
rather than silently re-issuing. Namespace pinned at
a8f3c1d2-4e5b-5a6c-9d8e-7f0a1b2c3d4e — every dataset_id ever issued
depends on these bytes.
Cross-lineage scrum on shipped code:
- Opus 4.7 (opencode): 1 BLOCK + 5 WARN + 3 INFO
- Kimi K2-0905 (openrouter, validated D2): 2 BLOCK + 2 WARN + 1 INFO
- Qwen3-coder (openrouter): 2 BLOCK + 2 WARN + 2 INFO
Fixed:
C1 list-offsets BLOCK (3-way convergent) → ValueOffsets(0) + bounds
C2 Rehydrate mutex held across I/O → swap-under-brief-lock pattern
S1 split-brain on persist failure → candidate-then-swap
S2 brittle string-match for 400 vs 500 → ErrEmptyName/ErrEmptyFingerprint sentinels
S3 Get/List shallow-copy aliasing → cloneManifest deep copy
S4 keep-alive socket leak on error paths → drainAndClose helper
Dismissed (false positives, all single-reviewer):
Kimi BLOCK "Decode crashes on empty Parquet" — already handled
Kimi INFO "safeKey double-escapes" — wrong, splitting before escape is required
Qwen INFO "rb.NewRecord() error unchecked" — API returns no error
Deferred to G1+: name validation regex, per-call deadlines, Snappy
compression, list pagination continuation tokens (storaged caps at
10k with sentinel for now).
Build clean, vet clean, all tests pass, smoke 6/6 PASS after every
fix round. arrow-go/v18 + google/uuid added; Go 1.24 → 1.25 forced
by arrow-go's minimum.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase G0 Day 2 ships storaged: aws-sdk-go-v2 wrapper + chi routes
binding 127.0.0.1:3211 with 256 MiB MaxBytesReader, Content-Length
up-front 413, and a 4-slot non-blocking semaphore returning 503 +
Retry-After:5 when full. Acceptance smoke (6/6 probes) PASSES against
the dedicated MinIO bucket lakehouse-go-primary, isolated from the
Rust system's lakehouse bucket during coexistence.
Cross-lineage scrum on the shipped code:
- Opus 4.7 (opencode): 1 BLOCK + 3 WARN + 3 INFO
- Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO (3 false positives)
- Kimi K2-0905 (openrouter, after route-shopping past opencode's 4k
cap and the direct adapter's empty-content reasoning bug):
1 BLOCK + 2 WARN + 1 INFO
Fixed:
C1 buildRegistry ctx cancel footgun → context.Background()
(Opus + Kimi convergent; future credential refresh chains)
C2 MaxBytesReader unwrap through manager.Uploader multipart
goroutines → Content-Length up-front 413 + string-suffix fallback
(Opus + Kimi convergent; latent 500-instead-of-413 in 5-256 MiB range)
C3 Bucket.List unbounded accumulation → MaxListResults=10_000 cap
(Opus + Kimi convergent; OOM guard)
S1 PUT response Content-Type: application/json (Opus single-reviewer)
Strict validateKey policy (J approved): rejects empty, >1024B, NUL,
leading "/", ".." path components, CR/LF/tab control characters.
DELETE exposed at HTTP layer (J approved option A) for symmetry +
smoke ergonomics.
Build clean, vet clean, all unit tests pass, smoke 6/6 PASS after
every fix round. go.mod 1.23 → 1.24 (required by aws-sdk-go-v2).
Process finding worth recording: opencode caps non-streaming Kimi at
max_tokens=4096; the direct kimi.com adapter consumed 8192 tokens of
reasoning but surfaced empty content; openrouter/moonshotai/kimi-k2-0905
delivered structured output in ~33s. Future Kimi scrums should default
to that route.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>