3-lineage scrum review of the role-gate work (commits 7f2f112..0331288)
ran Opus 4.7 / Kimi K2.6 / Qwen3-coder via scripts/scrum_review.sh.
All three flagged the same edge case: the homegrown plural-stripper
in roleNormalize would collapse non-plural-s tokens like "Sales" →
"Sale", "Logistics" → "Logistic", "Operations" → "Operation". In a
staffing domain those are real role names; the silent normalization
would have caused false role-equality matches and re-opened the
cross-role bleed for those clusters.
Fix:
- nonPluralSWords allowlist for known staffing-domain non-plural-s
tokens (sales, logistics, operations, facilities, premises, news,
physics, economics, mathematics, analytics).
- Last-word-only stripping ("Sales Associate" stays whole; only
"Associates" head noun is plural-checked).
- -ss ending check so "Press Operator", "Boss" don't lose their s.
- strings.ToLower + strings.TrimSpace replace the homegrown rune-
loop ASCII normalizer (Opus INFO — minor cleanup, folded in).
Tests:
- TestRoleNormalize_NonPluralS: 18 cases covering the allowlist,
-ss ending, real plurals (Operators → Operator, Boxes → Box),
multi-word real plurals (Forklift Operators → forklift operator),
whitespace/case tolerance.
- TestRoleEqual_NonPluralS: gate-level pairing — proves equal-
shape allowlisted tokens compare equal AND that "Sales" ≠ "Sale"
(the original bug shape).
- Existing TestRoleEqual_PluralAndCase still green (refactor
preserved behavior).
Other scrum findings dispositioned (not actioned):
- Opus WARN on empty-role fail-open semantics: documented
backward-compat behavior; production path closes via opt-in LLM
extractor (real_004).
- Opus INFO on unsynchronized package-global cache map: harness is
single-goroutine; add sync.Mutex when/if it parallelizes.
- Opus INFO on parallel constructor (NewPlaybookEntryWithRole vs
optional arg): API smell only, both forms preserved.
- Kimi 2 BLOCKs (NewPlaybookEntryWithRole missing, ApplyPlaybookBoost
signature breakage): FALSE positives. Pre-push smoke chain green
on 0331288, both symbols + all call sites compile clean. Matches
feedback_cross_lineage_review.md's documented Kimi truncation
behavior — Kimi BLOCKs warrant trace verification before action.
Disposition (local): reports/scrum/_evidence/2026-04-30/verdicts/role_gate_v1_disposition.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
real_003 left a known-weak hole: shorthand-style queries
("{count} {role} {city} {state} ...") have no separator between
role and city, so a regex can't reliably extract — leaving the
cross-role gate disabled when both record AND query are shorthand.
This commit adds a roleExtractor with regex-first + LLM fallback:
- Regex first (fast, deterministic) — handles need + client_first +
looking from real_003b. ~75% of styles, no LLM cost paid.
- LLM fallback when regex returns empty AND model is configured —
Ollama-shape /api/chat with format=json, schema-tight prompt,
temperature 0. ~1-3s on local qwen2.5.
- Per-process cache — paraphrase + rejudge passes reuse the same
query 4× per run; cache prevents 4× LLM cost.
- Off-by-default — opt-in via -llm-role-extract flag (CLI) and
LLM_ROLE_EXTRACT=1 env var (harness wrapper). real_003b shipping
config unchanged unless explicitly enabled.
8 new tests in scripts/playbook_lift/main_test.go:
- TestRoleExtractor_RegexFirst: LLM not called when regex matches
- TestRoleExtractor_LLMFallback: shorthand goes to LLM
- TestRoleExtractor_LLMOffLeavesEmpty: opt-in default preserved
- TestRoleExtractor_Cache: 3 calls = 1 LLM hit
- TestRoleExtractor_NilSafe: nil receiver runs regex only
- TestExtractRoleViaLLM_HTTPError + _BadJSON: failure paths
- TestRoleExtractor_ClosesCrossRoleShorthandBleed: synthetic
witness for the real_003 scenario — both record + query are
shorthand, regex returns "" for both, LLM produces DIFFERENT
role tokens for CNC vs Forklift, so matrix gate's cross-role
rejection (locked separately in
TestInjectPlaybookMisses_RoleGateRejectsCrossRole) fires
correctly. This is the load-bearing verification.
Reality test real_004 ran the same 40-query stress as real_003 with
LLM extraction on. Cross-style same-role boosts fired correctly
across all 4 styles for Loaders + Packers + Shipping Clerk clusters
(including shorthand → other-style transfer). No cross-role bleed
observed. The reality test alone can't be a clean "with vs without"
comparison (HNSW build is non-deterministic across runs, and
real_004 stochastics didn't trigger a shorthand recording at all),
which is why the unit-test witness exists.
Production note (in real_004_findings.md): LLM extraction is for
reality-test coverage of arbitrary query shapes. Production should
extract role at INGEST time (when the inbox parser already runs an
LLM) and pass already-resolved role through requests — same shape
as multi_coord_stress's existing Demand{Role: ...} model. The hot
path should never need the harness extractor's per-query LLM cost.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stress-tests the role gate with 40 queries (10 fill_events rows × 4
styles): need, client_first, looking, shorthand. Each row's role +
client + city stays the same; only the surface phrasing changes.
real_003 (original extractor) confirmed the shorthand-vs-shorthand
failure mode: CNC Operator shorthand recording leaked w-2404 onto
Forklift Operator shorthand query within the same Beacon Freight
Detroit cluster. Both record + query had empty role (extractor
returns "" for shorthand because there's no separator between role
and city), gate disabled, distance check passed, bleed fired.
Fix: extended extractRoleFromNeed to handle client_first
("{client} needs N {role} in...") and looking ("Looking for N
{role} at...") patterns. Shorthand left intentionally unmatched —
"Forklift Operator Detroit" is shape-indistinguishable from
"Forklift" + "Operator Detroit" without an LLM extractor or known-
cities lookup.
real_003b (extended extractor) verifies bleed closed across all 4
styles for this dataset. Forklift Operator queries keep w-2136 (the
cold-pass-correct match) regardless of which style the query came
in. Same-role boosts now fire correctly across styles — a CNC
Operator recording made in `looking` style boosts the CNC need-form
query.
scripts/cutover/gen_real_queries.go: added -styles flag with values
need|client_first|looking|shorthand|all (default need preserves
real_001/002 behavior). Tests/reality/real_coord_queries_v2.txt is
the 40-query stress file.
scripts/playbook_lift/main_test.go: 10 sub-tests lock the four
documented patterns + shorthand limitation + lift-suite-style
queries (no clean role, returns empty as expected).
Aggregate metrics:
- real_003 (original): disc=7, lift=7, boost=14, meanΔ=-0.108
- real_003b (extended): disc=11, lift=10, boost=31, meanΔ=-0.202
The growth reflects more LEGITIMATE same-role same-cluster transfer
firing across styles, not bleed (verified by per-cluster bleed
table — Forklift Operator queries unchanged across all 4 styles).
Known limitation documented in real_003_findings.md: same-cluster,
same-role queries in shorthand still embed close enough that a
shorthand recording could bleed onto a different-role shorthand
query if both record + query strip role. Closing this requires
LLM extraction or known-cities lookup at record + query time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
real_001 surfaced same-client+city queries bleeding across roles:
Q#2 (Forklift Operator @ Beacon Freight Detroit) recorded e-6193
in the playbook corpus. Q#5 (Pickers same client+city) and Q#10
(CNC Operator same client+city) embedded within 0.13-0.18 cosine of
Q#2's query — well inside the 0.20 inject threshold — so e-6193
injected on both, demoting the cold-pass-correct workers.
Root cause: the inject distance threshold isn't tight enough on
the same-client+city cluster. Cosine collapses queries that share
city + client + count-token + time-token regardless of role. The
existing judge gate is per-injection at record time and doesn't
fire at retrieve time.
Fix: structural role gate in front of both Shape A boost and
Shape B inject. PlaybookEntry gains Role; SearchRequest gains
QueryRole. When both are non-empty and differ under roleEqual's
case+plural normalization, the entry is rejected before BoostFactor
or judge-gate logic runs.
Backward-compat: empty role on either side disables the gate —
preserves behavior for the lift suite's free-form multi-constraint
queries that have no clean single role. Caller-supplied (not
inferred), so existing recordings unaffected.
Wire-through:
- internal/matrix/playbook.go: Role field, NewPlaybookEntryWithRole,
roleEqual helper with plural+case normalization
- internal/matrix/retrieve.go: QueryRole on SearchRequest, threaded
to both ApplyPlaybookBoost + InjectPlaybookMisses
- cmd/matrixd/main.go: role on POST /matrix/playbooks/record + bulk
- scripts/playbook_lift/main.go: extractRoleFromNeed regex pulls
role from "Need N {role}{s} in" queries (the fill_events shape);
free-form queries fall back to empty (gate disabled)
Tests (5 new):
- TestInjectPlaybookMisses_RoleGateRejectsCrossRole: exact Q#10
scenario (distance 0.135, recorded "Forklift Operator", query
"CNC Operator") — locks the bleed at unit level
- TestInjectPlaybookMisses_RoleGateAllowsSameRole: Forklift Operator
recording fires on Forklift Operators query (plural normalization)
- TestInjectPlaybookMisses_RoleGateBackwardCompat: empty Role on
either side = gate disabled, preserves current behavior
- TestApplyPlaybookBoost_RoleGateRejectsCrossRole: Shape A defense
in depth — boost doesn't fire on cross-role even when answer is
in cold top-K
- TestRoleEqual_PluralAndCase: case + -s + -es plural normalization
Verification (real_002, same query set as real_001):
- Q#5 Pickers @ Beacon Freight: e-6193 → e-8499 (no bleed)
- Q#10 CNC Operator @ Beacon Freight: e-6193 → w-2404 (no bleed)
- Discoveries + lifts unchanged at 2 each (same-role lift still fires)
- Mean Δdist tightens from -0.127 to -0.040 (boosts no longer
pulling distances through the floor on cross-role mismatches)
Findings: reports/reality-tests/real_002_findings.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First retrieval probe with non-synthetic query distribution. Pulls
N rows from /home/profit/lakehouse/data/datasets/fill_events.parquet
(real-shape demand data) and translates each to the natural language
a coordinator would type: "Need {count} {role}s in {city} {state}
starting at {at} for {client}".
Headline: 8/10 cold-pass top-1 = judge-best on real distribution.
Substrate works on queries it was never trained for. v2-moe + workers
corpus carry the load.
Surfaced finding (the real value of running this): same-client+city
queries cluster, and Shape A's distance boost bleeds across roles
within the cluster. Q#2 (Forklift @ Beacon Freight Detroit) records
e-6193 in the playbook corpus. Q#5 (Pickers same client+city) and
Q#10 (CNC Operator same client+city) inherit e-6193 at warm top-1
even though:
- Neither query has its own recorded playbook.
- Neither warm pass triggers a Shape B inject (boosted=0).
- The roles are different staffing categories.
Q#10 specifically demoted the cold-pass-correct w-3759 (judge rating
4 at rank 0) for a worker who was approved by the judge for a
different role on a different query.
Why the lift suite missed it: synthetic queries use 7 disjoint
scenario buckets (forklift+OSHA+WI / CDL+IL / etc.). Real demand
clusters on (client, city). The cluster doesn't exist in the
synthetic distribution.
Why the judge gate doesn't catch it: the gate (5a3364f) is
per-injection at record time. After approval the worker rides Shape A
distance boosts on all later same-cluster queries with no second
gate call.
Becomes new OPEN #1. Fix candidate: role-scoped playbook corpus
metadata + Shape A boost gate on role match. Cheap; doesn't need
new judge calls.
Files:
- scripts/cutover/gen_real_queries.go: parquet → coordinator NL
- tests/reality/real_coord_queries.txt: 10 generated queries
- reports/reality-tests/playbook_lift_real_001.md: harness output
- reports/reality-tests/real_001_findings.md: the reading
Repro:
go run scripts/cutover/gen_real_queries.go -limit 10 > tests/reality/real_coord_queries.txt
QUERIES_FILE=tests/reality/real_coord_queries.txt RUN_ID=real_001 \
WITH_PARAPHRASE=0 WITH_REJUDGE=0 ./scripts/playbook_lift.sh
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First concrete cutover artifact: scripts/cutover/embed_parity.sh
brings up Go embedd + gateway alongside the live Rust gateway,
hits both /ai/embed and /v1/embed with the same forced model, and
emits a per-date verdict report under reports/cutover/.
Why embed first: the parity invariant is one math identity (cosine
sim of vectors against same input). Retrieve has thousands of edge
cases. If embed parity holds, all downstream vector consumers
inherit confidence; if it doesn't, we catch it in 30s instead of
after a flip.
Verdict 2026-04-30: 5/5 samples cosine=1.000000 with model forced
to nomic-embed-text (v1). Same with nomic-embed-text-v2-moe (both
Ollamas have it loaded). Math is provably equivalent across the
gateway plumbing.
Drift catalog (reports/cutover/SUMMARY.md):
- URL: Rust /ai/embed vs Go /v1/embed
- Wire: Rust {embeddings, dimensions} (plural) vs Go {vectors,
dimension} (singular). Wire-format adapter is the only real
cutover work for this endpoint.
- L2 norm: Rust unit vectors (~1.0); Go raw Ollama (~20-23). Same
direction (cos=1.0); harmless under cosine-distance HNSW (which
is Go vectord's default), but worth fixing in internal/embed/
before extending to euclidean indexes.
reports/cutover/ now tracked (joined the scrum/ + reality-tests/
exemptions in .gitignore).
Next probe: /v1/matrix/retrieve ↔ Rust /vectors/hybrid for the
real user-facing retrieve path. Embed parity gives that probe a
clean foundation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds langfuseMiddleware in internal/shared so every daemon's
shared.Run gets free production-traffic trace visibility when
LANGFUSE_URL + LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY are set.
Same env names + file shape as the multi_coord_stress driver, so
operators ship one /etc/lakehouse/langfuse.env across the deploy.
Wiring is auth-gated: middleware runs INSIDE the RequireAuth group,
so 401s from credential-stuffing don't pollute traces. /health is
exempt so LB probes don't either. Missing env vars → nil client →
middleware is a passthrough no-op (fail-open per ADR-005 5.1).
Bundled deploy:
- langfuse.env.example template (mode 0640, root:lakehouse)
- 11 systemd units gain `EnvironmentFile=-/etc/lakehouse/langfuse.env`
(leading - so missing file = OK)
- REPLICATION.md bootstrap section documents setup
Tests (4): nil passthrough, /health bypass, real-request emission,
status-writer wrapping. All green.
STATE_OF_PLAY OPEN list: 5 rows → 4 rows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lift suite run #004 left two unresolved tail issues:
- Q6 ("Forklift loader") ↔ Q7 ("Hazmat warehouse, cold storage")
swap recordings as warm top-1 because their embeddings are within
0.20 cosine of each other. Distance gate can't tell them apart.
- Q9 + Q15 lose paraphrase recovery when qwen2.5 rephrases past the
0.20 threshold. Distance says "drift too far"; sometimes the drift
is real (skip), sometimes the paraphrase is still on-domain (don't
want to skip).
Multi-coord run #008's judge re-rating proved the LLM can
distinguish: Q3 crane case landed at distance 0.23 (looks tight)
but rating 1 (irrelevant). The judge sees domain mismatch the
embedder doesn't.
This commit lifts that pattern into the matrix substrate. Shape B
inject now optionally routes every candidate through a judge gate
before the rank insert lands. Distance + judge BOTH have to approve.
internal/matrix/playbook.go:
- InjectPlaybookMisses signature gains a query string + an
optional InjectGate. nil gate preserves pre-judge-gating
behavior (current tests already pass with nil).
- New InjectGate interface + InjectGateFunc adapter for tests
and non-LLM callers.
- Per-candidate gate.Approve(query, hit) call inserted between
the dedup and the inject. Rejected candidates skip silently;
injected count reflects post-gate decision.
internal/matrix/judge.go (new, ~140 lines):
- LLMJudgeGate calls an Ollama-shape /api/chat endpoint with the
same 1-5 staffing-rubric prompt that worked in multi_coord
run #008. fail-closed on HTTP/JSON errors (don't inject if
judge can't speak — better miss than wrong-domain).
- NewLLMJudgeGate returns nil when URL or Model is empty,
matching InjectGate's nil-means-no-judge semantics.
internal/matrix/retrieve.go:
- SearchRequest gains JudgeURL, JudgeModel, JudgeMinRating
fields. Run() builds an LLMJudgeGate when set; passes nil
otherwise. Backward compatible — existing callers see no
behavior change.
Tests:
- TestInjectPlaybookMisses_GateRejectsCandidate (rejectAll → 0
injected, even with tight distance)
- TestInjectPlaybookMisses_GateApprovesCandidate (approveAll →
same as nil-gate behavior)
- TestInjectPlaybookMisses_GateSeesCorrectQuery (gate receives
CURRENT query + RECORDED query separately so it can score
the (current, candidate) pair)
- All 5 existing inject tests updated to new signature
go test ./internal/matrix → all 8 inject tests pass.
go test ./internal/matrix ./internal/shared ./cmd/{matrixd,
queryd,pathwayd,observerd} → all green.
STATE_OF_PLAY:
- OPEN item #1 (judge-gated injection) closed.
- DO NOT RELITIGATE adds the substrate-level judge-gate lock.
- OPEN list now 5 rows (was 6).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sprint 4 row removed (shipped: a59ef5b systemd + 54a05d9 docker).
ADR-006 row already dropped on the previous STATE update.
Two lift-suite tail items (Q6↔Q7 adjacent-query, Q9/Q15 liberal-
paraphrase) consolidated into one "judge-gated playbook injection"
row — both are downstream of the same fix (let the judge approve
before Shape B inserts). Captures the design lineage from
multi-coord run #008's judge-rating pattern.
Three items folded into a single "operational nice-to-haves" row:
real-time clock, chatd fixture storage half, liberal-paraphrase
calibration. None are product-blocking; each lights up when
someone hits its specific trigger.
Reorder reflects leverage on the active product theory (multi-
coord staffing co-pilot via the 5-loop substrate), not effort:
1. Judge-gated injection (lift quality + lift-tail closure)
2. Wider Langfuse instrumentation (production observability)
3. Fresh→main merge (operational hygiene as the corpus grows)
4. Distillation full port (production dependency, not yet)
5. Drift quantification (research)
6. Operational nice-to-haves
Lead-in note added: "Items move to closed when the work demands
them, not on a calendar." Locks intent against future drift toward
a sprawling todo list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-003 locked the auth substrate; ADR-006 ratifies the operator
playbook + adds two implementation pieces needed for Sprint 4
deployment: env-resolved tokens and dual-token rotation.
Six decisions locked in docs/DECISIONS.md:
- 6.1: Non-loopback bind requires auth.token (mechanical gate at
shared.Run, already implemented; this ratifies it).
- 6.2: Token from env, not TOML. /etc/lakehouse/auth.env (mode 0600)
loaded by systemd EnvironmentFile=. New TokenEnv field on
AuthConfig defaults to "AUTH_TOKEN".
- 6.3: AllowedIPs for inter-service same-trust-domain; Token for
cross-trust-boundary (gateway ↔ external).
- 6.4: /health stays unauthenticated; everything else under
shared.Run is gated. Already implemented; ratified here.
- 6.5: Token rotation is dual-token. New SecondaryTokens []string
on AuthConfig — both primary and any secondary pass auth
during the rotation window. Implemented in this commit.
- 6.6: TLS terminates at the network edge (nginx/Caddy), not
in-process. Daemons stay HTTP-only; internal traffic stays
on private subnets per Decision 6.3.
Implementation:
- internal/shared/config.go: AuthConfig gains TokenEnv +
SecondaryTokens fields. New resolveAuthFromEnv() called by
LoadConfig fills Token from os.Getenv(TokenEnv) when Token is
empty. TokenEnv defaults to "AUTH_TOKEN" so the happy path needs
no TOML config.
- internal/shared/auth.go: RequireAuth pre-encodes Bearer headers
for primary + every secondary token; per-request constant-time
compare walks the slice. Fast path is 1 compare (primary).
Tests:
- TestLoadConfig_AuthTokenFromEnv (3 sub-tests): default env name,
custom token_env, explicit Token wins over env.
- TestRequireAuth_SecondaryTokenAccepted: both primary + secondary
tokens pass during rotation window.
- TestRequireAuth_SecondaryTokensOnly: only-secondary path works
for the case where primary was just promoted-to-empty mid-rotation.
go test ./internal/shared all green; existing auth_test.go
unchanged (constant-time compare path preserved).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Anchor was last touched at v4 split-threshold; since then the
multi-coord stress harness landed end-to-end across 11 commits.
Future sessions reading this file need to see the verified state,
not derive from git log.
Major additions:
- New "Multi-coordinator stress test (Phase 1 → 3)" section in
VERIFIED WORKING. 11-row capability table covering per-coord
playbook isolation, diversity metrics, paraphrase handover,
ExcludeIDs swap, fresh-resume two-tier, inbox endpoints, LLM
demand parsing, judge re-rating, Langfuse tracing.
- Substrate-gains list under that section: ExcludeIDs on
SearchRequest, observer.SourceInbox + /observer/inbox,
internal/langfuse client, embedd default bumped to v2-moe,
two-tier fresh_workers index pattern.
- Last-verified bumped to 16:42 CDT on the run #011 anchor.
DO NOT RELITIGATE expanded with five new locks:
1. Boost / inject use SEPARATE thresholds (0.5 / 0.20)
2. Multi-coord product theory is empirically VALIDATED
3. Fresh content uses two-tier indexing (fresh_workers)
4. embedd.default_model = nomic-embed-text-v2-moe (don't downgrade)
5. Inbox flow: parse + search + judge + trace
6. Langfuse Go-side client lives at internal/langfuse/
OPEN list refresh:
- Removed: re-judge metric (shipped as b13b5cd), adjacent-query as
separate item (folded into a single "judge-approves-before-inject"
follow-up), liberal-paraphrase (kept).
- Added: real-time 48-hour clock, wider Langfuse instrumentation,
periodic fresh→main merge job.
RECENT VERIFIED WAVE table extended with 11 commits (b13b5cd..5d49967).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Reality test table extends from #001-#003 to #001-#004; v4 row marked
as "the honest configuration" because OOD cross-pollination is gone.
- Shape B section gains the split-threshold rationale (boost safe at
loose, inject structurally riskier so tighter).
- Verbatim drop framing rewritten — v3→v4 is configuration evolution,
not regression.
- OPEN: closed "Shape B cap/decay" + the conditional Q15 boost-math
item (Shape B + split threshold addressed both). Replaced with two
finer-grained follow-ups: adjacent-query Q6↔Q7 swap (might be
correct, verify with v4 re-judge metric) and liberal-paraphrase
recovery loss (Q9/Q15 missed because qwen2.5 drifted >0.20).
- RECENT VERIFIED WAVE adds 94fc3b6 + 67d1957.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Reality test section now spans v1/v2/v3 across one table — the
product story (boost-only verbatim → paraphrase gap → Shape B
closes the gap) is legible without reading the reports.
- Verbatim-lift drop v1→v3 (7→2) explicitly framed as
cross-pollination, NOT regression — and filed as v4 re-judge metric
in OPEN.
- "DO NOT RELITIGATE" gains: Shape B is the stance now (don't revert
to boost-only); local_judge stays on qwen2.5 (don't bump to qwen3.5
for cleanliness — vision-SSM cost geometry).
- OPEN list: removed the now-closed paraphrase v2 row + the boost-math
Q15 row (Shape B may have addressed it; flagged for verify after v4).
Added v4 re-judge metric and Shape B injection cap/decay design call.
- RECENT VERIFIED WAVE adds the four new commits past 6c02c90
(2c71d1c, 9ce067b, e9822f0, 154a72e).
- Matrix indexer §5/5 component description now references
InjectPlaybookMisses + the run #002→#003 evidence chain.
- [models] tier registry comment locks the local_judge=qwen2.5 choice
with the rationale inline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the OPEN item from STATE_OF_PLAY. Required because observerd is
now on the prod-realistic data path via the lift harness boot (b2e45f7),
so the next consumer (scrum runner / distillation rebuild / production
workflow) needs the fail-safe rationale locked, not implicit.
The Rust "verdict:accept on crash" anti-pattern doesn't translate
one-to-one to the Go observer (witness, not gate). But four adjacent
fail-safe decisions are real and live:
5.1 Persist failure is logged-not-fatal; ring is in-flight source of
truth. Persist-required mode deferred to a future opt-in ADR.
5.2 Mode failure → Success=false, no panic-swallow path. The runner
catches mode errors and surfaces them via node.Error; downstream
consumers see failures explicitly rather than as fake successes
(the Rust anti-pattern surface).
5.3 One row per node, recorded post-run. A workflow with N nodes
produces N audit rows, never a per-workflow catch-all that
survives partial crashes. Known gap: recording happens after
runner.Run returns (acceptable for short workflows; streaming
callback is the right shape when workflows get longer).
5.4 /observer/event accepts on full ring (oldest evicted). Refusing
to write would translate every burst into client errors — wrong
direction for an audit witness.
Mostly ratifies existing behavior; cross-checked claims against
actual code (caught one error in Decision 5.3 draft — recording is
post-run-batched, not per-node-as-it-completes — and the ADR now
states reality).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 5-loop substrate's load-bearing gate is verified — playbook +
matrix indexer give the results we're looking for. Per the report's
rubric, lift ≥ 50% of discoveries means matrix is doing real work;
7/8 = 87.5% blew through that.
Harness was structurally hiding bugs behind a 5-daemon stripped boot.
Expanding to the full 10-daemon prod stack surfaced 7 fixes in cascade:
1. driver→matrixd: {"query": ...} → {"query_text": ...} field name
2. harness temp toml missing [s3] → wrong default bucket → catalogd
rehydrate 500 on first call
3. harness→queryd SQL probe: {"q": ...} → {"sql": ...} field name
4. expand boot from 5 → 10 daemons in dep-ordered launch
5. add SQL surface probe (3-row CSV ingest → COUNT(*)=3 assertion)
6. candidates corpus was synthetic SWE-tech (Swift/iOS, Scala/Spark) —
wrong domain for staffing queries; replaced with ethereal_workers
(10K rows, real staffing schema, "e-" id prefix to avoid collision
with workers' "w-"). staffing_workers driver gains -index-name +
-id-prefix flags so the same binary serves both corpora
7. local_judge qwen3.5:latest is a vision-SSM 256K-ctx build running
~30s per judge call against the lift loop; reverted to
qwen2.5:latest (~1s/call, 30× faster, held lift theory)
Each contract drift (1, 3) is now locked into a cmd/<bin>/main_test.go
so future drift fires in `go test`, not in a reality run. R-005 closed:
- cmd/matrixd/main_test.go (new) — playbook record drift detector +
score bounds + 6 routes mounted
- cmd/queryd/main_test.go — wrong-field-name drift detector
- cmd/pathwayd/main_test.go (new) — 9 routes + add round-trip + retire
- cmd/observerd/main_test.go (new) — 4 routes + invalid-op + unknown-mode
`go test ./cmd/{matrixd,queryd,pathwayd,observerd}` all green.
Reality test results (reports/reality-tests/playbook_lift_001.{json,md}):
Queries 21 (staffing-domain, 7 categories)
Discoveries 8 (judge ≠ cosine top-1)
Lifts 7/8 (87.5%)
Boosts triggered 9
Mean Δ distance -0.053 (warm closer than cold)
OOD honesty dental/RN/SWE rated 1, no fake matches
Cross-corpus boosts confirmed (e- ↔ w- swaps in lifts)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- SPEC §1 component table: add chatd row marked DONE; replaces
Rust gateway's v1::ollama_cloud / openrouter / opencode adapters
+ the aibridge crate.
- SPEC §3.9 — chatd shipped: 5-provider routing (ollama, ollama_cloud,
openrouter, opencode, kimi) by model-name prefix or :cloud suffix.
Captures the Anthropic 4.7 temperature-deprecation quirk + the
local-Ollama think=false default that the playbook_lift judge
needed. Mentions scrum_review.sh as the reusable cross-lineage
vehicle eating chatd's own /v1/chat.
- SPEC §3.10 — local-review-harness sibling tool: separate repo at
git.agentview.dev/profit/local-review-harness, MVP shipped today.
Documents the cross-pollination plan for when both substrates
stabilize (chatd as the harness's LLM backend; harness findings
as Lakehouse pathway-memory drift signal; .memory/known-risks
as a matrix corpus). Explicit "don't re-port" so future Claudes
don't try to absorb the harness into Lakehouse.
- STATE_OF_PLAY.md: SIBLING TOOLS section with 1-line summary
+ pointer to SPEC §3.10.
No code changes. just verify still PASS — touched only docs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>