golangLAKEHOUSE

profit/golangLAKEHOUSE

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	6c02c905c8	scrum lift_001: 4 fixes from cross-lineage review Cross-lineage scrum on b2e45f7 produced 1 convergent + 3 single-reviewer findings worth fixing. All apply. 1. (Opus WARN + Qwen INFO convergent) scripts/playbook_lift.sh: replace sleep 2.5 in SQL probe with active polling up to 5s. refresh_every=1s is a lower bound; under load the manifest may not be visible in a fixed sleep, which would 4xx the probe and abort the reality run. 2. (Opus INFO) scripts/playbook_lift.sh: report template glued "env JUDGE_MODEL" + value as "env JUDGE_MODELqwen2.5:latest" with no separator. Replaced two :+/:- substitution chains with a single JUDGE_SOURCE variable computed once at the top of the harness. 3. (Opus INFO) scripts/staffing_workers/main.go: -id-prefix "" silently allowed, defeating the flag's purpose (cross-corpus collision prevent). Now log.Fatal at startup with explicit hint. 4. (Opus WARN) cmd/{pathwayd,observerd}/main_test.go: newTestRouter returned http.Handler then re-cast to chi.Router for chi.Walk. Returning chi.Router directly satisfies http.Handler AND avoids an assertion that would panic if future middleware wraps the router. Dismissed (with rationale): - Kimi INFO hardcoded MinIO endpoint: harness is local-by-design. - Kimi WARN matrixd accepts 502/500: documented; real retriever needs real upstreams the test doesn't spin up. - Qwen INFO queryd string.Contains: brittle but very low risk; restating through typed-error path would couple without adding signal. go test ./cmd/{matrixd,queryd,pathwayd,observerd} all green. Verdicts at reports/scrum/_evidence/2026-04-30/verdicts/lift_001_*.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 06:27:24 -05:00
root	b2e45f7f26	playbook_lift: harness expansion + reality test #001 (7/8 lift, 87.5%) The 5-loop substrate's load-bearing gate is verified — playbook + matrix indexer give the results we're looking for. Per the report's rubric, lift ≥ 50% of discoveries means matrix is doing real work; 7/8 = 87.5% blew through that. Harness was structurally hiding bugs behind a 5-daemon stripped boot. Expanding to the full 10-daemon prod stack surfaced 7 fixes in cascade: 1. driver→matrixd: {"query": ...} → {"query_text": ...} field name 2. harness temp toml missing [s3] → wrong default bucket → catalogd rehydrate 500 on first call 3. harness→queryd SQL probe: {"q": ...} → {"sql": ...} field name 4. expand boot from 5 → 10 daemons in dep-ordered launch 5. add SQL surface probe (3-row CSV ingest → COUNT(*)=3 assertion) 6. candidates corpus was synthetic SWE-tech (Swift/iOS, Scala/Spark) — wrong domain for staffing queries; replaced with ethereal_workers (10K rows, real staffing schema, "e-" id prefix to avoid collision with workers' "w-"). staffing_workers driver gains -index-name + -id-prefix flags so the same binary serves both corpora 7. local_judge qwen3.5:latest is a vision-SSM 256K-ctx build running ~30s per judge call against the lift loop; reverted to qwen2.5:latest (~1s/call, 30× faster, held lift theory) Each contract drift (1, 3) is now locked into a cmd/<bin>/main_test.go so future drift fires in `go test`, not in a reality run. R-005 closed: - cmd/matrixd/main_test.go (new) — playbook record drift detector + score bounds + 6 routes mounted - cmd/queryd/main_test.go — wrong-field-name drift detector - cmd/pathwayd/main_test.go (new) — 9 routes + add round-trip + retire - cmd/observerd/main_test.go (new) — 4 routes + invalid-op + unknown-mode `go test ./cmd/{matrixd,queryd,pathwayd,observerd}` all green. Reality test results (reports/reality-tests/playbook_lift_001.{json,md}): Queries 21 (staffing-domain, 7 categories) Discoveries 8 (judge ≠ cosine top-1) Lifts 7/8 (87.5%) Boosts triggered 9 Mean Δ distance -0.053 (warm closer than cold) OOD honesty dental/RN/SWE rated 1, no fake matches Cross-corpus boosts confirmed (e- ↔ w- swaps in lifts) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 06:22:21 -05:00

Author

SHA1

Message

Date

root

6c02c905c8

scrum lift_001: 4 fixes from cross-lineage review

Cross-lineage scrum on b2e45f7 produced 1 convergent + 3 single-reviewer
findings worth fixing. All apply.

1. (Opus WARN + Qwen INFO convergent) scripts/playbook_lift.sh: replace
   sleep 2.5 in SQL probe with active polling up to 5s. refresh_every=1s
   is a lower bound; under load the manifest may not be visible in a
   fixed sleep, which would 4xx the probe and abort the reality run.

2. (Opus INFO) scripts/playbook_lift.sh: report template glued
   "env JUDGE_MODEL" + value as "env JUDGE_MODELqwen2.5:latest" with no
   separator. Replaced two :+/:- substitution chains with a single
   JUDGE_SOURCE variable computed once at the top of the harness.

3. (Opus INFO) scripts/staffing_workers/main.go: -id-prefix "" silently
   allowed, defeating the flag's purpose (cross-corpus collision prevent).
   Now log.Fatal at startup with explicit hint.

4. (Opus WARN) cmd/{pathwayd,observerd}/main_test.go: newTestRouter
   returned http.Handler then re-cast to chi.Router for chi.Walk.
   Returning chi.Router directly satisfies http.Handler AND avoids an
   assertion that would panic if future middleware wraps the router.

Dismissed (with rationale):
- Kimi INFO hardcoded MinIO endpoint: harness is local-by-design.
- Kimi WARN matrixd accepts 502/500: documented; real retriever needs
  real upstreams the test doesn't spin up.
- Qwen INFO queryd string.Contains: brittle but very low risk; restating
  through typed-error path would couple without adding signal.

go test ./cmd/{matrixd,queryd,pathwayd,observerd} all green.

Verdicts at reports/scrum/_evidence/2026-04-30/verdicts/lift_001_*.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 06:27:24 -05:00

root

b2e45f7f26

playbook_lift: harness expansion + reality test #001 (7/8 lift, 87.5%)

The 5-loop substrate's load-bearing gate is verified — playbook +
matrix indexer give the results we're looking for. Per the report's
rubric, lift ≥ 50% of discoveries means matrix is doing real work;
7/8 = 87.5% blew through that.

Harness was structurally hiding bugs behind a 5-daemon stripped boot.
Expanding to the full 10-daemon prod stack surfaced 7 fixes in cascade:

1. driver→matrixd: {"query": ...} → {"query_text": ...} field name
2. harness temp toml missing [s3] → wrong default bucket → catalogd
   rehydrate 500 on first call
3. harness→queryd SQL probe: {"q": ...} → {"sql": ...} field name
4. expand boot from 5 → 10 daemons in dep-ordered launch
5. add SQL surface probe (3-row CSV ingest → COUNT(*)=3 assertion)
6. candidates corpus was synthetic SWE-tech (Swift/iOS, Scala/Spark) —
   wrong domain for staffing queries; replaced with ethereal_workers
   (10K rows, real staffing schema, "e-" id prefix to avoid collision
   with workers' "w-"). staffing_workers driver gains -index-name +
   -id-prefix flags so the same binary serves both corpora
7. local_judge qwen3.5:latest is a vision-SSM 256K-ctx build running
   ~30s per judge call against the lift loop; reverted to
   qwen2.5:latest (~1s/call, 30× faster, held lift theory)

Each contract drift (1, 3) is now locked into a cmd/<bin>/main_test.go
so future drift fires in `go test`, not in a reality run. R-005 closed:

- cmd/matrixd/main_test.go (new) — playbook record drift detector +
  score bounds + 6 routes mounted
- cmd/queryd/main_test.go — wrong-field-name drift detector
- cmd/pathwayd/main_test.go (new) — 9 routes + add round-trip + retire
- cmd/observerd/main_test.go (new) — 4 routes + invalid-op + unknown-mode

`go test ./cmd/{matrixd,queryd,pathwayd,observerd}` all green.

Reality test results (reports/reality-tests/playbook_lift_001.{json,md}):
  Queries              21 (staffing-domain, 7 categories)
  Discoveries          8 (judge ≠ cosine top-1)
  Lifts                7/8 (87.5%)
  Boosts triggered     9
  Mean Δ distance      -0.053 (warm closer than cold)
  OOD honesty          dental/RN/SWE rated 1, no fake matches
  Cross-corpus boosts  confirmed (e- ↔ w- swaps in lifts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 06:22:21 -05:00

2 Commits