STATE_OF_PLAY: trim OPEN list — 9 rows → 6, ordered by product leverage

Sprint 4 row removed (shipped: a59ef5b systemd + 54a05d9 docker). ADR-006 row already dropped on the previous STATE update. Two lift-suite tail items (Q6↔Q7 adjacent-query, Q9/Q15 liberal- paraphrase) consolidated into one "judge-gated playbook injection" row — both are downstream of the same fix (let the judge approve before Shape B inserts). Captures the design lineage from multi-coord run #008's judge-rating pattern. Three items folded into a single "operational nice-to-haves" row: real-time clock, chatd fixture storage half, liberal-paraphrase calibration. None are product-blocking; each lights up when someone hits its specific trigger. Reorder reflects leverage on the active product theory (multi- coord staffing co-pilot via the 5-loop substrate), not effort: 1. Judge-gated injection (lift quality + lift-tail closure) 2. Wider Langfuse instrumentation (production observability) 3. Fresh→main merge (operational hygiene as the corpus grows) 4. Distillation full port (production dependency, not yet) 5. Drift quantification (research) 6. Operational nice-to-haves Lead-in note added: "Items move to closed when the work demands them, not on a calendar." Locks intent against future drift toward a sprawling todo list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:32:31 -05:00 · 2026-04-30 19:32:31 -05:00 · 247e36e687
commit 247e36e687
parent 54a05d9311
1 changed files with 9 additions and 10 deletions
--- a/STATE_OF_PLAY.md
+++ b/STATE_OF_PLAY.md
@ -215,17 +215,16 @@ Verbatim verdicts at `reports/scrum/_evidence/2026-04-30/verdicts/`. Disposition

 ## OPEN — what's not done yet

-| Item | What | When to act |
+The list is intentionally short. Items move to closed when the work demands them, not on a calendar. Ordered by leverage on the active product theory (multi-coord staffing co-pilot via the 5-loop substrate), not by effort.
+
+| # | Item | When to act |
 |---|---|---|
-| **Real-time 48-hour clock** | Multi-coord stress fires phases as discrete steps with simulated-hour labels (0/6/12/18/24/30/36/42/48). A real-time variant would space events on actual wall-clock with `time.Sleep`, simulating the rhythm of a coordinator workday. Cosmetic; doesn't change product behavior — but lets the harness mimic the cadence at which inbox events would arrive in production. ~30 min. | When stress test starts being used to capture realistic per-hour throughput numbers. |
-| **Wider Langfuse instrumentation across daemons** | multi_coord_stress traces every external call, but the daemons themselves (matrixd, observerd, chatd) don't yet emit traces from their own request handlers. Adding `internal/langfuse/middleware.go` would auto-emit a span per HTTP request, giving production-traffic visibility for free. | When production traffic starts hitting the Go gateway. |
-| **Periodic fresh→main index merge** | Two-tier `fresh_workers` pattern is verified working but no scheduled job consolidates fresh→main. Fresh corpus grows monotonically; eventually has its own recall issues. A daily cron that ingests the fresh corpus' contents into the main `workers` index + drops fresh would solve it. | When fresh_workers grows past ~500 items. |
-| **Adjacent-query cross-pollination (lift suite Q6↔Q7)** | After lift v4's split threshold, OOD cross-pollination is gone but Q6 / Q7 still swap recordings as warm top-1 because their embeddings are within 0.20 cosine of each other. Multi-coord run #008 inbox-judge re-rating proved the judge can distinguish — gating injection on "judge approves before injecting" closes this. ~1 hr. | When playbook injection quality matters more than retrieval throughput. |
-| **Liberal-paraphrase recovery loss (lift suite Q9, Q15)** | Q9 + Q15 in run #004 lost paraphrase recovery because qwen2.5 rephrased liberally enough to drift past 0.20 inject threshold. Acceptable (system refusing to inject when not confident), but might be tightenable with a different paraphrase prompt or a per-pair `paraphrase_max_drift` measurement. | When real coordinator queries are available for a calibration run. |
-| **Sprint 4 — deployment** | No `REPLICATION.md`, `secrets-go.toml.example`, `deploy/systemd/<bin>.service`, `Dockerfile`. Largest open Sprint. Required input for any G5 cutover plan. | When G5 cutover is on the table. |
-| **chatd fixture-mode storage half** | `g2_smoke_fixtures.sh` closed embed half via fake_ollama; storage half (mock S3) still deferred. Closes R-006 fully. | When CI box without MinIO is needed. |
-| **Distillation full port** | `57d0df1` shipped scorer + contamination firewall (E partial); SFT export pipeline + audit_baselines lineage not yet ported. | When distillation is needed for production. |
-| **Drift full quantification** | `be65f85` is "scorer drift first." Full distribution-drift signal underspecified everywhere — research gap, not a port. | Open research item. |
+| 1 | **Judge-gated playbook injection** — close lift-suite tail issues (Q6↔Q7 swap, Q9/Q15 paraphrase drift) by routing every Shape B injection through the judge before the rank insert lands. Multi-coord run #008 already proved the judge can distinguish tight-but-wrong from tight-and-right; this lifts that pattern into the matrix substrate. ~1.5 hr. | When playbook quality starts mattering more than retrieval throughput. |
+| 2 | **Wider Langfuse instrumentation across daemons** — `internal/langfuse/middleware.go` that auto-emits one span per HTTP request from every daemon's `shared.Run`. Production traffic gets free trace visibility without per-handler wiring. | When production traffic actually starts hitting the gateway. |
+| 3 | **Periodic fresh→main index merge** — two-tier pattern works but `fresh_workers` grows monotonically. A scheduled job that re-ingests the fresh corpus into `workers` (with the v2-moe embedder) + clears fresh closes the loop. | When `fresh_workers` crosses ~500 items in production. |
+| 4 | **Distillation full port** — `57d0df1` shipped scorer + contamination firewall (E partial). SFT export pipeline + audit_baselines lineage still on the Rust side. | When distillation becomes a production dependency. |
+| 5 | **Drift quantification** — `be65f85` is "scorer drift first." Full distribution-drift signal is underspecified everywhere; this is research, not a port. | Open research item; no calendar. |
+| 6 | **Operational nice-to-haves** — real-time wall-clock for the stress harness; chatd fixture-mode storage half (mock S3 for CI without MinIO); liberal-paraphrase calibration once real coordinator queries land. | When any of these block someone. |

 ---