From c5c31b6ca632252706dd59c2335714cb01ee91bb Mon Sep 17 00:00:00 2001 From: root Date: Thu, 30 Apr 2026 00:37:24 -0500 Subject: [PATCH] =?UTF-8?q?docs:=20STATE=5FOF=5FPLAY.md=20=E2=80=94=20Go-s?= =?UTF-8?q?ide=20truth=20anchor=20(mirrors=20Rust=20convention)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the "verified working RIGHT NOW / DO NOT RELITIGATE / OPEN" anchor at the repo root, mirroring /home/profit/lakehouse/STATE_OF_PLAY.md. Memory files (project_golang_lakehouse.md) supplement; this file is the verified-truth pointer. Sections: - VERIFIED WORKING: 13 cmd binaries + 18 smokes + 5 matrix components + Mem0 pathway + observerd + workflow runner + chatd 5-provider dispatcher + model tier registry. just verify PASS in 31s. - DO NOT RELITIGATE: 4 ratified ADRs (DECISIONS.md ADR-001..004) + today's scrum dispositions (B-1..B-4 fixed, FP-A1/A2/C1 dismissed) + session frame items (Rust legacy is maintenance-only, etc.). - OPEN: reality test held on J's queries, 3 daemon main_test.go gap, Sprint 4 deployment, ADR-005 observer fail-safe, ADR-006 auth posture. - RECENT WAVE: 6-commit table 05273ac..e4ee002 documenting today's 4 phases + scrum + tooling. - RUNTIME CHEATSHEET: just verify, chatd boot, /v1/chat/providers probe, scrum_review.sh usage. - VISION: 5-loop substrate gate from project_small_model_pipeline_vision.md. The read-mem skill (in /root/.claude/skills/read-mem/) and project memory file are updated to reference this file as the primary Go anchor. Co-Authored-By: Claude Opus 4.7 (1M context) --- STATE_OF_PLAY.md | 211 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 211 insertions(+) create mode 100644 STATE_OF_PLAY.md diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md new file mode 100644 index 0000000..fd2bb66 --- /dev/null +++ b/STATE_OF_PLAY.md @@ -0,0 +1,211 @@ +# STATE OF PLAY — Lakehouse-Go + +**Last verified:** 2026-04-30 ~01:00 CDT +**Verified by:** live probes + `just verify` PASS, not memory. + +> **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes. + +--- + +## VERIFIED WORKING RIGHT NOW + +### Substrate (G0 + G1 family) + +13 service binaries under `cmd/` plus 2 driver scripts under `scripts/staffing_*` build into `bin/`. **18 smoke scripts all PASS.** `just verify` (vet + 30 packages × short tests + 9 core smokes) green in ~31s wall. + +| Binary | Port | What | +|---|---|---| +| `gateway` | 3110 | reverse proxy, single OpenAI-compat-style edge | +| `storaged` | 3211 | S3 GET/PUT/LIST/DELETE w/ per-prefix PUT cap (ADR-002) | +| `catalogd` | 3212 | Parquet manifests, ADR-020 idempotent register | +| `ingestd` | 3213 | CSV → Parquet → catalogd, content-addressed keys | +| `queryd` | 3214 | DuckDB SELECT over Parquet via httpfs | +| `vectord` | 3215 | HNSW indexes (coder/hnsw), persistence to storaged | +| `embedd` | 3216 | Ollama-backed embedder w/ LRU cache | +| `pathwayd` | 3217 | Mem0 ops (Add/Update/Revise/Retire/History/Search) | +| `matrixd` | 3218 | Multi-corpus retrieve+merge + relevance + downgrade + playbook | +| `observerd` | 3219 | Witness loop, workflow runner with DAG executor | +| `chatd` | 3220 | LLM dispatcher: ollama / ollama_cloud / openrouter / opencode / kimi | +| `mcpd` | — | MCP SDK port (Bun mcp-server replacement) | +| `fake_ollama` | — | Test fixture (used by `g2_smoke_fixtures.sh`) | + +### Matrix indexer — all 5 SPEC §3.4 components shipped + +1. **Corpus builders** (`internal/corpusingest`) +2. **Multi-corpus retrieve+merge** (`matrixd /matrix/search`) +3. **Relevance filter** (`internal/matrix/relevance.go` 376 LoC + 289 LoC test) +4. **Strong-model downgrade gate** (`internal/matrix/downgrade.go`, reads `cfg.Models.WeakModels` after Phase 2) +5. **Playbook memory + boost** (`internal/matrix/playbook.go`, learning loop) + +### Pathway memory (Mem0 substrate) + +Full ADR-004 surface shipped. **Cycle-detection + retired-trace exclusion proven by tests:** `TestHistory_CycleDetected`, `TestRetire_ExcludedFromSearch`, `TestRevise_ChainOfThree_BackwardWalk`. JSONL append-only persistence with corruption tolerance. + +### Observer + workflow runner + +- `observerd` ring buffer + JSONL persistence +- Workflow DAG executor (Archon-style) with 5 native modes wired: `matrix.relevance`, `matrix.downgrade`, `matrix.search`, `distillation.score`, `drift.scorer`. Plus `fixture.echo` / `fixture.upper` for runner mechanics smokes. + +### Distillation + drift + +- **E (partial)** at `57d0df1` — scorer + contamination firewall ported from Rust v1.0.0 (logic only per ADR-001 §1.4; not bit-identical). +- **F (first slice)** at `be65f85` — drift quantification, scorer drift first. + +### chatd — Phase 4 (shipped 2026-04-30, scrum-hardened same day) + +Multi-provider LLM dispatcher routing `/v1/chat` by model-name prefix or `:cloud` suffix: + +| Prefix / suffix | Provider | Auth | +|---|---|---| +| `ollama/` or bare | `ollama` (local) | none | +| `ollama_cloud/` or `:cloud` | `ollama_cloud` | Bearer (OLLAMA_CLOUD_KEY) | +| `openrouter//` | `openrouter` | Bearer (OPENROUTER_API_KEY) | +| `opencode/` | `opencode` | Bearer (OPENCODE_API_KEY) | +| `kimi/` | `kimi` | Bearer (KIMI_API_KEY) | + +All 5 keys live in `/etc/lakehouse/{ollama_cloud,openrouter,opencode,kimi}.env` files (mode 0600). Empty/missing files leave that provider unregistered (404 at first call instead of 503). Test request: `POST /v1/chat {"model":"opencode/claude-opus-4-7","messages":[{"role":"user","content":"hi"}],"max_tokens":8}`. + +`Request.Temperature` is `*float64` (pointer) — Anthropic 4.7 deprecates `temperature` entirely, so we omit the field when caller doesn't set it. + +### Model tier registry + +`lakehouse.toml [models]` names model IDs by tier so swaps are 1-line: + +```toml +local_fast = "qwen3.5:latest" +local_judge = "qwen3.5:latest" +cloud_judge = "kimi-k2.6:cloud" +cloud_review = "qwen3-coder:480b" +frontier_review = "openrouter/anthropic/claude-opus-4-7" +frontier_arch = "openrouter/moonshotai/kimi-k2-0905" +frontier_free = "opencode/claude-opus-4-7" +weak_models = ["qwen3.5:latest", "qwen3:latest"] # matrix.downgrade bypass +``` + +Callers read `cfg.Models.LocalJudge` etc. instead of literal strings. `playbook_lift` harness, `matrix.downgrade`, and observerd's `MatrixDowngradeWithWeakList` factory all migrated. + +### Code health + +- `go vet ./...` → **0 warnings, 0 errors** +- `go test -short ./...` → **all green**, 349 test functions +- `just verify` → PASS (vet + tests + 9 smokes) in ~31s +- 18 smoke scripts (9 core gating verify + 9 domain smokes for new daemons) + +### Latest scrum: 2026-04-30 cross-lineage wave + +Composite **50/60** at scrum2 head `c7e3124` (was 35 baseline → 43 R1 → 50 R2). Today's chatd wave reviewed by Opus + Kimi + Qwen3-coder via the chatd's own `/v1/chat`; **2 BLOCKs + 2 WARNs landed as fixes** (`0efc736`); reusable driver at `scripts/scrum_review.sh`. + +--- + +## DO NOT RELITIGATE + +### Ratified ADRs (`docs/DECISIONS.md`) + +- **ADR-001**: DuckDB via cgo, HTMX UI, Gitea hosting, distillation rebuilt-not-ported, pathway memory clean start, auditor longitudinal signal restarts. **6 sub-decisions, all final.** +- **ADR-002**: storaged per-prefix PUT cap (4 GiB for `_vectors/`, 256 MiB elsewhere) — implemented at `423a381`. Operator-config bump rather than constant change is the documented path if 4 GiB ever insufficient. +- **ADR-003**: Inter-service auth = Bearer + IP allowlist, opt-in via `cfg.Auth.Token`. Wiring deferred to Sprint 1 but **the design is locked** — alternatives (mTLS, JWT, OAuth2, IP-only) all considered + rejected. +- **ADR-004**: Pathway memory = Mem0 versioned traces, JSONL append-only persistence, opaque `json.RawMessage` content. Implemented in `internal/pathway/`. + +### Today's scrum dispositions (2026-04-30) + +Verbatim verdicts at `reports/scrum/_evidence/2026-04-30/verdicts/`. Disposition table: `reports/scrum/_evidence/2026-04-30/disposition.md`. + +**Real findings, all fixed in `0efc736`:** +- B-1 (Opus+Kimi convergent): `ResolveKey` 3-arg API → 2-arg +- B-2 (Opus+Kimi convergent): `handleProviders` direct map lookup, drop synthesis-via-Resolve +- B-3 (Opus single, trace-verified): `OllamaCloud.Chat` strips `ollama_cloud/` prefix correctly +- B-4 (Opus single): Ollama `done_reason` surfaced to FinishReason + +**False positives dismissed (3, documented):** +- FP-A1: Kimi misread `TestMaybeDowngrade_WithConfigList` assertion +- FP-A2: Qwen claimed nil-deref in `MaybeDowngrade` that doesn't exist +- FP-C1: Opus claimed `qwen3.5:latest` doesn't exist on Ollama hub (it does on this box's local install) + +### Session frame (don't redo) + +- The Rust legacy is **maintenance-only** until Go reaches feature parity. Don't propose ports of components already shipped here. +- The matrix indexer **5/5 components** are shipped. Don't propose to "build the matrix indexer" — it's done. +- `qwen3.5:latest` IS available locally on this box. Opus's hub-only knowledge is a known-stale signal; the chatd_smoke uses it daily. +- `temperature` is **omitted** for Anthropic 4.7 (handled by `Request.Temperature *float64`); don't re-add it. +- chatd-smoke runs with **all cloud providers disabled** intentionally so the suite doesn't depend on API keys; that's why it can't catch B-3-class bugs (those need a fake-server fixture, see Sprint 0 follow-up). + +--- + +## OPEN — what's not done yet + +| Item | What | When to act | +|---|---|---| +| **Reality test for the 5-loop substrate** | `playbook_lift_001.json` exists at `reports/reality-tests/` but the harness hasn't been run against real queries yet (J held it). Driver: `scripts/playbook_lift.sh`. Needs J's 20+ staffing queries in `tests/reality/playbook_lift_queries.txt` first (5 placeholders shipped). | When J supplies queries OR explicitly green-lights running with placeholders. | +| **`cmd/{matrixd,observerd,pathwayd}/main_test.go` absent** | 3 new daemons each mount ≥4 routes with no wiring test. Original 6 binaries all closed via `0f79bce`. New gap reopens R-005. | ~1 hr pattern-match against `cmd/storaged/main_test.go`. Cheap. | +| **Sprint 4 — deployment** | No `REPLICATION.md`, `secrets-go.toml.example`, `deploy/systemd/.service`, `Dockerfile`. Largest open Sprint. Required input for any G5 cutover plan. | When G5 cutover is on the table. | +| **ADR-005 — observer fail-safe semantics** | Observer ported but the upstream "verdict:accept on crash" anti-pattern still has no Go-side decision locked. Doc-only, ~30 min. | Before observer is wired into production paths. | +| **ADR-006 — auth posture for non-loopback deploy** | Locks R-001 + R-007 from "opt-in middleware exists" to "wired-by-default for X, opt-in for Y." Doc-only, ~1 hr. | Required before any Go binary binds non-loopback in prod. | +| **chatd fixture-mode storage half** | `g2_smoke_fixtures.sh` closed embed half via fake_ollama; storage half (mock S3) still deferred. Closes R-006 fully. | When CI box without MinIO is needed. | +| **Distillation full port** | `57d0df1` shipped scorer + contamination firewall (E partial); SFT export pipeline + audit_baselines lineage not yet ported. | When distillation is needed for production. | +| **Drift full quantification** | `be65f85` is "scorer drift first." Full distribution-drift signal underspecified everywhere — research gap, not a port. | Open research item. | + +--- + +## RECENT VERIFIED WAVE (2026-04-30) + +`05273ac..e4ee002` — 4 phases + scrum + tooling, all gate-tested. + +| SHA | What | +|---|---| +| `ec1d031` | Phase 1: `[models]` tier config (additive, no callers migrate) | +| `622e124` | Phase 2: `matrix.downgrade` reads `cfg.Models.WeakModels` | +| `848cbf5` | Phase 3: `playbook_lift` harness defaults from config | +| `05273ac` | Phase 4: chatd + 5 providers (1,624 LoC) | +| `0efc736` | Scrum: 4 fixes (B-1..B-4) + 2 INFOs from cross-lineage review | +| `e4ee002` | `scripts/scrum_review.sh` — reusable 3-lineage driver | + +Plus on Rust side (`8de94eb`, `3d06868`): qwen2.5 → qwen3.5:latest backport in active defaults; distillation acceptance reports regenerated (run_hash refresh, reproducibility property still holds). + +--- + +## RUNTIME CHEATSHEET + +```bash +# Verify everything green +cd /home/profit/golangLAKEHOUSE +just verify # vet + tests + 9 core smokes (~31s) +just doctor # dep probe (go/gcc/minio/ollama/secrets) + +# Boot the chat dispatcher (Phase 4) +nohup ./bin/chatd -config lakehouse.toml > /tmp/chatd.log 2>&1 & disown +nohup ./bin/gateway -config lakehouse.toml > /tmp/gateway.log 2>&1 & disown +curl -sf http://127.0.0.1:3110/v1/chat/providers | jq # all 5 providers should report true + +# Test a chat call to each lineage +for m in "qwen3.5:latest" "opencode/claude-opus-4-7" "openrouter/moonshotai/kimi-k2-0905"; do + curl -sS -X POST http://127.0.0.1:3110/v1/chat \ + -H 'Content-Type: application/json' \ + -d "{\"model\":\"$m\",\"messages\":[{\"role\":\"user\",\"content\":\"reply: OK\"}],\"max_tokens\":8}" \ + | jq -c '{model,provider,content}' +done + +# Run the scrum on a diff +./scripts/scrum_review.sh path/to/bundle.diff bundle_label +ls reports/scrum/_evidence/$(date +%Y-%m-%d)/verdicts/ + +# Domain smokes (not in `just verify`) +for s in chatd matrix observer pathway playbook relevance downgrade workflow; do + bash scripts/${s}_smoke.sh > /tmp/${s}.log 2>&1 && echo "$s ✓" || echo "$s ✗" +done +``` + +--- + +## VISION — what we're actually building + +J's framing (canonical at `/root/.claude/projects/-home-profit/memory/project_small_model_pipeline_vision.md`): a small-model-driven autonomous pipeline that gets better with each run. Frontier APIs (Opus, Kimi, GPT-5) are too expensive + rate-limited for the inner loop — they live in audit/oversight via `frontier_*` tier. The hot path runs on local `qwen3.5:latest` given: + +1. **Pathway memory** — what we tried before, how it went (Mem0 substrate ✓) +2. **Matrix indexer** — multi-corpus retrieve+merge giving the small model the right slice for this task (5/5 components ✓) +3. **Observer** — watches each run, refines configs (not prompts) toward good pathways + +Successful runs get **rated and distilled back into the playbook**. Each iteration the playbook gets denser, runs get cheaper, results get better. **Drift** in the distilled playbook is a measured signal, not vibes. + +**The single load-bearing gate:** *"the playbook + matrix indexer must give the results we're looking for."* Throughput, scaling, code elegance are all secondary. The `playbook_lift` reality test is the regression gate before Enterprise cutover (where real contracts + live profile updates land). + +When evaluating any Go workstream, ask: which of the 5 loops does this advance? Strong workstreams advance ≥1; weak workstreams sit in infra-for-its-own-sake.