golangLAKEHOUSE/STATE_OF_PLAY.md
root 2c71d1c637 ADR-005: observer fail-safe semantics
Closes the OPEN item from STATE_OF_PLAY. Required because observerd is
now on the prod-realistic data path via the lift harness boot (b2e45f7),
so the next consumer (scrum runner / distillation rebuild / production
workflow) needs the fail-safe rationale locked, not implicit.

The Rust "verdict:accept on crash" anti-pattern doesn't translate
one-to-one to the Go observer (witness, not gate). But four adjacent
fail-safe decisions are real and live:

5.1 Persist failure is logged-not-fatal; ring is in-flight source of
    truth. Persist-required mode deferred to a future opt-in ADR.

5.2 Mode failure → Success=false, no panic-swallow path. The runner
    catches mode errors and surfaces them via node.Error; downstream
    consumers see failures explicitly rather than as fake successes
    (the Rust anti-pattern surface).

5.3 One row per node, recorded post-run. A workflow with N nodes
    produces N audit rows, never a per-workflow catch-all that
    survives partial crashes. Known gap: recording happens after
    runner.Run returns (acceptable for short workflows; streaming
    callback is the right shape when workflows get longer).

5.4 /observer/event accepts on full ring (oldest evicted). Refusing
    to write would translate every burst into client errors — wrong
    direction for an audit witness.

Mostly ratifies existing behavior; cross-checked claims against
actual code (caught one error in Decision 5.3 draft — recording is
post-run-batched, not per-node-as-it-completes — and the ADR now
states reality).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:32:12 -05:00

268 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# STATE OF PLAY — Lakehouse-Go
**Last verified:** 2026-04-30 ~05:50 CDT
**Verified by:** live probes + `just verify` PASS + reality test PASS (7/8 lift), not memory.
> **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
---
## VERIFIED WORKING RIGHT NOW
### Substrate (G0 + G1 family)
13 service binaries under `cmd/` plus 2 driver scripts under `scripts/staffing_*` build into `bin/`. **18 smoke scripts all PASS.** `just verify` (vet + 30 packages × short tests + 9 core smokes) green in ~31s wall.
| Binary | Port | What |
|---|---|---|
| `gateway` | 3110 | reverse proxy, single OpenAI-compat-style edge |
| `storaged` | 3211 | S3 GET/PUT/LIST/DELETE w/ per-prefix PUT cap (ADR-002) |
| `catalogd` | 3212 | Parquet manifests, ADR-020 idempotent register |
| `ingestd` | 3213 | CSV → Parquet → catalogd, content-addressed keys |
| `queryd` | 3214 | DuckDB SELECT over Parquet via httpfs |
| `vectord` | 3215 | HNSW indexes (coder/hnsw), persistence to storaged |
| `embedd` | 3216 | Ollama-backed embedder w/ LRU cache |
| `pathwayd` | 3217 | Mem0 ops (Add/Update/Revise/Retire/History/Search) |
| `matrixd` | 3218 | Multi-corpus retrieve+merge + relevance + downgrade + playbook |
| `observerd` | 3219 | Witness loop, workflow runner with DAG executor |
| `chatd` | 3220 | LLM dispatcher: ollama / ollama_cloud / openrouter / opencode / kimi |
| `mcpd` | — | MCP SDK port (Bun mcp-server replacement) |
| `fake_ollama` | — | Test fixture (used by `g2_smoke_fixtures.sh`) |
### Matrix indexer — all 5 SPEC §3.4 components shipped
1. **Corpus builders** (`internal/corpusingest`)
2. **Multi-corpus retrieve+merge** (`matrixd /matrix/search`)
3. **Relevance filter** (`internal/matrix/relevance.go` 376 LoC + 289 LoC test)
4. **Strong-model downgrade gate** (`internal/matrix/downgrade.go`, reads `cfg.Models.WeakModels` after Phase 2)
5. **Playbook memory + boost** (`internal/matrix/playbook.go`, learning loop)
### Pathway memory (Mem0 substrate)
Full ADR-004 surface shipped. **Cycle-detection + retired-trace exclusion proven by tests:** `TestHistory_CycleDetected`, `TestRetire_ExcludedFromSearch`, `TestRevise_ChainOfThree_BackwardWalk`. JSONL append-only persistence with corruption tolerance.
### Observer + workflow runner
- `observerd` ring buffer + JSONL persistence
- Workflow DAG executor (Archon-style) with 5 native modes wired: `matrix.relevance`, `matrix.downgrade`, `matrix.search`, `distillation.score`, `drift.scorer`. Plus `fixture.echo` / `fixture.upper` for runner mechanics smokes.
### Distillation + drift
- **E (partial)** at `57d0df1` — scorer + contamination firewall ported from Rust v1.0.0 (logic only per ADR-001 §1.4; not bit-identical).
- **F (first slice)** at `be65f85` — drift quantification, scorer drift first.
### chatd — Phase 4 (shipped 2026-04-30, scrum-hardened same day)
Multi-provider LLM dispatcher routing `/v1/chat` by model-name prefix or `:cloud` suffix:
| Prefix / suffix | Provider | Auth |
|---|---|---|
| `ollama/<m>` or bare | `ollama` (local) | none |
| `ollama_cloud/<m>` or `<m>:cloud` | `ollama_cloud` | Bearer (OLLAMA_CLOUD_KEY) |
| `openrouter/<v>/<m>` | `openrouter` | Bearer (OPENROUTER_API_KEY) |
| `opencode/<m>` | `opencode` | Bearer (OPENCODE_API_KEY) |
| `kimi/<m>` | `kimi` | Bearer (KIMI_API_KEY) |
All 5 keys live in `/etc/lakehouse/{ollama_cloud,openrouter,opencode,kimi}.env` files (mode 0600). Empty/missing files leave that provider unregistered (404 at first call instead of 503). Test request: `POST /v1/chat {"model":"opencode/claude-opus-4-7","messages":[{"role":"user","content":"hi"}],"max_tokens":8}`.
`Request.Temperature` is `*float64` (pointer) — Anthropic 4.7 deprecates `temperature` entirely, so we omit the field when caller doesn't set it.
### Model tier registry
`lakehouse.toml [models]` names model IDs by tier so swaps are 1-line:
```toml
local_fast = "qwen3.5:latest"
local_judge = "qwen3.5:latest"
cloud_judge = "kimi-k2.6:cloud"
cloud_review = "qwen3-coder:480b"
frontier_review = "openrouter/anthropic/claude-opus-4-7"
frontier_arch = "openrouter/moonshotai/kimi-k2-0905"
frontier_free = "opencode/claude-opus-4-7"
weak_models = ["qwen3.5:latest", "qwen3:latest"] # matrix.downgrade bypass
```
Callers read `cfg.Models.LocalJudge` etc. instead of literal strings. `playbook_lift` harness, `matrix.downgrade`, and observerd's `MatrixDowngradeWithWeakList` factory all migrated.
### Code health
- `go vet ./...`**0 warnings, 0 errors**
- `go test -short ./...`**all green**, 349 test functions
- `just verify` → PASS (vet + tests + 9 smokes) in ~31s
- 18 smoke scripts (9 core gating verify + 9 domain smokes for new daemons)
### Latest scrum: 2026-04-30 cross-lineage wave
Composite **50/60** at scrum2 head `c7e3124` (was 35 baseline → 43 R1 → 50 R2). Today's chatd wave reviewed by Opus + Kimi + Qwen3-coder via the chatd's own `/v1/chat`; **2 BLOCKs + 2 WARNs landed as fixes** (`0efc736`); reusable driver at `scripts/scrum_review.sh`.
### Reality test PASSED — `playbook_lift_001` (2026-04-30 ~05:50 CDT)
The 5-loop substrate's load-bearing gate (per `project_small_model_pipeline_vision.md`: *"the playbook + matrix indexer must give the results we're looking for"*) is verified.
| Metric | Value |
|---|---:|
| Queries | 21 (staffing-domain, 7 categories) |
| Cold-pass discoveries (judge-best ≠ top-1) | 8 |
| **Warm-pass lifts** (recorded playbook → top-1) | **7 / 8 (87.5%)** |
| Boosts triggered | 9 |
| Mean Δ top-1 distance | -0.053 (warm consistently closer) |
| OOD honesty (dental/RN/SWE queries) | rated 1, no fake matches |
| Cross-corpus boosts | confirmed (e- ↔ w- swaps in lifts) |
Evidence: `reports/reality-tests/playbook_lift_001.{json,md}`. Per the report's rubric (lift ≥ 50% = matrix doing real work), 87.5% means we're well past validation.
### Harness expansion (2026-04-30 ~05:30 CDT)
`scripts/playbook_lift.sh` rewritten from a 5-daemon stripped harness to the **full 10-daemon prod-realistic stack** (chatd stays up independently). The 5-daemon version was structurally hiding bugs; expanding the daemon set surfaced 7 distinct fixes:
| # | Fix | Lock |
|---|---|---|
| 1 | driver→matrixd: `query``query_text` field name | `cmd/matrixd/main_test.go` TestPlaybookRecord_OldFieldNameRejected |
| 2 | harness toml missing `[s3]` block | inline comment in `scripts/playbook_lift.sh` |
| 3 | harness→queryd: `q``sql` field name | `cmd/queryd/main_test.go` TestHandleSQL_WrongFieldName_400 |
| 4 | 5→10 daemon boot order | inline comment + dep-ordered launch |
| 5 | SQL surface probe (3-row CSV → COUNT=3) | `[lift] ✓ SQL surface probe passed` assertion |
| 6 | `candidates` corpus was SWE-tech, not staffing | swapped to `ethereal_workers.parquet` (10K rows, real staffing schema, "e-" id prefix) |
| 7 | `qwen3.5:latest` is vision-SSM 256K-ctx → 30s/judge | reverted `local_judge` to `qwen2.5:latest` (1s/judge, 30× faster) |
### R-005 closed (2026-04-30 ~05:35 CDT)
Four new `cmd/<bin>/main_test.go` files — chi router-level contract tests:
- `cmd/matrixd/main_test.go` (123 lines) — playbook record drift detector + score bounds + 6 routes mounted
- `cmd/queryd/main_test.go` (extended) — wrong-field-name drift detector
- `cmd/pathwayd/main_test.go` (102 lines) — 9 routes + add round-trip + retire-nonexistent
- `cmd/observerd/main_test.go` (98 lines) — 4 routes + invalid-op 400 + unknown-mode 400
`go test ./cmd/{matrixd,queryd,pathwayd,observerd}` all green. R-005 from prior STATE OPEN list is closed.
---
## DO NOT RELITIGATE
### Ratified ADRs (`docs/DECISIONS.md`)
- **ADR-001**: DuckDB via cgo, HTMX UI, Gitea hosting, distillation rebuilt-not-ported, pathway memory clean start, auditor longitudinal signal restarts. **6 sub-decisions, all final.**
- **ADR-002**: storaged per-prefix PUT cap (4 GiB for `_vectors/`, 256 MiB elsewhere) — implemented at `423a381`. Operator-config bump rather than constant change is the documented path if 4 GiB ever insufficient.
- **ADR-003**: Inter-service auth = Bearer + IP allowlist, opt-in via `cfg.Auth.Token`. Wiring deferred to Sprint 1 but **the design is locked** — alternatives (mTLS, JWT, OAuth2, IP-only) all considered + rejected.
- **ADR-004**: Pathway memory = Mem0 versioned traces, JSONL append-only persistence, opaque `json.RawMessage` content. Implemented in `internal/pathway/`.
### Today's scrum dispositions (2026-04-30)
Verbatim verdicts at `reports/scrum/_evidence/2026-04-30/verdicts/`. Disposition table: `reports/scrum/_evidence/2026-04-30/disposition.md`.
**Real findings, all fixed in `0efc736`:**
- B-1 (Opus+Kimi convergent): `ResolveKey` 3-arg API → 2-arg
- B-2 (Opus+Kimi convergent): `handleProviders` direct map lookup, drop synthesis-via-Resolve
- B-3 (Opus single, trace-verified): `OllamaCloud.Chat` strips `ollama_cloud/` prefix correctly
- B-4 (Opus single): Ollama `done_reason` surfaced to FinishReason
**False positives dismissed (3, documented):**
- FP-A1: Kimi misread `TestMaybeDowngrade_WithConfigList` assertion
- FP-A2: Qwen claimed nil-deref in `MaybeDowngrade` that doesn't exist
- FP-C1: Opus claimed `qwen3.5:latest` doesn't exist on Ollama hub (it does on this box's local install)
### Session frame (don't redo)
- The Rust legacy is **maintenance-only** until Go reaches feature parity. Don't propose ports of components already shipped here.
- The matrix indexer **5/5 components** are shipped. Don't propose to "build the matrix indexer" — it's done.
- `qwen3.5:latest` IS available locally on this box. Opus's hub-only knowledge is a known-stale signal; the chatd_smoke uses it daily.
- `temperature` is **omitted** for Anthropic 4.7 (handled by `Request.Temperature *float64`); don't re-add it.
- chatd-smoke runs with **all cloud providers disabled** intentionally so the suite doesn't depend on API keys; that's why it can't catch B-3-class bugs (those need a fake-server fixture, see Sprint 0 follow-up).
---
## OPEN — what's not done yet
| Item | What | When to act |
|---|---|---|
| **Reality test v2: paraphrase queries** | The 21 verbatim queries in `tests/reality/playbook_lift_queries.txt` exercise verbatim replay only. The interesting case is *similar but not identical* queries hitting a recorded playbook — does the cosine on `query_text` find the playbook hit? Add a paraphrase pass and measure. | After J wants to push the harness past v1 baseline. |
| **Q15 boost-math edge case** | "Engaged warehouse associate with strong safety compliance" — judge picked rank-9 result; score=1.0 boost halves distance but rank-9 was >2× top-1 distance, so not promoted. Documented in caveat #2. Either (a) accept the math limit, or (b) tier scores so judge-best-found-deep gets score>1.0. Open design call. | When a second reality run shows the same edge case persisting. |
| **Sprint 4 — deployment** | No `REPLICATION.md`, `secrets-go.toml.example`, `deploy/systemd/<bin>.service`, `Dockerfile`. Largest open Sprint. Required input for any G5 cutover plan. | When G5 cutover is on the table. |
| **ADR-006 — auth posture for non-loopback deploy** | Locks R-001 + R-007 from "opt-in middleware exists" to "wired-by-default for X, opt-in for Y." Doc-only, ~1 hr. | Required before any Go binary binds non-loopback in prod. |
| **chatd fixture-mode storage half** | `g2_smoke_fixtures.sh` closed embed half via fake_ollama; storage half (mock S3) still deferred. Closes R-006 fully. | When CI box without MinIO is needed. |
| **Distillation full port** | `57d0df1` shipped scorer + contamination firewall (E partial); SFT export pipeline + audit_baselines lineage not yet ported. | When distillation is needed for production. |
| **Drift full quantification** | `be65f85` is "scorer drift first." Full distribution-drift signal underspecified everywhere — research gap, not a port. | Open research item. |
---
## RECENT VERIFIED WAVE (2026-04-30)
`05273ac..e4ee002` — 4 phases + scrum + tooling, all gate-tested.
| SHA | What |
|---|---|
| `ec1d031` | Phase 1: `[models]` tier config (additive, no callers migrate) |
| `622e124` | Phase 2: `matrix.downgrade` reads `cfg.Models.WeakModels` |
| `848cbf5` | Phase 3: `playbook_lift` harness defaults from config |
| `05273ac` | Phase 4: chatd + 5 providers (1,624 LoC) |
| `0efc736` | Scrum: 4 fixes (B-1..B-4) + 2 INFOs from cross-lineage review |
| `e4ee002` | `scripts/scrum_review.sh` — reusable 3-lineage driver |
| `b2e45f7` | playbook_lift harness expansion + reality test #001 (7/8 lift, 87.5%) |
| `6c02c90` | scrum lift_001: 4 fixes (sleep→polling SQL probe, JUDGE_SOURCE template, -id-prefix validation, chi.Router cast) |
| (next) | ADR-005: observer fail-safe semantics (this commit) |
Plus on Rust side (`8de94eb`, `3d06868`): qwen2.5 → qwen3.5:latest backport in active defaults; distillation acceptance reports regenerated (run_hash refresh, reproducibility property still holds).
---
## RUNTIME CHEATSHEET
```bash
# Verify everything green
cd /home/profit/golangLAKEHOUSE
just verify # vet + tests + 9 core smokes (~31s)
just doctor # dep probe (go/gcc/minio/ollama/secrets)
# Boot the chat dispatcher (Phase 4)
nohup ./bin/chatd -config lakehouse.toml > /tmp/chatd.log 2>&1 & disown
nohup ./bin/gateway -config lakehouse.toml > /tmp/gateway.log 2>&1 & disown
curl -sf http://127.0.0.1:3110/v1/chat/providers | jq # all 5 providers should report true
# Test a chat call to each lineage
for m in "qwen3.5:latest" "opencode/claude-opus-4-7" "openrouter/moonshotai/kimi-k2-0905"; do
curl -sS -X POST http://127.0.0.1:3110/v1/chat \
-H 'Content-Type: application/json' \
-d "{\"model\":\"$m\",\"messages\":[{\"role\":\"user\",\"content\":\"reply: OK\"}],\"max_tokens\":8}" \
| jq -c '{model,provider,content}'
done
# Run the scrum on a diff
./scripts/scrum_review.sh path/to/bundle.diff bundle_label
ls reports/scrum/_evidence/$(date +%Y-%m-%d)/verdicts/
# Domain smokes (not in `just verify`)
for s in chatd matrix observer pathway playbook relevance downgrade workflow; do
bash scripts/${s}_smoke.sh > /tmp/${s}.log 2>&1 && echo "$s ✓" || echo "$s ✗"
done
```
---
## VISION — what we're actually building
J's framing (canonical at `/root/.claude/projects/-home-profit/memory/project_small_model_pipeline_vision.md`): a small-model-driven autonomous pipeline that gets better with each run. Frontier APIs (Opus, Kimi, GPT-5) are too expensive + rate-limited for the inner loop — they live in audit/oversight via `frontier_*` tier. The hot path runs on local `qwen3.5:latest` given:
1. **Pathway memory** — what we tried before, how it went (Mem0 substrate ✓)
2. **Matrix indexer** — multi-corpus retrieve+merge giving the small model the right slice for this task (5/5 components ✓)
3. **Observer** — watches each run, refines configs (not prompts) toward good pathways
Successful runs get **rated and distilled back into the playbook**. Each iteration the playbook gets denser, runs get cheaper, results get better. **Drift** in the distilled playbook is a measured signal, not vibes.
**The single load-bearing gate:** *"the playbook + matrix indexer must give the results we're looking for."* Throughput, scaling, code elegance are all secondary. The `playbook_lift` reality test is the regression gate before Enterprise cutover (where real contracts + live profile updates land).
When evaluating any Go workstream, ask: which of the 5 loops does this advance? Strong workstreams advance ≥1; weak workstreams sit in infra-for-its-own-sake.
---
## SIBLING TOOLS (separate repos, intentional integration target later)
**`local-review-harness`** at `git.agentview.dev/profit/local-review-harness` (also SMB-mounted at `/home/profit/share/local-review-harness-full-md/`). Local-first code review harness — 12 evidence-bearing static analyzers, Scrum-style reports, no cloud deps. Phase A + B (MVP) shipped 2026-04-30. Phases CE (Ollama LLM review, validation, memory) pending.
**Cross-pollination plan when both stabilize:**
- Replace harness's `internal/llm/ollama.go` with a chatd `/v1/chat` client → frontier judges via config toggle
- Feed harness findings into Lakehouse pathway memory as a drift signal
- Treat harness's `.memory/known-risks.json` as a matrix-indexer corpus
Detail at `docs/SPEC.md` §3.10. Don't re-port harness functionality into Lakehouse-Go — the standalone tool is the design.