Closes the OPEN item from STATE_OF_PLAY. Required because observerd is
now on the prod-realistic data path via the lift harness boot (b2e45f7),
so the next consumer (scrum runner / distillation rebuild / production
workflow) needs the fail-safe rationale locked, not implicit.
The Rust "verdict:accept on crash" anti-pattern doesn't translate
one-to-one to the Go observer (witness, not gate). But four adjacent
fail-safe decisions are real and live:
5.1 Persist failure is logged-not-fatal; ring is in-flight source of
truth. Persist-required mode deferred to a future opt-in ADR.
5.2 Mode failure → Success=false, no panic-swallow path. The runner
catches mode errors and surfaces them via node.Error; downstream
consumers see failures explicitly rather than as fake successes
(the Rust anti-pattern surface).
5.3 One row per node, recorded post-run. A workflow with N nodes
produces N audit rows, never a per-workflow catch-all that
survives partial crashes. Known gap: recording happens after
runner.Run returns (acceptable for short workflows; streaming
callback is the right shape when workflows get longer).
5.4 /observer/event accepts on full ring (oldest evicted). Refusing
to write would translate every burst into client errors — wrong
direction for an audit witness.
Mostly ratifies existing behavior; cross-checked claims against
actual code (caught one error in Decision 5.3 draft — recording is
post-run-batched, not per-node-as-it-completes — and the ADR now
states reality).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
268 lines
16 KiB
Markdown
268 lines
16 KiB
Markdown
# STATE OF PLAY — Lakehouse-Go
|
||
|
||
**Last verified:** 2026-04-30 ~05:50 CDT
|
||
**Verified by:** live probes + `just verify` PASS + reality test PASS (7/8 lift), not memory.
|
||
|
||
> **Read this FIRST.** When the user says "we're working on lakehouse," default to the Go rewrite (this repo); the Rust legacy at `/home/profit/lakehouse/` is maintenance-only. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
|
||
|
||
---
|
||
|
||
## VERIFIED WORKING RIGHT NOW
|
||
|
||
### Substrate (G0 + G1 family)
|
||
|
||
13 service binaries under `cmd/` plus 2 driver scripts under `scripts/staffing_*` build into `bin/`. **18 smoke scripts all PASS.** `just verify` (vet + 30 packages × short tests + 9 core smokes) green in ~31s wall.
|
||
|
||
| Binary | Port | What |
|
||
|---|---|---|
|
||
| `gateway` | 3110 | reverse proxy, single OpenAI-compat-style edge |
|
||
| `storaged` | 3211 | S3 GET/PUT/LIST/DELETE w/ per-prefix PUT cap (ADR-002) |
|
||
| `catalogd` | 3212 | Parquet manifests, ADR-020 idempotent register |
|
||
| `ingestd` | 3213 | CSV → Parquet → catalogd, content-addressed keys |
|
||
| `queryd` | 3214 | DuckDB SELECT over Parquet via httpfs |
|
||
| `vectord` | 3215 | HNSW indexes (coder/hnsw), persistence to storaged |
|
||
| `embedd` | 3216 | Ollama-backed embedder w/ LRU cache |
|
||
| `pathwayd` | 3217 | Mem0 ops (Add/Update/Revise/Retire/History/Search) |
|
||
| `matrixd` | 3218 | Multi-corpus retrieve+merge + relevance + downgrade + playbook |
|
||
| `observerd` | 3219 | Witness loop, workflow runner with DAG executor |
|
||
| `chatd` | 3220 | LLM dispatcher: ollama / ollama_cloud / openrouter / opencode / kimi |
|
||
| `mcpd` | — | MCP SDK port (Bun mcp-server replacement) |
|
||
| `fake_ollama` | — | Test fixture (used by `g2_smoke_fixtures.sh`) |
|
||
|
||
### Matrix indexer — all 5 SPEC §3.4 components shipped
|
||
|
||
1. **Corpus builders** (`internal/corpusingest`)
|
||
2. **Multi-corpus retrieve+merge** (`matrixd /matrix/search`)
|
||
3. **Relevance filter** (`internal/matrix/relevance.go` 376 LoC + 289 LoC test)
|
||
4. **Strong-model downgrade gate** (`internal/matrix/downgrade.go`, reads `cfg.Models.WeakModels` after Phase 2)
|
||
5. **Playbook memory + boost** (`internal/matrix/playbook.go`, learning loop)
|
||
|
||
### Pathway memory (Mem0 substrate)
|
||
|
||
Full ADR-004 surface shipped. **Cycle-detection + retired-trace exclusion proven by tests:** `TestHistory_CycleDetected`, `TestRetire_ExcludedFromSearch`, `TestRevise_ChainOfThree_BackwardWalk`. JSONL append-only persistence with corruption tolerance.
|
||
|
||
### Observer + workflow runner
|
||
|
||
- `observerd` ring buffer + JSONL persistence
|
||
- Workflow DAG executor (Archon-style) with 5 native modes wired: `matrix.relevance`, `matrix.downgrade`, `matrix.search`, `distillation.score`, `drift.scorer`. Plus `fixture.echo` / `fixture.upper` for runner mechanics smokes.
|
||
|
||
### Distillation + drift
|
||
|
||
- **E (partial)** at `57d0df1` — scorer + contamination firewall ported from Rust v1.0.0 (logic only per ADR-001 §1.4; not bit-identical).
|
||
- **F (first slice)** at `be65f85` — drift quantification, scorer drift first.
|
||
|
||
### chatd — Phase 4 (shipped 2026-04-30, scrum-hardened same day)
|
||
|
||
Multi-provider LLM dispatcher routing `/v1/chat` by model-name prefix or `:cloud` suffix:
|
||
|
||
| Prefix / suffix | Provider | Auth |
|
||
|---|---|---|
|
||
| `ollama/<m>` or bare | `ollama` (local) | none |
|
||
| `ollama_cloud/<m>` or `<m>:cloud` | `ollama_cloud` | Bearer (OLLAMA_CLOUD_KEY) |
|
||
| `openrouter/<v>/<m>` | `openrouter` | Bearer (OPENROUTER_API_KEY) |
|
||
| `opencode/<m>` | `opencode` | Bearer (OPENCODE_API_KEY) |
|
||
| `kimi/<m>` | `kimi` | Bearer (KIMI_API_KEY) |
|
||
|
||
All 5 keys live in `/etc/lakehouse/{ollama_cloud,openrouter,opencode,kimi}.env` files (mode 0600). Empty/missing files leave that provider unregistered (404 at first call instead of 503). Test request: `POST /v1/chat {"model":"opencode/claude-opus-4-7","messages":[{"role":"user","content":"hi"}],"max_tokens":8}`.
|
||
|
||
`Request.Temperature` is `*float64` (pointer) — Anthropic 4.7 deprecates `temperature` entirely, so we omit the field when caller doesn't set it.
|
||
|
||
### Model tier registry
|
||
|
||
`lakehouse.toml [models]` names model IDs by tier so swaps are 1-line:
|
||
|
||
```toml
|
||
local_fast = "qwen3.5:latest"
|
||
local_judge = "qwen3.5:latest"
|
||
cloud_judge = "kimi-k2.6:cloud"
|
||
cloud_review = "qwen3-coder:480b"
|
||
frontier_review = "openrouter/anthropic/claude-opus-4-7"
|
||
frontier_arch = "openrouter/moonshotai/kimi-k2-0905"
|
||
frontier_free = "opencode/claude-opus-4-7"
|
||
weak_models = ["qwen3.5:latest", "qwen3:latest"] # matrix.downgrade bypass
|
||
```
|
||
|
||
Callers read `cfg.Models.LocalJudge` etc. instead of literal strings. `playbook_lift` harness, `matrix.downgrade`, and observerd's `MatrixDowngradeWithWeakList` factory all migrated.
|
||
|
||
### Code health
|
||
|
||
- `go vet ./...` → **0 warnings, 0 errors**
|
||
- `go test -short ./...` → **all green**, 349 test functions
|
||
- `just verify` → PASS (vet + tests + 9 smokes) in ~31s
|
||
- 18 smoke scripts (9 core gating verify + 9 domain smokes for new daemons)
|
||
|
||
### Latest scrum: 2026-04-30 cross-lineage wave
|
||
|
||
Composite **50/60** at scrum2 head `c7e3124` (was 35 baseline → 43 R1 → 50 R2). Today's chatd wave reviewed by Opus + Kimi + Qwen3-coder via the chatd's own `/v1/chat`; **2 BLOCKs + 2 WARNs landed as fixes** (`0efc736`); reusable driver at `scripts/scrum_review.sh`.
|
||
|
||
### Reality test PASSED — `playbook_lift_001` (2026-04-30 ~05:50 CDT)
|
||
|
||
The 5-loop substrate's load-bearing gate (per `project_small_model_pipeline_vision.md`: *"the playbook + matrix indexer must give the results we're looking for"*) is verified.
|
||
|
||
| Metric | Value |
|
||
|---|---:|
|
||
| Queries | 21 (staffing-domain, 7 categories) |
|
||
| Cold-pass discoveries (judge-best ≠ top-1) | 8 |
|
||
| **Warm-pass lifts** (recorded playbook → top-1) | **7 / 8 (87.5%)** |
|
||
| Boosts triggered | 9 |
|
||
| Mean Δ top-1 distance | -0.053 (warm consistently closer) |
|
||
| OOD honesty (dental/RN/SWE queries) | rated 1, no fake matches |
|
||
| Cross-corpus boosts | confirmed (e- ↔ w- swaps in lifts) |
|
||
|
||
Evidence: `reports/reality-tests/playbook_lift_001.{json,md}`. Per the report's rubric (lift ≥ 50% = matrix doing real work), 87.5% means we're well past validation.
|
||
|
||
### Harness expansion (2026-04-30 ~05:30 CDT)
|
||
|
||
`scripts/playbook_lift.sh` rewritten from a 5-daemon stripped harness to the **full 10-daemon prod-realistic stack** (chatd stays up independently). The 5-daemon version was structurally hiding bugs; expanding the daemon set surfaced 7 distinct fixes:
|
||
|
||
| # | Fix | Lock |
|
||
|---|---|---|
|
||
| 1 | driver→matrixd: `query` → `query_text` field name | `cmd/matrixd/main_test.go` TestPlaybookRecord_OldFieldNameRejected |
|
||
| 2 | harness toml missing `[s3]` block | inline comment in `scripts/playbook_lift.sh` |
|
||
| 3 | harness→queryd: `q` → `sql` field name | `cmd/queryd/main_test.go` TestHandleSQL_WrongFieldName_400 |
|
||
| 4 | 5→10 daemon boot order | inline comment + dep-ordered launch |
|
||
| 5 | SQL surface probe (3-row CSV → COUNT=3) | `[lift] ✓ SQL surface probe passed` assertion |
|
||
| 6 | `candidates` corpus was SWE-tech, not staffing | swapped to `ethereal_workers.parquet` (10K rows, real staffing schema, "e-" id prefix) |
|
||
| 7 | `qwen3.5:latest` is vision-SSM 256K-ctx → 30s/judge | reverted `local_judge` to `qwen2.5:latest` (1s/judge, 30× faster) |
|
||
|
||
### R-005 closed (2026-04-30 ~05:35 CDT)
|
||
|
||
Four new `cmd/<bin>/main_test.go` files — chi router-level contract tests:
|
||
|
||
- `cmd/matrixd/main_test.go` (123 lines) — playbook record drift detector + score bounds + 6 routes mounted
|
||
- `cmd/queryd/main_test.go` (extended) — wrong-field-name drift detector
|
||
- `cmd/pathwayd/main_test.go` (102 lines) — 9 routes + add round-trip + retire-nonexistent
|
||
- `cmd/observerd/main_test.go` (98 lines) — 4 routes + invalid-op 400 + unknown-mode 400
|
||
|
||
`go test ./cmd/{matrixd,queryd,pathwayd,observerd}` all green. R-005 from prior STATE OPEN list is closed.
|
||
|
||
---
|
||
|
||
## DO NOT RELITIGATE
|
||
|
||
### Ratified ADRs (`docs/DECISIONS.md`)
|
||
|
||
- **ADR-001**: DuckDB via cgo, HTMX UI, Gitea hosting, distillation rebuilt-not-ported, pathway memory clean start, auditor longitudinal signal restarts. **6 sub-decisions, all final.**
|
||
- **ADR-002**: storaged per-prefix PUT cap (4 GiB for `_vectors/`, 256 MiB elsewhere) — implemented at `423a381`. Operator-config bump rather than constant change is the documented path if 4 GiB ever insufficient.
|
||
- **ADR-003**: Inter-service auth = Bearer + IP allowlist, opt-in via `cfg.Auth.Token`. Wiring deferred to Sprint 1 but **the design is locked** — alternatives (mTLS, JWT, OAuth2, IP-only) all considered + rejected.
|
||
- **ADR-004**: Pathway memory = Mem0 versioned traces, JSONL append-only persistence, opaque `json.RawMessage` content. Implemented in `internal/pathway/`.
|
||
|
||
### Today's scrum dispositions (2026-04-30)
|
||
|
||
Verbatim verdicts at `reports/scrum/_evidence/2026-04-30/verdicts/`. Disposition table: `reports/scrum/_evidence/2026-04-30/disposition.md`.
|
||
|
||
**Real findings, all fixed in `0efc736`:**
|
||
- B-1 (Opus+Kimi convergent): `ResolveKey` 3-arg API → 2-arg
|
||
- B-2 (Opus+Kimi convergent): `handleProviders` direct map lookup, drop synthesis-via-Resolve
|
||
- B-3 (Opus single, trace-verified): `OllamaCloud.Chat` strips `ollama_cloud/` prefix correctly
|
||
- B-4 (Opus single): Ollama `done_reason` surfaced to FinishReason
|
||
|
||
**False positives dismissed (3, documented):**
|
||
- FP-A1: Kimi misread `TestMaybeDowngrade_WithConfigList` assertion
|
||
- FP-A2: Qwen claimed nil-deref in `MaybeDowngrade` that doesn't exist
|
||
- FP-C1: Opus claimed `qwen3.5:latest` doesn't exist on Ollama hub (it does on this box's local install)
|
||
|
||
### Session frame (don't redo)
|
||
|
||
- The Rust legacy is **maintenance-only** until Go reaches feature parity. Don't propose ports of components already shipped here.
|
||
- The matrix indexer **5/5 components** are shipped. Don't propose to "build the matrix indexer" — it's done.
|
||
- `qwen3.5:latest` IS available locally on this box. Opus's hub-only knowledge is a known-stale signal; the chatd_smoke uses it daily.
|
||
- `temperature` is **omitted** for Anthropic 4.7 (handled by `Request.Temperature *float64`); don't re-add it.
|
||
- chatd-smoke runs with **all cloud providers disabled** intentionally so the suite doesn't depend on API keys; that's why it can't catch B-3-class bugs (those need a fake-server fixture, see Sprint 0 follow-up).
|
||
|
||
---
|
||
|
||
## OPEN — what's not done yet
|
||
|
||
| Item | What | When to act |
|
||
|---|---|---|
|
||
| **Reality test v2: paraphrase queries** | The 21 verbatim queries in `tests/reality/playbook_lift_queries.txt` exercise verbatim replay only. The interesting case is *similar but not identical* queries hitting a recorded playbook — does the cosine on `query_text` find the playbook hit? Add a paraphrase pass and measure. | After J wants to push the harness past v1 baseline. |
|
||
| **Q15 boost-math edge case** | "Engaged warehouse associate with strong safety compliance" — judge picked rank-9 result; score=1.0 boost halves distance but rank-9 was >2× top-1 distance, so not promoted. Documented in caveat #2. Either (a) accept the math limit, or (b) tier scores so judge-best-found-deep gets score>1.0. Open design call. | When a second reality run shows the same edge case persisting. |
|
||
| **Sprint 4 — deployment** | No `REPLICATION.md`, `secrets-go.toml.example`, `deploy/systemd/<bin>.service`, `Dockerfile`. Largest open Sprint. Required input for any G5 cutover plan. | When G5 cutover is on the table. |
|
||
| **ADR-006 — auth posture for non-loopback deploy** | Locks R-001 + R-007 from "opt-in middleware exists" to "wired-by-default for X, opt-in for Y." Doc-only, ~1 hr. | Required before any Go binary binds non-loopback in prod. |
|
||
| **chatd fixture-mode storage half** | `g2_smoke_fixtures.sh` closed embed half via fake_ollama; storage half (mock S3) still deferred. Closes R-006 fully. | When CI box without MinIO is needed. |
|
||
| **Distillation full port** | `57d0df1` shipped scorer + contamination firewall (E partial); SFT export pipeline + audit_baselines lineage not yet ported. | When distillation is needed for production. |
|
||
| **Drift full quantification** | `be65f85` is "scorer drift first." Full distribution-drift signal underspecified everywhere — research gap, not a port. | Open research item. |
|
||
|
||
---
|
||
|
||
## RECENT VERIFIED WAVE (2026-04-30)
|
||
|
||
`05273ac..e4ee002` — 4 phases + scrum + tooling, all gate-tested.
|
||
|
||
| SHA | What |
|
||
|---|---|
|
||
| `ec1d031` | Phase 1: `[models]` tier config (additive, no callers migrate) |
|
||
| `622e124` | Phase 2: `matrix.downgrade` reads `cfg.Models.WeakModels` |
|
||
| `848cbf5` | Phase 3: `playbook_lift` harness defaults from config |
|
||
| `05273ac` | Phase 4: chatd + 5 providers (1,624 LoC) |
|
||
| `0efc736` | Scrum: 4 fixes (B-1..B-4) + 2 INFOs from cross-lineage review |
|
||
| `e4ee002` | `scripts/scrum_review.sh` — reusable 3-lineage driver |
|
||
| `b2e45f7` | playbook_lift harness expansion + reality test #001 (7/8 lift, 87.5%) |
|
||
| `6c02c90` | scrum lift_001: 4 fixes (sleep→polling SQL probe, JUDGE_SOURCE template, -id-prefix validation, chi.Router cast) |
|
||
| (next) | ADR-005: observer fail-safe semantics (this commit) |
|
||
|
||
Plus on Rust side (`8de94eb`, `3d06868`): qwen2.5 → qwen3.5:latest backport in active defaults; distillation acceptance reports regenerated (run_hash refresh, reproducibility property still holds).
|
||
|
||
---
|
||
|
||
## RUNTIME CHEATSHEET
|
||
|
||
```bash
|
||
# Verify everything green
|
||
cd /home/profit/golangLAKEHOUSE
|
||
just verify # vet + tests + 9 core smokes (~31s)
|
||
just doctor # dep probe (go/gcc/minio/ollama/secrets)
|
||
|
||
# Boot the chat dispatcher (Phase 4)
|
||
nohup ./bin/chatd -config lakehouse.toml > /tmp/chatd.log 2>&1 & disown
|
||
nohup ./bin/gateway -config lakehouse.toml > /tmp/gateway.log 2>&1 & disown
|
||
curl -sf http://127.0.0.1:3110/v1/chat/providers | jq # all 5 providers should report true
|
||
|
||
# Test a chat call to each lineage
|
||
for m in "qwen3.5:latest" "opencode/claude-opus-4-7" "openrouter/moonshotai/kimi-k2-0905"; do
|
||
curl -sS -X POST http://127.0.0.1:3110/v1/chat \
|
||
-H 'Content-Type: application/json' \
|
||
-d "{\"model\":\"$m\",\"messages\":[{\"role\":\"user\",\"content\":\"reply: OK\"}],\"max_tokens\":8}" \
|
||
| jq -c '{model,provider,content}'
|
||
done
|
||
|
||
# Run the scrum on a diff
|
||
./scripts/scrum_review.sh path/to/bundle.diff bundle_label
|
||
ls reports/scrum/_evidence/$(date +%Y-%m-%d)/verdicts/
|
||
|
||
# Domain smokes (not in `just verify`)
|
||
for s in chatd matrix observer pathway playbook relevance downgrade workflow; do
|
||
bash scripts/${s}_smoke.sh > /tmp/${s}.log 2>&1 && echo "$s ✓" || echo "$s ✗"
|
||
done
|
||
```
|
||
|
||
---
|
||
|
||
## VISION — what we're actually building
|
||
|
||
J's framing (canonical at `/root/.claude/projects/-home-profit/memory/project_small_model_pipeline_vision.md`): a small-model-driven autonomous pipeline that gets better with each run. Frontier APIs (Opus, Kimi, GPT-5) are too expensive + rate-limited for the inner loop — they live in audit/oversight via `frontier_*` tier. The hot path runs on local `qwen3.5:latest` given:
|
||
|
||
1. **Pathway memory** — what we tried before, how it went (Mem0 substrate ✓)
|
||
2. **Matrix indexer** — multi-corpus retrieve+merge giving the small model the right slice for this task (5/5 components ✓)
|
||
3. **Observer** — watches each run, refines configs (not prompts) toward good pathways
|
||
|
||
Successful runs get **rated and distilled back into the playbook**. Each iteration the playbook gets denser, runs get cheaper, results get better. **Drift** in the distilled playbook is a measured signal, not vibes.
|
||
|
||
**The single load-bearing gate:** *"the playbook + matrix indexer must give the results we're looking for."* Throughput, scaling, code elegance are all secondary. The `playbook_lift` reality test is the regression gate before Enterprise cutover (where real contracts + live profile updates land).
|
||
|
||
When evaluating any Go workstream, ask: which of the 5 loops does this advance? Strong workstreams advance ≥1; weak workstreams sit in infra-for-its-own-sake.
|
||
|
||
---
|
||
|
||
## SIBLING TOOLS (separate repos, intentional integration target later)
|
||
|
||
**`local-review-harness`** at `git.agentview.dev/profit/local-review-harness` (also SMB-mounted at `/home/profit/share/local-review-harness-full-md/`). Local-first code review harness — 12 evidence-bearing static analyzers, Scrum-style reports, no cloud deps. Phase A + B (MVP) shipped 2026-04-30. Phases C–E (Ollama LLM review, validation, memory) pending.
|
||
|
||
**Cross-pollination plan when both stabilize:**
|
||
- Replace harness's `internal/llm/ollama.go` with a chatd `/v1/chat` client → frontier judges via config toggle
|
||
- Feed harness findings into Lakehouse pathway memory as a drift signal
|
||
- Treat harness's `.memory/known-risks.json` as a matrix-indexer corpus
|
||
|
||
Detail at `docs/SPEC.md` §3.10. Don't re-port harness functionality into Lakehouse-Go — the standalone tool is the design.
|