148 Commits

Author SHA1 Message Date
root
22c0b42e96 config: mark unwired ModelsConfig tier fields as scaffolding
Surfaced during today's local-only audit. The ModelsConfig struct
parses 11 tier fields from lakehouse.toml (LocalFast/LocalEmbed/
LocalJudge/LocalReview/CloudJudge/CloudReview/CloudStrong/
FrontierReview/FrontierArch/FrontierStrong/FrontierFree) and
exposes Resolve(tier) → model. As of 2026-05-03, NO code calls
Resolve(). Operators setting these in lakehouse.toml see no runtime
effect — silent config drift.

Not removing the fields (would silently swallow operator's existing
TOML and make it harder to wire later). Adding a clear UNWIRED
warning comment instead.

WeakModels IS live (consumed by internal/workflow/modes.go +
internal/matrix/downgrade.go) — split out with its own comment so
it's not lumped with the dead fields.

If a future commit wires up Resolve() consumers, replace the
UNWIRED comment with the consumer reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 02:54:10 -05:00
root
5d3996b51d STATE_OF_PLAY: Rust is not maintenance-only as of 2026-05-02
Frames the Rust system accurately — it's receiving parity work +
infrastructure (Lance gauntlet, sidecar drop, observability parity),
not just security fixes. Points readers at lakehouse/STATE_OF_PLAY.md
+ docs/ARCHITECTURE_COMPARISON.md for the cross-runtime view.

Also commits today's parity probe report regenerations (5/5 still
32/32 post-Lance work).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:24:14 -05:00
root
916923440a docs(comparison): close Lance backend deferral, reframe as Lance-vs-Parquet+HNSW
Lance went from "deferred until corpus exceeds 5M rows" to verified
production-ready at 10M in a single wave (lakehouse repo commits
7594725 + 5d30b3d). Captures the 4-pack (sanitizer + tests + smoke +
bench) + the root-cause fix (auto-build doc_id btree in migrate handler).

Reframes the strategic question: HNSW at 10M doesn't fit RAM, so the
real choice is Lance vs Parquet+HNSW-with-spilling, deferred until we
have a workload where the Parquet path is the bottleneck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:10:46 -05:00
root
b314ed1c94 parity: /v1/embed cross-runtime probe (5th probe, 8/8 cosine match)
Today's sidecar drop (lakehouse ba928b1) changed Rust's embed
transport from gateway → sidecar → Ollama (2 hops) to gateway →
Ollama directly. Go's embedd has always been direct. A drift here
would mean: same query, different vector → different HNSW top-K →
different staffing recommendations. This probe is the regression
gate for that surface.

Fixtures cover staffing-domain shapes (forklift, welder, OSHA,
dental, CNC) plus stress shapes (unicode "Café résumé  你好",
single char "x", 200-word long fixture).

Match metric: cosine similarity ≥ 0.99999. Byte-equal isn't
expected — Go round-trips through []float32 internally while Rust
stays at Vec<f64>, so JSON serialization introduces small float
drift. What matters operationally is vector direction (HNSW uses
cosine distance), and both runtimes preserve it when calling the
same Ollama with the same model.

Result: **8/8 fixtures match** including the long + unicode cases.
Sidecar drop didn't disturb the embed surface. The probe also
forces both endpoints to use `nomic-embed-text` so the v1-vs-v2-moe
default difference doesn't pollute the comparison.

5th cross-runtime parity probe joining the family:
  - validator_parity (6/6)
  - extract_json_parity (12/12)
  - session_log_parity (4/4)
  - materializer_parity (2/2)
  - embed_parity (8/8) — this commit

Cumulative: 32/32 parity assertions across 5 probes covering
HTTP shape (validator, embed), CLI output (materializer), unit
behavior (extract_json), and persisted shape (session_log).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:28:40 -05:00
root
a21a34b057 docs: close 2 cross-runtime parity gaps + document unified log
Companion to lakehouse 98b6647. Architecture comparison decisions
tracker now captures:

  - Go validatord direct header read (fixes 6847bbc): closes the
    case where Langfuse-off middleware passthrough silently dropped
    forwarded X-Lakehouse-Trace-Id
  - Rust IterateResponse trace_id echo (fixes 98b6647): closes the
    asymmetry where Go's response carried the join key and Rust's
    didn't
  - Unified longitudinal log demonstrated end-to-end: both daemons
    co-writing /tmp/lakehouse-validator/sessions.jsonl, distinct
    daemon tags, one DuckDB query covers both

24/24 parity assertions (validator 6/6, extract_json 12/12,
session_log 4/4, materializer 2/2) hold against live :3100 + :4110.
Both runtimes deployed with today's full stack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:25:21 -05:00
root
6847bbc180 validatord: honor X-Lakehouse-Trace-Id even when Langfuse is off
Surfaced by the 2026-05-02 cross-runtime test: when a caller
forwarded X-Lakehouse-Trace-Id but the langfuse middleware was a
passthrough (no Langfuse env), the header was never read — Go minted
a fallback id, breaking cross-daemon parent-trace linkage.

The middleware only honored the header when its lf client was
non-nil. With LANGFUSE_URL unset on the persistent stack, every
inbound iterate request lost the parent linkage.

Fix: validatord's iterate handler reads the header DIRECTLY (matches
Rust's iterate.rs pattern) before falling through to the ctx value
+ fallback id. Now Go behavior matches Rust regardless of Langfuse
configuration.

Resolution order is:
  1. req.TraceID (caller put it in the JSON body)
  2. X-Lakehouse-Trace-Id header (read directly here)
  3. context value from langfuse middleware (when configured)
  4. fallback to a locally-minted time-ordered hex id

Verified end-to-end:
  curl -H 'X-Lakehouse-Trace-Id: go-cmp-fixed' POST /v1/iterate
  → response.trace_id = "go-cmp-fixed" ✓
  → sessions.jsonl row session_id = "go-cmp-fixed" ✓

Pre-fix (this commit's parent ran from /tmp/val-fresh3 binary):
  same call → trace_id minted as 18abbb5a008061b7-008061e9
  (header silently ignored)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:16:25 -05:00
root
1263720497 validatord: always populate session_id (fallback when Langfuse off)
Surfaced during the 2026-05-02 deploy + reality wave: the persistent
Go stack runs without LANGFUSE_URL/PUBLIC_KEY/SECRET_KEY env, so
shared.langfuseMiddleware operates as a passthrough — never minting
a trace id, never stashing it on the request context. Result:
session_id was empty on every JSONL row, breaking correlation across
the longitudinal log + replay_runs.jsonl + future Langfuse traces.

The fix: validatord falls back to a locally-generated time-ordered
hex id when both the X-Lakehouse-Trace-Id header AND the middleware
context are empty. Same shape Langfuse accepts, so a future deploy
that turns Langfuse on doesn't break correlation — already-emitted
session_ids stay valid as Langfuse trace ids.

Verified post-deploy by driving 9 /v1/iterate sessions through the
persistent stack at :4110:
  - 6 accepted on iter 0 (qwen2.5:latest first-shot 75%)
  - 2 max_iter_exhausted (no_json on prose-y prompts)
  - 1 infra_error (chatd cold-start probe timed out at 5s)

Latest row's session_id: "18abbabdc2306a83-c2306aa9" (was: "")

Probe re-runs (validator_parity, session_log_parity) included as
post-deploy artifacts; both 6/6 + 4/4 with the freshly-restarted
persistent gateway+validatord binaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:03:43 -05:00
root
fa4e1b4e16 parity: session_log probe + Rust observability parity recorded
Companion to lakehouse commit 57bde63 (Rust gateway gains
trace-id propagation + coordinator session JSONL). The
cross-runtime parity probe is the regression gate that prevents
silent schema drift between the two runtimes.

scripts/cutover/parity/session_log_parity.sh:
  - 4 fixtures (accepted_grounded, max_iter_exhausted, infra_error,
    unicode_in_prompt) feed identical input to both helpers
  - jq -e validity gate + non-trivial-equal guard prevents the
    "both sides fail identically → spurious match" failure mode
    (caught one IFS='||' bug during initial authoring — recorded
    in the script comment)
  - normalize() strips timestamp + daemon (legitimate per-producer
    differences); everything else must be byte-equal
  - Result: 4/4 fixtures match, including unicode

scripts/cutover/parity/session_log_helper/main.go:
  - Tiny stdin/stdout Go helper that round-trips a fixture
    through validator.SessionRecord serde
  - Counterpart to crates/gateway/src/bin/parity_session_log.rs

docs/ARCHITECTURE_COMPARISON.md decisions tracker:
  - "Rust observability parity" row added (DONE 2026-05-02)
  - Cross-runtime probe documented as reusable gate

STATE_OF_PLAY refreshed.

Both observability pieces (trace-id propagation, session JSONL)
now exist on both runtimes. Operators who point Rust gateway and
Go validatord at the same session-log path get a unified
longitudinal stream queryable via DuckDB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:39:49 -05:00
root
1a3a82aedb validatord: coordinator session JSONL for offline analysis (B follow-up)
Closes the second half of J's 2026-05-02 multi-call observability
concern. Trace-id propagation (commit d6d2fdf) gave us the *live*
view in Langfuse; this gives us the *longitudinal* view for ad-hoc
DuckDB queries over thousands of sessions:

  "show me every session where the model produced a real candidate
   without ever needing a retry"
  "find sessions where validation rejected three times in a row"
  "first-shot success rate per model — did we feed it enough corpus?"

## What's in

internal/validator/session_log.go:
  - SessionRecord type (schema=session.iterate.v1)
  - SessionLogger writer — mutex-guarded append, best-effort posture,
    nil-safe (NewSessionLogger("") = nil = no-op on Append)
  - BuildSessionRecord helper — assembles a row from any
    iterate response/failure/infra-error combination, callable from
    other daemons that wrap iterate (cross-daemon shared schema)
  - 7 unit tests including concurrent-append safety + the three
    code paths (success / max_iter_exhausted / infra_error)

cmd/validatord/main.go:
  - handlers.sessionLog field + wiring from cfg.Validatord.SessionLogPath
  - Iterate handler: build + append a SessionRecord on every call
  - rosterCheckFor("fill") closure stamps grounded_in_roster — the
    load-bearing forensic property J flagged ("we can never
    hallucinate available staff members to contracts")

internal/shared/config.go + lakehouse.toml:
  - [validatord].session_log_path field; empty = disabled
  - Production: /var/lib/lakehouse/validator/sessions.jsonl

scripts/validatord_smoke.sh:
  - Adds a probe verifying validatord announces session log path on
    startup. Smoke is now 6/6 (was 5/5).

docs/SESSION_LOG.md:
  - Schema reference + 5 worked DuckDB query examples including the
    "alarm" query (sessions where grounded_in_roster=false on an
    accepted fill — should always be empty; if not, something is
    bypassing FillValidator).

## What this is NOT

This is NOT a duplicate of replay_runs.jsonl. They're siblings:
  - replay_runs.jsonl: replay tool's per-task retrieval+model output
  - sessions.jsonl: validatord's per-iterate full retry chain +
    grounded-in-roster verdict

A single coordinator session can produce rows in both streams; the
session_id (= Langfuse trace_id) is the join key.

## Layered observability now in place

  Live view:  Langfuse trace tree (X-Lakehouse-Trace-Id propagation)
              `iterate.attempt[N]` spans with prompt/raw/verdict
  Offline:    coordinator_sessions.jsonl (this commit)
              DuckDB-queryable; longitudinal forensics
  Hard gate:  FillValidator + WorkerLookup (existing)
              phantom IDs structurally rejected, never reach
              session log's grounded_in_roster=true bucket

Per the architecture invariant in STATE_OF_PLAY's DO NOT RELITIGATE
section — these layers are wired; future work targets the data, not
the wiring.

## Verification

- internal/validator: 7 new tests (session_log_test.go) — all PASS
- cmd/validatord: 3 new integration tests covering the success,
  failure, and grounded=false paths — all PASS
- validatord_smoke.sh: 6/6 PASS through gateway :3110
- Full go test ./... green across 33 packages

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:22:09 -05:00
root
d6d2fdf81f trace-id propagation through /v1/iterate (multi-call observability)
Closes J's 2026-05-02 multi-call observability gap: a single
/v1/iterate session with N retries used to surface in Langfuse as
N+1 disconnected traces (one per /v1/chat hop + one for the iterate
request itself), with no parent/child linkage. Operators couldn't
scroll the retry chain in one trace tree to spot where grounding
failed.

## Wire-level change

- New header constant `shared.TraceIDHeader = "X-Lakehouse-Trace-Id"`
- `langfuseMiddleware` honors the header on inbound requests: if
  set, reuses that trace id instead of minting a new one. Stashes
  the trace id on the request context so handlers can attach
  application-level child spans.
- `validatord.chatCaller` forwards the header to chatd. Every chat
  hop in an iterate session lands as a child of the parent trace.

## Application-level spans

- `validator.IterateConfig` gains `Tracer` (optional callback).
  When wired, each iteration attempt emits one Langfuse span
  via `validator.AttemptSpan`:
    Name: iterate.attempt[N]
    Input: { iteration, model, provider, prompt }
    Output: { verdict, raw, error }
    Level: WARNING when verdict != accepted
- `validatord.iterTracer` is the production hook — bridges
  `validator.Tracer` → `langfuse.Client.Span`.
- `IterateRequest`/`IterateResponse`/`IterateFailure` gain
  `TraceID`; each `IterateAttempt` gains `SpanID`. The /v1/iterate
  caller can pivot from the JSON response straight into the
  Langfuse trace tree.

## What an operator sees post-cutover

  GET /v1/iterate {kind=fill, prompt=...} → Trace TR-1
    ├─ http.request span (from middleware)
    ├─ iterate.attempt[0] span (validator.Iterate emit)
    │     input: prompt+model
    │     output: { verdict: validation_failed, error: ..., raw }
    ├─ chatd /v1/chat call (X-Lakehouse-Trace-Id: TR-1)
    │     ├─ http.request span (chatd middleware)
    │     └─ chatd-internal spans (existing)
    ├─ iterate.attempt[1] span
    └─ ...

All in one Langfuse trace tree, not N+1 separate traces.

## Hallucinated-worker safety net is unchanged

The /v1/iterate flow's hard correctness gate is still
FillValidator + WorkerLookup. Phantom candidate IDs raise
ValidationError::Consistency which 422s and forces the iteration
loop to retry. The trace-id propagation is the OBSERVABILITY layer
on top — it makes the existing safety net's outcomes visible per-call,
not a replacement for it.

## Verification

- internal/validator: 4 new tests
  - TestIterate_TracerEmitsSpanPerAttempt — span/attempt count + SpanID
  - TestIterate_NoTraceIDSkipsTracer — no orphan spans without trace_id
  - TestIterate_ChatCallerReceivesTraceID — propagation contract
  - (existing iterate tests updated for new ChatCaller signature)
- internal/shared: 1 new test
  - TestLangfuseMiddleware_HonorsTraceIDHeader — cross-service linkage
- cmd/validatord: existing HTTP tests still PASS via the dual-shape
  UnmarshalJSON contract.
- validatord_smoke.sh: 5/5 PASS through gateway :3110 (unchanged).
- Full go test ./... green across 33 packages.

## Architecture invariant added

STATE_OF_PLAY "DO NOT RELITIGATE" gains a paragraph documenting
the X-Lakehouse-Trace-Id header contract + the iterate.attempt[N]
span emission. Future-Claude won't re-propose "wire trace-id
propagation" — the header IS the wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:13:18 -05:00
root
afdeca80d9 docs: record Python sidecar drop in architecture comparison
Companion to lakehouse commit ba928b1 (aibridge: drop Python sidecar
from hot path; AiClient → direct Ollama).

ARCHITECTURE_COMPARISON.md:
  - Decisions tracker: "Drop Python sidecar" moves _open_ → DONE
  - "Cross-cutting abstracts to address" item #3 marked complete
  - "Python dependency (the load-bearing axis)" section reframed:
    pre/post-2026-05-02 diagram, lab UI / pipeline_lab clarified
    as dev-only Python (not on runtime hot path)
  - Change log: new entry

STATE_OF_PLAY: timestamp refresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:00:51 -05:00
root
7d6636b33e validator: align ValidationError JSON to Rust serde shape (6/6 parity)
Closes the 2026-05-02 parity finding: validator_parity probe found
5/6 body shapes diverging because Go emitted {"Kind":"...","Field":"...","Reason":"..."}
while Rust emits the externally-tagged-enum {"Schema":{"field":"...","reason":"..."}}.
A caller parsing the error envelope would break silently in cutover.

## Changes

internal/validator/types.go:
- Custom MarshalJSON emits the Rust shape:
    Schema:       {"Schema":      {"field":"x","reason":"y"}}
    Completeness: {"Completeness":{"reason":"y"}}
    Consistency:  {"Consistency": {"reason":"y"}}
    Policy:       {"Policy":      {"reason":"y"}}
- Custom UnmarshalJSON accepts BOTH the new Rust shape AND the legacy
  flat shape (migration safety for any persisted error rows).
- Unknown variants (e.g. a future Rust addition Go hasn't learned)
  surface as an Unmarshal error, not a silent default.

internal/validator/types_test.go:
- 4 pinning tests anchor the wire format. Failing them = wire-format
  drift; the parity probe is the secondary line of defense.

scripts/validatord_smoke.sh:
- Updated probes to read the new variant-name shape (jq keys[0],
  .Schema.field) instead of legacy .Kind/.Field.

## Verification

- internal/validator unit tests: PASS (4 new + all existing).
- cmd/validatord HTTP tests: PASS (UnmarshalJSON falls through to flat
  shape so existing tests reading ValidationError still work).
- validatord_smoke.sh: 5/5 PASS through gateway :3110.
- validator parity probe re-run: **6/6 match** (was 1/6).

## Pattern

Per architecture_comparison's "use the dual-implementation as a
measurement instrument" thesis: a parity probe surfaced this gap;
50 LOC of MarshalJSON closed it; 4 pinning tests prevent regression;
the probe is the longitudinal gate. Cutover-friendly direction (Go
matches Rust) chosen because Rust is the existing production
contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:49:28 -05:00
root
b0c8a3f227 parity probes: materializer + extract_json (caught + fixed real bug)
Two new cross-runtime parity probes joining the validator probe from
the gauntlet wave. Pattern: feed identical input through Rust and Go;
diff outputs. Each probe surfaced a different signal.

## Materializer parity probe
scripts/cutover/parity/materializer_parity.sh runs Bun + Go
materializer against an identical synthetic data/_kb/ root, diffs the
resulting evidence/ JSONL byte-equivalent (modulo provenance.recorded_at).

**First run: 0/2 match.** Real finding: Go's Provenance.LineOffset
had `json:"line_offset,omitempty"` which strips the field when value
is 0. Line offset 0 is the FIRST ROW of every source file — a real
semantic value, not absent. Bun side always emits it.

Fix: drop `omitempty` on Provenance.LineOffset. Updated comment
explaining why.

**Re-run: 2/2 match.** On-wire JSON parity holds.

## extract_json parity probe
scripts/cutover/parity/extract_json_parity.sh feeds 12 fixture
strings through both runtimes' extract_json:
  - fenced ```json``` blocks
  - unfenced ``` blocks
  - bare braces with prose around
  - first-balanced-of-many
  - nested objects
  - unicode in string values
  - escaped quotes
  - empty object
  - top-level array (both return first inner object)
  - no JSON
  - depth-balanced but invalid syntax
  - trailing garbage

Substrate gate: cargo test -p gateway extract_json PASS before probe.

**Result: 12/12 match.** Algorithms genuinely equivalent.

## scripts/cutover/parity/extract_json_helper/main.go
Tiny Go binary that reads stdin, calls validator.ExtractJSON, prints
{matched, value} JSON. Counterpart to the Rust parity_extract_json
binary in golangLAKEHOUSE's sibling lakehouse repo (separate commit).

## Pattern crystallized
Every cross-runtime port should land with a parity probe. Three
probes now exist:
  - validator (5/6 wire-format gap captured 2026-05-02)
  - materializer (caught + fixed real bug 2026-05-02)
  - extract_json (12/12 match 2026-05-02)

The instrument is reusable — each new shared HTTP/CLI surface gets
a probe row added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:43:54 -05:00
root
e8cf113af8 gauntlet 2026-05-02: smoke chain + per-component scrum + parity probe
Production-readiness gauntlet exploiting the dual Rust/Go
implementation as a measurement instrument.

## Phase 1 — Full smoke chain
21/21 PASS in ~60s. Substrate intact across the full service surface.

## Phase 2 — Per-component scrum (token-volume fix)
Prior wave (165KB diff): Kimi 62 tokens out, Qwen 297 → no useful
analysis. This wave splits today's commits into 4 focused bundles
(36-71KB each):
  c1 validatord (46KB) → 0 convergent / 11 distinct
  c2 vectord substrate (36KB) → 0 convergent / 10 distinct
  c3 materializer (71KB) → 0 convergent / 6 distinct (Opus emitted
                           a BLOCK then self-retracted in same response)
  c4 replay (45KB) → 0 convergent / 10 distinct

Reviewer engagement vs prior wave: Kimi went 62 → ~250 tokens out
once bundles dropped below 60KB.

scripts/scrum_review.sh hardening:
  * Diff-size guard (warn >60KB, hard-fail >100KB,
    SCRUM_FORCE_OVERSIZE=1 override)
  * Tightened prompt — file path must appear EXACTLY as in diff
    so post-processor can grep WHERE: lines reliably
  * Auto-tally step dedupes by (reviewer, location); convergence
    counts distinct lineages (closes the prior `opus+opus+opus`
    false-convergence bug)

## Phase 3 — Cross-runtime validator parity probe (the headline finding)
scripts/cutover/parity/validator_parity.sh sends 6 identical
/v1/validate cases to Rust :3100 AND Go :4110, compares status+body.

Result: **6/6 status codes match · 5/6 body shapes diverge.**

Rust returns serde-tagged enum:   {"Schema":{"field":"x","reason":"y"}}
Go returns flat exported-fields:  {"Kind":"schema","Field":"x","Reason":"y"}

Both round-trip inside their own runtime; a caller swapping one for
the other would break parsing silently. Captured as new _open_ row
in docs/ARCHITECTURE_COMPARISON.md decisions tracker.

This is the "use the dual-implementation as a measurement instrument"
return — single-repo scrums can't catch this class of cross-runtime
drift.

## Phase 4 — Production assessment
ship-with-known-gap. Validator wire-format gap is documented, not
regressed. ~50 LOC future fix on Go side (custom MarshalJSON on
ValidationError to match Rust's serde shape).

Persistent stack config (/tmp/lakehouse-persistent.toml) gains
validatord on :3221 + persistent-validatord binary so operators
bringing up the persistent stack get the new daemon automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:05:18 -05:00
root
f9e72412c1 validatord: /v1/validate + /v1/iterate HTTP surface (port 3221)
Closes the last "Go primary" backlog item in
docs/ARCHITECTURE_COMPARISON.md. Go now owns the entire validator
path end-to-end — no Rust dep for staffing safety net.

Architecture: cmd/validatord on :3221 hosts both endpoints. Calls
chatd directly for the iterate loop's LLM hop (no gateway
self-loopback like the Rust shape). Gateway proxies /v1/validate +
/v1/iterate to validatord.

What's in:
- internal/validator/playbook.go — 3rd validator kind (PRD checks:
  fill: prefix, endorsed_names ≤ target_count×2, fingerprint required)
- internal/validator/lookup_jsonl.go — JSONL roster loader (Parquet
  deferred; producer one-liner documented in package comment)
- internal/validator/iterate.go — ExtractJSON helper + Iterate
  orchestrator with ChatCaller seam for unit tests
- cmd/validatord/main.go — HTTP routes, roster load, chat client
- internal/shared/config.go — ValidatordConfig + gateway URL field
- lakehouse.toml — [validatord] section
- cmd/gateway/main.go — proxy routes for /v1/validate + /v1/iterate

Smoke: 5/5 PASS through gateway :3110:
  ✓ playbook happy path
  ✓ playbook missing fingerprint → 422 schema/fingerprint
  ✓ phantom candidate W-PHANTOM → 422 consistency
  ✓ unknown kind → 400
  ✓ roster loaded with 3 records

go test ./... green across 33 packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:53:20 -05:00
root
09299a27b7 scrum 2026-05-02: materializer+replay+vectord — ship-with-fixes
Cross-lineage review of 89ca72d (Opus + Kimi + Qwen3-coder).

Convergent findings (≥2 reviewers): NONE.

- Kimi BLOCK (materializer main.go exits 0 on validation fail):
  confabulation. Code does os.Exit(1) at lines 65-66.
- Qwen BLOCK (saveTask race condition): confabulation. All access
  to inflight/pending is under s.mu.
- Qwen WARN (saveAfter nil deref): confabulation. Explicit
  `if h.persist == nil { return }` guard at line 184.
- Opus BLOCK (TestSaveTask_Coalesces): self-withdrawn in same
  response.

Opus WARNs actioned:
- Detached docstring on TestAdd_SmallIndex_ConcurrentDistinctIDs —
  attached.
- isoDatePartition fallback comment — clarified as defense-in-depth
  (MaterializeAll guards upstream; branch unreachable through public
  surface).

Disposition + verdicts in reports/scrum/_evidence/2026-05-02/.

Pattern matches feedback_cross_lineage_review.md: only Opus emits
BLOCK-class findings worth verifying; Kimi/Qwen single-reviewer
BLOCKs failed trace verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:35:12 -05:00
root
89ca72d471 materializer + replay ports + vectord substrate fix verified at scale
Two threads landing together — the doc edits interleave so they ship
in a single commit.

1. **vectord substrate fix verified at original scale** (closes the
   2026-05-01 thread). Re-ran multitier 5min @ conc=50: 132,211
   scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix).
   Throughput dropped 1,115 → 438/sec because previously-broken
   scenarios now do real HNSW Add work — honest cost of correctness.
   The fix (i.vectors side-store + safeGraphAdd recover wrappers +
   smallIndexRebuildThreshold=32 + saveTask coalescing) holds at the
   footprint that originally surfaced the bug.

2. **Materializer port** — internal/materializer + cmd/materializer +
   scripts/materializer_smoke.sh. Ports scripts/distillation/transforms.ts
   (12 transforms) + build_evidence_index.ts (idempotency, day-partition,
   receipt). On-wire JSON shape matches TS so Bun and Go runs are
   interchangeable. 14 tests green.

3. **Replay port** — internal/replay + cmd/replay +
   scripts/replay_smoke.sh. Ports scripts/distillation/replay.ts
   (retrieve → bundle → /v1/chat → validate → log). Closes audit-FULL
   phase 7 live invocation on the Go side. Both runtimes append to the
   same data/_kb/replay_runs.jsonl (schema=replay_run.v1). 14 tests green.

Side effect on internal/distillation/types.go: EvidenceRecord gained
prompt_tokens, completion_tokens, and metadata fields to mirror the TS
shape the materializer transforms produce.

STATE_OF_PLAY refreshed to 2026-05-02; ARCHITECTURE_COMPARISON decisions
tracker moves the materializer + replay items from _open_ to DONE and
adds the substrate-fix scale verification row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:31:02 -05:00
root
277884b5eb multitier_100k: 335k scenarios @ 1,115/sec against 100k corpus, 4/6 at 0% fail
J asked for a much more sophisticated test using the 100k corpus from
the Rust legacy database. This commit ships:

scripts/cutover/multitier/main.go — 6-scenario harness with weighted
random selection per goroutine. Mixes search, email/SMS/fill
validators (in-process via internal/validator), profile swap with
ExcludeIDs, repeat-cache exercise, and playbook record/replay.

Scenarios + weights (cumulative scenario fractions):
  35% cold_search_email      — search + email outreach + EmailValidator
  15% surge_fill_validate    — search + fill proposal + FillValidator + record
  15% profile_swap           — original search + ExcludeIDs swap + no-overlap check
  15% repeat_cache           — same query × 5 (cache effectiveness)
  10% sms_validate           — SMS draft (≤160 chars, phone for SSN-FP guard)
  10% playbook_record_replay — cold → record → warm w/ use_playbook=true

Test results (5-min sustained, conc=50, 100k workers indexed):
  TOTAL 335,257 scenarios @ 1,115/sec
  cold_search_email     117k @ 0.0% fail · p50 2.2ms · p99 8.6ms
  surge_fill_validate    50k @ 98.8% fail (substrate bug below)
  profile_swap           50k @ 0.0% fail · p50 4.5ms · ExcludeIDs verified
  repeat_cache           50k × 5 = 252k searches @ 0.0% fail · p50 11.7ms
  sms_validate           33k @ 0.0% fail · phone-pattern guard works
  playbook_record_replay 33k @ 96.8% fail (substrate bug below)
  Total successful workflows: ~250k+

Validator integration verified at load:
  150,930 EmailValidator passes across cold_search_email + sms_validate
  35 + 1,061 successful FillValidator + playbook_record (where the bug
    didn't fire)
  zero false positives on the SSN-pattern guard against phone numbers

Resource footprint at 100k:
  vectord 1.23GB RSS (linear with 100k vectors)
  matrixd 26MB, 75% CPU (1-core saturated at conc=50)
  Total across 11 daemons: 1.7GB
  Compare to Rust at 14.9GB — ~10× less even at 100k.

SUBSTRATE BUG SURFACED: coder/hnsw v0.6.1 nil-deref in
layerNode.search at graph.go:95. Triggers on /v1/matrix/playbooks/record
under sustained writes to the small playbook_memory index. Both Add
and Search paths can panic.

Workaround applied (this commit) in internal/vectord/index.go
BatchAdd: recover() guard converts panic to error; daemon stays up
instead of crashing the request handler.

Operator recovery procedure (also documented in the report):
  curl -X DELETE http://localhost:4215/vectors/index/playbook_memory
Next record recreates the index fresh.

Real fix DEFERRED — open in docs/ARCHITECTURE_COMPARISON.md
Decisions tracker. Three options:
  a) upstream patch to coder/hnsw
  b) custom small-index Add path that always rebuilds when len < threshold
  c) alternate store for playbook_memory (Lance? in-memory map?)

Evidence: reports/cutover/multitier_100k.md (full methodology +
results + repro + bug analysis). docs/ARCHITECTURE_COMPARISON.md
Decisions tracker updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 06:28:50 -05:00
root
3a2823c02f g5 cutover: bigger load test — 5.87M req, 0 errors, 370MB RSS
Larger-scale follow-up to the original load test. Three axis
expansions: corpus 200→5K workers, body variety 6→200 distinct
queries, concurrency sweep 10/50/100/200, plus mixed
embed+search workload.

Concurrency sweep on /v1/matrix/search direct (3 min each):
  conc=10:  486,733 req  · 2,704 RPS · p50 2.19ms · p99 6.7ms
  conc=50:  1,148,543 req · 6,381 RPS · p50 7.08ms · p99 20ms
  conc=100: 1,253,389 req · 6,963 RPS · p50 13.34ms · p99 37ms
  conc=200: 1,460,676 req · 8,114 RPS · p50 23.45ms · p99 56ms

Mixed embed+search at 60 conc each, 90s:
  /v1/embed: 1,127,854 req · 12,531 RPS · p50 3.31ms · p99 14.6ms
  /v1/matrix/search: 392,229 req · 4,358 RPS · p50 12.68ms · p99 33.8ms

TOTAL: 5,869,424 requests across ~13.5 minutes. ZERO errors.

Resource footprint during peak load:
  matrixd  105% CPU, 33MB RSS (bottleneck — pegs 1 core)
  vectord   39% CPU, 82MB RSS
  gateway   44% CPU, 41MB RSS
  embedd    30% CPU, 67MB RSS
  Total RSS across 11 daemons: ~370MB

Compare to Rust gateway under similar load: 14.9GB RSS, 374% CPU.
Go uses ~40x less memory + spreads load across daemons rather
than packing into one mega-process.

Saturation analysis:
- conc 10→50: +135% RPS (linear-ish scaling)
- conc 50→100: +9% RPS (saturation begins)
- conc 100→200: +17% RPS (matrixd 1-core pegged)

Headroom paths if production exceeds current demand:
1. Run multiple matrixd instances behind a load balancer.
   Substrate is stateless (recordings via storaged), horizontal
   scale is straightforward.
2. Profile matrixd's per-request work (role-gate + judge-eligibility
   + result merge).
3. Skip Bun for hot endpoints (direct nginx → Go = 5.7x previously
   measured).

Evidence: reports/cutover/g5_load_test_big.md (full tables +
methodology + repro script).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 05:18:00 -05:00
root
2a974d6dea docs: ARCHITECTURE_COMPARISON.md as living source file
Per J's request: move the parallel-runtime comparison from
reports/cutover/ (where it lived as cutover-prep evidence) into
docs/ as the source-of-truth file. J will keep updating it as
fixes ship on either side.

Restructured for living-document use:
- Status header (last refresh date, owner, update triggers)
- 'How to update this doc' section with explicit dos and don'ts
- Decisions tracker at top — actioned items with commit refs
  + open backlog with LOC estimates
- Each comparison section now has 'Last verified' columns where
  numbers are time-sensitive
- Change log section at bottom for one-line entries on every
  meaningful refresh

The original at reports/cutover/architecture_comparison.md gains
a 'THIS IS A SNAPSHOT' header pointing at the docs/ source. Kept
as historical record but no longer the place to update.

Sister pointer file in /home/profit/lakehouse/docs/ARCHITECTURE_COMPARISON.md
so the doc is reachable from either repo side. That file explicitly
says the source lives in golangLAKEHOUSE and warns against
authoritative content in the pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:56:20 -05:00
root
b03521a506 validator: port FillValidator + EmailValidator from Rust validator crate
Per architecture_comparison.md universal-win for Go side: ports the
Rust crates/validator/src/staffing/ to internal/validator/. Production
safety net Go was missing — FillValidator catches phantom worker IDs
+ status/blacklist/geo/role mismatches; EmailValidator catches
SSN-shape PII + salary disclosure + wrong-target name in email/SMS
drafts.

Files:
- types.go: Artifact (FillProposal | EmailDraft), Validator interface,
  WorkerLookup interface, ValidationError + Finding + Severity
- lookup.go: InMemoryWorkerLookup with case-insensitive ID lookup
- fill.go: FillValidator — schema → completeness → cross-roster
  (phantom ID / status / blacklist / geo / role)
- email.go: EmailValidator — schema → length → PII (SSN + salary)
  → worker-name consistency
- fill_test.go + email_test.go: 24 tests covering happy path +
  every error variant + the load-bearing edge cases (phone-pattern
  not flagged as SSN, flanking-digit guard rejects extended
  numeric runs)

Validator names match Rust (staffing.fill / staffing.email) so
cross-runtime audit logs share the same identifier. PII scanners
(containsSSNPattern, containsSalaryDisclosure) ported byte-for-byte
so a draft flagged by one runtime is flagged by the other.

Caveat: the Rust validator crate also has parquet_lookup.rs (loads
workers_500k.parquet at startup) and playbook.rs (additional
checks). Those weren't ported in this wave — only the two
load-bearing validators that were named in the comparison doc.

Closes one of the two universal-win items for Go side. The other
(materializer port) remains deferred — it's a bigger surface change
and depends on transforms.ts source-class adapters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:49:55 -05:00
root
b3ad14832d architecture_comparison: Rust vs Go lakehouse — weaknesses, strengths, abstracts to address
J asked for the comparison before locking in primary line. This
report documents what's actually structurally different vs
implementation-level different, and what to do about each.

Key findings:

1. Python sidecar is the single biggest architectural lever
   - Rust: gateway → HTTP → Python sidecar :3200 → HTTP → Ollama
   - Go:   gateway → HTTP → embedd → HTTP → Ollama (no Python)
   - Sidecar adds zero compute over Ollama (just pydantic + httpx)
   - 63× perf gap (8,119 vs 128 RPS) driven by sidecar + cache absence

2. Process model: Rust 1 mega-binary (14.9G RSS), Go 11 daemons
   - Rust: simpler ops at small scale, panic blast radius = whole system
   - Go: per-daemon scale + crash isolation, more config surface

3. Code volume: Go 15,128 lines vs Rust 35,447 + 1,237 sidecar
   - Go is 43% the size doing similar work
   - Gap concentrated in vectord (Rust 11k lines, Go 804 — Lance + benchmarking)

4. Distillation pipeline asymmetry
   - Audit/observation: BOTH sides parallel-mature
   - Production: Rust-only (materializer + replay + RAG/pref export)
   - Go can READ everything but can't PRODUCE evidence

5. Production validators (FillValidator/EmailValidator/'/v1/validate')
   - Rust has them (1,286 lines, 12 tests each)
   - Go doesn't — matrix gate covers role bleed but not structural validation

Cross-cutting abstracts to address regardless of which wins:
- Drop Python sidecar from Rust (call Ollama directly)
- Add LRU embed cache to Rust aibridge
- Port materializer + replay + validators to Go
- Pin shared JSONL schemas as canonical (both runtimes consume same spec)
- Decide on Lance backend (defer until corpus > 5M rows)

If keeping Go primary: port materializer first, validators second,
skip Lance. If keeping Rust primary: drop Python + add cache,
port chatd 5-provider dispatcher + cross-role gate from Go.

Bottom line: substrate is parallel-mature on observation; producer
side is Rust-only; performance structurally favors Go ~60× on warm
workloads; operations favors Go on isolation; production deployment
favors Rust today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:34:24 -05:00
root
c164a3da96 g5 cutover: production load test — 0 errors / 101k req · Go direct = 2,772 RPS
Sustained-traffic load test against the cutover slice. Three runs,
zero correctness errors across 101,770 total requests. Substrate
holds up under concurrent load — matrix gate, vectord HNSW,
embedd cache, gateway proxy all hold. This was the load test's
primary question; latency numbers are secondary.

scripts/cutover/loadgen — focused Go load generator. 6-query
rotating body mix (Forklift/CNC/Warehouse/Picker/Loader/Shipping).
Configurable URL/concurrency/duration. Reports per-status-code
counts + p50/p95/p99 latencies + JSON summary on stderr.

Three runs:

  baseline (Bun → Go, conc=1, 10s):
    4,085 req · 408 RPS · p50 1.3ms · p99 32ms · max 215ms

  sustained (Bun → Go, conc=10, 30s):
    14,527 req · 484 RPS · p50 4.6ms · p99 92ms · max 372ms

  direct (→ Go, conc=10, 30s):
    83,158 req · 2,772 RPS · p50 2.5ms · p99 8.5ms · max 16ms

Critical findings:

1. ZERO correctness errors across 101k requests. No 5xx, no
   transport errors, no panics. Concurrency-safety verified across
   matrix gate / vectord / gateway / embedd cache.

2. Direct-to-Go is production-grade. 2,772 RPS at p99 8.5ms on a
   single host, no scaling cliff at concurrency=10.

3. Bun frontend is the bottleneck. -82% RPS, +982% p99 vs direct.
   Single-process JS event loop queueing under concurrent
   requests — known Bun proxy-mode characteristic. The substrate
   itself isn't the limiter.

4. For staffing-domain demand levels (<1 RPS typical per
   coordinator), Bun-fronted 484 RPS has 480× headroom. No
   urgency to optimize Bun out of the data path. If/when
   concurrent demand grows orders of magnitude, the path is
   nginx → Go direct for hot endpoints, skip Bun.

Substrate is now load-tested and verified production-ready.

What this load test does NOT cover (documented in
g5_load_test.md): cold-cache embed, larger corpus, mixed
read/write, multi-host, full 5-loop traffic with judge gate
calls. Each is its own probe shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:20:41 -05:00
root
6507dff26d g5 cutover: first 5-loop end-to-end through Bun frontend
Companion to c522ace (cutover slice live). That commit proved
infrastructure (Bun /_go/* → Go gateway). This commit proves the
SUBSTRATE'S CORE LEARNING BEHAVIOR through the same path.

Two tests against persistent Go stack on :4110 with the 200-worker
corpus, all traffic via Bun frontend on :3700:

TEST 1: same-role boost fires with exact math
  Q1: Need 3 Forklift Operators in Aurora IL for Parallel Machining
  query_role: "Forklift Operator"

  cold (use_playbook=false):
    rank=0 id=w-43  dist=0.4449  Brian Ramirez, Springfield IL

  POST /_go/v1/matrix/playbooks/record:
    query_text=Q1, role=Forklift Operator, answer_id=w-43, score=1.0
    → playbook_id=pb-1126c52bd106df6b

  warm (use_playbook=true):
    rank=0 id=w-43  dist=0.2224  ← halved
    boosted=1, injected=0

  Math check: BoostFactor = 1 - 0.5*score = 0.5 (for score=1.0).
  Expected warm_dist = 0.4449 * 0.5 = 0.22245.
  Observed: 0.2224. 4-decimal exact through 3 HTTP hops.

TEST 2: cross-role gate prevents bleed
  Q2: Need 1 CNC Operator in Detroit MI for Beacon Freight
  query_role: "CNC Operator"
  use_playbook: true (Forklift recording from Test 1 in playbook corpus)

  result:
    rank=0 id=w-175  Kevin Ruiz       (Machine Operator, Detroit MI)
    rank=2 id=w-102  Laura Long       (Forklift Operator, Cleveland OH)
    boosted=0, injected=0  ← role gate fired correctly

  w-102 (Forklift Operator) appears at rank 2 organically via
  cosine retrieval — but boosted=0 confirms the Forklift PLAYBOOK
  did NOT influence this query. Surgical: gate suppresses
  playbook-driven boosts from cross-role recordings, leaves
  organic retrieval untouched.

What this confirms about the substrate:
1. Learning works — single recording → measurable, math-exact boost
2. Bleed protection works — role gate (real_001 fix) holds through
   cutover slice
3. Math holds across HTTP hops — Bun → gateway → matrixd → vectord
   with no drift
4. Substrate works through real production-shape framing — CORS,
   content-type, body forwarding, all transparent

The substrate's reason-for-being (5-loop learning) is now
demonstrably executing on persistent daemons under
production-shape frontend traffic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:14:21 -05:00
root
c522acec8b g5 cutover slice live — first real Bun-frontend traffic to Go substrate
J said "let's go" → "next" (option 3): actual flip via Bun
mcp-server. Done. Real Bun-frontend traffic now reaches the Go
substrate via /_go/* on Bun :3700, routed to the persistent Go
gateway at :4110.

Companion change in /home/profit/lakehouse (Rust legacy):
  mcp-server/index.ts: new /_go/* pass-through, opt-in via
  GO_LAKEHOUSE_URL env var. Off-by-default (returns 503 on
  /_go/* with rationale). Existing /api/* (Rust gateway) path
  unchanged. Committed locally on the demo/post-pr11 branch.

System config:
  /etc/systemd/system/lakehouse-agent.service.d/go-cutover.conf
  adds Environment=GO_LAKEHOUSE_URL=http://127.0.0.1:4110 to
  the systemd-managed Bun service. Reversible via systemctl
  revert lakehouse-agent.

Live verification (operator curl through Bun frontend):
- /_go/health: gateway responds {"status":"ok","service":"gateway"}
- /_go/v1/embed: nomic-embed-text-v2-moe vectors, dim=768
- /_go/v1/matrix/search vs persistent 200-worker corpus:
    rank=0 id=w-43  Brian Ramirez   (Forklift Operator, Springfield IL)
    rank=1 id=w-102 Laura Long      (Forklift Operator, Cleveland   OH)
    rank=2 id=w-101 Terrence Gray   (Forklift Operator, Champaign   IL)
    3/3 role match, top-1 in IL exactly
- /api/health: lakehouse ok (Rust path unchanged — control verified)

What this is NOT:
- Not an nginx flip — devop.live/lakehouse/* still goes through
  /api/* → Rust :3100. /_go/* is parallel slice for opt-in.
- Not a tool-level cutover — each /_go/<path> is a manual choice;
  no automatic mapping of Rust paths to Go equivalents.
- Not a transformation layer — caller sends Go-shaped requests
  (e.g. /_go/v1/embed expects {texts, model}, not {text}).

Three cutover unit properties verified:
- ADDITIVE: zero modification to any existing Bun tool
- REVERSIBLE: unset GO_LAKEHOUSE_URL → /_go/* → 503
- ISOLATED: Rust gateway state unaffected (different port,
  different binary, different MinIO bucket)

This is the cutover slice operators can use to validate Go-side
handlers under realistic frontend conditions before any
production-traffic flip. Next step (deferred): pick a specific
mcp-server tool to optionally route through Go with response-
shape adapter — that's a product-visible flip rather than this
infrastructure-visible slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 03:45:41 -05:00
root
4fd560cad6 start_go_stack.sh: third isolation layer (port range :4xxx for persistent)
Earlier push exposed the gap in the previous 2-layer isolation:
smokes still failed because they tried to bind :3211-:3220 which
my persistent stack already had. Smoke catalogd's bind-failure
went undetected because poll_health 3212 succeeded responding to
the persistent catalogd, and smoke proceeded against the wrong
backend with the wrong bucket expectations.

Fix: persistent stack now uses :4110 + :4211-:4219 via additional
sed in the temp toml (bind addresses + upstream URLs). Smoke
harnesses keep :3110 + :3211-:3219. Both reach the SAME chatd at
:3220 because chatd is read-mostly (no state to clobber) and
operators don't want to maintain two LLM provider key sets.

Three isolation layers now in effect:
1. Binary names (bin/persistent-* via symlinks)
2. MinIO buckets (lakehouse-go-persistent vs lakehouse-go-primary)
3. Port range (:4xxx vs :3xxx, with shared chatd on :3220)

Verified pre-push:
- 11 persistent ports listening on :4xxx + :3220
- 0 smoke ports listening on :3110-:3219 (free for smokes)

Pushed while persistent stack live — first cross-isolation test
(no port collision, no bucket collision, no name collision).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 03:26:41 -05:00
root
c48b58ff8d start_go_stack.sh: 2-layer isolation from smoke harness
The 2026-05-01 persistent-stack milestone exposed two collision
modes between the long-running Go stack and the pre-push smoke
harness:

1. PKILL COLLISION: smoke teardown uses anchored
   `pkill -f "bin/(storaged|...|gateway)$"`. Same-named persistent
   processes match → smokes kill 7 of 11 persistent daemons.
2. MINIO STATE COLLISION: persistent stack writes
   `_vectors/workers.lhv1` to the shared lakehouse-go-primary
   bucket. Smoke vectord rehydrates from same bucket → sees both
   smoke-owned and persistent-owned indexes → assertion failures.

Both fixed in this commit by adding two isolation layers:

LAYER 1 — distinct binary names via symlink:
  bin/persistent-<daemon> → bin/<daemon>
  Persistent stack runs as ./bin/persistent-gateway etc.
  Smoke pattern `bin/(name)$` matches `bin/gateway$` but NOT
  `bin/persistent-gateway$` (regex group requires bin/ followed
  immediately by a daemon name; "bin/p..." doesn't qualify).
  Cmdline lookup verified: 7 persistent procs, 0 match smoke pkill.

LAYER 2 — separate MinIO bucket via temp config:
  Persistent stack writes to lakehouse-go-persistent (configurable
  via $LH_PERSISTENT_BUCKET). Temp toml at /tmp/lakehouse-persistent.toml
  inherits everything from lakehouse.toml except [s3].bucket which
  is sed-replaced. Bucket auto-created via mc if missing.
  Verified: workers.lhv1 lands in persistent bucket; primary
  bucket _vectors/ stays empty.

Net effect: the persistent stack should survive `git push` (which
runs smokes that rehydrate vectord from primary bucket and pkill
their own bin/<name>$ daemons). This commit is the first push test
WITH the persistent stack live.

Caveat: bin/persistent-* symlinks are gitignored already (/bin/ is
in .gitignore wholesale), so the symlinks need to be created on
each fresh checkout — which start_go_stack.sh does idempotently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 03:20:00 -05:00
root
77a3dcf266 cutover: first end-to-end coordinator query against persistent Go stack
Three real-shape demand queries against the long-running 11-daemon
stack with 500 workers ingested from workers_500k.parquet (real
production data). Substrate is producing useful answers:

Q1 (Forklift @ Aurora IL): 5/5 role match, top 3 in IL, dist 0.44-0.46
Q2 (CNC @ Detroit MI):     top-1 in Detroit MI exactly, role pulls
                           Machine Operator (semantic neighbor)
Q3 (Warehouse @ Indianapolis IN): top-1 in Indianapolis IN, 5/5 role
                                  match, dist 0.42-0.54

This is the FIRST end-to-end coordinator-shape query against the
persistent Go stack — every prior reality test (real_001..real_005)
ran through harness-transient stacks that died on exit. This one
ran against daemons that have been up for minutes and stayed up
through retrieval.

Geo is load-bearing: top-1 city/state matched in 3/3 queries.
Embedder treats geography as a primary feature.

Q2's CNC→Machine Operator gap exposes the playbook learning loop's
purpose: judge would rate this ~3/5; the first time a coordinator
approves a Machine Operator for a CNC Operator query, that
recording starts shifting substrate behavior. That's the loop
we've been building toward — the persistent stack is now the
substrate that loop will run on.

Evidence: reports/cutover/persistent_stack_first_query.md (full
top-K tables + read on each query).

What this does NOT prove:
- Production-volume load (3 queries, 500 workers)
- Concurrent latency
- Full 5-loop substrate (this exercised retrieval only; no
  playbook recordings exist on the persistent stack yet)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 03:10:09 -05:00
root
54b2e7db76 start_go_stack.sh: document smoke-vs-persistent-stack pkill conflict
Caught immediately after the prior commit pushed: pre-push smokes
killed 7 of 11 persistent Go daemons because the smokes' anchored
`pkill -f "bin/(name)$"` teardown matches ANY process named
`bin/<daemon>`, not just the smokes' own children.

Documented in the script header as a KNOWN CONSTRAINT with a
workaround (re-run start_go_stack.sh after every push) and a
proper-fix sketch (give the persistent stack a different binary
name via build tag or symlink). Proper fix deferred until trigger
fires — operators living through this once will know to want it.

Persistent stack restored (all 11 healthy as of this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 02:56:52 -05:00
root
09904d5222 cutover: persistent Go stack milestone — first long-running deployment + first Go-emitted audit_baselines entry
J's "let's go" instruction: leave OPEN list behind, push the Go
substrate forward into actual deployment shape. This commit marks
the first time the Go side has run as long-running daemons rather
than per-harness transient processes, and the first time the
shared cross-runtime longitudinal log has carried a Go-emitted
entry alongside the Rust ones.

What landed:

scripts/cutover/start_go_stack.sh — the persistent-stack runbook.
Brings up all 11 daemons (storaged → catalogd → ingestd → queryd
→ embedd → vectord → pathwayd → observerd → matrixd → gateway,
plus chatd-if-not-already-up) in dependency order via nohup +
disown. Anchored pkill per feedback_pkill_scope (never bare
"bin/"). Logs land in /tmp/gostack-logs/<bin>.log, one per daemon.

Verified live state:
- All 11 services healthy on :3110 + :3211-:3220
- gateway → embedd proxy returns nomic-embed-text-v2-moe vectors
- chatd reports 5/5 providers loaded
- No port collision with Rust gateway on :3100
- Daemons stay up after exit of the start script (production shape,
  not harness-transient)

audit_baselines.jsonl crosses the runtime boundary:
- 7 Rust-emitted entries (last: ca7375ea 2026-04-27)
- 1 Go-emitted entry (ee2a40c 2026-05-01T07:53:54Z) appended via
  ./bin/audit_full -append-baseline
- Same envelope shape, same metric set, same drift comparator
  semantics — operators running either runtime grow the same log

What this DOES prove:
- Substrate parity at deployment shape (not just unit tests)
- Cross-runtime artifact write-side compatibility (was previously
  proven on read side via audit_baselines roundtrip)
- The deploy machinery works end-to-end for the persistent case

What this does NOT prove (still ahead):
- Real coordinator traffic against the Go stack (no nginx flip yet;
  devop.live/lakehouse/ still serves through Rust)
- Go-side production materializer (Phase 2 is observer-only)
- Replay tool parity (Phase 7 is observer-only)
- The 5-loop product gate against actual humans

reports/cutover/SUMMARY.md now logs three new rows:
- audit-FULL with 12/12 phases ported
- First Go-emitted audit_baselines entry
- Persistent Go stack live

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 02:55:29 -05:00
root
ee2a40c505 audit-FULL: port phases 1/2/5/7 — only acceptance.ts (TS-only) remains skipped
Closes 4 of the 5 phases the initial audit-FULL port left as
deferred. The pattern: most "deferred" phases didn't actually need
the un-ported Rust pieces — they were observer-mode by design and
just needed to read existing on-disk artifacts.

Phase 1 (schema validators) → ported via exec.Command:
  Invokes `go test ./internal/distillation/...` — the Go equivalent
  of Rust's `bun test auditor/schemas/distillation/`. New
  GoTestModule field on AuditFullOptions controls the package
  pattern; empty disables the invocation (test mode, prevents
  recursion when audit-full is invoked from inside `go test`).

Phase 2 (evidence materialization) → ported as observer:
  Reads data/evidence/ directly and tallies rows + tier-1 source
  hits. Doesn't re-run the materializer (which is Rust-side TS).
  Emits p2_evidence_rows + p2_evidence_skips metrics matching
  Rust shape — drop-in audit_baselines.jsonl entries possible.

Phase 5 (run summary) → ported as observer:
  Reads reports/distillation/{run_id}/summary.json + 5 stage
  receipts. Validates schema_version=1, run_hash sha256, git_commit
  40-char hex, all stage receipts decode as JSON. Full schema
  validation (StageReceipt schema) is intentionally NOT ported —
  it would require porting the TS schemas/distillation/ validators
  in full; basic shape checks catch the load-bearing invariants.

Phase 7 (replay log) → ported as observer:
  Reads data/_kb/replay_runs.jsonl, validates last 50 rows parse
  as JSON. Skips the live-replay invocation that Rust's phase 7
  also does — porting Rust replay.ts is substantial and not in
  scope. The "log shape sanity" check is what audit-full actually
  needs; the live invocation is a separate concern.

Phase 6 (acceptance gate) — STILL SKIPPED:
  Rust acceptance.ts is a TS-only fixture harness with bun-specific
  deps. Porting the fixtures (tests/fixtures/distillation/acceptance/)
  + the 22-invariant runner to Go is an ADR-worth undertaking.
  Documented in the header comment.

Live-data probe (against /home/profit/lakehouse):
  Skips count: 4 → 1 (only phase 6).
  Required checks: 6/6 → 12/12 PASS.
  New metric: p2_evidence_rows=1055, BYTE-EQUAL to the Rust
  pipeline's collect.records_out from the latest summary.json.
  Cross-runtime parity now extends across phases 0/1/2/3/4/5/7.

6 new tests:
- TestPhase2_EvidenceTallyFromOnDisk: row + tier-1-hit tallying
- TestPhase5_FullSummaryFlow: complete run-summary fixture passes
- TestPhase5_ShortRunHashCaught: bad run_hash fails required check
- TestPhase7_ReplayLogReadsFromDisk: row-count reporting
- TestPhase7_MalformedTailRowsCaught: structural parse failure
- TestRunAuditFull_FullFixtureFlow updated to seed evidence/ +
  reports/distillation/ for the phases now wired.

Cleanup: removed local sortStrings helper (replaced with sort.Strings
now that `sort` is imported for phase 5's mtime-sort).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 02:35:13 -05:00
root
55b8c76a8c distillation: audit-FULL pipeline port (phases 0/3/4) — cross-runtime metric parity verified
Ports the metric-collection passes from scripts/distillation/audit_full.ts.
The substrate that PRODUCES audit_baselines.jsonl entries — the
half OPEN #2 left as "deferred to next wave" after the read/write
substrate landed in ca142b9.

Phase coverage:
  Phase 0 (file presence)             ported
  Phase 1 (schema validators)         skipped (Go's `go test` covers it)
  Phase 2 (materializer dry-run)      deferred (Go materializer not yet ported)
  Phase 3 (scored-runs distribution)  ported
  Phase 4 (contamination firewall)    ported
  Phase 5 (receipts validation)       deferred (Go run-summary JSON not yet emitted)
  Phase 6 (replay sanity)             deferred (Go replay tool not ported)
  Phase 7 (run summary lineage)       deferred (same)

Cross-runtime parity verified end-to-end:
  Go-side audit-full against /home/profit/lakehouse produced
  metrics IDENTICAL to the last Rust-emitted audit_baselines.jsonl
  entry. All 8 ported metrics match byte-for-byte:
    p3_accepted=386, p3_partial=132, p3_rejected=57, p3_human=480,
    p4_sft_rows=353, p4_rag_rows=448, p4_pref_pairs=83, p4_total_quarantined=1325
  6/6 required checks pass on live data.

Components:
- internal/distillation/audit_full.go: PhaseCheck struct (mirrors
  Rust shape), PhaseCheckReport aggregation, RunAuditFull
  orchestrator, auditPhase0/3/4 implementations, FormatAuditFullReport
  Markdown writer.
- cmd/audit_full/main.go: CLI binary with -root, -out, -json,
  -append-baseline flags. Operators run "./bin/audit_full
  -append-baseline" to grow the longitudinal log alongside the
  Rust pipeline (entries are interchangeable — same envelope shape).
- 6 new tests: empty-root failure handling, full-fixture clean PASS
  (locks all 8 metrics + all 6 required checks), SFT firewall
  contamination detection, preference self-pair detection, sig_hash
  regex correctness (rejects wrong-length + uppercase), Markdown
  formatter smoke.

Live-data probe captured at reports/cutover/audit_full_go_vs_rust.md
(linked from reports/cutover/SUMMARY.md). Same shape as the
audit_baselines round-trip evidence — both Go-side ports of the
distillation surface are now validated against real Rust data, not
just fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 01:30:23 -05:00
root
eb0dfdff04 vectord: v2 envelope + handleMerge robustness — actions post_role_gate_v1 scrum
3-lineage scrum on 434f466..0d4f033 surfaced one convergent finding
(Opus + Kimi) and 3 Opus-only real bugs. All actioned in this
commit. Two false positives (Kimi rollback misreading, Opus stale-
comment claim) verified + rejected — both required manual control-
flow inspection to refute, matching the documented Kimi-truncation
behavior in feedback_cross_lineage_review.md.

Convergent fix — DecodeIndex lost nil-meta items:
- Envelope version bumped 1 → 2.
- New v2 field: IDs []string carries the canonical ID set
  explicitly, independent of meta map's nil-vs-{} sparseness.
- DecodeIndex accepts both versions: v2 reads from env.IDs; v1
  falls back to meta-key inference (with the documented
  limitation that nil-meta items are invisible — preserved for
  backward-compat with already-persisted indexes).
- Encode emits v2 going forward.
- 2 new regression tests:
  - TestEncodeDecode_NilMetaItemsSurviveRoundTrip: items added
    with nil metadata MUST survive Encode → Decode and remain
    visible to IDs(). Pre-fix would have yielded IDs() == [].
  - TestDecodeIndex_V1BackwardCompat: hand-crafted v1 envelope
    still decodes (proves the fallback path).

Opus-only fixes:
- handleMerge: non-ErrIndexNotFound errors at h.reg.Get(name) /
  h.reg.Get(req.Dest) now return 500 + log instead of falling
  through with nil src/dest pointers (which would panic on the
  next deref). Real bug — only the sentinel error was handled.
- internal/drift/drift.go: mathLog wrapper removed; math.Log
  inlined. Wrapper added no value (math was already imported).
- internal/distillation/audit_baseline.go: BuildAuditDriftTable's
  bubble sort replaced with sort.Slice. Idiomatic + shorter.

Rejected after verification:
- Kimi WARN "missing rollback on partial merge": misread the
  control flow. Code at cmd/vectord/main.go:404-414 does NOT
  delete from src when dest.Add fails (continue before reaching
  src.Delete). Only successful Adds trigger Deletes.
- Opus INFO "TimestampUnixNano comment references missing field":
  field exists at scripts/multi_coord_stress/main.go:128. Opus
  saw only the diff context, not the full file.

Deferred (no fired trigger):
- Opus WARN "no per-index lock during merge": no concurrent merge
  callers today (operators run merge as deliberate one-shot job).
  Worth a lock if/when matrixd or chatd start auto-triggering.

Disposition: reports/scrum/_evidence/2026-05-01/verdicts/post_role_gate_v1_disposition.md.

Build + vet + tests green; 2 new regression tests + all prior tests
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 01:20:37 -05:00
root
0d4f033b34 audit_baselines: round-trip validation against live Rust data
Same shape of proof as embed_parity.sh for the embed endpoint:
take the just-shipped Go port (ca142b9) and validate it against
the actual production data the Rust legacy emits, not just unit-
test fixtures. Locks the cross-runtime parity that operators
running mixed pipelines depend on.

scripts/cutover/audit_baselines_validate.go:
- Reads /home/profit/lakehouse/data/_kb/audit_baselines.jsonl
- Parses every entry via the Go AuditBaseline struct
- Round-trips the last entry: encode → decode → field-by-field
  equality check (catches any silently-dropped JSON keys)
- Calls LoadLastBaseline against the live file (proves the public
  API works on real shapes, not just inline parsing)
- Computes BuildAuditDriftTable(first → last) — full-window
  lineage drift over the captured baselines

Live-data probe results (reports/cutover/audit_baselines_roundtrip.md):
- 7 entries parse without error
- Round-trip is byte-equal on every metric + every header field
- Drift table fires the expected verdicts:
  - p2_evidence_rows 12→82 (+583%) → warn (above 20% threshold)
  - p3_accepted/partial/rejected/human 0→non-zero → warn (the
    zero-baseline edge case TestBuildAuditDriftTable_ZeroBaseline
    was designed to lock — verified now firing on real history)
  - p4_* metrics +0% → ok (stable across the window)

What this does NOT prove (documented in the report): the Go-side
audit-FULL pipeline that PRODUCES baselines doesn't exist yet.
Only the load/append/drift substrate is ported. Operators running
audit-full from Go would still need a metric-collection pass —
that's a separate port deliberately not in this wave.

reports/cutover/SUMMARY.md gains a new row alongside the embed
parity entries; cutover-prep verification log keeps the
discipline of "verified against real data, not just fixtures."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:20:18 -05:00
root
ca142b9271 distillation: audit-baselines lineage port — fully closes the OPEN #2 surface
The original OPEN #2 line called for "SFT export pipeline +
audit_baselines lineage." Commit 7bb432f shipped the SFT export.
This commit ports the audit_baselines half — the longitudinal
drift signal that distinguishes "metrics shifted because the world
changed" from "metrics shifted because we broke something."

Mirrors Rust scripts/distillation/audit_full.ts's substrate:

- LoadLastBaseline(path) reads the most recent entry from
  data/_kb/audit_baselines.jsonl. Returns (nil, nil) on missing
  file (first run), errors on truncated last line (partial-write
  detection — operators don't lose drift signal silently).
- AppendBaseline(path, baseline) appends one entry as a JSON line.
  Atomic at the line level via bufio + O_APPEND. Creates the
  parent directory if missing.
- BuildAuditDriftTable(prior, current, threshold) computes
  per-metric drift. flag values mirror Rust exactly: first_run,
  ok, warn. DefaultDriftWarnThreshold = 0.20 = Rust's 20%.
- FormatAuditDriftTable renders a fixed-width text grid for
  stdout dumps in audit-full runs.

Edge cases handled:
- Zero-baseline: prior=0 means no division — PctChange stays nil.
  current=0 → ok (no change). current>0 → warn (zero→nonzero is
  always notable, never silently fine).
- New metric in current: flagged first_run, not "0%-change".
  Operators see "this is a new signal we haven't tracked before."
- Sort: stable by metric name for deterministic JSON output and
  clean CI diffs.

Generic on metric name (vs Rust's pinned p2_evidence_rows etc.):
the Rust phase numbering doesn't translate to Go directly. The
AuditBaselineRustCompat constant pins the Rust names so operators
running both runtimes use the same labels, which makes drift
comparison meaningful across the two pipelines.

13 new tests covering: missing file, last-line-wins, blank-line
tolerance, malformed-line errors, append round-trip, append-to-
existing, schema validation, first-run, threshold boundary,
zero-baseline, new-metric-in-current, sort-by-metric stability,
formatter output rendering.

OPEN #2's "audit_baselines lineage" half now closed. The
distillation package surface is at parity with the Rust pipeline:
scorer, scored runs, SFT export, audit baselines all available
on the Go side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:11:47 -05:00
root
7bb432f6c8 distillation: full SFT export port — closes OPEN #2 fully
Follow-up to b216b7e (which shipped the SFT export substrate). This
commit ports the synthesis logic, completing the migration:

- SynthesizeSft(scored, ev, recordedAt, sftID) → *SftSample
  Mirrors the Rust synthesizeSft byte-for-byte. Returns nil for
  extraction-class records + empty-text records (same skip
  semantics as Rust).
- LoadEvidenceByRunID(scoredPath, cache) reads the paired evidence
  JSONL (path derived by /scored-runs/ → /evidence/ replacement).
  Per-call cache so multiple scored-runs files in the same dir
  don't reload the same evidence.
- buildInstruction maps source_file stem → per-class instruction
  template. All 8 templates (scrum_reviews, mode_experiments,
  auto_apply, audits, observer_reviews, contract_analyses,
  outcomes, default) match Rust output exactly so a/b validation
  between runtimes can diff JSONL byte-for-byte.
- stemFromSourceFile strips data/_kb/ prefix + .jsonl suffix.
- ExportSft now writes data/distilled/sft/sft_export.jsonl with
  the synthesized samples (DryRun=true skips file write).

Per-class templates verified by 8-case sub-test:
- scrum_reviews → "Review the file '...' against the PRD..."
- mode_experiments → "Run task_class='...' for file..."
- auto_apply → "Auto-apply: emit a 6-line surgical patch..."
- audits with phase: prefix → strips to bare phase name
- observer_reviews → "Observer-review the latest attempt..."
- contract_analyses with permit: prefix → strips to permit ID
- outcomes → "Run scenario; report per-event outcome..."
- unknown source → "Source 'X' run; produce the appropriate output"

Caveat documented inline: contract_analyses uses ev.metadata.contractor
in Rust to produce "Analyze contractor 'X' for permit 'Y'" when
present. Go's EvidenceRecord doesn't carry a free-form metadata bag
yet, so we always emit the no-contractor form. Operators needing
contractor-aware instructions can extend EvidenceRecord with an
explicit Metadata field (separate ADR).

Test additions (5 new):
- TestSynthesizeSft_PerSourceClass: 8 sub-cases, one per template
- TestSynthesizeSft_RejectsExtraction: extraction-role records skipped
- TestSynthesizeSft_RejectsEmptyText: empty/whitespace text skipped
- TestSynthesizeSft_ContextAssembly: matrix + pathway + model
  context string formatting matches Rust " · " join
- TestExportSft_FullPort_WritesJSONL: end-to-end fixture, asserts
  output contains expected instruction + omits firewalled records

Pre-existing TestExportSft_PartialPort_FirewallFires renamed +
updated to TestExportSft_FirewallFiresBeforeEvidenceLoad — reflects
the new contract that records passing the firewall but lacking
evidence land in "not-instructable" rather than being silently
exported. Honest semantics shift documented in the test.

OPEN #2 now fully closed (was: substrate-only). The synthesis path
no longer requires the Rust pipeline to be invoked — Go-side
operators can run the full distillation export end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:06:57 -05:00
root
b216b7e5b6 fix the other 4: close all OPEN-list items in one wave
Substantial wave addressing all 4 prior OPEN items. Three closed in
full, one partially (the speculative half deliberately deferred).

OPEN #1 — Periodic fresh→main index merge (FULL):
- POST /v1/vectors/index/{src}/merge with {dest, clear_source}
- Idempotent on re-runs (existing-in-dest items skipped)
- internal/vectord/index.go: new Index.IDs() snapshot method +
  i.ids tracker field as canonical ID set, independent of meta
  map's nil-vs-{} sparseness (was a real bug — IDs() backed by meta
  alone missed items added with nil metadata)
- 4 cmd-level integration tests (happy path drain+clear, dim
  mismatch, dest not found, self-merge rejection) + 1 unit test
- DecodeIndex backward-compat: old envelopes restore i.ids from
  meta keys (best effort; new items going forward use the tracker)

OPEN #2 — Distillation SFT export (SUBSTRATE):
- internal/distillation/sft_export.go ports the load-bearing half:
  IsSftNever predicate + ListScoredRunFiles (data/scored-runs/YYYY/
  MM/DD walk) + LoadScoredRunsFromFile + partial ExportSft.
- Synthesis (instruction/input/response generation) deferred to a
  separate wave — too big for this session, but the substrate
  makes the next wave a port-not-design exercise.
- TestSftNever_PinsExpectedSet locks the contamination firewall
  set: if a future commit adds/removes from SftNever, this test
  fails — forcing the change through review.
- 5 new tests; firewall fires end-to-end through the partial port.

OPEN #3 — Distribution drift via PSI (FULL):
- internal/drift/drift.go: ComputeDistributionDrift via Population
  Stability Index. Standard finance/risk metric, well-defined
  verdict tiers (stable < 0.10, minor 0.10–0.25, major ≥ 0.25).
- Equal-width bucketing over combined min/max so neither dist
  falls outside; epsilon-clamping for empty buckets so log doesn't
  blow up. Per-bucket breakdown for drilldown.
- Pairs with the existing ComputeScorerDrift: scorer drift is
  categorical, distribution drift is continuous. Different shapes,
  same package.
- 7 new tests covering identical-is-stable, hard-shift-is-major,
  moderate-detected-not-stable, empty-inputs-safe, all-identical-
  safe, bucket-counts-conserved, num-buckets-clamping.

OPEN #4 — Ops nice-to-haves (PARTIAL — wall-clock done, others
deferred):
- (a) Real-time wall-clock for stress harness: per-phase elapsed
  time logged to stdout as it runs (`[stress] phase NAME starting
  (T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`).
  Output.PhaseTimings + Output.TotalElapsedMs in JSON.
- (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase
  calibration: not actioned — no fired trigger, would be
  speculative. Documented as deferred-until-need rather than
  ignored. Per the project's discipline ("don't add features
  beyond what the task requires").

OPEN list now empty / steady-state. Future items will land as
production triggers fire.

Build + vet + tests green; 18 new tests across the 4 closures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:42:11 -05:00
root
356d76b4b0 multi_coord_stress: thread role through matrix retrieve + playbook record
Real wire-up gap discovered post-scrum: Demand.Role was already
extracted at every call site in multi_coord_stress (44 occurrences,
both contract-driven and LLM-parsed inbox-triggered paths), but
neither matrixSearch nor playbookRecord accepted role in their
signatures. Cross-role gate (real_001..real_004 work) was bypassed
for the entire multi-coord harness — recordings and queries went
through with empty role, gate fell back to lenient behavior.

Fix:
- matrixSearchReq gains query_role field
- matrixSearch signature: (..., query, role string, ...)
- tracedSearch wrapper gains role param + emits it in span input
  metadata for Langfuse visibility
- playbookRecord signature: (..., query, role, ...) — body emits
  role only when non-empty (preserves backward compat at API)
- 14 call sites updated:
    contract-driven Demand loops → d.Role
    LLM-parsed inbox path → parsed.Role (qwen2.5 already extracts it)
    swap path (warehouseDemand) → warehouseDemand.Role
    reissue path → ev.Role (captured at original event time)
    fresh-verify (resume snippet, no role concept) → ""

Build clean, vet clean, all tests pass. Cross-role gate now fires
end-to-end across the multi-coord harness — matches the playbook_lift
harness's coverage from the original real_001 fix.

This closes the symmetric gap to scripts/playbook_lift's existing
wire-through. Both production-shape harnesses now exercise the role
gate; future reality tests automatically inherit the protection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:10:49 -05:00
root
cca32344f3 reality_test real_005: negation probe — substrate gap is correctly out-of-scope
5 explicit-negation queries ("Need Forklift Operators in Aurora IL,
NOT in Detroit", "excluding Cornerstone Fabrication roster", etc.)
through the standard playbook_lift harness. Goal: characterize
whether the substrate has negation handling or silently treats
"NOT X" as "X".

Headline: substrate has zero negation handling. Cosine on dense
embeddings tokenizes "NOT in Detroit" identical to "in Detroit"
plus noise — there is no logical-quantifier representation in the
embedding space. This is a structural property of dense embeddings,
not a substrate bug.

Per-query observations:
- Q1 (Aurora IL, NOT Detroit): all top-10 rated 1-2/5 by judge
- Q2 (NOT Beacon Freight): top-1 rated 4/5 — accidentally OK
  because role+city signal pulled non-Beacon worker naturally
- Q3 (excluding Cornerstone): unanimous 1/5 across top-10
- Q4 (NOT Detroit-area): all top-10 rated 1-2/5
- Q5 (exclude Heritage Foods): top-1 rated 4/5 — accidentally OK

The judge IS the safety net: when retrieval can't honor the
constraint, the judge refuses to approve any result. That's the
honesty signal — `discovery=0` for the run aggregates it.

No code change. The architectural answer for production is:
- UI surfaces an "exclude" affordance that populates ExcludeIDs
  (already supported, added in multi-coord stress 200-worker swap)
- Coordinators don't type natural-language negation — they click
- Substrate's role: surface honesty signal (judge ratings) + don't
  pretend to honor unparseable constraints

Adding NL-negation handling at the substrate level would be product
debt — it would let coordinators type sloppier queries that
silently fail when the LLM extractor misses a phrasing. Don't ship
until production traffic demonstrates demand for it.

Findings: reports/reality-tests/real_005_findings.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:06:06 -05:00
root
434f466288 matrix: roleNormalize allowlist for non-plural-s tokens (scrum role_gate_v1)
3-lineage scrum review of the role-gate work (commits 7f2f112..0331288)
ran Opus 4.7 / Kimi K2.6 / Qwen3-coder via scripts/scrum_review.sh.
All three flagged the same edge case: the homegrown plural-stripper
in roleNormalize would collapse non-plural-s tokens like "Sales" →
"Sale", "Logistics" → "Logistic", "Operations" → "Operation". In a
staffing domain those are real role names; the silent normalization
would have caused false role-equality matches and re-opened the
cross-role bleed for those clusters.

Fix:
- nonPluralSWords allowlist for known staffing-domain non-plural-s
  tokens (sales, logistics, operations, facilities, premises, news,
  physics, economics, mathematics, analytics).
- Last-word-only stripping ("Sales Associate" stays whole; only
  "Associates" head noun is plural-checked).
- -ss ending check so "Press Operator", "Boss" don't lose their s.
- strings.ToLower + strings.TrimSpace replace the homegrown rune-
  loop ASCII normalizer (Opus INFO — minor cleanup, folded in).

Tests:
- TestRoleNormalize_NonPluralS: 18 cases covering the allowlist,
  -ss ending, real plurals (Operators → Operator, Boxes → Box),
  multi-word real plurals (Forklift Operators → forklift operator),
  whitespace/case tolerance.
- TestRoleEqual_NonPluralS: gate-level pairing — proves equal-
  shape allowlisted tokens compare equal AND that "Sales" ≠ "Sale"
  (the original bug shape).
- Existing TestRoleEqual_PluralAndCase still green (refactor
  preserved behavior).

Other scrum findings dispositioned (not actioned):
- Opus WARN on empty-role fail-open semantics: documented
  backward-compat behavior; production path closes via opt-in LLM
  extractor (real_004).
- Opus INFO on unsynchronized package-global cache map: harness is
  single-goroutine; add sync.Mutex when/if it parallelizes.
- Opus INFO on parallel constructor (NewPlaybookEntryWithRole vs
  optional arg): API smell only, both forms preserved.
- Kimi 2 BLOCKs (NewPlaybookEntryWithRole missing, ApplyPlaybookBoost
  signature breakage): FALSE positives. Pre-push smoke chain green
  on 0331288, both symbols + all call sites compile clean. Matches
  feedback_cross_lineage_review.md's documented Kimi truncation
  behavior — Kimi BLOCKs warrant trace verification before action.

Disposition (local): reports/scrum/_evidence/2026-04-30/verdicts/role_gate_v1_disposition.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:58:02 -05:00
root
0331288641 playbook_lift: LLM-based role extractor closes shorthand bleed (real_004)
real_003 left a known-weak hole: shorthand-style queries
("{count} {role} {city} {state} ...") have no separator between
role and city, so a regex can't reliably extract — leaving the
cross-role gate disabled when both record AND query are shorthand.

This commit adds a roleExtractor with regex-first + LLM fallback:

- Regex first (fast, deterministic) — handles need + client_first +
  looking from real_003b. ~75% of styles, no LLM cost paid.
- LLM fallback when regex returns empty AND model is configured —
  Ollama-shape /api/chat with format=json, schema-tight prompt,
  temperature 0. ~1-3s on local qwen2.5.
- Per-process cache — paraphrase + rejudge passes reuse the same
  query 4× per run; cache prevents 4× LLM cost.
- Off-by-default — opt-in via -llm-role-extract flag (CLI) and
  LLM_ROLE_EXTRACT=1 env var (harness wrapper). real_003b shipping
  config unchanged unless explicitly enabled.

8 new tests in scripts/playbook_lift/main_test.go:
- TestRoleExtractor_RegexFirst: LLM not called when regex matches
- TestRoleExtractor_LLMFallback: shorthand goes to LLM
- TestRoleExtractor_LLMOffLeavesEmpty: opt-in default preserved
- TestRoleExtractor_Cache: 3 calls = 1 LLM hit
- TestRoleExtractor_NilSafe: nil receiver runs regex only
- TestExtractRoleViaLLM_HTTPError + _BadJSON: failure paths
- TestRoleExtractor_ClosesCrossRoleShorthandBleed: synthetic
  witness for the real_003 scenario — both record + query are
  shorthand, regex returns "" for both, LLM produces DIFFERENT
  role tokens for CNC vs Forklift, so matrix gate's cross-role
  rejection (locked separately in
  TestInjectPlaybookMisses_RoleGateRejectsCrossRole) fires
  correctly. This is the load-bearing verification.

Reality test real_004 ran the same 40-query stress as real_003 with
LLM extraction on. Cross-style same-role boosts fired correctly
across all 4 styles for Loaders + Packers + Shipping Clerk clusters
(including shorthand → other-style transfer). No cross-role bleed
observed. The reality test alone can't be a clean "with vs without"
comparison (HNSW build is non-deterministic across runs, and
real_004 stochastics didn't trigger a shorthand recording at all),
which is why the unit-test witness exists.

Production note (in real_004_findings.md): LLM extraction is for
reality-test coverage of arbitrary query shapes. Production should
extract role at INGEST time (when the inbox parser already runs an
LLM) and pass already-resolved role through requests — same shape
as multi_coord_stress's existing Demand{Role: ...} model. The hot
path should never need the harness extractor's per-query LLM cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:51:27 -05:00
root
3263254f1c reality_test real_003: 40-query paraphrase stress + extractor extension
Stress-tests the role gate with 40 queries (10 fill_events rows × 4
styles): need, client_first, looking, shorthand. Each row's role +
client + city stays the same; only the surface phrasing changes.

real_003 (original extractor) confirmed the shorthand-vs-shorthand
failure mode: CNC Operator shorthand recording leaked w-2404 onto
Forklift Operator shorthand query within the same Beacon Freight
Detroit cluster. Both record + query had empty role (extractor
returns "" for shorthand because there's no separator between role
and city), gate disabled, distance check passed, bleed fired.

Fix: extended extractRoleFromNeed to handle client_first
("{client} needs N {role} in...") and looking ("Looking for N
{role} at...") patterns. Shorthand left intentionally unmatched —
"Forklift Operator Detroit" is shape-indistinguishable from
"Forklift" + "Operator Detroit" without an LLM extractor or known-
cities lookup.

real_003b (extended extractor) verifies bleed closed across all 4
styles for this dataset. Forklift Operator queries keep w-2136 (the
cold-pass-correct match) regardless of which style the query came
in. Same-role boosts now fire correctly across styles — a CNC
Operator recording made in `looking` style boosts the CNC need-form
query.

scripts/cutover/gen_real_queries.go: added -styles flag with values
need|client_first|looking|shorthand|all (default need preserves
real_001/002 behavior). Tests/reality/real_coord_queries_v2.txt is
the 40-query stress file.

scripts/playbook_lift/main_test.go: 10 sub-tests lock the four
documented patterns + shorthand limitation + lift-suite-style
queries (no clean role, returns empty as expected).

Aggregate metrics:
- real_003  (original): disc=7,  lift=7,  boost=14, meanΔ=-0.108
- real_003b (extended): disc=11, lift=10, boost=31, meanΔ=-0.202
The growth reflects more LEGITIMATE same-role same-cluster transfer
firing across styles, not bleed (verified by per-cluster bleed
table — Forklift Operator queries unchanged across all 4 styles).

Known limitation documented in real_003_findings.md: same-cluster,
same-role queries in shorthand still embed close enough that a
shorthand recording could bleed onto a different-role shorthand
query if both record + query strip role. Closing this requires
LLM extraction or known-cities lookup at record + query time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:42:02 -05:00
root
997527be4d matrix: cross-role playbook gate — closes real_001 bleed (OPEN #1)
real_001 surfaced same-client+city queries bleeding across roles:
Q#2 (Forklift Operator @ Beacon Freight Detroit) recorded e-6193
in the playbook corpus. Q#5 (Pickers same client+city) and Q#10
(CNC Operator same client+city) embedded within 0.13-0.18 cosine of
Q#2's query — well inside the 0.20 inject threshold — so e-6193
injected on both, demoting the cold-pass-correct workers.

Root cause: the inject distance threshold isn't tight enough on
the same-client+city cluster. Cosine collapses queries that share
city + client + count-token + time-token regardless of role. The
existing judge gate is per-injection at record time and doesn't
fire at retrieve time.

Fix: structural role gate in front of both Shape A boost and
Shape B inject. PlaybookEntry gains Role; SearchRequest gains
QueryRole. When both are non-empty and differ under roleEqual's
case+plural normalization, the entry is rejected before BoostFactor
or judge-gate logic runs.

Backward-compat: empty role on either side disables the gate —
preserves behavior for the lift suite's free-form multi-constraint
queries that have no clean single role. Caller-supplied (not
inferred), so existing recordings unaffected.

Wire-through:
- internal/matrix/playbook.go: Role field, NewPlaybookEntryWithRole,
  roleEqual helper with plural+case normalization
- internal/matrix/retrieve.go: QueryRole on SearchRequest, threaded
  to both ApplyPlaybookBoost + InjectPlaybookMisses
- cmd/matrixd/main.go: role on POST /matrix/playbooks/record + bulk
- scripts/playbook_lift/main.go: extractRoleFromNeed regex pulls
  role from "Need N {role}{s} in" queries (the fill_events shape);
  free-form queries fall back to empty (gate disabled)

Tests (5 new):
- TestInjectPlaybookMisses_RoleGateRejectsCrossRole: exact Q#10
  scenario (distance 0.135, recorded "Forklift Operator", query
  "CNC Operator") — locks the bleed at unit level
- TestInjectPlaybookMisses_RoleGateAllowsSameRole: Forklift Operator
  recording fires on Forklift Operators query (plural normalization)
- TestInjectPlaybookMisses_RoleGateBackwardCompat: empty Role on
  either side = gate disabled, preserves current behavior
- TestApplyPlaybookBoost_RoleGateRejectsCrossRole: Shape A defense
  in depth — boost doesn't fire on cross-role even when answer is
  in cold top-K
- TestRoleEqual_PluralAndCase: case + -s + -es plural normalization

Verification (real_002, same query set as real_001):
- Q#5 Pickers @ Beacon Freight: e-6193 → e-8499 (no bleed)
- Q#10 CNC Operator @ Beacon Freight: e-6193 → w-2404 (no bleed)
- Discoveries + lifts unchanged at 2 each (same-role lift still fires)
- Mean Δdist tightens from -0.127 to -0.040 (boosts no longer
  pulling distances through the floor on cross-role mismatches)

Findings: reports/reality-tests/real_002_findings.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:34:10 -05:00
root
7f2f112e6a reality_test real_001: real-shape coordinator queries — surfaces cross-role bleed
First retrieval probe with non-synthetic query distribution. Pulls
N rows from /home/profit/lakehouse/data/datasets/fill_events.parquet
(real-shape demand data) and translates each to the natural language
a coordinator would type: "Need {count} {role}s in {city} {state}
starting at {at} for {client}".

Headline: 8/10 cold-pass top-1 = judge-best on real distribution.
Substrate works on queries it was never trained for. v2-moe + workers
corpus carry the load.

Surfaced finding (the real value of running this): same-client+city
queries cluster, and Shape A's distance boost bleeds across roles
within the cluster. Q#2 (Forklift @ Beacon Freight Detroit) records
e-6193 in the playbook corpus. Q#5 (Pickers same client+city) and
Q#10 (CNC Operator same client+city) inherit e-6193 at warm top-1
even though:
- Neither query has its own recorded playbook.
- Neither warm pass triggers a Shape B inject (boosted=0).
- The roles are different staffing categories.

Q#10 specifically demoted the cold-pass-correct w-3759 (judge rating
4 at rank 0) for a worker who was approved by the judge for a
different role on a different query.

Why the lift suite missed it: synthetic queries use 7 disjoint
scenario buckets (forklift+OSHA+WI / CDL+IL / etc.). Real demand
clusters on (client, city). The cluster doesn't exist in the
synthetic distribution.

Why the judge gate doesn't catch it: the gate (5a3364f) is
per-injection at record time. After approval the worker rides Shape A
distance boosts on all later same-cluster queries with no second
gate call.

Becomes new OPEN #1. Fix candidate: role-scoped playbook corpus
metadata + Shape A boost gate on role match. Cheap; doesn't need
new judge calls.

Files:
- scripts/cutover/gen_real_queries.go: parquet → coordinator NL
- tests/reality/real_coord_queries.txt: 10 generated queries
- reports/reality-tests/playbook_lift_real_001.md: harness output
- reports/reality-tests/real_001_findings.md: the reading

Repro:
  go run scripts/cutover/gen_real_queries.go -limit 10 > tests/reality/real_coord_queries.txt
  QUERIES_FILE=tests/reality/real_coord_queries.txt RUN_ID=real_001 \
    WITH_PARAPHRASE=0 WITH_REJUDGE=0 ./scripts/playbook_lift.sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:18:40 -05:00
root
5687ec65c2 G5 cutover prep: embed parity probe — Rust /ai/embed ↔ Go /v1/embed verified
First concrete cutover artifact: scripts/cutover/embed_parity.sh
brings up Go embedd + gateway alongside the live Rust gateway,
hits both /ai/embed and /v1/embed with the same forced model, and
emits a per-date verdict report under reports/cutover/.

Why embed first: the parity invariant is one math identity (cosine
sim of vectors against same input). Retrieve has thousands of edge
cases. If embed parity holds, all downstream vector consumers
inherit confidence; if it doesn't, we catch it in 30s instead of
after a flip.

Verdict 2026-04-30: 5/5 samples cosine=1.000000 with model forced
to nomic-embed-text (v1). Same with nomic-embed-text-v2-moe (both
Ollamas have it loaded). Math is provably equivalent across the
gateway plumbing.

Drift catalog (reports/cutover/SUMMARY.md):
- URL: Rust /ai/embed vs Go /v1/embed
- Wire: Rust {embeddings, dimensions} (plural) vs Go {vectors,
  dimension} (singular). Wire-format adapter is the only real
  cutover work for this endpoint.
- L2 norm: Rust unit vectors (~1.0); Go raw Ollama (~20-23). Same
  direction (cos=1.0); harmless under cosine-distance HNSW (which
  is Go vectord's default), but worth fixing in internal/embed/
  before extending to euclidean indexes.

reports/cutover/ now tracked (joined the scrum/ + reality-tests/
exemptions in .gitignore).

Next probe: /v1/matrix/retrieve ↔ Rust /vectors/hybrid for the
real user-facing retrieve path. Embed parity gives that probe a
clean foundation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:07:04 -05:00
root
a2fa9a2ce7 scripts/scrum_review: pipe diff via temp files — fixes argv overflow on large bundles
`jq --arg` and `curl --data-binary @-` both read stdin/argv-bound
buffers. Diffs >~128KB blow past the kernel's argv limit even when
piped via stdin (because we still build `body` as a shell variable
first, then feed it to curl). Voice-ai full bundle was 156K and
hit it.

Switch to writing user/system/body to mktemp files, jq reads via
--rawfile, curl reads via @file. Same on-the-wire shape, no argv
involvement. Cleanup with rm at the end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:57:34 -05:00
root
68d9e554b0 shared: auto-emit Langfuse trace+span per HTTP request — closes OPEN #2
Adds langfuseMiddleware in internal/shared so every daemon's
shared.Run gets free production-traffic trace visibility when
LANGFUSE_URL + LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY are set.
Same env names + file shape as the multi_coord_stress driver, so
operators ship one /etc/lakehouse/langfuse.env across the deploy.

Wiring is auth-gated: middleware runs INSIDE the RequireAuth group,
so 401s from credential-stuffing don't pollute traces. /health is
exempt so LB probes don't either. Missing env vars → nil client →
middleware is a passthrough no-op (fail-open per ADR-005 5.1).

Bundled deploy:
- langfuse.env.example template (mode 0640, root:lakehouse)
- 11 systemd units gain `EnvironmentFile=-/etc/lakehouse/langfuse.env`
  (leading - so missing file = OK)
- REPLICATION.md bootstrap section documents setup

Tests (4): nil passthrough, /health bypass, real-request emission,
status-writer wrapping. All green.

STATE_OF_PLAY OPEN list: 5 rows → 4 rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:55:42 -05:00
root
5a3364f539 matrix: judge-gated Shape B inject — closes lift-suite tail issues
Lift suite run #004 left two unresolved tail issues:
- Q6 ("Forklift loader") ↔ Q7 ("Hazmat warehouse, cold storage")
  swap recordings as warm top-1 because their embeddings are within
  0.20 cosine of each other. Distance gate can't tell them apart.
- Q9 + Q15 lose paraphrase recovery when qwen2.5 rephrases past the
  0.20 threshold. Distance says "drift too far"; sometimes the drift
  is real (skip), sometimes the paraphrase is still on-domain (don't
  want to skip).

Multi-coord run #008's judge re-rating proved the LLM can
distinguish: Q3 crane case landed at distance 0.23 (looks tight)
but rating 1 (irrelevant). The judge sees domain mismatch the
embedder doesn't.

This commit lifts that pattern into the matrix substrate. Shape B
inject now optionally routes every candidate through a judge gate
before the rank insert lands. Distance + judge BOTH have to approve.

internal/matrix/playbook.go:
- InjectPlaybookMisses signature gains a query string + an
  optional InjectGate. nil gate preserves pre-judge-gating
  behavior (current tests already pass with nil).
- New InjectGate interface + InjectGateFunc adapter for tests
  and non-LLM callers.
- Per-candidate gate.Approve(query, hit) call inserted between
  the dedup and the inject. Rejected candidates skip silently;
  injected count reflects post-gate decision.

internal/matrix/judge.go (new, ~140 lines):
- LLMJudgeGate calls an Ollama-shape /api/chat endpoint with the
  same 1-5 staffing-rubric prompt that worked in multi_coord
  run #008. fail-closed on HTTP/JSON errors (don't inject if
  judge can't speak — better miss than wrong-domain).
- NewLLMJudgeGate returns nil when URL or Model is empty,
  matching InjectGate's nil-means-no-judge semantics.

internal/matrix/retrieve.go:
- SearchRequest gains JudgeURL, JudgeModel, JudgeMinRating
  fields. Run() builds an LLMJudgeGate when set; passes nil
  otherwise. Backward compatible — existing callers see no
  behavior change.

Tests:
- TestInjectPlaybookMisses_GateRejectsCandidate (rejectAll → 0
  injected, even with tight distance)
- TestInjectPlaybookMisses_GateApprovesCandidate (approveAll →
  same as nil-gate behavior)
- TestInjectPlaybookMisses_GateSeesCorrectQuery (gate receives
  CURRENT query + RECORDED query separately so it can score
  the (current, candidate) pair)
- All 5 existing inject tests updated to new signature

go test ./internal/matrix → all 8 inject tests pass.
go test ./internal/matrix ./internal/shared ./cmd/{matrixd,
queryd,pathwayd,observerd} → all green.

STATE_OF_PLAY:
- OPEN item #1 (judge-gated injection) closed.
- DO NOT RELITIGATE adds the substrate-level judge-gate lock.
- OPEN list now 5 rows (was 6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:38:12 -05:00
root
247e36e687 STATE_OF_PLAY: trim OPEN list — 9 rows → 6, ordered by product leverage
Sprint 4 row removed (shipped: a59ef5b systemd + 54a05d9 docker).
ADR-006 row already dropped on the previous STATE update.

Two lift-suite tail items (Q6↔Q7 adjacent-query, Q9/Q15 liberal-
paraphrase) consolidated into one "judge-gated playbook injection"
row — both are downstream of the same fix (let the judge approve
before Shape B inserts). Captures the design lineage from
multi-coord run #008's judge-rating pattern.

Three items folded into a single "operational nice-to-haves" row:
real-time clock, chatd fixture storage half, liberal-paraphrase
calibration. None are product-blocking; each lights up when
someone hits its specific trigger.

Reorder reflects leverage on the active product theory (multi-
coord staffing co-pilot via the 5-loop substrate), not effort:
1. Judge-gated injection (lift quality + lift-tail closure)
2. Wider Langfuse instrumentation (production observability)
3. Fresh→main merge (operational hygiene as the corpus grows)
4. Distillation full port (production dependency, not yet)
5. Drift quantification (research)
6. Operational nice-to-haves

Lead-in note added: "Items move to closed when the work demands
them, not on a calendar." Locks intent against future drift toward
a sprawling todo list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:32:31 -05:00
root
54a05d9311 Sprint 4 deployment artifacts: Dockerfile + docker-compose
Parallel deploy target to the systemd units that landed in a59ef5b.
Single image carries all 11 daemons; docker-compose runs one
container per daemon with the same dependency graph as the systemd
units. Useful when systemd isn't available (Mac dev, remote VMs
without root) or when isolation to a private docker network is
preferred.

Dockerfile (multi-stage):
- Builder: golang:1.25-bookworm. DuckDB cgo needs gcc + glibc;
  alpine's musl doesn't link the official duckdb-go bindings cleanly.
- Runtime: debian:bookworm-slim — same libc, much smaller surface.
  Adds ca-certificates (outbound HTTPS to OpenRouter/OpenCode/Kimi),
  curl + jq (in-container healthchecks + smoke probes), tini (PID 1
  signal forwarding so docker stop sends SIGTERM to the daemon, not
  to a wrapper).
- Single image, multiple binaries. Ships all 11 cmd/* + 3 scripts/
  (staffing_workers, playbook_lift, multi_coord_stress) so deployed
  stacks can run reality tests against themselves.
- Non-root runtime user (uid 999 lakehouse). Layout matches
  /usr/local/bin/lakehouse/<daemon> from REPLICATION.md.
- ENTRYPOINT=tini; no default CMD — operators / compose pick
  which daemon explicitly.

docker-compose.yml (11 services):
- Same dependency graph as deploy/systemd/. depends_on with
  service_healthy condition matches Requires= equivalents:
    catalogd → storaged
    ingestd → storaged + catalogd
    queryd → catalogd
    matrixd → embedd + vectord
- Gateway uses bare depends_on (no health condition) — Wants=
  equivalent so single-upstream restart doesn't cascade.
- chatd has per-provider env_file entries (one each for
  ollama_cloud, openrouter, opencode, kimi) — missing files are
  silently OK, matching the systemd unit's EnvironmentFile=- list.
- Persistent state on the lakehouse-state named volume; commented
  driver_opts shows how to bind to a host path for off-volume
  backups.

.dockerignore:
- Excludes bin/ + reports/ + data/ + git metadata + .env files.
- Especially excludes lakehouse.toml/secrets-go.toml/auth.env so
  local dev configs don't accidentally bake into a published image.

REPLICATION.md gains a Docker section between systemd setup and
the logs section. Ten-line copy-paste from "git clone" to
"docker compose up -d", plus a docker-vs-systemd differences
table covering process supervision, logs, restart policy, file
ownership, host networking quirks, and backup targets.

Validation: docker compose config --quiet → exit 0 (with
placeholder env files in place).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:58:47 -05:00