golangLAKEHOUSE/lakehouse.toml
root 1a3a82aedb validatord: coordinator session JSONL for offline analysis (B follow-up)
Closes the second half of J's 2026-05-02 multi-call observability
concern. Trace-id propagation (commit d6d2fdf) gave us the *live*
view in Langfuse; this gives us the *longitudinal* view for ad-hoc
DuckDB queries over thousands of sessions:

  "show me every session where the model produced a real candidate
   without ever needing a retry"
  "find sessions where validation rejected three times in a row"
  "first-shot success rate per model — did we feed it enough corpus?"

## What's in

internal/validator/session_log.go:
  - SessionRecord type (schema=session.iterate.v1)
  - SessionLogger writer — mutex-guarded append, best-effort posture,
    nil-safe (NewSessionLogger("") = nil = no-op on Append)
  - BuildSessionRecord helper — assembles a row from any
    iterate response/failure/infra-error combination, callable from
    other daemons that wrap iterate (cross-daemon shared schema)
  - 7 unit tests including concurrent-append safety + the three
    code paths (success / max_iter_exhausted / infra_error)

cmd/validatord/main.go:
  - handlers.sessionLog field + wiring from cfg.Validatord.SessionLogPath
  - Iterate handler: build + append a SessionRecord on every call
  - rosterCheckFor("fill") closure stamps grounded_in_roster — the
    load-bearing forensic property J flagged ("we can never
    hallucinate available staff members to contracts")

internal/shared/config.go + lakehouse.toml:
  - [validatord].session_log_path field; empty = disabled
  - Production: /var/lib/lakehouse/validator/sessions.jsonl

scripts/validatord_smoke.sh:
  - Adds a probe verifying validatord announces session log path on
    startup. Smoke is now 6/6 (was 5/5).

docs/SESSION_LOG.md:
  - Schema reference + 5 worked DuckDB query examples including the
    "alarm" query (sessions where grounded_in_roster=false on an
    accepted fill — should always be empty; if not, something is
    bypassing FillValidator).

## What this is NOT

This is NOT a duplicate of replay_runs.jsonl. They're siblings:
  - replay_runs.jsonl: replay tool's per-task retrieval+model output
  - sessions.jsonl: validatord's per-iterate full retry chain +
    grounded-in-roster verdict

A single coordinator session can produce rows in both streams; the
session_id (= Langfuse trace_id) is the join key.

## Layered observability now in place

  Live view:  Langfuse trace tree (X-Lakehouse-Trace-Id propagation)
              `iterate.attempt[N]` spans with prompt/raw/verdict
  Offline:    coordinator_sessions.jsonl (this commit)
              DuckDB-queryable; longitudinal forensics
  Hard gate:  FillValidator + WorkerLookup (existing)
              phantom IDs structurally rejected, never reach
              session log's grounded_in_roster=true bucket

Per the architecture invariant in STATE_OF_PLAY's DO NOT RELITIGATE
section — these layers are wired; future work targets the data, not
the wiring.

## Verification

- internal/validator: 7 new tests (session_log_test.go) — all PASS
- cmd/validatord: 3 new integration tests covering the success,
  failure, and grounded=false paths — all PASS
- validatord_smoke.sh: 6/6 PASS through gateway :3110
- Full go test ./... green across 33 packages

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:22:09 -05:00

182 lines
7.2 KiB
TOML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Lakehouse-Go config — G0 dev defaults. Overrides via env are a
# G1+ concern; for G0 edit this file and restart the affected service.
# G0 dev ports — shifted to 3110+ so the Go services run alongside
# the live Rust lakehouse on 3100/3201-3204 without colliding. G5
# (demo cutover) flips gateway back to 3100 when Rust retires.
[gateway]
bind = "127.0.0.1:3110"
storaged_url = "http://127.0.0.1:3211"
catalogd_url = "http://127.0.0.1:3212"
ingestd_url = "http://127.0.0.1:3213"
queryd_url = "http://127.0.0.1:3214"
vectord_url = "http://127.0.0.1:3215"
embedd_url = "http://127.0.0.1:3216"
pathwayd_url = "http://127.0.0.1:3217"
matrixd_url = "http://127.0.0.1:3218"
observerd_url = "http://127.0.0.1:3219"
chatd_url = "http://127.0.0.1:3220"
validatord_url = "http://127.0.0.1:3221"
[storaged]
bind = "127.0.0.1:3211"
[catalogd]
bind = "127.0.0.1:3212"
storaged_url = "http://127.0.0.1:3211"
[ingestd]
bind = "127.0.0.1:3213"
storaged_url = "http://127.0.0.1:3211"
catalogd_url = "http://127.0.0.1:3212"
# CSV uploads are ~4-6× the resulting Parquet. 256 MiB cap keeps the in-memory
# parse + Arrow + Parquet output footprint bounded. Bump for known large
# datasets (e.g. workers_500k → 344 MiB CSV needs 512 MiB).
max_ingest_bytes = 268435456
[vectord]
bind = "127.0.0.1:3215"
# Optional — set to empty string to disable persistence (dev/test).
storaged_url = "http://127.0.0.1:3211"
[embedd]
bind = "127.0.0.1:3216"
# G2: Ollama local. G3+ may swap in OpenAI/Voyage by changing
# this URL + the wire format inside the provider.
provider_url = "http://localhost:11434"
default_model = "nomic-embed-text-v2-moe"
[queryd]
bind = "127.0.0.1:3214"
catalogd_url = "http://127.0.0.1:3212"
secrets_path = "/etc/lakehouse/secrets-go.toml"
refresh_every = "30s"
[pathwayd]
bind = "127.0.0.1:3217"
# Empty = in-memory only (dev/test). Production sets a path under
# /var/lib/lakehouse/pathway/state.jsonl so traces survive restart.
persist_path = ""
[matrixd]
bind = "127.0.0.1:3218"
# matrixd calls embedd (query-text → vector) and vectord (per-corpus
# search) directly. Localhost defaults; in distributed deployments
# these point at the gateway's upstream addresses.
embedd_url = "http://127.0.0.1:3216"
vectord_url = "http://127.0.0.1:3215"
[observerd]
bind = "127.0.0.1:3219"
# Empty = in-memory only (dev/test). Production sets a path under
# /var/lib/lakehouse/observer/ops.jsonl so ops survive restart.
persist_path = ""
[chatd]
# LLM chat dispatcher (Phase 4). Routes /v1/chat to the right provider
# based on model name prefix or :cloud suffix:
# ollama/<model> → local Ollama (no auth)
# ollama_cloud/<model> → Ollama Cloud
# <model>:cloud → Ollama Cloud (suffix variant)
# openrouter/<vendor>/<m> → OpenRouter
# opencode/<model> → OpenCode (Zen+Go unified)
# kimi/<model> → Kimi For Coding
# bare names (e.g. qwen3.5:latest) → local Ollama (default)
bind = "127.0.0.1:3220"
ollama_url = "http://localhost:11434"
# Per-provider key resolution: env var first, then .env file fallback.
# Empty file path skips the file lookup. systemd EnvironmentFile=
# loads these natively, so service runtime sees the env vars.
ollama_cloud_key_env = "OLLAMA_CLOUD_KEY"
openrouter_key_env = "OPENROUTER_API_KEY"
opencode_key_env = "OPENCODE_API_KEY"
kimi_key_env = "KIMI_API_KEY"
ollama_cloud_key_file = "/etc/lakehouse/ollama_cloud.env"
openrouter_key_file = "/etc/lakehouse/openrouter.env"
opencode_key_file = "/etc/lakehouse/opencode.env"
kimi_key_file = "/etc/lakehouse/kimi.env"
# Per-call timeout (seconds). Cloud reasoning models can take >60s
# for long prompts, so 180 is the default.
timeout_secs = 180
[validatord]
# Production-validator network surface (Phase 43 PRD parity).
# Hosts /validate (FillValidator + EmailValidator + PlaybookValidator)
# and /iterate (generate→validate→correct loop).
bind = "127.0.0.1:3221"
chatd_url = "http://127.0.0.1:3220"
# Roster of valid workers. Empty = no roster — worker-existence checks
# all fail Consistency (correct fail-closed posture). Production points
# at a path regenerated from workers_500k.parquet on a schedule:
# roster_path = "/var/lib/lakehouse/validator/roster.jsonl"
roster_path = ""
# Per-call cap on the iteration loop (Phase 43 default: 3).
default_max_iterations = 3
# Per-call cap on chat hop max_tokens.
default_max_tokens = 4096
# Chat hop timeout (seconds). 240s tolerates frontier reasoning models.
chat_timeout_secs = 240
# Session log: where to append one JSONL row per /v1/iterate session
# for offline DuckDB analysis. Empty = disabled. Production:
# session_log_path = "/var/lib/lakehouse/validator/sessions.jsonl"
# Each row: schema=session.iterate.v1, session_id (= Langfuse trace_id),
# kind, model, iterations, attempts[], final_verdict, grounded_in_roster,
# duration_ms. See internal/validator/session_log.go.
session_log_path = ""
[s3]
endpoint = "http://localhost:9000"
region = "us-east-1"
bucket = "lakehouse-go-primary" # G0 dedicated bucket so Rust + Go coexist
access_key_id = "" # populated by SecretsProvider from /etc/lakehouse/secrets-go.toml
secret_access_key = "" # ditto
use_path_style = true
[log]
level = "info"
# Model tier registry — names map to actual model IDs per the small-
# model pipeline architecture (project_small_model_pipeline_vision.md).
# Bumping a model means editing one line here, not hunting through code.
#
# Tier philosophy:
# - local_* : on-box Ollama. Inner-loop hot path. Repeated calls.
# - cloud_* : Ollama Cloud (Pro plan). Larger context, fail-up tier.
# - frontier_* : OpenRouter / OpenCode. Rate-limited, billed per call.
# Reserved for blockers and full-scope reviews.
#
# weak_models is the codified "local-hot-path eligible" list that the
# matrix.downgrade gate reads. A model in this list bypasses the
# strong-model downgrade rule (it's already weak — no need to downgrade
# corpora further).
[models]
# Tier 1 — local hot path
local_fast = "qwen3.5:latest"
local_embed = "nomic-embed-text-v2-moe" # 475M MoE, drop-in upgrade from 137M v1 — verified 2026-04-30 same 768-dim
# local_judge stays on qwen2.5:latest — qwen3.5:latest is a vision-SSM
# build with 256K context that runs ~30s per judge call against the
# playbook_lift loop (verified 2026-04-30). qwen2.5:latest at ~1s/call
# is 30× faster and held lift theory across the 21-query reality test
# (7/8 lift, 87.5%). The 8de94eb "bump qwen2.5 → qwen3.5" was a casual
# version-up; this revert is workload-specific.
local_judge = "qwen2.5:latest"
local_review = "qwen3.5:latest"
# Tier 2 — Ollama Cloud (Pro). kimi-k2:1t still upstream-broken;
# deepseek/kimi-k2.6/qwen3-coder are the working primaries.
cloud_judge = "kimi-k2.6:cloud"
cloud_review = "qwen3-coder:480b"
cloud_strong = "deepseek-v3.2"
# Tier 3 — frontier. Use sparingly; rate-limited + per-call billing.
frontier_review = "openrouter/anthropic/claude-opus-4-7"
frontier_arch = "openrouter/moonshotai/kimi-k2-0905"
frontier_strong = "openrouter/openai/gpt-5"
frontier_free = "opencode/claude-opus-4-7"
# Local-hot-path eligible — matrix.downgrade bypass list.
weak_models = ["qwen3.5:latest", "qwen3:latest"]