lakehouse/lakehouse.toml
root 57bde63a06
Some checks failed
lakehouse/auditor 10 blocking issues: cloud: claim not backed — "Verified end-to-end against persistent Go stack on :4110:"
gateway: trace-id propagation + coordinator session JSONL (Rust parity)
Cross-runtime parity with the Go-side observability wave (commits
d6d2fdf + 1a3a82a in golangLAKEHOUSE). The two layers J flagged:
the LIVE per-call view (Langfuse) and the LONGITUDINAL forensic view
(JSONL queryable via DuckDB). Hard correctness gate (FillValidator
phantom-rejection) was already in place; this is the observability
on top.

## Trace-id propagation

X-Lakehouse-Trace-Id header constant declared in
crates/gateway/src/v1/iterate.rs (matches Go's shared.TraceIDHeader
byte-for-byte). When set on an inbound /v1/iterate request, the
handler reuses it; the chat + validate self-loopback hops forward
the same header so chatd's trace emit nests under the parent rather
than minting a fresh top-level trace per call.

ChatTrace gains a parent_trace_id field. emit_chat_inner skips the
trace-create event when parent is set, only emits the
generation-create which attaches to the existing trace tree. Result:
an iterate session with N retries shows in Langfuse as ONE tree, not
N+1 disconnected traces.

emit_attempt_span (new) writes one Langfuse span per iteration
attempt with input={iteration, model, provider, prompt} and
output={verdict, raw, error}. WARNING level on non-accepted
verdicts. The returned span id is stamped on the corresponding
SessionRecord attempt for cross-log correlation.

## Coordinator session JSONL

crates/gateway/src/v1/session_log.rs — new writer matching Go's
internal/validator/session_log.go schema byte-for-byte:
  - SessionRecord with schema=session.iterate.v1
  - SessionAttemptRecord per retry
  - SessionLogger.append: tokio Mutex serialized append-only
  - Best-effort posture (slog.Warn on error, never blocks request)

iterate.rs builds + appends a row on EVERY code path:
  - accepted: write_session_accepted with grounded_in_roster bool
    derived from validate_workers WorkerLookup (matches Go's
    handlers.rosterCheckFor("fill") semantics)
  - max-iter-exhausted: write_session_failure
  - infra-error: write_infra_error (so a missing /v1/iterate event
    never silently disappears from the longitudinal log)

[gateway].session_log_path config field (empty = disabled).
Production: /var/lib/lakehouse/gateway/sessions.jsonl. Operators who
want a unified longitudinal stream can point both Rust and Go
loggers at the same path — write-append is safe at the row sizes we
produce.

## Cross-runtime parity probe

crates/gateway/src/bin/parity_session_log: tiny stdin/stdout helper
that round-trips a fixture through SessionRecord serde.
golangLAKEHOUSE/scripts/cutover/parity/session_log_parity.sh feeds
4 fixtures through both helpers and diffs the rows after stripping
timestamp + daemon (the two fields that legitimately differ between
producers).

Result: **4/4 byte-equal** including the unicode-prompt fixture
("Café résumé  你好"). Schema parity holds. The non-trivial-equal
guard in the probe rejects the case where both sides fail
identically — protecting against a regression where one side
silently stops producing valid JSON.

## Verification

- cargo test -p gateway --lib: 90/90 PASS (3 new session_log tests
  including concurrent-append safety)
- cargo check --workspace: clean
- session_log_parity.sh: 4/4 fixtures byte-equal
- Both runtimes can append to the same path; DuckDB sees one stream
- The Go-side validatord smoke remains 5/5 (unchanged)

## Architecture invariant

Don't propose to "wire trace-id propagation in Rust" or "add Rust
session log" — both are now shipped on the demo/post-pr11-polish
branch. The longitudinal log + Langfuse tree together cover the
multi-call observability concern J flagged 2026-05-02.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:39:29 -05:00

96 lines
3.0 KiB
TOML

# Lakehouse Configuration
[gateway]
host = "0.0.0.0"
port = 3100
# Coordinator session JSONL — one row per /v1/iterate session for
# offline DuckDB analysis. Cross-runtime parity with the Go-side
# [validatord].session_log_path. Empty = disabled. Production:
# session_log_path = "/var/lib/lakehouse/gateway/sessions.jsonl"
session_log_path = ""
[storage]
root = "./data"
profile_root = "./data/_profiles"
rescue_bucket = "rescue"
[[storage.buckets]]
name = "primary"
backend = "local"
root = "./data"
[[storage.buckets]]
name = "rescue"
backend = "local"
root = "./data/_rescue"
[[storage.buckets]]
name = "testing"
backend = "local"
root = "./data/_testing"
# S3 bucket via MinIO. The name "s3:lakehouse" is the convention
# lance_backend.rs uses to emit s3:// URIs for Lance datasets.
# Credentials resolved via environment (AWS_ACCESS_KEY_ID etc) or
# the secrets provider.
[[storage.buckets]]
name = "s3:lakehouse"
backend = "s3"
bucket = "lakehouse"
endpoint = "http://localhost:9000"
region = "us-east-1"
secret_ref = "minio-lakehouse"
[catalog]
# Manifests persisted to object storage under this prefix
manifest_prefix = "_catalog/manifests"
[query]
# max_rows_per_query = 10000
[sidecar]
# Post-2026-05-02: AiClient talks directly to Ollama; the Python
# sidecar's hot-path role (~120 LOC of pure Ollama wrappers) was
# retired. Field name kept for migration compat — value now points
# at Ollama on :11434. Lab UI + pipeline_lab Python remains as a
# dev-only tool, NOT on this URL.
url = "http://localhost:11434"
[ai]
embed_model = "nomic-embed-text"
# Local-tier defaults bumped 2026-04-30: qwen3.5:latest is the
# stronger local rung in the 5-loop substrate (per
# project_small_model_pipeline_vision.md). Same JSON-clean property
# as qwen2.5, more capacity. Ollama still serves both — bump back
# in this file if a workload regressed.
gen_model = "qwen3.5:latest"
rerank_model = "qwen3.5:latest"
[auth]
enabled = false
# api_key = "changeme"
[observability]
# Export traces to stdout (set to "otlp" for OpenTelemetry collector)
exporter = "stdout"
service_name = "lakehouse"
[agent]
# Phase 16.2 — background autotune agent. Opt-in: set enabled = true to
# let the agent continuously propose + trial HNSW configs and auto-promote
# winners. Defaults are conservative so it stays out of the way of live
# search traffic on shared Ollama.
enabled = true
cycle_interval_secs = 120 # periodic wake if no triggers
cooldown_between_trials_secs = 10 # min gap between trials
min_recall = 0.9 # never promote below this
max_trials_per_hour = 20 # hard budget cap
# Model roster — available for profile hot-swap
# qwen3.5:latest: stronger local rung — JSON-clean, 8K+ context,
# default for gen_model and rerank_model
# qwen3: 8.2B, 40K context, thinking+tools, best for reasoning tasks
# qwen2.5: 7B, 8K context, fast — kept loaded for the 2026-04 era
# comparison runs; new defaults use qwen3.5:latest
# nomic-embed-text: 137M, embedding-only, used by all profiles