golangLAKEHOUSE/REPLICATION.md
root 68d9e554b0 shared: auto-emit Langfuse trace+span per HTTP request — closes OPEN #2
Adds langfuseMiddleware in internal/shared so every daemon's
shared.Run gets free production-traffic trace visibility when
LANGFUSE_URL + LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY are set.
Same env names + file shape as the multi_coord_stress driver, so
operators ship one /etc/lakehouse/langfuse.env across the deploy.

Wiring is auth-gated: middleware runs INSIDE the RequireAuth group,
so 401s from credential-stuffing don't pollute traces. /health is
exempt so LB probes don't either. Missing env vars → nil client →
middleware is a passthrough no-op (fail-open per ADR-005 5.1).

Bundled deploy:
- langfuse.env.example template (mode 0640, root:lakehouse)
- 11 systemd units gain `EnvironmentFile=-/etc/lakehouse/langfuse.env`
  (leading - so missing file = OK)
- REPLICATION.md bootstrap section documents setup

Tests (4): nil passthrough, /health bypass, real-request emission,
status-writer wrapping. All green.

STATE_OF_PLAY OPEN list: 5 rows → 4 rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:55:42 -05:00

10 KiB
Raw Blame History

Lakehouse-Go — Replication Runbook

How to deploy Lakehouse-Go onto a fresh Linux host. Mirrors the layout the dev box uses; covers prereqs, secrets, systemd units, validation.

Prereqs

The host needs these external services reachable BEFORE the Lakehouse daemons can usefully start. None are managed by Lakehouse-Go's own units; they're operator infrastructure.

Service Purpose Reachability
Go 1.25+ builds the binaries go version returns ≥ 1.25
gcc DuckDB cgo (queryd) gcc --version
MinIO (or AWS S3) storaged backing store curl http://localhost:9000/minio/health/live returns 200; bucket lakehouse-go-primary exists
Ollama embedd + chatd LLM dispatch curl http://localhost:11434/api/tags returns 200 with nomic-embed-text-v2-moe (or whatever [embedd].default_model names) loaded
Langfuse (optional) trace + span observability curl http://localhost:3001/api/public/health returns 200
PostgreSQL (optional) only if Langfuse is wanted bundled with the Langfuse docker compose

Bind ports the daemons use (G0 dev defaults; shifted by 10 from the Rust legacy on 3100/32013204 so both stacks coexist):

Daemon Port
gateway 3110
storaged 3211
catalogd 3212
ingestd 3213
queryd 3214
vectord 3215
embedd 3216
pathwayd 3217
matrixd 3218
observerd 3219
chatd 3220

Bootstrap

1. User + directories

sudo useradd --system --no-create-home --shell /usr/sbin/nologin lakehouse
sudo mkdir -p /var/lib/lakehouse/{pathway,observer} /var/log/lakehouse \
              /usr/local/bin/lakehouse /etc/lakehouse
sudo chown -R lakehouse:lakehouse /var/lib/lakehouse /var/log/lakehouse

2. Build + install binaries

From a clone of the repo:

git clone https://git.agentview.dev/profit/golangLAKEHOUSE.git
cd golangLAKEHOUSE
just verify    # vet + tests + 9 core smokes — ~31s
go build -o bin/ ./cmd/...   # 11 binaries land in ./bin/
sudo cp bin/{gateway,storaged,catalogd,ingestd,queryd,vectord,embedd,pathwayd,observerd,matrixd,chatd} /usr/local/bin/lakehouse/
sudo chmod 755 /usr/local/bin/lakehouse/*

3. Config + secrets

# Main config — edit ports/URLs/model tier as needed
sudo cp lakehouse.toml /etc/lakehouse/lakehouse.toml

# S3 credentials — fill in real keys
sudo cp deploy/etc-lakehouse/secrets-go.toml.example /etc/lakehouse/secrets-go.toml
sudo chown root:lakehouse /etc/lakehouse/secrets-go.toml
sudo chmod 0640 /etc/lakehouse/secrets-go.toml
sudo $EDITOR /etc/lakehouse/secrets-go.toml  # set [s3.primary] keys

# Auth token — required ONLY if any daemon binds non-loopback
sudo cp deploy/etc-lakehouse/auth.env.example /etc/lakehouse/auth.env
sudo chown root:lakehouse /etc/lakehouse/auth.env
sudo chmod 0640 /etc/lakehouse/auth.env
# For non-loopback deploys, set:
#   AUTH_TOKEN=<generate via `openssl rand -hex 32`>
sudo $EDITOR /etc/lakehouse/auth.env

# Optional: Langfuse traces. When set, every authenticated HTTP
# request to every daemon emits a trace + span (production
# observability per OPEN item #2 closure). Missing file = no
# traces, no warnings.
sudo cp deploy/etc-lakehouse/langfuse.env.example /etc/lakehouse/langfuse.env
sudo chown root:lakehouse /etc/lakehouse/langfuse.env
sudo chmod 0640 /etc/lakehouse/langfuse.env
sudo $EDITOR /etc/lakehouse/langfuse.env  # set URL + PUBLIC_KEY + SECRET_KEY

# Optional: chatd cloud provider keys, one file per provider
# (each is its own EnvironmentFile so rotations don't restart all chatd)
for provider in ollama_cloud openrouter opencode kimi; do
  echo "${provider^^}_API_KEY=" | sudo tee /etc/lakehouse/$provider.env > /dev/null
  sudo chown root:lakehouse /etc/lakehouse/$provider.env
  sudo chmod 0640 /etc/lakehouse/$provider.env
done
sudo $EDITOR /etc/lakehouse/openrouter.env  # etc per provider you need

4. systemd units

sudo cp deploy/systemd/*.service deploy/systemd/*.target /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable lakehouse-go.target
sudo systemctl start lakehouse-go.target

5. Validation

# All 11 daemons should be active
systemctl status 'lakehouse-*.service' --no-pager | grep -E "Active|●"

# Health endpoints respond on each port
for port in 3110 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220; do
  printf "%5d: " "$port"
  curl -sS --max-time 2 "http://127.0.0.1:$port/health" || echo "FAIL"
done

# Through the gateway: all chatd providers register (cloud keys present)
curl -sS http://127.0.0.1:3110/v1/chat/providers | jq

# End-to-end: ingest a tiny CSV → queryd SELECT → matrix.search
echo -e "id,name,role\n1,Alice,Forklift Operator" > /tmp/probe.csv
curl -sS -F "file=@/tmp/probe.csv" "http://127.0.0.1:3110/v1/ingest?name=probe"
curl -sS -X POST http://127.0.0.1:3110/v1/sql \
  -H 'content-type: application/json' \
  -d '{"sql":"SELECT COUNT(*) FROM probe"}' | jq

Auth posture

Per ADR-006:

  • Loopback-only deploy (every daemon binds 127.0.0.1): no auth needed. Empty AUTH_TOKEN is fine. Network is the boundary.
  • Non-loopback deploy (gateway exposed beyond loopback, daemons internal-private): set AUTH_TOKEN in /etc/lakehouse/auth.env. The mechanical gate at startup refuses to bind without one.
  • Multi-host deploy (gateway + daemons on separate machines): set AUTH_TOKEN and [auth].allowed_ips in lakehouse.toml to the gateway's address. Both layers gate.
  • TLS: terminate at nginx/Caddy in front of the gateway. The Go daemons speak HTTP; in-process TLS is explicitly out of scope per ADR-006 Decision 6.6.

Token rotation

Per ADR-006 Decision 6.5 — dual-token window:

# 1. Generate new token
NEW=$(openssl rand -hex 32)

# 2. Add as secondary, keep old as primary
sudo sed -i "s|^AUTH_SECONDARY_TOKEN=.*|AUTH_SECONDARY_TOKEN=$NEW|" /etc/lakehouse/auth.env
sudo systemctl restart lakehouse-go.target

# 3. Update every caller to use NEW token
# 4. Promote: NEW becomes primary, secondary clears
sudo sed -i "s|^AUTH_TOKEN=.*|AUTH_TOKEN=$NEW|" /etc/lakehouse/auth.env
sudo sed -i "s|^AUTH_SECONDARY_TOKEN=.*|AUTH_SECONDARY_TOKEN=|" /etc/lakehouse/auth.env
sudo systemctl restart lakehouse-go.target

Docker / docker-compose deploy (alternative to systemd)

The single-image Dockerfile carries all 11 daemons; docker-compose.yml runs one container per daemon with the same dependency graph as the systemd units. Useful when the host doesn't have systemd (Mac dev boxes, remote VMs without root) or when you want all of Lakehouse-Go isolated to a private docker network.

# Build the image (multi-stage; ~3 min on first build, ~30s with
# cached go module download).
docker build -t lakehouse-go:latest .

# Place config + secrets next to docker-compose.yml. The compose file
# bind-mounts these into every container at /etc/lakehouse/.
cp lakehouse.toml lakehouse.toml          # already in repo; edit if needed
cp deploy/etc-lakehouse/secrets-go.toml.example secrets-go.toml
chmod 0600 secrets-go.toml
cp deploy/etc-lakehouse/auth.env.example auth.env
chmod 0600 auth.env
# Per-provider chatd keys (each its own file so missing == provider
# unregistered, NOT chatd startup failure):
for p in ollama_cloud openrouter opencode kimi; do
  echo "${p^^}_API_KEY=" > $p.env
  chmod 0600 $p.env
done

# $EDITOR each file to fill in real values...

# Bring up the stack.
docker compose up -d
docker compose ps    # all 11 services Healthy
docker compose logs -f gateway

# Validate via the gateway like the systemd path.
curl -sS http://127.0.0.1:3110/v1/chat/providers | jq

# Tear down.
docker compose down
# State volume (pathway/observer JSONLs) survives `down`. To wipe:
docker compose down -v

Key docker-vs-systemd differences

Concern systemd docker-compose
Process supervision systemd tini + docker daemon
Logs journald docker logs (or routed to a sink via logging driver)
Restarts on failure Restart=on-failure restart: unless-stopped
File ownership User=lakehouse (uid varies) user: 999:999 (uid is fixed in the image)
Reaches MinIO/Ollama host network host's address from inside the bridge network — typically host.docker.internal (Mac/Win) or 172.17.0.1 (Linux). Set [s3].endpoint + [embedd].provider_url accordingly.
Backup target /var/lib/lakehouse/ on host the lakehouse-state named volume; bind to a host path via the commented-out driver_opts in compose if needed

Logs

systemd routes everything to journald with per-daemon SyslogIdentifier:

journalctl -u lakehouse-gateway.service -f
journalctl -u 'lakehouse-*.service' --since '5 min ago'

Stopping

sudo systemctl stop lakehouse-go.target  # cascades to all 11 daemons

Backup / state preservation

Path What Backup priority
/var/lib/lakehouse/pathway/state.jsonl Mem0 trace store (append-only) high
/var/lib/lakehouse/observer/ops.jsonl observer ring's persistor backup medium
MinIO lakehouse-go-primary bucket parquets, vector LHV1 indexes, catalog manifests high
/etc/lakehouse/lakehouse.toml service config medium
/etc/lakehouse/secrets-go.toml + *.env secrets high (in your secrets manager, not on disk)

Troubleshooting

Daemon refuses to start with "refuse non-loopback bind without auth.token" ADR-006 6.1 mechanical gate. Set AUTH_TOKEN in /etc/lakehouse/auth.env or bind back to loopback.

Daemon refuses to start with "refusing non-loopback bind ... see audit R-001" The previous loopback-bind gate. For dev: LH_<NAME>_ALLOW_NONLOOPBACK=1 overrides. For prod: set AUTH_TOKEN AND keep the override (or move to loopback + reverse-proxy).

catalogd 500 / NoSuchBucket storaged is pointing at a bucket that doesn't exist. Either create the bucket in MinIO or fix [s3].bucket in lakehouse.toml.

embedd 502 on /v1/embed Ollama not running OR [embedd].default_model not loaded. ollama list to verify; ollama pull nomic-embed-text-v2-moe to load.

chatd /v1/chat/providers shows false for cloud providers The provider's env file is missing or empty. Check /etc/lakehouse/<provider>.env.

queryd unable to read parquet Check [queryd].secrets_path points at the right secrets-go.toml AND the file's owner+mode allow the lakehouse user to read.

  • STATE_OF_PLAY.md — verified-working snapshot
  • docs/DECISIONS.md — all ADRs, especially ADR-003 (auth substrate) + ADR-006 (auth posture)
  • docs/SPEC.md §1 — component table