golangLAKEHOUSE/REPLICATION.md
root 54a05d9311 Sprint 4 deployment artifacts: Dockerfile + docker-compose
Parallel deploy target to the systemd units that landed in a59ef5b.
Single image carries all 11 daemons; docker-compose runs one
container per daemon with the same dependency graph as the systemd
units. Useful when systemd isn't available (Mac dev, remote VMs
without root) or when isolation to a private docker network is
preferred.

Dockerfile (multi-stage):
- Builder: golang:1.25-bookworm. DuckDB cgo needs gcc + glibc;
  alpine's musl doesn't link the official duckdb-go bindings cleanly.
- Runtime: debian:bookworm-slim — same libc, much smaller surface.
  Adds ca-certificates (outbound HTTPS to OpenRouter/OpenCode/Kimi),
  curl + jq (in-container healthchecks + smoke probes), tini (PID 1
  signal forwarding so docker stop sends SIGTERM to the daemon, not
  to a wrapper).
- Single image, multiple binaries. Ships all 11 cmd/* + 3 scripts/
  (staffing_workers, playbook_lift, multi_coord_stress) so deployed
  stacks can run reality tests against themselves.
- Non-root runtime user (uid 999 lakehouse). Layout matches
  /usr/local/bin/lakehouse/<daemon> from REPLICATION.md.
- ENTRYPOINT=tini; no default CMD — operators / compose pick
  which daemon explicitly.

docker-compose.yml (11 services):
- Same dependency graph as deploy/systemd/. depends_on with
  service_healthy condition matches Requires= equivalents:
    catalogd → storaged
    ingestd → storaged + catalogd
    queryd → catalogd
    matrixd → embedd + vectord
- Gateway uses bare depends_on (no health condition) — Wants=
  equivalent so single-upstream restart doesn't cascade.
- chatd has per-provider env_file entries (one each for
  ollama_cloud, openrouter, opencode, kimi) — missing files are
  silently OK, matching the systemd unit's EnvironmentFile=- list.
- Persistent state on the lakehouse-state named volume; commented
  driver_opts shows how to bind to a host path for off-volume
  backups.

.dockerignore:
- Excludes bin/ + reports/ + data/ + git metadata + .env files.
- Especially excludes lakehouse.toml/secrets-go.toml/auth.env so
  local dev configs don't accidentally bake into a published image.

REPLICATION.md gains a Docker section between systemd setup and
the logs section. Ten-line copy-paste from "git clone" to
"docker compose up -d", plus a docker-vs-systemd differences
table covering process supervision, logs, restart policy, file
ownership, host networking quirks, and backup targets.

Validation: docker compose config --quiet → exit 0 (with
placeholder env files in place).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:58:47 -05:00

10 KiB
Raw Blame History

Lakehouse-Go — Replication Runbook

How to deploy Lakehouse-Go onto a fresh Linux host. Mirrors the layout the dev box uses; covers prereqs, secrets, systemd units, validation.

Prereqs

The host needs these external services reachable BEFORE the Lakehouse daemons can usefully start. None are managed by Lakehouse-Go's own units; they're operator infrastructure.

Service Purpose Reachability
Go 1.25+ builds the binaries go version returns ≥ 1.25
gcc DuckDB cgo (queryd) gcc --version
MinIO (or AWS S3) storaged backing store curl http://localhost:9000/minio/health/live returns 200; bucket lakehouse-go-primary exists
Ollama embedd + chatd LLM dispatch curl http://localhost:11434/api/tags returns 200 with nomic-embed-text-v2-moe (or whatever [embedd].default_model names) loaded
Langfuse (optional) trace + span observability curl http://localhost:3001/api/public/health returns 200
PostgreSQL (optional) only if Langfuse is wanted bundled with the Langfuse docker compose

Bind ports the daemons use (G0 dev defaults; shifted by 10 from the Rust legacy on 3100/32013204 so both stacks coexist):

Daemon Port
gateway 3110
storaged 3211
catalogd 3212
ingestd 3213
queryd 3214
vectord 3215
embedd 3216
pathwayd 3217
matrixd 3218
observerd 3219
chatd 3220

Bootstrap

1. User + directories

sudo useradd --system --no-create-home --shell /usr/sbin/nologin lakehouse
sudo mkdir -p /var/lib/lakehouse/{pathway,observer} /var/log/lakehouse \
              /usr/local/bin/lakehouse /etc/lakehouse
sudo chown -R lakehouse:lakehouse /var/lib/lakehouse /var/log/lakehouse

2. Build + install binaries

From a clone of the repo:

git clone https://git.agentview.dev/profit/golangLAKEHOUSE.git
cd golangLAKEHOUSE
just verify    # vet + tests + 9 core smokes — ~31s
go build -o bin/ ./cmd/...   # 11 binaries land in ./bin/
sudo cp bin/{gateway,storaged,catalogd,ingestd,queryd,vectord,embedd,pathwayd,observerd,matrixd,chatd} /usr/local/bin/lakehouse/
sudo chmod 755 /usr/local/bin/lakehouse/*

3. Config + secrets

# Main config — edit ports/URLs/model tier as needed
sudo cp lakehouse.toml /etc/lakehouse/lakehouse.toml

# S3 credentials — fill in real keys
sudo cp deploy/etc-lakehouse/secrets-go.toml.example /etc/lakehouse/secrets-go.toml
sudo chown root:lakehouse /etc/lakehouse/secrets-go.toml
sudo chmod 0640 /etc/lakehouse/secrets-go.toml
sudo $EDITOR /etc/lakehouse/secrets-go.toml  # set [s3.primary] keys

# Auth token — required ONLY if any daemon binds non-loopback
sudo cp deploy/etc-lakehouse/auth.env.example /etc/lakehouse/auth.env
sudo chown root:lakehouse /etc/lakehouse/auth.env
sudo chmod 0640 /etc/lakehouse/auth.env
# For non-loopback deploys, set:
#   AUTH_TOKEN=<generate via `openssl rand -hex 32`>
sudo $EDITOR /etc/lakehouse/auth.env

# Optional: chatd cloud provider keys, one file per provider
# (each is its own EnvironmentFile so rotations don't restart all chatd)
for provider in ollama_cloud openrouter opencode kimi; do
  echo "${provider^^}_API_KEY=" | sudo tee /etc/lakehouse/$provider.env > /dev/null
  sudo chown root:lakehouse /etc/lakehouse/$provider.env
  sudo chmod 0640 /etc/lakehouse/$provider.env
done
sudo $EDITOR /etc/lakehouse/openrouter.env  # etc per provider you need

4. systemd units

sudo cp deploy/systemd/*.service deploy/systemd/*.target /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable lakehouse-go.target
sudo systemctl start lakehouse-go.target

5. Validation

# All 11 daemons should be active
systemctl status 'lakehouse-*.service' --no-pager | grep -E "Active|●"

# Health endpoints respond on each port
for port in 3110 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220; do
  printf "%5d: " "$port"
  curl -sS --max-time 2 "http://127.0.0.1:$port/health" || echo "FAIL"
done

# Through the gateway: all chatd providers register (cloud keys present)
curl -sS http://127.0.0.1:3110/v1/chat/providers | jq

# End-to-end: ingest a tiny CSV → queryd SELECT → matrix.search
echo -e "id,name,role\n1,Alice,Forklift Operator" > /tmp/probe.csv
curl -sS -F "file=@/tmp/probe.csv" "http://127.0.0.1:3110/v1/ingest?name=probe"
curl -sS -X POST http://127.0.0.1:3110/v1/sql \
  -H 'content-type: application/json' \
  -d '{"sql":"SELECT COUNT(*) FROM probe"}' | jq

Auth posture

Per ADR-006:

  • Loopback-only deploy (every daemon binds 127.0.0.1): no auth needed. Empty AUTH_TOKEN is fine. Network is the boundary.
  • Non-loopback deploy (gateway exposed beyond loopback, daemons internal-private): set AUTH_TOKEN in /etc/lakehouse/auth.env. The mechanical gate at startup refuses to bind without one.
  • Multi-host deploy (gateway + daemons on separate machines): set AUTH_TOKEN and [auth].allowed_ips in lakehouse.toml to the gateway's address. Both layers gate.
  • TLS: terminate at nginx/Caddy in front of the gateway. The Go daemons speak HTTP; in-process TLS is explicitly out of scope per ADR-006 Decision 6.6.

Token rotation

Per ADR-006 Decision 6.5 — dual-token window:

# 1. Generate new token
NEW=$(openssl rand -hex 32)

# 2. Add as secondary, keep old as primary
sudo sed -i "s|^AUTH_SECONDARY_TOKEN=.*|AUTH_SECONDARY_TOKEN=$NEW|" /etc/lakehouse/auth.env
sudo systemctl restart lakehouse-go.target

# 3. Update every caller to use NEW token
# 4. Promote: NEW becomes primary, secondary clears
sudo sed -i "s|^AUTH_TOKEN=.*|AUTH_TOKEN=$NEW|" /etc/lakehouse/auth.env
sudo sed -i "s|^AUTH_SECONDARY_TOKEN=.*|AUTH_SECONDARY_TOKEN=|" /etc/lakehouse/auth.env
sudo systemctl restart lakehouse-go.target

Docker / docker-compose deploy (alternative to systemd)

The single-image Dockerfile carries all 11 daemons; docker-compose.yml runs one container per daemon with the same dependency graph as the systemd units. Useful when the host doesn't have systemd (Mac dev boxes, remote VMs without root) or when you want all of Lakehouse-Go isolated to a private docker network.

# Build the image (multi-stage; ~3 min on first build, ~30s with
# cached go module download).
docker build -t lakehouse-go:latest .

# Place config + secrets next to docker-compose.yml. The compose file
# bind-mounts these into every container at /etc/lakehouse/.
cp lakehouse.toml lakehouse.toml          # already in repo; edit if needed
cp deploy/etc-lakehouse/secrets-go.toml.example secrets-go.toml
chmod 0600 secrets-go.toml
cp deploy/etc-lakehouse/auth.env.example auth.env
chmod 0600 auth.env
# Per-provider chatd keys (each its own file so missing == provider
# unregistered, NOT chatd startup failure):
for p in ollama_cloud openrouter opencode kimi; do
  echo "${p^^}_API_KEY=" > $p.env
  chmod 0600 $p.env
done

# $EDITOR each file to fill in real values...

# Bring up the stack.
docker compose up -d
docker compose ps    # all 11 services Healthy
docker compose logs -f gateway

# Validate via the gateway like the systemd path.
curl -sS http://127.0.0.1:3110/v1/chat/providers | jq

# Tear down.
docker compose down
# State volume (pathway/observer JSONLs) survives `down`. To wipe:
docker compose down -v

Key docker-vs-systemd differences

Concern systemd docker-compose
Process supervision systemd tini + docker daemon
Logs journald docker logs (or routed to a sink via logging driver)
Restarts on failure Restart=on-failure restart: unless-stopped
File ownership User=lakehouse (uid varies) user: 999:999 (uid is fixed in the image)
Reaches MinIO/Ollama host network host's address from inside the bridge network — typically host.docker.internal (Mac/Win) or 172.17.0.1 (Linux). Set [s3].endpoint + [embedd].provider_url accordingly.
Backup target /var/lib/lakehouse/ on host the lakehouse-state named volume; bind to a host path via the commented-out driver_opts in compose if needed

Logs

systemd routes everything to journald with per-daemon SyslogIdentifier:

journalctl -u lakehouse-gateway.service -f
journalctl -u 'lakehouse-*.service' --since '5 min ago'

Stopping

sudo systemctl stop lakehouse-go.target  # cascades to all 11 daemons

Backup / state preservation

Path What Backup priority
/var/lib/lakehouse/pathway/state.jsonl Mem0 trace store (append-only) high
/var/lib/lakehouse/observer/ops.jsonl observer ring's persistor backup medium
MinIO lakehouse-go-primary bucket parquets, vector LHV1 indexes, catalog manifests high
/etc/lakehouse/lakehouse.toml service config medium
/etc/lakehouse/secrets-go.toml + *.env secrets high (in your secrets manager, not on disk)

Troubleshooting

Daemon refuses to start with "refuse non-loopback bind without auth.token" ADR-006 6.1 mechanical gate. Set AUTH_TOKEN in /etc/lakehouse/auth.env or bind back to loopback.

Daemon refuses to start with "refusing non-loopback bind ... see audit R-001" The previous loopback-bind gate. For dev: LH_<NAME>_ALLOW_NONLOOPBACK=1 overrides. For prod: set AUTH_TOKEN AND keep the override (or move to loopback + reverse-proxy).

catalogd 500 / NoSuchBucket storaged is pointing at a bucket that doesn't exist. Either create the bucket in MinIO or fix [s3].bucket in lakehouse.toml.

embedd 502 on /v1/embed Ollama not running OR [embedd].default_model not loaded. ollama list to verify; ollama pull nomic-embed-text-v2-moe to load.

chatd /v1/chat/providers shows false for cloud providers The provider's env file is missing or empty. Check /etc/lakehouse/<provider>.env.

queryd unable to read parquet Check [queryd].secrets_path points at the right secrets-go.toml AND the file's owner+mode allow the lakehouse user to read.

  • STATE_OF_PLAY.md — verified-working snapshot
  • docs/DECISIONS.md — all ADRs, especially ADR-003 (auth substrate) + ADR-006 (auth posture)
  • docs/SPEC.md §1 — component table