Parallel deploy target to the systemd units that landed in a59ef5b.
Single image carries all 11 daemons; docker-compose runs one
container per daemon with the same dependency graph as the systemd
units. Useful when systemd isn't available (Mac dev, remote VMs
without root) or when isolation to a private docker network is
preferred.
Dockerfile (multi-stage):
- Builder: golang:1.25-bookworm. DuckDB cgo needs gcc + glibc;
alpine's musl doesn't link the official duckdb-go bindings cleanly.
- Runtime: debian:bookworm-slim — same libc, much smaller surface.
Adds ca-certificates (outbound HTTPS to OpenRouter/OpenCode/Kimi),
curl + jq (in-container healthchecks + smoke probes), tini (PID 1
signal forwarding so docker stop sends SIGTERM to the daemon, not
to a wrapper).
- Single image, multiple binaries. Ships all 11 cmd/* + 3 scripts/
(staffing_workers, playbook_lift, multi_coord_stress) so deployed
stacks can run reality tests against themselves.
- Non-root runtime user (uid 999 lakehouse). Layout matches
/usr/local/bin/lakehouse/<daemon> from REPLICATION.md.
- ENTRYPOINT=tini; no default CMD — operators / compose pick
which daemon explicitly.
docker-compose.yml (11 services):
- Same dependency graph as deploy/systemd/. depends_on with
service_healthy condition matches Requires= equivalents:
catalogd → storaged
ingestd → storaged + catalogd
queryd → catalogd
matrixd → embedd + vectord
- Gateway uses bare depends_on (no health condition) — Wants=
equivalent so single-upstream restart doesn't cascade.
- chatd has per-provider env_file entries (one each for
ollama_cloud, openrouter, opencode, kimi) — missing files are
silently OK, matching the systemd unit's EnvironmentFile=- list.
- Persistent state on the lakehouse-state named volume; commented
driver_opts shows how to bind to a host path for off-volume
backups.
.dockerignore:
- Excludes bin/ + reports/ + data/ + git metadata + .env files.
- Especially excludes lakehouse.toml/secrets-go.toml/auth.env so
local dev configs don't accidentally bake into a published image.
REPLICATION.md gains a Docker section between systemd setup and
the logs section. Ten-line copy-paste from "git clone" to
"docker compose up -d", plus a docker-vs-systemd differences
table covering process supervision, logs, restart policy, file
ownership, host networking quirks, and backup targets.
Validation: docker compose config --quiet → exit 0 (with
placeholder env files in place).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
256 lines
10 KiB
Markdown
256 lines
10 KiB
Markdown
# Lakehouse-Go — Replication Runbook
|
||
|
||
How to deploy Lakehouse-Go onto a fresh Linux host. Mirrors the layout
|
||
the dev box uses; covers prereqs, secrets, systemd units, validation.
|
||
|
||
## Prereqs
|
||
|
||
The host needs these external services reachable BEFORE the Lakehouse
|
||
daemons can usefully start. None are managed by Lakehouse-Go's own
|
||
units; they're operator infrastructure.
|
||
|
||
| Service | Purpose | Reachability |
|
||
|---|---|---|
|
||
| **Go 1.25+** | builds the binaries | `go version` returns ≥ 1.25 |
|
||
| **gcc** | DuckDB cgo (queryd) | `gcc --version` |
|
||
| **MinIO** (or AWS S3) | storaged backing store | `curl http://localhost:9000/minio/health/live` returns 200; bucket `lakehouse-go-primary` exists |
|
||
| **Ollama** | embedd + chatd LLM dispatch | `curl http://localhost:11434/api/tags` returns 200 with `nomic-embed-text-v2-moe` (or whatever `[embedd].default_model` names) loaded |
|
||
| **Langfuse** *(optional)* | trace + span observability | `curl http://localhost:3001/api/public/health` returns 200 |
|
||
| **PostgreSQL** *(optional)* | only if Langfuse is wanted | bundled with the Langfuse docker compose |
|
||
|
||
Bind ports the daemons use (G0 dev defaults; shifted by 10 from the
|
||
Rust legacy on 3100/3201–3204 so both stacks coexist):
|
||
|
||
| Daemon | Port |
|
||
|---|---:|
|
||
| gateway | 3110 |
|
||
| storaged | 3211 |
|
||
| catalogd | 3212 |
|
||
| ingestd | 3213 |
|
||
| queryd | 3214 |
|
||
| vectord | 3215 |
|
||
| embedd | 3216 |
|
||
| pathwayd | 3217 |
|
||
| matrixd | 3218 |
|
||
| observerd | 3219 |
|
||
| chatd | 3220 |
|
||
|
||
## Bootstrap
|
||
|
||
### 1. User + directories
|
||
|
||
```bash
|
||
sudo useradd --system --no-create-home --shell /usr/sbin/nologin lakehouse
|
||
sudo mkdir -p /var/lib/lakehouse/{pathway,observer} /var/log/lakehouse \
|
||
/usr/local/bin/lakehouse /etc/lakehouse
|
||
sudo chown -R lakehouse:lakehouse /var/lib/lakehouse /var/log/lakehouse
|
||
```
|
||
|
||
### 2. Build + install binaries
|
||
|
||
From a clone of the repo:
|
||
|
||
```bash
|
||
git clone https://git.agentview.dev/profit/golangLAKEHOUSE.git
|
||
cd golangLAKEHOUSE
|
||
just verify # vet + tests + 9 core smokes — ~31s
|
||
go build -o bin/ ./cmd/... # 11 binaries land in ./bin/
|
||
sudo cp bin/{gateway,storaged,catalogd,ingestd,queryd,vectord,embedd,pathwayd,observerd,matrixd,chatd} /usr/local/bin/lakehouse/
|
||
sudo chmod 755 /usr/local/bin/lakehouse/*
|
||
```
|
||
|
||
### 3. Config + secrets
|
||
|
||
```bash
|
||
# Main config — edit ports/URLs/model tier as needed
|
||
sudo cp lakehouse.toml /etc/lakehouse/lakehouse.toml
|
||
|
||
# S3 credentials — fill in real keys
|
||
sudo cp deploy/etc-lakehouse/secrets-go.toml.example /etc/lakehouse/secrets-go.toml
|
||
sudo chown root:lakehouse /etc/lakehouse/secrets-go.toml
|
||
sudo chmod 0640 /etc/lakehouse/secrets-go.toml
|
||
sudo $EDITOR /etc/lakehouse/secrets-go.toml # set [s3.primary] keys
|
||
|
||
# Auth token — required ONLY if any daemon binds non-loopback
|
||
sudo cp deploy/etc-lakehouse/auth.env.example /etc/lakehouse/auth.env
|
||
sudo chown root:lakehouse /etc/lakehouse/auth.env
|
||
sudo chmod 0640 /etc/lakehouse/auth.env
|
||
# For non-loopback deploys, set:
|
||
# AUTH_TOKEN=<generate via `openssl rand -hex 32`>
|
||
sudo $EDITOR /etc/lakehouse/auth.env
|
||
|
||
# Optional: chatd cloud provider keys, one file per provider
|
||
# (each is its own EnvironmentFile so rotations don't restart all chatd)
|
||
for provider in ollama_cloud openrouter opencode kimi; do
|
||
echo "${provider^^}_API_KEY=" | sudo tee /etc/lakehouse/$provider.env > /dev/null
|
||
sudo chown root:lakehouse /etc/lakehouse/$provider.env
|
||
sudo chmod 0640 /etc/lakehouse/$provider.env
|
||
done
|
||
sudo $EDITOR /etc/lakehouse/openrouter.env # etc per provider you need
|
||
```
|
||
|
||
### 4. systemd units
|
||
|
||
```bash
|
||
sudo cp deploy/systemd/*.service deploy/systemd/*.target /etc/systemd/system/
|
||
sudo systemctl daemon-reload
|
||
sudo systemctl enable lakehouse-go.target
|
||
sudo systemctl start lakehouse-go.target
|
||
```
|
||
|
||
### 5. Validation
|
||
|
||
```bash
|
||
# All 11 daemons should be active
|
||
systemctl status 'lakehouse-*.service' --no-pager | grep -E "Active|●"
|
||
|
||
# Health endpoints respond on each port
|
||
for port in 3110 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220; do
|
||
printf "%5d: " "$port"
|
||
curl -sS --max-time 2 "http://127.0.0.1:$port/health" || echo "FAIL"
|
||
done
|
||
|
||
# Through the gateway: all chatd providers register (cloud keys present)
|
||
curl -sS http://127.0.0.1:3110/v1/chat/providers | jq
|
||
|
||
# End-to-end: ingest a tiny CSV → queryd SELECT → matrix.search
|
||
echo -e "id,name,role\n1,Alice,Forklift Operator" > /tmp/probe.csv
|
||
curl -sS -F "file=@/tmp/probe.csv" "http://127.0.0.1:3110/v1/ingest?name=probe"
|
||
curl -sS -X POST http://127.0.0.1:3110/v1/sql \
|
||
-H 'content-type: application/json' \
|
||
-d '{"sql":"SELECT COUNT(*) FROM probe"}' | jq
|
||
```
|
||
|
||
## Auth posture
|
||
|
||
Per ADR-006:
|
||
|
||
- **Loopback-only deploy** (every daemon binds 127.0.0.1): no auth needed. Empty `AUTH_TOKEN` is fine. Network is the boundary.
|
||
- **Non-loopback deploy** (gateway exposed beyond loopback, daemons internal-private): set `AUTH_TOKEN` in `/etc/lakehouse/auth.env`. The mechanical gate at startup refuses to bind without one.
|
||
- **Multi-host deploy** (gateway + daemons on separate machines): set `AUTH_TOKEN` *and* `[auth].allowed_ips` in lakehouse.toml to the gateway's address. Both layers gate.
|
||
- **TLS**: terminate at nginx/Caddy in front of the gateway. The Go daemons speak HTTP; in-process TLS is explicitly out of scope per ADR-006 Decision 6.6.
|
||
|
||
## Token rotation
|
||
|
||
Per ADR-006 Decision 6.5 — dual-token window:
|
||
|
||
```bash
|
||
# 1. Generate new token
|
||
NEW=$(openssl rand -hex 32)
|
||
|
||
# 2. Add as secondary, keep old as primary
|
||
sudo sed -i "s|^AUTH_SECONDARY_TOKEN=.*|AUTH_SECONDARY_TOKEN=$NEW|" /etc/lakehouse/auth.env
|
||
sudo systemctl restart lakehouse-go.target
|
||
|
||
# 3. Update every caller to use NEW token
|
||
# 4. Promote: NEW becomes primary, secondary clears
|
||
sudo sed -i "s|^AUTH_TOKEN=.*|AUTH_TOKEN=$NEW|" /etc/lakehouse/auth.env
|
||
sudo sed -i "s|^AUTH_SECONDARY_TOKEN=.*|AUTH_SECONDARY_TOKEN=|" /etc/lakehouse/auth.env
|
||
sudo systemctl restart lakehouse-go.target
|
||
```
|
||
|
||
## Docker / docker-compose deploy (alternative to systemd)
|
||
|
||
The single-image `Dockerfile` carries all 11 daemons; `docker-compose.yml`
|
||
runs one container per daemon with the same dependency graph as the
|
||
systemd units. Useful when the host doesn't have systemd (Mac dev
|
||
boxes, remote VMs without root) or when you want all of Lakehouse-Go
|
||
isolated to a private docker network.
|
||
|
||
```bash
|
||
# Build the image (multi-stage; ~3 min on first build, ~30s with
|
||
# cached go module download).
|
||
docker build -t lakehouse-go:latest .
|
||
|
||
# Place config + secrets next to docker-compose.yml. The compose file
|
||
# bind-mounts these into every container at /etc/lakehouse/.
|
||
cp lakehouse.toml lakehouse.toml # already in repo; edit if needed
|
||
cp deploy/etc-lakehouse/secrets-go.toml.example secrets-go.toml
|
||
chmod 0600 secrets-go.toml
|
||
cp deploy/etc-lakehouse/auth.env.example auth.env
|
||
chmod 0600 auth.env
|
||
# Per-provider chatd keys (each its own file so missing == provider
|
||
# unregistered, NOT chatd startup failure):
|
||
for p in ollama_cloud openrouter opencode kimi; do
|
||
echo "${p^^}_API_KEY=" > $p.env
|
||
chmod 0600 $p.env
|
||
done
|
||
|
||
# $EDITOR each file to fill in real values...
|
||
|
||
# Bring up the stack.
|
||
docker compose up -d
|
||
docker compose ps # all 11 services Healthy
|
||
docker compose logs -f gateway
|
||
|
||
# Validate via the gateway like the systemd path.
|
||
curl -sS http://127.0.0.1:3110/v1/chat/providers | jq
|
||
|
||
# Tear down.
|
||
docker compose down
|
||
# State volume (pathway/observer JSONLs) survives `down`. To wipe:
|
||
docker compose down -v
|
||
```
|
||
|
||
### Key docker-vs-systemd differences
|
||
|
||
| Concern | systemd | docker-compose |
|
||
|---|---|---|
|
||
| Process supervision | systemd | tini + docker daemon |
|
||
| Logs | journald | `docker logs` (or routed to a sink via logging driver) |
|
||
| Restarts on failure | `Restart=on-failure` | `restart: unless-stopped` |
|
||
| File ownership | `User=lakehouse` (uid varies) | `user: 999:999` (uid is fixed in the image) |
|
||
| Reaches MinIO/Ollama | host network | host's address from inside the bridge network — typically `host.docker.internal` (Mac/Win) or `172.17.0.1` (Linux). Set `[s3].endpoint` + `[embedd].provider_url` accordingly. |
|
||
| Backup target | `/var/lib/lakehouse/` on host | the `lakehouse-state` named volume; bind to a host path via the commented-out `driver_opts` in compose if needed |
|
||
|
||
## Logs
|
||
|
||
systemd routes everything to journald with per-daemon SyslogIdentifier:
|
||
|
||
```bash
|
||
journalctl -u lakehouse-gateway.service -f
|
||
journalctl -u 'lakehouse-*.service' --since '5 min ago'
|
||
```
|
||
|
||
## Stopping
|
||
|
||
```bash
|
||
sudo systemctl stop lakehouse-go.target # cascades to all 11 daemons
|
||
```
|
||
|
||
## Backup / state preservation
|
||
|
||
| Path | What | Backup priority |
|
||
|---|---|---|
|
||
| `/var/lib/lakehouse/pathway/state.jsonl` | Mem0 trace store (append-only) | high |
|
||
| `/var/lib/lakehouse/observer/ops.jsonl` | observer ring's persistor backup | medium |
|
||
| MinIO `lakehouse-go-primary` bucket | parquets, vector LHV1 indexes, catalog manifests | high |
|
||
| `/etc/lakehouse/lakehouse.toml` | service config | medium |
|
||
| `/etc/lakehouse/secrets-go.toml` + `*.env` | secrets | high (in your secrets manager, not on disk) |
|
||
|
||
## Troubleshooting
|
||
|
||
**Daemon refuses to start with "refuse non-loopback bind without auth.token"**
|
||
ADR-006 6.1 mechanical gate. Set `AUTH_TOKEN` in `/etc/lakehouse/auth.env` or bind back to loopback.
|
||
|
||
**Daemon refuses to start with "refusing non-loopback bind ... see audit R-001"**
|
||
The previous loopback-bind gate. For dev: `LH_<NAME>_ALLOW_NONLOOPBACK=1` overrides. For prod: set `AUTH_TOKEN` AND keep the override (or move to loopback + reverse-proxy).
|
||
|
||
**catalogd 500 / NoSuchBucket**
|
||
storaged is pointing at a bucket that doesn't exist. Either create the bucket in MinIO or fix `[s3].bucket` in lakehouse.toml.
|
||
|
||
**embedd 502 on /v1/embed**
|
||
Ollama not running OR `[embedd].default_model` not loaded. `ollama list` to verify; `ollama pull nomic-embed-text-v2-moe` to load.
|
||
|
||
**chatd `/v1/chat/providers` shows `false` for cloud providers**
|
||
The provider's env file is missing or empty. Check `/etc/lakehouse/<provider>.env`.
|
||
|
||
**queryd unable to read parquet**
|
||
Check `[queryd].secrets_path` points at the right secrets-go.toml AND the file's owner+mode allow the lakehouse user to read.
|
||
|
||
## Related docs
|
||
|
||
- `STATE_OF_PLAY.md` — verified-working snapshot
|
||
- `docs/DECISIONS.md` — all ADRs, especially ADR-003 (auth substrate) + ADR-006 (auth posture)
|
||
- `docs/SPEC.md` §1 — component table
|