golangLAKEHOUSE/Dockerfile
root 54a05d9311 Sprint 4 deployment artifacts: Dockerfile + docker-compose
Parallel deploy target to the systemd units that landed in a59ef5b.
Single image carries all 11 daemons; docker-compose runs one
container per daemon with the same dependency graph as the systemd
units. Useful when systemd isn't available (Mac dev, remote VMs
without root) or when isolation to a private docker network is
preferred.

Dockerfile (multi-stage):
- Builder: golang:1.25-bookworm. DuckDB cgo needs gcc + glibc;
  alpine's musl doesn't link the official duckdb-go bindings cleanly.
- Runtime: debian:bookworm-slim — same libc, much smaller surface.
  Adds ca-certificates (outbound HTTPS to OpenRouter/OpenCode/Kimi),
  curl + jq (in-container healthchecks + smoke probes), tini (PID 1
  signal forwarding so docker stop sends SIGTERM to the daemon, not
  to a wrapper).
- Single image, multiple binaries. Ships all 11 cmd/* + 3 scripts/
  (staffing_workers, playbook_lift, multi_coord_stress) so deployed
  stacks can run reality tests against themselves.
- Non-root runtime user (uid 999 lakehouse). Layout matches
  /usr/local/bin/lakehouse/<daemon> from REPLICATION.md.
- ENTRYPOINT=tini; no default CMD — operators / compose pick
  which daemon explicitly.

docker-compose.yml (11 services):
- Same dependency graph as deploy/systemd/. depends_on with
  service_healthy condition matches Requires= equivalents:
    catalogd → storaged
    ingestd → storaged + catalogd
    queryd → catalogd
    matrixd → embedd + vectord
- Gateway uses bare depends_on (no health condition) — Wants=
  equivalent so single-upstream restart doesn't cascade.
- chatd has per-provider env_file entries (one each for
  ollama_cloud, openrouter, opencode, kimi) — missing files are
  silently OK, matching the systemd unit's EnvironmentFile=- list.
- Persistent state on the lakehouse-state named volume; commented
  driver_opts shows how to bind to a host path for off-volume
  backups.

.dockerignore:
- Excludes bin/ + reports/ + data/ + git metadata + .env files.
- Especially excludes lakehouse.toml/secrets-go.toml/auth.env so
  local dev configs don't accidentally bake into a published image.

REPLICATION.md gains a Docker section between systemd setup and
the logs section. Ten-line copy-paste from "git clone" to
"docker compose up -d", plus a docker-vs-systemd differences
table covering process supervision, logs, restart policy, file
ownership, host networking quirks, and backup targets.

Validation: docker compose config --quiet → exit 0 (with
placeholder env files in place).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:58:47 -05:00

90 lines
3.6 KiB
Docker

# syntax=docker/dockerfile:1.6
#
# Multi-stage Dockerfile for Lakehouse-Go.
#
# Single image carries all 11 daemon binaries; docker-compose runs
# one container per daemon (matches the systemd unit topology in
# deploy/systemd/). Operators can also `docker run lakehouse-go
# /usr/local/bin/lakehouse/<daemon>` to invoke any one daemon
# directly.
#
# Builder uses golang:1.25-bookworm (DuckDB cgo needs gcc + glibc;
# alpine's musl doesn't link the official duckdb-go bindings cleanly).
# Runtime is debian:bookworm-slim — same libc, much smaller surface.
#
# Build:
# docker build -t lakehouse-go:latest .
# Or with a tag:
# docker build -t lakehouse-go:$(git rev-parse --short HEAD) .
# ── Stage 1: builder ────────────────────────────────────────────
FROM golang:1.25-bookworm AS builder
# build-essential pulls gcc + make + libc-dev — DuckDB cgo needs all three.
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /src
# Copy go.mod + go.sum first so module download is cacheable across
# source-only changes.
COPY go.mod go.sum ./
RUN go mod download
# Source.
COPY . .
# Build all 11 daemon binaries + the staffing_workers script (used
# by the multi_coord_stress harness; ships in the same image so
# operators can run reality tests against a deployed stack).
RUN go build -trimpath -o /out/ \
./cmd/storaged ./cmd/catalogd ./cmd/ingestd ./cmd/queryd \
./cmd/embedd ./cmd/vectord ./cmd/pathwayd ./cmd/observerd \
./cmd/matrixd ./cmd/gateway ./cmd/chatd \
./scripts/staffing_workers ./scripts/playbook_lift ./scripts/multi_coord_stress
# ── Stage 2: runtime ────────────────────────────────────────────
FROM debian:bookworm-slim
# CA certs for outbound HTTPS (Ollama Cloud, OpenRouter, OpenCode,
# Kimi). curl + jq for in-container health checks + smoke probes.
# tini handles PID 1 signal forwarding so docker stop sends SIGTERM
# to the actual daemon, not just to a wrapper.
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
curl \
jq \
tini \
&& rm -rf /var/lib/apt/lists/*
# Non-root runtime user — same name as the systemd User= directive
# in deploy/systemd/, so file ownership stays consistent across
# deployment modes (docker-compose vs systemd).
RUN groupadd --system --gid 999 lakehouse \
&& useradd --system --uid 999 --gid 999 \
--no-create-home --shell /usr/sbin/nologin lakehouse
# Layout matches /usr/local/bin/lakehouse/<daemon> from REPLICATION.md
# so docs apply equally to systemd + docker deployments.
COPY --from=builder /out/* /usr/local/bin/lakehouse/
# /var/lib/lakehouse for pathway/observer JSONLs; /var/log/lakehouse
# in case operators want file logs in addition to docker logs.
RUN mkdir -p /var/lib/lakehouse/pathway /var/lib/lakehouse/observer /var/log/lakehouse \
&& chown -R lakehouse:lakehouse /var/lib/lakehouse /var/log/lakehouse
USER lakehouse
WORKDIR /var/lib/lakehouse
# No default CMD — operators (or docker-compose) MUST specify which
# daemon. Forces explicit topology rather than implicit "run
# everything in one container."
ENTRYPOINT ["/usr/bin/tini", "--"]
# Default healthcheck targets gateway's port. Per-service compose
# overrides land per their own port.
HEALTHCHECK --interval=10s --timeout=2s --start-period=5s --retries=3 \
CMD curl -sSf http://127.0.0.1:3110/health || exit 1