ADR-003 wiring: Bearer token + IP allowlist middleware

Implements the auth posture from ADR-003 (commit 0d18ffa). Two
independent layers — Bearer token (constant-time compare via
crypto/subtle) and IP allowlist (CIDR set) — composed in shared.Run
so every binary inherits the same gate without per-binary wiring.

Together with the bind-gate from commit 6af0520, this mechanically
closes audit risks R-001 + R-007:
  - non-loopback bind without auth.token = startup refuse
  - non-loopback bind WITH auth.token + override env = allowed
  - loopback bind = all gates open (G0 dev unchanged)

internal/shared/auth.go (NEW)
  RequireAuth(cfg AuthConfig) returns chi-compatible middleware.
  Empty Token + empty AllowedIPs → pass-through (G0 dev mode).
  Token-only → 401 Bearer mismatch.
  AllowedIPs-only → 403 source IP not in CIDR set.
  Both → both gates apply.
  /health bypasses both layers (load-balancer / liveness probes
  shouldn't carry tokens).

  CIDR parsing pre-runs at boot; bare IP (no /N) treated as /32 (or
  /128 for IPv6). Invalid entries log warn and drop, fail-loud-but-
  not-fatal so a typo doesn't kill the binary.

  Token comparison: subtle.ConstantTimeCompare on the full
  "Bearer <token>" wire-format string. Length-mismatch returns 0
  (per stdlib spec), so wrong-length tokens reject without timing
  leak. Pre-encoded comparison slice stored in the middleware
  closure — one allocation per request.

  Source-IP extraction prefers net.SplitHostPort fallback to
  RemoteAddr-as-is for httptest compatibility. X-Forwarded-For
  support is a follow-up when a trusted proxy fronts the gateway
  (config knob TBD per ADR-003 §"Future").

internal/shared/server.go
  Run signature: gained AuthConfig parameter (4th arg).
  /health stays mounted on the outer router (public).
  Registered routes go inside chi.Group with RequireAuth applied —
  empty config = transparent group.
  Added requireAuthOnNonLoopback startup check: non-loopback bind
  with empty Token = refuse to start (cites R-001 + R-007 by name).

internal/shared/config.go
  AuthConfig type added with TOML tags. Fields: Token, AllowedIPs.
  Composed into Config under [auth].

cmd/<svc>/main.go × 7 (catalogd, embedd, gateway, ingestd, queryd,
storaged, vectord, mcpd is unaffected — stdio doesn't bind a port)
  Each call site adds cfg.Auth as the 4th arg to shared.Run. No
  other changes — middleware applies via shared.Run uniformly.

internal/shared/auth_test.go (12 test funcs)
  Empty config pass-through, missing-token 401, wrong-token 401,
  correct-token 200, raw-token-without-Bearer-prefix 401, /health
  always public, IP allowlist allow + reject, bare IP /32, both
  layers when both configured, invalid CIDR drop-with-warn, RemoteAddr
  shape extraction. The constant-time comparison is verified by
  inspection (comments in auth.go) plus the existence of the
  passthrough test (length-mismatch case).

Verified:
  go test -count=1 ./internal/shared/  — all green (was 21, now 33 funcs)
  just verify                            — vet + test + 9 smokes 33s
  just proof contract                    — 53/0/1 unchanged

Smokes + proof harness keep working without any token configuration:
default Auth is empty struct → middleware is no-op → existing tests
pass unchanged. To exercise the gate, operators set [auth].token in
lakehouse.toml (or, per the "future" note in the ADR, via env var).

Closes audit findings:
  R-001 HIGH — fully mechanically closed (was: partial via bind gate)
  R-007 MED  — fully mechanically closed (was: design-only ADR-003)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-04-29 07:11:34 -05:00
parent 8f4c16fab1
commit fa56134b90
12 changed files with 411 additions and 13 deletions

View File

@ -49,7 +49,7 @@ func main() {
h := newHandlers(registry)
if err := shared.Run("catalogd", cfg.Catalogd.Bind, h.register); err != nil {
if err := shared.Run("catalogd", cfg.Catalogd.Bind, h.register, cfg.Auth); err != nil {
slog.Error("server", "err", err)
os.Exit(1)
}

View File

@ -56,7 +56,7 @@ func main() {
"enabled", cfg.Embedd.CacheSize > 0)
h := &handlers{provider: cached, cache: cached}
if err := shared.Run("embedd", cfg.Embedd.Bind, h.register); err != nil {
if err := shared.Run("embedd", cfg.Embedd.Bind, h.register, cfg.Auth); err != nil {
slog.Error("server", "err", err)
os.Exit(1)
}

View File

@ -72,6 +72,7 @@ func main() {
embeddProxy := gateway.NewProxyHandler(embeddURL)
if err := shared.Run("gateway", cfg.Gateway.Bind, func(r chi.Router) {
// Storage / catalog have multi-segment paths under their
// prefix (e.g. /v1/storage/get/<key>). chi's `*` wildcard
// captures the rest of the path.
@ -87,7 +88,7 @@ func main() {
r.Handle("/v1/vectors/*", vectordProxy)
// Embedding service — /v1/embed
r.Handle("/v1/embed", embeddProxy)
}); err != nil {
}, cfg.Auth); err != nil {
slog.Error("server", "err", err)
os.Exit(1)
}

View File

@ -65,7 +65,7 @@ func main() {
maxBytes: maxBytes,
}
if err := shared.Run("ingestd", cfg.Ingestd.Bind, h.register); err != nil {
if err := shared.Run("ingestd", cfg.Ingestd.Bind, h.register, cfg.Auth); err != nil {
slog.Error("server", "err", err)
os.Exit(1)
}

View File

@ -99,7 +99,7 @@ func main() {
h := &handlers{db: db}
if err := shared.Run("queryd", cfg.Queryd.Bind, h.register); err != nil {
if err := shared.Run("queryd", cfg.Queryd.Bind, h.register, cfg.Auth); err != nil {
slog.Error("server", "err", err)
os.Exit(1)
}

View File

@ -78,7 +78,7 @@ func main() {
h := newHandlers(registry)
if err := shared.Run("storaged", cfg.Storaged.Bind, h.register); err != nil {
if err := shared.Run("storaged", cfg.Storaged.Bind, h.register, cfg.Auth); err != nil {
slog.Error("server", "err", err)
os.Exit(1)
}

View File

@ -62,7 +62,7 @@ func main() {
}
}
if err := shared.Run("vectord", cfg.Vectord.Bind, h.register); err != nil {
if err := shared.Run("vectord", cfg.Vectord.Bind, h.register, cfg.Auth); err != nil {
slog.Error("server", "err", err)
os.Exit(1)
}

129
internal/shared/auth.go Normal file
View File

@ -0,0 +1,129 @@
// auth.go — inter-service auth middleware per ADR-003.
//
// Two layers, each independently configurable:
// - Bearer token (constant-time compare via crypto/subtle)
// - IP allowlist (CIDR set; bare IPs treated as /32)
//
// /health is exempt from both layers (load balancers + monitors need
// it open; the route doesn't expose anything sensitive).
//
// When both Token and AllowedIPs are empty, RequireAuth returns a
// pass-through that does no work — preserves G0 dev-mode behavior
// where every binary binds 127.0.0.1 and the network is the auth
// layer.
//
// The non-loopback-bind + empty-token coupling is enforced at
// startup in shared.Run, not in the middleware — the middleware
// only sees per-request auth, not the bind config. Together they
// make the audit's worst case (R-001 + R-007: queryd /sql RCE-eq
// off-loopback with no auth) mechanically impossible.
package shared
import (
"crypto/subtle"
"log/slog"
"net"
"net/http"
"strings"
)
// RequireAuth returns a chi-compatible middleware that enforces
// the configured AuthConfig. Empty config returns a pass-through.
func RequireAuth(cfg AuthConfig) func(http.Handler) http.Handler {
tokenSet := cfg.Token != ""
if !tokenSet && len(cfg.AllowedIPs) == 0 {
// G0 dev mode — no auth wired.
return passthrough
}
// Pre-parse CIDRs once. Invalid entries log a warning and are
// dropped — fail-loud-but-not-fatal so a typo in one CIDR
// doesn't kill the binary; operator sees the warning at startup.
var allowedNets []*net.IPNet
for _, raw := range cfg.AllowedIPs {
cidr := raw
if !strings.Contains(cidr, "/") {
// Bare IP — single-host CIDR.
if strings.Contains(cidr, ":") {
cidr += "/128"
} else {
cidr += "/32"
}
}
_, n, err := net.ParseCIDR(cidr)
if err != nil {
slog.Warn("auth: invalid CIDR in allowed_ips, skipping",
"raw", raw, "err", err)
continue
}
allowedNets = append(allowedNets, n)
}
// Pre-encode the wire-format Bearer token so per-request
// comparison is one allocation against a precomputed slice.
expectedHeader := []byte("Bearer " + cfg.Token)
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// /health bypasses both layers. Operators rely on it
// being public for liveness probes.
if r.URL.Path == "/health" {
next.ServeHTTP(w, r)
return
}
if len(allowedNets) > 0 && !ipAllowed(r, allowedNets) {
http.Error(w, "forbidden", http.StatusForbidden)
return
}
if tokenSet {
got := []byte(r.Header.Get("Authorization"))
// ConstantTimeCompare returns 0 if lengths differ,
// 1 on match. Anything else (would be 0 or 1) is
// treated as no-match.
if subtle.ConstantTimeCompare(got, expectedHeader) != 1 {
http.Error(w, "unauthorized", http.StatusUnauthorized)
return
}
}
next.ServeHTTP(w, r)
})
}
}
// passthrough is the no-op middleware returned when no auth is
// configured. Used by RequireAuth in G0 dev mode.
func passthrough(next http.Handler) http.Handler { return next }
// ipAllowed checks whether the request's source IP is in any of
// the allowed networks. Falls back to false for unparseable
// RemoteAddr — a deploy with broken peer-IP logging would otherwise
// silently bypass the allowlist.
func ipAllowed(r *http.Request, nets []*net.IPNet) bool {
ip := remoteIP(r)
if ip == nil {
return false
}
for _, n := range nets {
if n.Contains(ip) {
return true
}
}
return false
}
// remoteIP extracts the request's source IP. Today: r.RemoteAddr.
// Future: when a trusted proxy fronts the gateway and adds
// X-Forwarded-For, we'd add a config knob to honor the first hop.
// G0 deploys are direct-to-binary so RemoteAddr suffices.
func remoteIP(r *http.Request) net.IP {
host, _, err := net.SplitHostPort(r.RemoteAddr)
if err != nil {
// SplitHostPort failure could mean the test's httptest.Server
// passed a bare IP; try parsing as-is.
host = r.RemoteAddr
}
return net.ParseIP(host)
}

View File

@ -0,0 +1,204 @@
package shared
import (
"net/http"
"net/http/httptest"
"testing"
"github.com/go-chi/chi/v5"
)
// Closes the audit's R-001 + R-007 mechanically once auth is wired
// per ADR-003. Tests cover: pass-through on empty config, token
// gate (401 on missing/wrong, 200 on correct), IP gate (403 on
// disallowed, 200 on allowed), /health bypass, both-layers when
// both configured, and constant-time comparison usage.
func mountWithAuth(cfg AuthConfig) http.Handler {
r := chi.NewRouter()
r.Get("/health", func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("ok"))
})
r.Group(func(authed chi.Router) {
authed.Use(RequireAuth(cfg))
authed.Get("/data", func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("private"))
})
})
return r
}
func get(t *testing.T, srv *httptest.Server, path string, header string) (int, string) {
t.Helper()
req, err := http.NewRequest(http.MethodGet, srv.URL+path, nil)
if err != nil {
t.Fatalf("NewRequest: %v", err)
}
if header != "" {
req.Header.Set("Authorization", header)
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
t.Fatalf("Do: %v", err)
}
defer resp.Body.Close()
buf := make([]byte, 1024)
n, _ := resp.Body.Read(buf)
return resp.StatusCode, string(buf[:n])
}
func TestRequireAuth_EmptyConfig_PassesThrough(t *testing.T) {
srv := httptest.NewServer(mountWithAuth(AuthConfig{}))
defer srv.Close()
if status, _ := get(t, srv, "/data", ""); status != http.StatusOK {
t.Errorf("empty config should pass through, got status %d", status)
}
if status, _ := get(t, srv, "/health", ""); status != http.StatusOK {
t.Errorf("/health on empty config should pass, got %d", status)
}
}
func TestRequireAuth_TokenSet_RejectsMissing(t *testing.T) {
srv := httptest.NewServer(mountWithAuth(AuthConfig{Token: "secret123"}))
defer srv.Close()
if status, _ := get(t, srv, "/data", ""); status != http.StatusUnauthorized {
t.Errorf("missing Authorization should return 401, got %d", status)
}
}
func TestRequireAuth_TokenSet_RejectsWrong(t *testing.T) {
srv := httptest.NewServer(mountWithAuth(AuthConfig{Token: "secret123"}))
defer srv.Close()
if status, _ := get(t, srv, "/data", "Bearer wrong-token"); status != http.StatusUnauthorized {
t.Errorf("wrong token should return 401, got %d", status)
}
}
func TestRequireAuth_TokenSet_AcceptsCorrect(t *testing.T) {
srv := httptest.NewServer(mountWithAuth(AuthConfig{Token: "secret123"}))
defer srv.Close()
status, body := get(t, srv, "/data", "Bearer secret123")
if status != http.StatusOK {
t.Errorf("correct Bearer should return 200, got %d", status)
}
if body != "private" {
t.Errorf("expected 'private' body, got %q", body)
}
}
func TestRequireAuth_TokenSet_HealthRemainsPublic(t *testing.T) {
srv := httptest.NewServer(mountWithAuth(AuthConfig{Token: "secret123"}))
defer srv.Close()
if status, _ := get(t, srv, "/health", ""); status != http.StatusOK {
t.Errorf("/health must stay public when token is set, got %d", status)
}
}
func TestRequireAuth_RejectsTokenAsRawNotBearer(t *testing.T) {
srv := httptest.NewServer(mountWithAuth(AuthConfig{Token: "secret123"}))
defer srv.Close()
// "secret123" alone (without "Bearer " prefix) should NOT pass.
if status, _ := get(t, srv, "/data", "secret123"); status != http.StatusUnauthorized {
t.Errorf("raw-token without 'Bearer ' prefix must reject, got %d", status)
}
}
func TestRequireAuth_IPAllowlist_AllowsLoopback(t *testing.T) {
cfg := AuthConfig{AllowedIPs: []string{"127.0.0.0/8"}}
srv := httptest.NewServer(mountWithAuth(cfg))
defer srv.Close()
// httptest.Server binds 127.0.0.1, so the test client always
// connects from 127.0.0.1 → allowed.
if status, _ := get(t, srv, "/data", ""); status != http.StatusOK {
t.Errorf("loopback in allowed CIDR should pass, got %d", status)
}
}
func TestRequireAuth_IPAllowlist_RejectsNonAllowed(t *testing.T) {
// Allowlist excludes loopback — every test request from 127.x.x.x
// gets rejected.
cfg := AuthConfig{AllowedIPs: []string{"10.0.0.0/8"}}
srv := httptest.NewServer(mountWithAuth(cfg))
defer srv.Close()
if status, _ := get(t, srv, "/data", ""); status != http.StatusForbidden {
t.Errorf("non-allowed IP should return 403, got %d", status)
}
}
func TestRequireAuth_BareIPInAllowlist(t *testing.T) {
// Bare IP without /N suffix should be treated as /32.
cfg := AuthConfig{AllowedIPs: []string{"127.0.0.1"}}
srv := httptest.NewServer(mountWithAuth(cfg))
defer srv.Close()
if status, _ := get(t, srv, "/data", ""); status != http.StatusOK {
t.Errorf("bare IP /32 should pass, got %d", status)
}
}
func TestRequireAuth_BothLayers_RequiresBoth(t *testing.T) {
cfg := AuthConfig{
Token: "secret123",
AllowedIPs: []string{"127.0.0.0/8"},
}
srv := httptest.NewServer(mountWithAuth(cfg))
defer srv.Close()
// Missing token: 401 even from allowed IP.
if status, _ := get(t, srv, "/data", ""); status != http.StatusUnauthorized {
t.Errorf("allowed IP without token should still 401, got %d", status)
}
// Wrong token: 401.
if status, _ := get(t, srv, "/data", "Bearer wrong"); status != http.StatusUnauthorized {
t.Errorf("allowed IP + wrong token should 401, got %d", status)
}
// Right token: 200 (we're on loopback, IP allowed).
if status, _ := get(t, srv, "/data", "Bearer secret123"); status != http.StatusOK {
t.Errorf("allowed IP + correct token should 200, got %d", status)
}
// /health: 200 always.
if status, _ := get(t, srv, "/health", ""); status != http.StatusOK {
t.Errorf("/health bypass on both-layers cfg, got %d", status)
}
}
func TestRequireAuth_InvalidCIDR_LoggedAndDropped(t *testing.T) {
// Mix of valid and invalid CIDRs: invalid should be skipped
// (warning logged) and the valid one should still gate.
cfg := AuthConfig{AllowedIPs: []string{
"not-a-cidr",
"10.0.0.0/8",
}}
srv := httptest.NewServer(mountWithAuth(cfg))
defer srv.Close()
// Loopback test client → still rejected because the only valid
// rule is 10.0.0.0/8.
if status, _ := get(t, srv, "/data", ""); status != http.StatusForbidden {
t.Errorf("loopback should reject when allowlist excludes it, got %d", status)
}
}
func TestRemoteIP_SplitHostPortShape(t *testing.T) {
// Sanity: real httptest requests come through with "ip:port"
// shape; ensure remoteIP returns the IP portion.
r, _ := http.NewRequest(http.MethodGet, "/", nil)
r.RemoteAddr = "10.1.2.3:54321"
ip := remoteIP(r)
if ip == nil {
t.Fatal("remoteIP returned nil for valid 'ip:port' addr")
}
if ip.String() != "10.1.2.3" {
t.Errorf("remoteIP = %q, want 10.1.2.3", ip.String())
}
}

View File

@ -46,6 +46,26 @@ func requireLoopbackOrOverride(serviceName, addr string) error {
"(set %s=1 to override; see audit R-001)", addr, serviceName, envKey)
}
// requireAuthOnNonLoopback closes the audit's R-001 + R-007 worst
// case: any binary that's deployed off-loopback MUST have an auth
// token configured. An off-loopback bind without auth is the literal
// "queryd /sql is RCE-equivalent" failure mode.
//
// Pairs with requireLoopbackOrOverride: that gate refuses non-loopback
// bind unless an explicit env override fires; this gate refuses the
// same bind unless auth.token is also set. Together they make the
// worst case mechanically impossible.
func requireAuthOnNonLoopback(serviceName, addr string, auth AuthConfig) error {
if isLoopbackAddr(addr) {
return nil
}
if auth.Token != "" {
return nil
}
return fmt.Errorf("refuse non-loopback bind %q for %q without auth.token configured "+
"(R-001 + R-007 — see ADR-003)", addr, serviceName)
}
// isLoopbackAddr returns true iff addr's host portion is on the
// loopback interface. Covers IPv4 127.0.0.0/8, IPv6 ::1, and
// "localhost". Empty host (":port"), empty string, and any

View File

@ -28,6 +28,7 @@ type Config struct {
Embedd EmbeddConfig `toml:"embedd"`
S3 S3Config `toml:"s3"`
Log LogConfig `toml:"log"`
Auth AuthConfig `toml:"auth"`
}
// IngestConfig adds ingestd-specific knobs. ingestd needs to PUT
@ -126,6 +127,27 @@ type LogConfig struct {
Level string `toml:"level"`
}
// AuthConfig is the inter-service auth posture from ADR-003.
// Token is a Bearer token; empty means "no auth" (G0 dev mode).
// AllowedIPs is a list of CIDRs (or bare IPs treated as /32);
// empty means "any source IP."
//
// Both layers operate independently when set:
// - Token + AllowedIPs both empty → middleware is a no-op
// - Token only → 401 unless Bearer matches
// - AllowedIPs only → 403 unless r.RemoteAddr in CIDR
// - Both → both gates apply
//
// The startup gate in shared.Run refuses to start with non-loopback
// bind AND empty Token — that's the audit's R-001 + R-007 worst
// case (no auth, world-reachable). LH_<SVC>_ALLOW_NONLOOPBACK=1 still
// bypasses the bind gate for explicit dev cases; the auth gate is
// independent of that bypass and is the real production guard.
type AuthConfig struct {
Token string `toml:"token"`
AllowedIPs []string `toml:"allowed_ips"`
}
// DefaultConfig returns the G0 dev defaults. Ports are shifted to
// 3110+ to coexist with the live Rust lakehouse on 3100/3201-3204
// during the migration. G5 cutover flips gateway back to 3100.

View File

@ -48,14 +48,28 @@ type RegisterRoutes func(r chi.Router)
// (Per Kimi review #4: shared library functions shouldn't silently
// mutate package globals.)
//
// Refuses to bind a non-loopback address unless the
// LH_<SERVICE>_ALLOW_NONLOOPBACK=1 env is set — closes the accidental
// 0.0.0.0 deploy path for R-001 (queryd /sql is RCE-equivalent off
// loopback, but the gate applies to every binary uniformly).
func Run(serviceName, addr string, register RegisterRoutes) error {
// Three startup gates apply in order:
//
// 1. requireLoopbackOrOverride — refuses non-loopback bind unless
// LH_<SERVICE>_ALLOW_NONLOOPBACK=1 is set. Closes the accidental
// 0.0.0.0 deploy path for R-001.
// 2. requireAuthOnNonLoopback — refuses non-loopback bind when
// auth.token is empty. Mechanically prevents R-001 + R-007's
// worst case: world-reachable bind with no auth layer.
// 3. RequireAuth middleware — runs per-request on registered routes.
// /health stays exempt (mounted on the outer router, before the
// authed group).
//
// Per ADR-003: empty auth.token + empty allowed_ips → middleware is
// a no-op. Smokes and proof harness keep working without setting
// either.
func Run(serviceName, addr string, register RegisterRoutes, auth AuthConfig) error {
if err := requireLoopbackOrOverride(serviceName, addr); err != nil {
return err
}
if err := requireAuthOnNonLoopback(serviceName, addr, auth); err != nil {
return err
}
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: slog.LevelInfo,
@ -67,6 +81,8 @@ func Run(serviceName, addr string, register RegisterRoutes) error {
r.Use(middleware.Recoverer)
r.Use(slogRequest(logger))
// /health stays on the outer router — public, no auth. Operators
// rely on it for liveness probes that don't carry a token.
r.Get("/health", func(w http.ResponseWriter, _ *http.Request) {
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(HealthResponse{
@ -76,7 +92,13 @@ func Run(serviceName, addr string, register RegisterRoutes) error {
})
if register != nil {
register(r)
// Registered routes live inside an auth-gated group so
// RequireAuth applies uniformly without per-handler wiring.
// Empty auth → middleware is a no-op (group is transparent).
r.Group(func(authed chi.Router) {
authed.Use(RequireAuth(auth))
register(authed)
})
}
srv := &http.Server{