ADR-003: inter-service auth posture — Bearer + IP allowlist
Locks in the auth model that R-001 + R-007 will be retrofitted against. Doc-only — wiring deferred to Sprint 1 when the first non-loopback binding is needed. Decision: Bearer token (from secrets-go.toml [auth] section) + IP allowlist (CIDR list). Both layers required when auth is on; empty token = G0 dev no-op. /health exempt. Implementation shape (when it lands): - internal/shared/auth.go middleware: one chi r.Use line per binary - shared.Run gates: refuses non-loopback bind without configured token - subtle.ConstantTimeCompare for token equality (timing-safe) Alternatives considered + rejected: mTLS — too heavy for single-machine inter-service traffic JWT — buys nothing over Bearer without external IdP IP-only — one stolen IP entry = full access; no defense depth OAuth2 — no external IdP commitment in G0-G3 timeline What this doesn't do: - Doesn't implement (code lands Sprint 1) - Doesn't break G0 dev (empty token = middleware no-op) - Doesn't address gateway→end-user auth (different ADR shape) Closes the design-decision blocker for R-001 and R-007. Wiring ticket: Sprint 1 backlog story S1.2. Also lifts ADR-002 (storaged per-prefix PUT cap) into the doc — it was implemented in 423a381 but not yet recorded as an ADR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
423a3817c5
commit
0d18ffa780
@ -121,6 +121,127 @@ historical record.
|
||||
|
||||
---
|
||||
|
||||
(Future ADRs from ADR-002 onward will be added as the Go
|
||||
## ADR-002: storaged per-prefix PUT cap (vectord _vectors/ → 4 GiB)
|
||||
**Date:** 2026-04-29
|
||||
**Decided by:** J
|
||||
**Status:** Implemented (commit `423a381`)
|
||||
|
||||
`storaged` enforces a 256 MiB per-PUT body cap as DoS protection
|
||||
(`MaxBytesReader` + Content-Length check). Keys under `_vectors/`
|
||||
(vectord LHV1 persistence) get a raised cap of 4 GiB; everything
|
||||
else stays at 256 MiB.
|
||||
|
||||
**Rationale:** the 500K staffing test surfaced that single-file LHV1
|
||||
above ~150K vectors at d=768 hits the 256 MiB cap. `manager.Uploader`
|
||||
already streams on the outbound side, so the cap is a safety gate
|
||||
not a memory bottleneck — raising it for the vector path doesn't
|
||||
introduce new memory pressure. Per-prefix preserves the safety
|
||||
gate for routine traffic while opening the documented production
|
||||
path. Splitting LHV1 across multiple keys was rejected because G1P
|
||||
specifically shipped the single-Put framed format to eliminate
|
||||
torn-write — multi-key would re-introduce that failure mode.
|
||||
|
||||
**Follow-up:** if production workloads exceed 4 GiB single-file
|
||||
LHV1, refactor to operator-driven config (env/TOML) rather than
|
||||
bumping the constant. The function-level `maxPutBytesFor(key)` in
|
||||
`cmd/storaged/main.go` keeps that drop-in clean.
|
||||
|
||||
---
|
||||
|
||||
## ADR-003: Inter-service auth posture — Bearer token + IP allowlist
|
||||
**Date:** 2026-04-29
|
||||
**Decided by:** J + Claude
|
||||
**Status:** Decided — wiring deferred to Sprint 1
|
||||
|
||||
**Decision:** When inter-service auth is needed (the moment any
|
||||
binary binds non-loopback or the deployment crosses a trust
|
||||
boundary), the auth model is **a Bearer token loaded from
|
||||
`secrets-go.toml` plus a configurable IP allowlist**. Both layers
|
||||
required: the token authenticates the caller; the allowlist
|
||||
narrows the network surface.
|
||||
|
||||
**Status today (G0):** zero auth middleware. Every binary binds
|
||||
`127.0.0.1` by default; commit `6af0520` (R-001 partial fix) refuses
|
||||
non-loopback bind unless the per-service `LH_<SVC>_ALLOW_NONLOOPBACK=1`
|
||||
env override is set. The override-and-no-auth combination is the
|
||||
worst case — this ADR locks in what we'll require before any
|
||||
production override fires.
|
||||
|
||||
### What gets implemented when auth lands
|
||||
|
||||
1. **`secrets-go.toml` adds a `[auth]` section:**
|
||||
```toml
|
||||
[auth]
|
||||
token = "..." # 32+ random bytes, hex-encoded
|
||||
allowed_ips = ["10.0.0.0/8", "127.0.0.1/32"] # CIDR list
|
||||
```
|
||||
|
||||
2. **`internal/shared/auth.go`** ships a single chi middleware:
|
||||
```go
|
||||
func RequireAuth(cfg AuthConfig) func(http.Handler) http.Handler
|
||||
```
|
||||
- Empty `cfg.Token` → middleware is a no-op (G0 dev mode).
|
||||
- Non-empty token → reject 401 unless request has
|
||||
`Authorization: Bearer <token>` matching constant-time.
|
||||
- Non-empty `allowed_ips` → reject 403 unless `r.RemoteAddr` (or
|
||||
`X-Forwarded-For` first hop, configurable) is in CIDR set.
|
||||
- `/health` exempt — load balancers + monitors need it open.
|
||||
|
||||
3. **Every `cmd/<svc>/main.go` adds one line:**
|
||||
```go
|
||||
r.Use(shared.RequireAuth(cfg.Auth))
|
||||
```
|
||||
Mounted before `register(r)` so it covers every route the binary
|
||||
exposes after `/health`.
|
||||
|
||||
4. **`shared.Run` startup gate:** if bind is non-loopback AND
|
||||
`cfg.Auth.Token == ""`, refuse to start. The implicit
|
||||
"localhost is the auth layer" guarantee becomes explicit when
|
||||
crossing the loopback boundary.
|
||||
|
||||
### Alternatives considered
|
||||
|
||||
| Option | Why rejected |
|
||||
|---|---|
|
||||
| **mTLS** | Strongest but heaviest — every binary needs cert provisioning, rotation tooling, and cert-aware client wiring. Overkill for inter-service traffic that already passes through a single gateway. Reconsider when Lakehouse-Go runs across machines. |
|
||||
| **JWT with short TTL** | Buys nothing over Bearer here — there's no third-party identity provider, no claim hierarchy worth modelling. Pure token has the same security properties at half the wire complexity. |
|
||||
| **No auth, IP-allowlist only** | One stolen IP allowlist entry → full access. Token + IP is defense in depth; either alone is too weak. |
|
||||
| **OAuth2 via external IdP** | Rejected for G0–G3 timeline. No external IdP commitment. Revisit if Lakehouse-Go ever serves end-user requests directly (today everything fronts through the staffing co-pilot which has its own session model). |
|
||||
|
||||
### Constant-time comparison + token hygiene
|
||||
|
||||
Token comparison must use `crypto/subtle.ConstantTimeCompare` —
|
||||
naive `==` is vulnerable to timing attacks against an attacker who
|
||||
can issue many requests and measure round-trip. Token rotation is
|
||||
operator-driven via `secrets-go.toml` edit + restart; G0 doesn't
|
||||
need rotate-without-restart.
|
||||
|
||||
### What this ADR does NOT do
|
||||
|
||||
- **Does not implement the middleware.** Code lands in Sprint 1.
|
||||
- **Does not require token in G0 dev.** Empty token → no-op. Smokes
|
||||
+ proof harness keep working without setting tokens.
|
||||
- **Does not address gateway → end-user auth.** Gateway terminates
|
||||
inter-service auth at its inbound; if end-users hit gateway from
|
||||
a browser, that's a different ADR (likely cookie/session, fronted
|
||||
by a reverse proxy that handles user auth).
|
||||
|
||||
### How this closes audit findings
|
||||
|
||||
- **R-001 (queryd /sql RCE-equivalent off-loopback):** the bind
|
||||
gate prevents accidental exposure today; this ADR specifies the
|
||||
guardrail when intentional exposure is needed.
|
||||
- **R-007 (zero auth middleware):** answered by the design above;
|
||||
R-007 stays open until the middleware is implemented but is no
|
||||
longer "design TBD."
|
||||
- **R-010 (no CORS posture):** orthogonal to inter-service auth,
|
||||
but the `RequireAuth` middleware sits at the right layer to add
|
||||
CORS handling later (browsers don't reach inter-service routes
|
||||
in the current design, so CORS is also Sprint 1+ when end-user
|
||||
requests start landing).
|
||||
|
||||
---
|
||||
|
||||
(Future ADRs from ADR-004 onward will be added as the Go
|
||||
implementation accrues design decisions — e.g. HNSW parameter
|
||||
choices, pathway-memory hash function, auditor model rotation, etc.)
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user