3 Commits

Author SHA1 Message Date
root
814197cfd3 ADR-006: auth posture for non-loopback deploy + token rotation impl
ADR-003 locked the auth substrate; ADR-006 ratifies the operator
playbook + adds two implementation pieces needed for Sprint 4
deployment: env-resolved tokens and dual-token rotation.

Six decisions locked in docs/DECISIONS.md:
- 6.1: Non-loopback bind requires auth.token (mechanical gate at
       shared.Run, already implemented; this ratifies it).
- 6.2: Token from env, not TOML. /etc/lakehouse/auth.env (mode 0600)
       loaded by systemd EnvironmentFile=. New TokenEnv field on
       AuthConfig defaults to "AUTH_TOKEN".
- 6.3: AllowedIPs for inter-service same-trust-domain; Token for
       cross-trust-boundary (gateway ↔ external).
- 6.4: /health stays unauthenticated; everything else under
       shared.Run is gated. Already implemented; ratified here.
- 6.5: Token rotation is dual-token. New SecondaryTokens []string
       on AuthConfig — both primary and any secondary pass auth
       during the rotation window. Implemented in this commit.
- 6.6: TLS terminates at the network edge (nginx/Caddy), not
       in-process. Daemons stay HTTP-only; internal traffic stays
       on private subnets per Decision 6.3.

Implementation:
- internal/shared/config.go: AuthConfig gains TokenEnv +
  SecondaryTokens fields. New resolveAuthFromEnv() called by
  LoadConfig fills Token from os.Getenv(TokenEnv) when Token is
  empty. TokenEnv defaults to "AUTH_TOKEN" so the happy path needs
  no TOML config.
- internal/shared/auth.go: RequireAuth pre-encodes Bearer headers
  for primary + every secondary token; per-request constant-time
  compare walks the slice. Fast path is 1 compare (primary).

Tests:
- TestLoadConfig_AuthTokenFromEnv (3 sub-tests): default env name,
  custom token_env, explicit Token wins over env.
- TestRequireAuth_SecondaryTokenAccepted: both primary + secondary
  tokens pass during rotation window.
- TestRequireAuth_SecondaryTokensOnly: only-secondary path works
  for the case where primary was just promoted-to-empty mid-rotation.

go test ./internal/shared all green; existing auth_test.go
unchanged (constant-time compare path preserved).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:51:14 -05:00
root
ec1d031996 phase 1: add [models] tier config — additive, no callers migrate yet
Codifies the small-model-pipeline tiering (per project_small_model_pipeline_vision.md)
in lakehouse.toml [models] section. Tier names map to actual model
IDs; bumping a model means editing one line, not hunting through code.

Tier philosophy:
- local_*    : on-box Ollama. Inner-loop hot path. Repeated calls.
- cloud_*    : Ollama Cloud (Pro plan). Larger context, fail-up tier.
- frontier_* : OpenRouter / OpenCode. Rate-limited, billed per call.

weak_models is the codified "local-hot-path eligible" list — phase 2
will migrate matrix.downgrade to read it instead of hardcoding.

Defaults reflect 2026-04-29 architecture: qwen3.5:latest as local
(stronger than qwen2.5, same JSON-clean property), kimi-k2.6 as cloud
judge (kimi-k2:1t still upstream-broken), opus-4-7 + kimi-k2-0905 as
frontier review/arch via OpenRouter, opencode/claude-opus-4-7 as
frontier_free leveraging the OpenCode subscription.

3 new tests in internal/shared/config_test.go:
- TestDefaultConfig_ModelsTier — locks tier defaults
- TestModelsConfig_IsWeak     — weak-bypass list
- TestLoadConfig_ModelsTOMLRoundTrip — override semantics

just verify PASS (g2 had one flake on first run — Ollama transfer
truncation; clean on retry, unrelated to this change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:48:45 -05:00
root
125e1c80b9 tests: close R-002 / R-003 / R-008 — internal/shared, storeclient, queryd/db.go
Audit-driven follow-up to the Rust scrum review on the 3 untested
HIGH-risk packages. Both the audit (reports/scrum/risk-register.md)
and the scrum (tests/real-world/runs/scrum_mojxb5bw/) independently
flagged these files as the highest-leverage missing test coverage.

internal/shared/server_test.go — 8 test funcs
  newListener: valid addr, invalid addr (non-numeric port, port
    out of range, port-already-in-use surfacing as net.OpError).
  Empty-addr-is-valid: documents the net.Listen quirk that "" binds
    an OS-picked port — future readers don't need to relitigate.
  HealthResponse marshal: JSON shape stable, round-trip clean.
  /health handler reconstructed via httptest.Server: status 200,
    Content-Type application/json, body fields stable.
  RegisterRoutes callback: contract verified (callback is invoked
    with a real chi.Router, mounted route reachable end-to-end).
  Run bind-failure surface: synchronous error, not a goroutine swallow
    — the contract Run depends on per the race-safe-startup comment.

internal/shared/config_test.go — 6 test funcs
  DefaultConfig G0 port pinning: every binary's default bind locked
    in (3110/3211-3216) so a refactor can't silently flip a port.
  LoadConfig empty path: returns DefaultConfig, no error.
  LoadConfig missing file: returns DefaultConfig, logs warn (the warn
    line shows up in test output, captured-but-not-asserted).
  LoadConfig valid TOML: partial overrides land, unspecified sections
    keep defaults (TOML decoder leave-alone behavior).
  LoadConfig invalid TOML: returns wrapped 'parse config' error.
  LoadConfig unreadable file: skipped under root (root reads 0000);
    captures the read-error wrap path for non-root contexts.

internal/storeclient/client_test.go — 14 test funcs
  safeKey table-driven: plain segments, single slash, empty, trailing
    slash, space (→ %20), apostrophe (→ %27), unicode (→ %C3%A9),
    deep nesting. Locks URL-escape contract per scrum suggestion.
  recordingServer helper backs Put/Get/Delete/List against
    httptest.Server: verifies method, path, body bytes round-trip.
  ErrKeyNotFound on 404 (errors.Is round-trip).
  Non-OK status wraps body preview into the error chain.
  Delete accepts both 200 and 204 (S3 vs compatible-store quirk).
  List parses JSON shape and surfaces query-string prefix.
  Context cancellation propagates through Put as context.Canceled.

internal/queryd/db_test.go — 5 test funcs (with subtests)
  sqlEscape table-driven: 8 cases including empty, all-quotes,
    nested apostrophes (the case from the scrum suggestion).
  redactCreds table-driven: 6 cases — both keys, single keys,
    empty, multi-occurrence, placeholder-collision (lossy but safe).
  buildBootstrap statement order: INSTALL → LOAD → CREATE SECRET.
  buildBootstrap endpoint schemes: http strips + USE_SSL false,
    https keeps SSL true, no-scheme defaults SSL true (prod ambient).
  buildBootstrap URL_STYLE: 'path' vs 'vhost' branch.
  buildBootstrap escapes credential quotes: future SSO-token-with-
    apostrophe doesn't break out of the SQL string literal — the
    belt holds when the suspenders snap.

Real finding caught by my own test:
  net.Listen("tcp", "") succeeds (OS-picked port) — captured as
  TestNewListener_EmptyAddrIsValid so the quirk is documented.

Verified:
  go test -short ./... — every internal/ package now has tests
    (no more 'no test files' lines for shared/storeclient).
  just verify — vet + test + 9 smokes green in 33s.
  just proof contract — 53/0/1 green (no harness regression).

Closes:
  R-002 internal/shared zero tests        HIGH
  R-003 internal/storeclient zero tests   HIGH
  R-008 queryd/db.go untested             MED (sqlEscape, redactCreds,
                                              CREATE SECRET formation)

Composite scrum score should move from 43 → ~46 / 60 — the three
HIGH/MED risks closed, internal/shared and internal/storeclient
become "tested + load-bearing" instead of "untested + load-bearing."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:51:05 -05:00