root 125e1c80b9 tests: close R-002 / R-003 / R-008 — internal/shared, storeclient, queryd/db.go
Audit-driven follow-up to the Rust scrum review on the 3 untested
HIGH-risk packages. Both the audit (reports/scrum/risk-register.md)
and the scrum (tests/real-world/runs/scrum_mojxb5bw/) independently
flagged these files as the highest-leverage missing test coverage.

internal/shared/server_test.go — 8 test funcs
  newListener: valid addr, invalid addr (non-numeric port, port
    out of range, port-already-in-use surfacing as net.OpError).
  Empty-addr-is-valid: documents the net.Listen quirk that "" binds
    an OS-picked port — future readers don't need to relitigate.
  HealthResponse marshal: JSON shape stable, round-trip clean.
  /health handler reconstructed via httptest.Server: status 200,
    Content-Type application/json, body fields stable.
  RegisterRoutes callback: contract verified (callback is invoked
    with a real chi.Router, mounted route reachable end-to-end).
  Run bind-failure surface: synchronous error, not a goroutine swallow
    — the contract Run depends on per the race-safe-startup comment.

internal/shared/config_test.go — 6 test funcs
  DefaultConfig G0 port pinning: every binary's default bind locked
    in (3110/3211-3216) so a refactor can't silently flip a port.
  LoadConfig empty path: returns DefaultConfig, no error.
  LoadConfig missing file: returns DefaultConfig, logs warn (the warn
    line shows up in test output, captured-but-not-asserted).
  LoadConfig valid TOML: partial overrides land, unspecified sections
    keep defaults (TOML decoder leave-alone behavior).
  LoadConfig invalid TOML: returns wrapped 'parse config' error.
  LoadConfig unreadable file: skipped under root (root reads 0000);
    captures the read-error wrap path for non-root contexts.

internal/storeclient/client_test.go — 14 test funcs
  safeKey table-driven: plain segments, single slash, empty, trailing
    slash, space (→ %20), apostrophe (→ %27), unicode (→ %C3%A9),
    deep nesting. Locks URL-escape contract per scrum suggestion.
  recordingServer helper backs Put/Get/Delete/List against
    httptest.Server: verifies method, path, body bytes round-trip.
  ErrKeyNotFound on 404 (errors.Is round-trip).
  Non-OK status wraps body preview into the error chain.
  Delete accepts both 200 and 204 (S3 vs compatible-store quirk).
  List parses JSON shape and surfaces query-string prefix.
  Context cancellation propagates through Put as context.Canceled.

internal/queryd/db_test.go — 5 test funcs (with subtests)
  sqlEscape table-driven: 8 cases including empty, all-quotes,
    nested apostrophes (the case from the scrum suggestion).
  redactCreds table-driven: 6 cases — both keys, single keys,
    empty, multi-occurrence, placeholder-collision (lossy but safe).
  buildBootstrap statement order: INSTALL → LOAD → CREATE SECRET.
  buildBootstrap endpoint schemes: http strips + USE_SSL false,
    https keeps SSL true, no-scheme defaults SSL true (prod ambient).
  buildBootstrap URL_STYLE: 'path' vs 'vhost' branch.
  buildBootstrap escapes credential quotes: future SSO-token-with-
    apostrophe doesn't break out of the SQL string literal — the
    belt holds when the suspenders snap.

Real finding caught by my own test:
  net.Listen("tcp", "") succeeds (OS-picked port) — captured as
  TestNewListener_EmptyAddrIsValid so the quirk is documented.

Verified:
  go test -short ./... — every internal/ package now has tests
    (no more 'no test files' lines for shared/storeclient).
  just verify — vet + test + 9 smokes green in 33s.
  just proof contract — 53/0/1 green (no harness regression).

Closes:
  R-002 internal/shared zero tests        HIGH
  R-003 internal/storeclient zero tests   HIGH
  R-008 queryd/db.go untested             MED (sqlEscape, redactCreds,
                                              CREATE SECRET formation)

Composite scrum score should move from 43 → ~46 / 60 — the three
HIGH/MED risks closed, internal/shared and internal/storeclient
become "tested + load-bearing" instead of "untested + load-bearing."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:51:05 -05:00

golangLAKEHOUSE

Go reimplementation of the Lakehouse — a versioned knowledge substrate for staffing analytics + local AI workloads.

Status

Phase G0 complete + G1/G1P/G2 shipped. Six binaries plus a seventh (vectord) and an eighth (embedd) on top, fronted by a single gateway. Acceptance smokes green for D1-D6 + G1 + G1P + G2.

End-to-end staffing co-pilot pipeline functional through the gateway:

text → /v1/embed → /v1/vectors/index/<name>/add
text → /v1/embed → /v1/vectors/index/<name>/search → top-K hits

Plus the SQL path:

CSV  → /v1/ingest    (parses, writes Parquet via storaged, registers
                      manifest with catalogd)
SQL  → /v1/sql       (DuckDB over the registered Parquets via httpfs)

See docs/PHASE_G0_KICKOFF.md for the day-by-day record (D1-D6 + real-scale validation + G1/G1P/G2 pointer at the bottom).

Service inventory

Bin Port Role
gateway 3110 Reverse proxy fronting all backing services
storaged 3211 Object I/O over S3 (MinIO in dev)
catalogd 3212 Parquet manifest registry, ADR-020 idempotency
ingestd 3213 CSV → Parquet → register loop
queryd 3214 DuckDB SELECT over registered Parquets via httpfs
vectord 3215 HNSW vector search (+ optional persistence to storaged)
embedd 3216 Text → vector via Ollama (default nomic-embed-text 768-d)

Acceptance smokes

scripts/d1_smoke.sh   # 5-binary skeleton + chi /health + gateway proxy probes
scripts/d2_smoke.sh   # storaged GET/PUT/LIST/DELETE + 256 MiB cap + concurrency cap
scripts/d3_smoke.sh   # catalogd register/manifest/list + rehydrate-across-restart
scripts/d4_smoke.sh   # ingestd CSV → Parquet round-trip + schema-drift 409
scripts/d5_smoke.sh   # queryd DuckDB SELECT through httpfs over MinIO
scripts/d6_smoke.sh   # full ingest → query through gateway only
scripts/g1_smoke.sh   # vectord HNSW recall + dim mismatch + duplicate-create 409
scripts/g1p_smoke.sh  # vectord state survives kill+restart via storaged
scripts/g2_smoke.sh   # embed → vectord add → search round-trip

Or run the full gate via the task runner (see below):

just verify     # vet + tests + 9 smokes; ~33s wall

Task runner

just                 # show available recipes
just verify          # full Sprint 0 gate (vet + tests + 9 smokes)
just smoke <day>     # single smoke (d1..d6, g1, g1p, g2)
just doctor          # check cold-start deps; --json for CI
just install-hooks   # install pre-push hook that runs just verify

After a fresh clone, run just install-hooks once so git push is gated on the same green chain that ran here. Hook lives in .git/hooks/pre-push (not tracked; recreated by the recipe).

Cold-start dependencies

  • Go 1.25+ at /usr/local/go/bin (arrow-go pulled the 1.25 floor)
  • gcc + libc-dev for the DuckDB cgo binding (ADR-001 §1.1)
  • just task runner (apt install just on Debian 13+)
  • MinIO running on :9000 with bucket lakehouse-go-primary
  • Ollama running on :11434 with nomic-embed-text loaded (G2)
  • /etc/lakehouse/secrets-go.toml with [s3.primary] credentials (storaged + queryd both read this)

just doctor probes all of the above and reports the fix command for each missing dep. CI / scripts can use just doctor --json.

Layout

docs/                         Direction + spec + ADRs + day-by-day
cmd/                          One main package per binary
internal/                     Shared packages — storeclient, catalogclient,
                                secrets, shared, embed, gateway, plus
                                per-service implementation packages
scripts/                      Smokes + ancillary tooling

Reading order

  1. docs/PRD.md — what we're building and why
  2. docs/SPEC.md — how, per-component
  3. docs/DECISIONS.md — ADRs (ADR-001 foundational)
  4. docs/PHASE_G0_KICKOFF.md — day-by-day from D1 through G2
  5. docs/RUST_PATHWAY_MEMORY_NOTE.md — historical reference for the Rust era's pathway memory (not migrated, by ADR-001 #5)

Predecessor

The Rust Lakehouse this rewrite supersedes lives at git.agentview.dev/profit/lakehouse. It remains the live system serving devop.live/lakehouse/ until this Go implementation reaches feature parity per docs/SPEC.md §7. Then Rust enters maintenance-only mode.

Description
Go reimplementation of the Lakehouse — versioned knowledge substrate for staffing analytics + local AI workloads
Readme 3.2 MiB
Languages
Go 79.4%
Shell 20.1%
Just 0.3%
Dockerfile 0.2%