32 Commits

Author SHA1 Message Date
root
1ec85b0a16 Batch 2: perf baseline — multi-sample + warmup + MAD threshold
Replaces single-shot baselines (40% noise floor flagged in Phase E)
with noise-aware regression detection.

What changed:
  ingest      n=3 runs (was 1) with 3-pass warmup
  vector_add  n=3 runs (was 1) with 3-pass warmup
  query       n=20 samples (unchanged) with 50-pass warmup
  search      n=20 samples (unchanged) with 50-pass warmup
  RSS         n=1 (unchanged — steady-state in G0)

Each metric stored as {value: median, mad: median absolute
deviation} in baseline.json (schema: v2-multisample-mad).

New regression detection:
  threshold = max(3 * baseline.mad, value * 0.75)
  REGRESSION iff |actual - baseline.value| > threshold AND direction
    signals worse (lower throughput / higher latency).

Why these specific numbers:
  3*MAD   = standard "outside the spread" bound; lets high-variance
            metrics tolerate their own noise.
  75% floor = empirical observation: even with 50 warmups, single-
            host inter-run variance on bootstrap-cold queryd was
            consistently 90-130% on this box. 75% catches >75%
            regressions cleanly while ignoring known noise.

lib/metrics.sh: new proof_compute_mad helper computes MAD from a
file of one-number-per-line samples. Used for both regen (to write
the baseline.mad value) and diff (read from baseline).

Honest finding from this iteration's 3 back-to-back diff runs:
  query_ms shows 90-130% delta from baseline consistently — not
  random noise but a systematic 2x gap between regen-time and
  steady-state. The regen captured a particularly fast moment;
  steady-state is slower. Operator workflow: regenerate the
  baseline at a known-representative state via
  `bash tests/proof/run_proof.sh --mode performance --regenerate-baseline`
  rather than expecting the harness to track a moving target.

The harness's value here is the EVIDENCE RECORD (every run captures
median+MAD+p95 plus all raw samples in raw/metrics/), not the gate.
Even false-positive REGRESSION skips give operators "this run was
20ms vs baseline 10ms" which is informative.

Sample counts also written into baseline.json under "samples" so a
future audit can verify the methodology that produced the values.

Verified across 3 back-to-back runs:
  ingest_rows_per_sec    PASS (delta within 75%, mostly < 10%)
  vectors_per_sec_add    PASS
  search_ms              PASS
  rss_*                  PASS
  query_ms               REGRESSION flagged (130/100/90%) — known
                         systematic gap, not bug

Closes the "40% noise floor" follow-up from Phase E FINAL_REPORT.
Honest about limitations: hard regression gating on a busy single-
host setup needs either much bigger sample counts (n≥100), longer
warmup, or moving to a dedicated benchmark host. Documented inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:13:47 -05:00
root
0d18ffa780 ADR-003: inter-service auth posture — Bearer + IP allowlist
Locks in the auth model that R-001 + R-007 will be retrofitted
against. Doc-only — wiring deferred to Sprint 1 when the first
non-loopback binding is needed.

Decision: Bearer token (from secrets-go.toml [auth] section) + IP
allowlist (CIDR list). Both layers required when auth is on; empty
token = G0 dev no-op. /health exempt.

Implementation shape (when it lands):
  - internal/shared/auth.go middleware: one chi r.Use line per binary
  - shared.Run gates: refuses non-loopback bind without configured token
  - subtle.ConstantTimeCompare for token equality (timing-safe)

Alternatives considered + rejected:
  mTLS         — too heavy for single-machine inter-service traffic
  JWT          — buys nothing over Bearer without external IdP
  IP-only      — one stolen IP entry = full access; no defense depth
  OAuth2       — no external IdP commitment in G0-G3 timeline

What this doesn't do:
  - Doesn't implement (code lands Sprint 1)
  - Doesn't break G0 dev (empty token = middleware no-op)
  - Doesn't address gateway→end-user auth (different ADR shape)

Closes the design-decision blocker for R-001 and R-007. Wiring
ticket: Sprint 1 backlog story S1.2.

Also lifts ADR-002 (storaged per-prefix PUT cap) into the doc —
it was implemented in 423a381 but not yet recorded as an ADR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:05:59 -05:00
root
423a3817c5 D: storaged per-prefix PUT cap — vectord _vectors/ → 4 GiB
Closes the documented 500K-test gap (memory project_golang_lakehouse:
"storaged 256 MiB PUT cap blocks single-file LHV1 persistence above
~150K vectors at d=768"). Vectord persistence under "_vectors/" now
gets a 4 GiB cap; everything else (parquets, manifests, ingest)
keeps the 256 MiB default.

Why per-prefix and not "raise globally":
  - 256 MiB cap is a real DoS protection — runaway clients can't
    drain the daemon. Raising it for ALL traffic would expand the
    attack surface for routine paths that have no need.
  - Per-prefix preserves existing protection while opening the one
    documented production-scale path.

Why not split LHV1 across multiple keys (the alternative):
  - G1P shipped a single-Put framed format SPECIFICALLY to eliminate
    the torn-write class (memory: "Single Put eliminates the torn-
    write class that the 3-way convergent scrum finding identified").
  - Multi-key LHV1 would re-introduce the half-saved-state failure
    mode we just paid to fix. Streaming via existing manager.Uploader
    is the better architectural answer.

Why not bump the cap operationally via env/config:
  - Future operator-driven cap can drop in cleanly via the
    maxPutBytesFor function. Started with hardcoded 4 GiB to keep
    this commit small; config knob is a follow-up if production
    workloads diverge from the documented 500K-vector ceiling.

manager.Uploader is already streaming-multipart on the outbound
S3 side; the inbound MaxBytesReader cap is a safety gate, not a
memory bottleneck. So raising it for vectord just lets the
existing streaming path actually flow, without introducing new
memory pressure (4-slot semaphore × 4 GiB worst case = 16 GiB
only if all slots simultaneously max out — vanishingly unlikely).

Implementation:
  cmd/storaged/main.go:
    new constant maxPutBytesVectors = 4 GiB (covers >700K vectors @ d=768)
    new constant vectorsPrefix = "_vectors/" (synced with vectord.VectorPrefix)
    new function maxPutBytesFor(key) → cap-by-prefix
    handlePut: ContentLength check + MaxBytesReader use the per-key cap

  cmd/storaged/main_test.go (3 new test funcs):
    TestMaxPutBytesFor: 7 cases incl. nested prefix, substring-but-not-
      prefix, empty key, parquet/manifest paths.
    TestVectorPrefixSyncWithVectord: regression test that asserts
      vectorsPrefix == vectord.VectorPrefix. A future rename surfaces
      here instead of silently bypassing the larger cap.
    TestVectorCapAccommodates500KStaffingTest: bounds the cap above
      the documented production workload (~700 MiB conservative).

Verified:
  go test ./cmd/storaged/ — all green (was 1 func, now 4)
  just verify             — 9 smokes still pass · 32s wall
  just proof contract     — 53/0/1 unchanged

Out of scope for this commit (deserves its own):
  - Heavy integration smoke: 200K dim=768 synthetic vectors → ~700
    MiB LHV1 → kill+restart vectord → recall=1. ~5-10 min wall;
    follow-up if you want production-scale persistence verified
    end-to-end. Unit tests + existing g1p_smoke cover the wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:00:09 -05:00
root
6af0520ed2 A: fail-loud on non-loopback bind — closes worst case of R-001
shared.Run now refuses to bind a non-loopback address unless the
LH_<SERVICE>_ALLOW_NONLOOPBACK=1 env is set. Single change covers
all 7 binaries via the existing Run call site; no per-binary
wiring needed.

Closes the accidental-0.0.0.0 deploy attack surface for R-001:
queryd /sql is RCE-equivalent off loopback (DuckDB has filesystem
read + COPY TO + read_text), but the gate applies to every binary
uniformly so the same posture covers vectord (mutation routes),
catalogd (manifest writes), and the others.

What passes the gate:
  127.0.0.1:port, 127.x.y.z:port (full /8), [::1]:port,
  localhost:port, OR explicit env LH_<SVC>_ALLOW_NONLOOPBACK=1

What fail-louds:
  0.0.0.0:port, [::]:port, :port (all interfaces),
  any non-loopback IP, any non-localhost hostname,
  unparseable shapes ("", "no port", garbage)

Override env is strict equality "1" — typos like "true"/"yes" do NOT
trigger it, so a future operator can't accidentally expose by typing
the wrong value. Override fires log a structured warn so the choice
is auditable in production.

Error message cites the env name AND R-001 by name so operators see
the fix path without grepping:
  "refusing non-loopback bind \"0.0.0.0:3214\" for \"queryd\"
   (set LH_QUERYD_ALLOW_NONLOOPBACK=1 to override; see audit R-001)"

internal/shared/bind.go            — requireLoopbackOrOverride + isLoopbackAddr
internal/shared/bind_test.go       — 7 test funcs incl. table-driven
                                     IPv4/IPv6/hostname coverage and
                                     per-service env isolation
internal/shared/server.go          — 1-line gate in Run before listen

Verified:
  go test -short ./internal/shared/ — all green (was 14 funcs, now 21)
  just verify                       — vet + test + 9 smokes still 33s

Doesn't address R-001's full attack surface (any reachable port can
issue arbitrary SQL); ADR-003 + Bearer-token middleware is the
follow-up. This commit makes the implicit "localhost-only is the auth
layer" guarantee explicit and un-bypassable without explicit env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:56:42 -05:00
root
125e1c80b9 tests: close R-002 / R-003 / R-008 — internal/shared, storeclient, queryd/db.go
Audit-driven follow-up to the Rust scrum review on the 3 untested
HIGH-risk packages. Both the audit (reports/scrum/risk-register.md)
and the scrum (tests/real-world/runs/scrum_mojxb5bw/) independently
flagged these files as the highest-leverage missing test coverage.

internal/shared/server_test.go — 8 test funcs
  newListener: valid addr, invalid addr (non-numeric port, port
    out of range, port-already-in-use surfacing as net.OpError).
  Empty-addr-is-valid: documents the net.Listen quirk that "" binds
    an OS-picked port — future readers don't need to relitigate.
  HealthResponse marshal: JSON shape stable, round-trip clean.
  /health handler reconstructed via httptest.Server: status 200,
    Content-Type application/json, body fields stable.
  RegisterRoutes callback: contract verified (callback is invoked
    with a real chi.Router, mounted route reachable end-to-end).
  Run bind-failure surface: synchronous error, not a goroutine swallow
    — the contract Run depends on per the race-safe-startup comment.

internal/shared/config_test.go — 6 test funcs
  DefaultConfig G0 port pinning: every binary's default bind locked
    in (3110/3211-3216) so a refactor can't silently flip a port.
  LoadConfig empty path: returns DefaultConfig, no error.
  LoadConfig missing file: returns DefaultConfig, logs warn (the warn
    line shows up in test output, captured-but-not-asserted).
  LoadConfig valid TOML: partial overrides land, unspecified sections
    keep defaults (TOML decoder leave-alone behavior).
  LoadConfig invalid TOML: returns wrapped 'parse config' error.
  LoadConfig unreadable file: skipped under root (root reads 0000);
    captures the read-error wrap path for non-root contexts.

internal/storeclient/client_test.go — 14 test funcs
  safeKey table-driven: plain segments, single slash, empty, trailing
    slash, space (→ %20), apostrophe (→ %27), unicode (→ %C3%A9),
    deep nesting. Locks URL-escape contract per scrum suggestion.
  recordingServer helper backs Put/Get/Delete/List against
    httptest.Server: verifies method, path, body bytes round-trip.
  ErrKeyNotFound on 404 (errors.Is round-trip).
  Non-OK status wraps body preview into the error chain.
  Delete accepts both 200 and 204 (S3 vs compatible-store quirk).
  List parses JSON shape and surfaces query-string prefix.
  Context cancellation propagates through Put as context.Canceled.

internal/queryd/db_test.go — 5 test funcs (with subtests)
  sqlEscape table-driven: 8 cases including empty, all-quotes,
    nested apostrophes (the case from the scrum suggestion).
  redactCreds table-driven: 6 cases — both keys, single keys,
    empty, multi-occurrence, placeholder-collision (lossy but safe).
  buildBootstrap statement order: INSTALL → LOAD → CREATE SECRET.
  buildBootstrap endpoint schemes: http strips + USE_SSL false,
    https keeps SSL true, no-scheme defaults SSL true (prod ambient).
  buildBootstrap URL_STYLE: 'path' vs 'vhost' branch.
  buildBootstrap escapes credential quotes: future SSO-token-with-
    apostrophe doesn't break out of the SQL string literal — the
    belt holds when the suspenders snap.

Real finding caught by my own test:
  net.Listen("tcp", "") succeeds (OS-picked port) — captured as
  TestNewListener_EmptyAddrIsValid so the quirk is documented.

Verified:
  go test -short ./... — every internal/ package now has tests
    (no more 'no test files' lines for shared/storeclient).
  just verify — vet + test + 9 smokes green in 33s.
  just proof contract — 53/0/1 green (no harness regression).

Closes:
  R-002 internal/shared zero tests        HIGH
  R-003 internal/storeclient zero tests   HIGH
  R-008 queryd/db.go untested             MED (sqlEscape, redactCreds,
                                              CREATE SECRET formation)

Composite scrum score should move from 43 → ~46 / 60 — the three
HIGH/MED risks closed, internal/shared and internal/storeclient
become "tested + load-bearing" instead of "untested + load-bearing."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:51:05 -05:00
root
ff9823b871 scrum audit re-run: 35 → 43 / 60 after Phase A-E + S0.3
Re-runs the SCRUM.md framework against HEAD (4840c10) to score the
delta from the audit baseline at 91edd43. Composite +8.

Scoring deltas:
  Reproducibility       7 → 9  (just verify, just doctor, pre-push hook)
  Test Coverage         6 → 8  (168 proof harness assertions; Go-test
                                gaps in shared/storeclient remain)
  Trust Boundary        7 → 7  (no code change; R-001/R-007 open)
  Memory Correctness    3 → 4  (vectord persistence proven; Mem0
                                pathway/playbook still not ported)
  Deployment Readiness  4 → 5  (just doctor; REPLICATION/systemd open)
  Maintainability       8 → 8  (spine unchanged; harness obeys
                                CLAUDE_REFACTOR_GUARDRAILS)

Risk register changes:
  R-004 (smokes not gated)        CLOSED — just verify + pre-push hook
  R-005 (cmd/main.go untested)    partial — proof harness covers wiring
  R-012 (empty tests/ dir)        CLOSED — populated by harness
  R-001/R-002/R-003/R-006/R-007/R-008/R-009/R-010 unchanged

Sprint 0 progress:
  S0.1 just doctor               DONE
  S0.3 just verify + pre-push    DONE
  S0.6 tests/ dir cleanup        DONE
  S0.2 just smoke-fixtures       open
  S0.4 cmd/main_test × 6         partial (harness coverage; go-test gap)
  S0.5 shared/storeclient tests  open  (HIGH risks still unaddressed)

New finding from this rerun (worth recording):
  Queryd refresh-tick race in 04_query_correctness — cache-warm
  binaries fire SELECTs faster than queryd's 500ms refresh tick.
  Caught by integration mode going 104/0/1 → 102/1/1, fixed at
  4840c10 with proof_wait_for_sql helper. Exactly the failure-mode
  the harness was designed to catch.

Original 5 audit reports preserved as immutable history at
91edd43; this file documents the delta only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:37:45 -05:00
root
4840c10311 proof harness: fix queryd refresh-tick race in 04_query_correctness
Caught by the audit rerun: with cache-warm binaries, 04 fires its
first SELECT faster than queryd's 500ms refresh tick — Q1 returned
400 ("table not found") even though 03_ingest had registered the
manifest. Subsequent queries (after the next tick) succeeded.

This is an eventual-consistency wait, not a retry — queryd's
contract is that views appear within one tick of catalogd having the
manifest. Production code does not need changing.

Added to lib/http.sh:
  proof_wait_for_sql <budget_sec> <sql>
    polls a SQL probe until it returns 200 or budget elapses; emits
    no evidence (test setup, not a claim).

Used in 04_query_correctness:
  Wait up to 5s for queryd to have the view before running the 5
  SQL assertions. Skip-with-loud-reason if the view never appears.

Verified: integration mode back to 104 pass / 0 fail / 1 skip after
fix. The skip is the unchanged GOLAKE-085 informational record.

This is exactly the kind of finding the harness was designed to
surface — the regression existed in the codebase the moment Phase D
shipped, but only fired when the next compare run hit cache-warm
timing. Without the harness, it would have surfaced on a CI run
weeks from now and been hard to bisect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:36:28 -05:00
root
4bb6548cbc proof harness Phase E: FINAL_REPORT.md answers the 9 mandated questions
Per docs/TEST_PROOF_SCOPE.md, this is the closing deliverable for the
proof harness: a single document that names what's proven, what's
partially proven, what failed, what was skipped and why, what evidence
exists for each, what bottlenecks were measured, what contract drift
was found, what refactor risks remain, and what to fix first.

Per-run report dirs (tests/proof/reports/proof-<ts>/) keep their
existing summary.md + summary.json + raw/ structure — they are the
replayable evidence chain. FINAL_REPORT.md is the stable, repo-tracked
synthesis pointing at them.

Headline findings (no surprises — harness behaves as designed):
  - 24 claims encoded; 22 fully proven, 1 informational (GOLAKE-085
    duplicate vector ID, contract not yet specified), 0 failed.
  - 4 contract-drift findings recorded as canonical: vectord add
    body field is `items` not `vectors`, search response is `results`
    not `hits`, index info is `length` not `count`, status codes
    201/204 not 200. All caught during Phase B; all now pinned by the
    harness.
  - Performance baseline shows queryd as the largest RSS (69 MiB,
    DuckDB process); single-sample noise floor is ~40% — tightening
    to multi-sample medians is a documented Sprint follow-up.
  - HIGH-risk audit findings (R-001 queryd /sql, R-002/R-003 untested
    shared+storeclient) are NOT closed by the harness — it's a
    multiplier, not a replacement for unit tests + auth posture.

The proof harness is complete. 11 cases · 3 modes · 168 assertions
peak across all tiers · ~22s total wall (contract+integration+perf).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:32:56 -05:00
root
175ad59cb3 proof harness Phase D: performance baseline · 1000-row ingest, p50/p95
GOLAKE-100. First run writes tests/proof/baseline.json; subsequent
runs diff against it. >10% regression emits a SKIP with REGRESSION
detail (not a fail — perf claim is required:false in claims.yaml so
the gate stays green; the human summary tells the regression story
honestly). Skip-with-loud-reason if any earlier case in the run
failed, per spec "performance only after contract+integration pass."

Workload (deterministic, repeatable):
  ingest      1000-row CSV (5 roles × 5 cities × seeded scores) → /v1/ingest
  query       SELECT count(*) ×20 against the just-ingested dataset
  vector add  200 dim=4 vectors with formulaic content (no Ollama)
  search      ×20 against the perf index with a fixed query vector
  RSS         per-service post-workload sample via /proc/<pid>/status

Recorded metrics:
  ingest_rows_per_sec, query_p50_ms, query_p95_ms,
  vectors_per_sec_add, search_p50_ms, search_p95_ms,
  rss_{storaged,catalogd,ingestd,queryd,vectord,embedd,gateway}_mb

baseline.json on this box (committed):
  25000 rows/sec ingest · 17ms p50 / 24ms p95 query
  6250 vectors/sec add  ·  8ms p50 / 20ms p95 search
  queryd 69 MiB · vectord 14 MiB · others 11-29 MiB

Honest measurement-design finding from the very first compare run:
back-to-back runs surfaced -41% ingest and +29% query p50 — pure
disk-cache + queryd-cold-start noise. Single-sample baselines have
real noise floor ≈40%. Recorded as REGRESSION skips so the human
summary surfaces it, not a code regression. Tightening the threshold
or moving to multi-sample medians is a Phase E recommendation.

Verified end-to-end:
  just proof contract       —  53 pass  · 1 skip · ~4s
  just proof integration    — 104 pass  · 1 skip · ~8s
  just proof performance    — 110 pass  · 3 skip · ~10s
  just verify               —  9 smokes still green · 29s

All 11 cases (4 contract + 6 integration + 1 performance) deterministic
end-to-end. Phase E (final report against the 9 mandated questions)
is the last piece.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:30:11 -05:00
root
1313eb2173 proof harness Phase C: 6 integration cases · 104/0/1 green
Adds the integration tier — full chain CSV→Parquet→SQL and full
text→embed→vector→search. All 10 cases (4 contract + 6 integration)
end-to-end deterministic; 8s wall total.

Cases added:
  01_storage_roundtrip.sh
    GOLAKE-010-012. PUT 1KiB → GET sha256-equal → LIST contains key
    → DELETE 200/204 → GET 404. Deterministic key under
    proof/<case_id>/ so concurrent runs don't collide.

  02_catalog_manifest.sh
    GOLAKE-020-022. Fresh register existing=false → manifest read
    matches → list contains dataset_id → idempotent re-register
    existing=true with stable dataset_id → schema-drift register
    409 (the ADR-020 contract). Per-run unique name via
    PROOF_RUN_ID so existing=false is meaningful.

  03_ingest_csv_to_parquet.sh
    GOLAKE-030. workers.csv (5 rows) via /v1/ingest multipart →
    parquet object on storaged → catalog manifest with row_count=5.
    Verifies content-addressed key shape (datasets/<n>/<fp>.parquet).

  04_query_correctness.sh
    GOLAKE-040. The 5 SQL assertions from fixtures/expected/queries.json
    against the workers fixture: count=5, Chicago=2, max=95,
    safety→Barbara, Houston avg=89.5. Iterates the YAML claims, runs
    each query, compares response columns to expected values.

  06_vector_add_search.sh integration extension
    GOLAKE-051. text → /v1/embed (4 docs from fixtures/text/docs.txt)
    → vectord add → search by query embedding. Top-1 ID per query
    asserted against fixtures/expected/rankings.json. First run (or
    --regenerate-rankings) writes the fixture and emits a skip with
    explicit reason; subsequent runs assert against it.

  07_vector_persistence_restart.sh
    GOLAKE-070. add 4 unit-basis vectors → search → record top-1
    distance → SIGTERM vectord → restart with the same --config →
    poll /health for 8s → search again → top-1 ID and distance match
    bit-identically. Skips with reason if vectord PID can't be found
    or post-restart bind times out.

Two harness improvements landed alongside:

  run_proof.sh writes a temp lakehouse_proof.toml with
  refresh_every="500ms" override and passes --config to all booted
  binaries. Production default is 30s; 04_query_correctness needs
  queryd to pick up the new view within a tick. Production config
  unchanged.

  cleanup() now pgreps for any orphan bin/<svc> processes (anchored
  to start-of-argv per memory feedback_pkill_scope.md) so a case
  that restarts a service mid-run still gets cleaned up.

  lib/http.sh adds proof_call(case_id, probe, method, url, args...)
  — escape hatch for cases that need raw curl args (multipart -F,
  custom headers). Used by 03_ingest for the multipart upload that
  conflicts with proof_post's --data + Content-Type defaults.

  lib/env.sh exports PROOF_RUN_ID — short unique id derived from the
  report directory timestamp. Used by 02 and 07 for fresh-each-run
  state isolation.

Two real findings recorded as evidence (no code changes):
  - rankings.json fixture pinned: 4 queries → 4 distinct top-1 docs
    via nomic-embed-text. A model swap that changes ranking now
    fails the harness loudly; --regenerate-rankings is the override.
  - vectord persistence kill+restart preserves top-1 distance
    bit-identically — the LHV1 single-Put framed format from
    G1P round-trips exactly through Save/Load.

Verified end-to-end:
  just proof contract       — 53 pass (4 cases)
  just proof integration    — 104 pass (10 cases) · 8s wall
  just verify               — 9 smokes still green · 33s wall

Phase D (performance baseline) lands next: 10_perf_baseline measures
rows/sec ingest, vectors/sec add, p50/p95 query+search latency, RSS,
CPU. First run writes tests/proof/baseline.json; later runs diff
against it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:26:00 -05:00
root
6d18394416 proof harness Phase B: 4 contract cases · 53/0/1 green
Added the contract tier above 00_health canary. All 5 contract cases
now cover GOLAKE-001-003, 050, 060-061, 080-085 — 53 assertions pass,
1 informational skip, 0 fail. Wall: 4s end-to-end (cached binaries).

Cases:
  05_embedding_contract.sh
    GOLAKE-050. POST /v1/embed with one short text → asserts dim=768,
    one vector returned, vector length matches dimension, sum of
    squared elements > 0 (proxy for non-zero), response.model echoed.
    Skips with explicit reason if Ollama is unreachable (502 from
    embedd) — per spec hard rule "skipped tests do not appear as
    passed."

  06_vector_add_search.sh
    GOLAKE-060 + GOLAKE-061. Synthetic dim=4 unit basis vectors.
    Create index → add 3 vectors → get-index returns length=3 →
    search([1,0,0,0],k=3) returns v1 at rank 1 with distance < 0.001.
    Cleanup with DELETE. No embedd dependency — pure contract layer.

  08_gateway_contracts.sh
    GOLAKE-003. For each /v1/* route, asserts gateway and direct
    upstream return identical status AND identical response body
    (sha256 match). Confirms gateway is a proxy not a transformer.
    Status passthrough verified on both 200 path (storage/list,
    catalog/list) and 4xx path (sql empty body → 400 from queryd).

  09_failure_modes.sh
    GOLAKE-080..085. Six failure-mode contracts:
      080 malformed JSON → 4xx on catalog/ingest/sql/embed
      081 missing required field → 4xx on catalog/vectors/embed
      082 bad SQL → 4xx with non-empty error body
      083 vector dim mismatch → 4xx
      084 missing storage object → 404
      085 duplicate vector ID → INFORMATIONAL (spec says required:false)
          first/second statuses recorded as evidence; contract decided
          later from the recorded record.

Two new lib helpers in lib/assert.sh:
  proof_assert_status_in <id> <claim> "200 201 204" <probe>
    pass if status is in the space-separated list. Used for
    delete-returns-200-or-204 case where vectord returns 204.

  proof_assert_status_4xx <id> <claim> <probe>
    pass if status in [400, 500). Used for failure modes where the
    specific 4xx code may vary (400 vs 422 vs 409). Records actual
    code as evidence.

Two real contract findings recorded by the harness during build:
  - vectord add expects {"items": [...]}, not {"vectors": [...]}.
    My initial test sent the wrong field; would have masked the bug
    forever in CI. The harness caught it via the assertion failure.
  - vectord create returns 201 Created, delete returns 204 No Content.
    Documented in the test fixtures as canonical.

Regression: just verify wall 33s, vet + test + 9 smokes still green.

Phase C (integration) lands next: 01_storage_roundtrip, 02_catalog_manifest,
03_ingest_csv_to_parquet, 04_query_correctness, 05/06 integration extends,
07_vector_persistence_restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:15:04 -05:00
root
a81291e38c proof harness Phase A: scaffolding + canary case green
Per docs/TEST_PROOF_SCOPE.md, building the claims-verification tier
above the smoke chain. This commit lays the scaffolding and proves
the orchestrator end-to-end with one canary case (00_health).

What landed:

  tests/proof/
    README.md             how to read a report, layout, modes
    claims.yaml           24 claims enumerated (GOLAKE-001..100)
    run_proof.sh          orchestrator with --mode {contract|integration|performance}
                          and --no-bootstrap / --regenerate-{rankings,baseline}
    lib/
      env.sh              service URLs, report dir, mode, git context
      http.sh             curl wrappers writing per-probe JSON + body + headers
      assert.sh           proof_assert_{eq,ne,contains,lt,gt,status,json_eq} +
                          proof_skip — each emits one JSONL record per call
      metrics.sh          start/stop timers, value capture, RSS sampling,
                          percentile compute (for Phase D)
    cases/
      00_health.sh        canary — gateway + 6 services /health → 200,
                          body identifies service, latency < 500ms (21 assertions)
    fixtures/
      csv/workers.csv     spec's 5-row deterministic CSV
      text/docs.txt       4 deterministic vector docs
      expected/queries.json  expected results for the 5 SQL assertions

Wired into the task runner:

  just proof contract       # canary only this commit
  just proof integration    # Phase C
  just proof performance    # Phase D

.gitignore: /tests/proof/reports/* with !.gitkeep — same pattern as
reports/scrum/_evidence/. Per-run output is a runtime artifact.

Specs landed alongside (J's drops):
  docs/TEST_PROOF_SCOPE.md           the harness contract this implements
  docs/CLAUDE_REFACTOR_GUARDRAILS.md process discipline this harness obeys

Verified end-to-end (cached binaries):
  just proof contract        wall < 2s, 21 pass / 0 fail / 0 skip
  just verify                wall 31s, vet + test + 9 smokes still green

Two bugs fixed during canary run, both in run_proof.sh aggregation:
- grep -c exits 1 on zero matches; the `|| echo 0` form concatenated
  "0\n0" and broke jq --argjson + integer comparison. Fixed via a
  _count helper that captures count-or-zero cleanly.
- per-case table iterated case scripts (filename-based) but cases
  write evidence under CASE_ID. Switched to JSONL-file iteration so
  multi-case scripts work and the mapping is faithful.

Phase B (contract cases) lands next: 05_embedding, 06_vector_add,
08_gateway_contracts, 09_failure_modes. Each sourcing the same lib
helpers and writing to the same report shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:08:51 -05:00
root
e31638204d S0.3: just verify + pre-push hook gates the smoke chain
Sprint 0 / R-004 / GATE-0.4 — the 9-smoke chain is no longer
documentation only. One command (`just verify`) runs vet + tests +
all 9 smokes; pre-push hook calls it; a regression cannot leave
this machine without explicit --no-verify override.

Recipes:
  just verify          full gate (33s wall on this box)
  just smoke <day>     single smoke (d1..d6, g1, g1p, g2)
  just smoke-all       all 9 smokes only
  just doctor          dep probe with structured output
                       (--json for CI / pre-push)
  just install-hooks   install .git/hooks/pre-push
  just fmt|vet|test|build|clean

scripts/doctor.sh probes Go ≥1.25, gcc, MinIO at :9000 with bucket
lakehouse-go-primary, Ollama at :11434 with nomic-embed-text loaded,
/etc/lakehouse/secrets-go.toml with [s3.primary]. Each missing dep
prints its install fix command. JSON mode emits the same shape for
CI / pre-push consumers.

README updated with the task-runner section + just install-hooks
on cold-start. Hooks live in .git/hooks/ (untracked); install
recipe recreates them on a fresh clone.

PATH note: justfile prepends /usr/local/go/bin so recipes find Go
without depending on the parent shell's PATH (ADR-001 §1.x lives
go there).

Verified: just verify exits 0 in 33s wall (vet ~0.1s + test ~0.1s +
9 smokes deterministic per audit baseline). Pre-push hook installed
and bash -n clean.

Closes audit risk R-004 (smokes not gated).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 04:56:50 -05:00
root
91edd43164 scrum audit: 5 reports under reports/scrum/ · score 35/60
Adapts docs/SCRUM.md framework (originally written for the
matrix-agent-validated repo) to the Go rewrite. Five deliverables:

  golang-lakehouse-scrum-test.md  top-line + scoring + verdict
  risk-register.md                12 findings, R-001..R-012
  claim-coverage-table.md         claim/test/risk for Sprint 2
  sprint-backlog.md               5 sprints, ~2 weeks of work
  acceptance-gates.md             DoD as runnable commands

Every claim cites file:line, command output, or "missing evidence."
Smoke chain ran clean (33s wall, all 9 PASS) and is captured in
reports/scrum/_evidence/smoke_chain.log (gitignored — runtime artifact).

Scoring:
  Reproducibility       7/10  9 smokes deterministic, no just/CI gate
  Test Coverage         6/10  internal/ packages tested, 6/7 cmd/ aren't
  Trust Boundary        7/10  escapes ok, zero auth, /sql is RCE-eq off-loopback
  Memory Correctness    3/10  pathway/playbook/observer not yet ported
  Deployment Readiness  4/10  no REPLICATION, no env template, no systemd
  Maintainability       8/10  no god-files, 7 lean binaries, ADRs current

Top three risks:
  R-001 HIGH  queryd /sql + DuckDB + non-loopback bind = RCE-equivalent
  R-002 HIGH  internal/shared (server.go + config.go) zero tests
  R-003 HIGH  internal/storeclient zero tests, used by 2 services
  R-004 MED   9-smoke chain green but not gated (no justfile/hook)

The audit is the work; refactors come after. Sprint 0 owns coverage
+ CI gating; Sprint 1 owns trust-boundary decisions; Sprints 2-3 are
mostly design-bar work for unbuilt agent components.

.gitignore exception: /reports/* + !/reports/scrum/ keeps reports/
a runtime-artifact directory while exposing reports/scrum/ as
tracked documentation. Mirrors the pattern future audit passes will
land in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 04:51:47 -05:00
root
1f700e731d Staffing scale test: full 500K through gateway → embedd → vectord pipeline
scripts/staffing_500k/main.go: driver that reads workers_500k.csv,
embeds combined-text per worker via /v1/embed, adds to vectord index
"workers_500k", runs canonical staffing queries against the populated
index. Reproducible end-to-end test of the staffing co-pilot pipeline
at production scale.

Run results (2026-04-29 ~02:30):
  500,000 vectors ingested in 35m 36s (~234/sec avg)
  vectord peak RSS 4.5 GB (~9 KB/vector incl. HNSW graph)
  Query latency: embed 40-59ms + search 1-3ms = ~50ms end-to-end
  GPU avg ~65% (Ollama not the bottleneck — vectord Add is)

Semantic recall on canonical queries:
  "electrician with industrial wiring": top 2 are literal Electricians (d=0.30)
  "CNC operator with first article": Assembler / Quality Techs (adjacent, d=0.24)
  "forklift driver OSHA-30": warehouse roles (d=0.33)
  "warehouse picker night shift bilingual": Material Handlers (d=0.31)
  "dental hygienist": Production Workers at d=0.49+ — correctly
    LOW-similarity, signals "no dental hygienists in this manufacturing
    dataset" rather than hallucinating a fake match.

Documented gaps:
  - storaged's 256 MiB PUT cap blocks single-file LHV1 persistence
    above ~150K vectors at d=768. Test ran with persistence disabled.
  - vectord Add is RWMutex-serialized — with GPU at 65% util this is
    the throughput cap. Concurrent Adds would be 2-3x faster but
    require careful audit of coder/hnsw thread-safety (G1 scrum
    documented two known quirks).

PHASE_G0_KICKOFF.md gains a "Staffing scale test" section with full
metrics + the gaps-surfaced list. The architectural payoff is real:
six binaries, one HTTP route, ~50ms from text query to top-K
semantically-relevant workers across 500K records.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 02:31:30 -05:00
root
0cb29cda15 docs: README + PHASE_G0_KICKOFF reflect post-G0 state (G1, G1P, G2)
README was stuck on "Pre-Phase G0, implementation has not started"
while we shipped through G2. Updated to reflect the current 7-binary
service inventory, the 9 acceptance smokes, the cold-start deps
(MinIO bucket, Ollama with nomic-embed-text, secrets-go.toml).

PHASE_G0_KICKOFF gains a "Post-G0 work" pointer at the end —
brief table mapping each G1+/G2 commit to its smoke + scrum-fix
count. Full per-day detail stays in commit messages and the
project memory file.

No code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:45:59 -05:00
root
9ee7fc5550 G2: embedd — text → vector via Ollama · 2 scrum fixes
Bridges the missing piece for the staffing co-pilot: text inputs to
vectord-shaped vectors. Standalone cmd/embedd on :3216 fronted by
gateway at /v1/embed. Pluggable embed.Provider interface (G2 ships
Ollama; OpenAI/Voyage swap in via the same interface in G3+).

Wire format:
  POST /v1/embed {"texts":[...], "model":"..."}  // model optional
  → 200 {"model","dimension","vectors":[[...]]}

Default model: nomic-embed-text (768-d). Ollama returns float64;
provider converts to float32 at the boundary so vectors flow through
vectord/HNSW without re-conversion.

Acceptance smoke 5/5 PASS — including the architectural payoff:
end-to-end embed → vectord add → search by re-embedded text returns
recall=1 at distance 5.96e-8 (float32 precision noise on identical
unit vectors). The staffing co-pilot pipeline (text → vector →
similarity search) is now functional end-to-end.

All 9 smokes (D1-D6 + G1 + G1P + G2) PASS deterministically.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                    0 BLOCK + 4 WARN + 3 INFO
  - Kimi K2-0905 (openrouter):              0 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):               "No BLOCKs" (3 tokens)

Fixed (2 — 1 convergent + 1 single-reviewer):
  C1 (Opus + Kimi convergent WARN): per-text 60s timeout × N-text
    batch was up to N×60s with no batch-level cap. One stuck Ollama
    call would stall the whole handler indefinitely. Fix:
    context.WithTimeout(r.Context(), 60s) wraps the entire batch.
  O-W3 (Opus WARN): empty strings in texts went to Ollama unchecked,
    producing version-dependent garbage. Fix: reject "" with 400 at
    the handler boundary so callers get a deterministic answer
    instead of an upstream-conditional 502.

Deferred (4): drainAndClose 64KiB cap (matches G0 pattern), no
concurrency limit on /embed (single-tenant G2), missing Accept
header (exotic-proxy concern), MaxBytesError string-match
redundancy (paranoia layer kept consistent across codebase).

Zero false positives this round — Qwen returned 3 tokens "No BLOCKs"
and the other two reviewers' findings were all real.

Setup confirmed: Ollama 0.21.0 on :11434 with nomic-embed-text loaded.
Per-text /api/embeddings used (forward-compat with 0.21+); newer
0.4+ /api/embed batch endpoint can swap in via the Provider interface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:42:27 -05:00
root
8b92518d21 G1P: vectord persistence to storaged + scrum (3 fixes incl. 3-way convergent)
Adds optional persistence to vectord (G1's HNSW vector search). Single-
file framed format per index — eliminates the torn-write class that
the 3-way convergent scrum finding identified:

  _vectors/<name>.lhv1  — single binary blob:
      [4 bytes magic "LHV1"]
      [4 bytes envelope_len uint32 BE]
      [envelope bytes — JSON params + metadata + version]
      [graph bytes — raw hnsw.Graph.Export]

Pre-extraction: internal/catalogd/store_client.go → internal/storeclient/
shared package, since both catalogd and vectord need it. Same pattern as
the pre-D5 catalogclient extraction.

Optional via [vectord].storaged_url config (empty = ephemeral mode).
On startup: List + Load each persisted index. After Create / batch Add /
DELETE: Save (or Delete from storaged). Save failures are logged-not-
fatal — in-memory state is the source of truth in flight.

Acceptance smoke G1P 8/8 PASS — kill+restart preserves state, post-
restart search returns dist=0 (graph round-trips exactly), DELETE
removes the file, post-delete restart shows count=0.

All 8 smokes (D1-D6 + G1 + G1P) PASS deterministically. The g1_smoke
gained scripts/g1_smoke.toml that disables persistence so the
in-memory API test stays decoupled from any rehydrate-from-storaged
state contamination.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                     1 BLOCK + 5 WARN + 3 INFO
  - Kimi K2-0905 (openrouter):               1 BLOCK + 2 WARN
  - Qwen3-coder (openrouter):                2 BLOCK + 2 WARN + 1 INFO

Fixed (3 — 1 convergent + 2 single-reviewer):
  C1 (Opus + Kimi + Qwen 3-WAY CONVERGENT WARN): Save was non-atomic
    across two PUTs — envelope-succeeds + graph-fails left a half-
    saved index that passed the "both present" List filter and
    silently mismatched metadata against vectors on Load.
    Fix: collapse to single framed file (no torn-write window
    possible).
  O-B1 (Opus BLOCK): isNotFound substring-matched "key not found"
    against the wrapped error message — brittle, any 5xx body
    containing that text would silently misclassify as missing.
    Fix: errors.Is(err, storeclient.ErrKeyNotFound).
  O-I3 (Opus INFO): handleAdd pre-validation only covered id+dim;
    NaN/Inf/zero-norm could still fail mid-batch leaving partial
    commits. Fix: extend pre-validation to call ValidateVector
    (newly exported) per item before any commit.

Dismissed (3 false positives):
  K-B1 + Q-B1 ("safeKey double-escapes %2F segments") — false
    convergent. Wire-protocol escape is decoded by storaged's chi
    router on the way in; on-disk key is the original literal.
    %2F round-trips correctly through PathEscape → URL → chi decode
    → S3 key.
  Q-B2 ("List vulnerable to race conditions") — vectord is single-
    process; no concurrent Save against List in the same vectord.

Deferred (3): rehydrate per-index timeout (G2+ multi-index scale),
saveAfter request ctx (matches G0 timeout deferral), Encode RLock
during slow writer (documented as buffer-only API).

The C1 finding is the strongest signal of the cross-lineage filter:
three independent reviewers all flagged the same torn-write hazard.
Single-file framing eliminates the class — there's now no Persistor
state where envelope and graph can disagree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:33:23 -05:00
root
b8c072cf0b G1: vectord — HNSW vector search via coder/hnsw · 6 scrum fixes applied
First G1+ piece. Standalone vectord service with in-memory HNSW
indexes keyed by string IDs and optional opaque JSON metadata.
Wraps github.com/coder/hnsw v0.6.1 (pure Go, no cgo). New port
:3215 with /v1/vectors/* routed through gateway.

API:
  POST   /v1/vectors/index            create
  GET    /v1/vectors/index            list
  GET    /v1/vectors/index/{name}     get info
  DELETE /v1/vectors/index/{name}
  POST   /v1/vectors/index/{name}/add (batch)
  POST   /v1/vectors/index/{name}/search

Acceptance smoke 7/7 PASS — including recall=1 on inserted vector
w-042 (cosine distance 5.96e-8, float32 precision noise), 200-
vector batch round-trip, dim mismatch → 400, missing index → 404,
duplicate create → 409.

Two upstream library quirks worked around in the wrapper:
  1. coder/hnsw.Add panics with "node not added" on re-adding an
     existing key (length-invariant fires because internal
     delete+re-add doesn't change Len). Pre-Delete fixes for n>1.
  2. Delete of the LAST node leaves layers[0] non-empty but
     entryless; next Add SIGSEGVs in Dims(). Workaround: when
     re-adding to a 1-node graph, recreate the underlying graph
     fresh via resetGraphLocked().

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                 0 BLOCK + 4 WARN + 3 INFO
  - Kimi K2-0905 (openrouter):           2 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):            "No BLOCKs" (4 tokens)

Fixed (4 real + 2 cleanup):
  O-W1: Lookup returned the raw []float32 from coder/hnsw — caller
    mutation would corrupt index. Now copies before return.
  O-W3: NaN/Inf vectors poison HNSW (distance comparisons return
    false for both < and >, breaking heap invariants). Zero-norm
    under cosine produces NaN. Now validated at Add time.
  K-B1: Re-adding with nil metadata silently cleared the existing
    entry — JSON-omitted "metadata" field deserializes as nil,
    making upsert non-idempotent. Now nil = "leave alone"; explicit
    {} or Delete to clear.
  O-W4: Batch Add with mid-batch failure left items 0..N-1
    committed and item N rejected. Now pre-validates all IDs+dims
    before any Add.
  O-I1: jsonItoa hand-roll replaced with strconv.Itoa — no
    measured allocation win.
  O-I2: distanceFn re-resolved per Search → use stored i.g.Distance.

Dismissed (2 false positives):
  K-B2 "MaxBytesReader applied after full read" — false, applied
    BEFORE Decode in decodeJSON
  K-W1 "Search distances under read lock might see invalidated
    slices from concurrent Add" — false, RWMutex serializes
    write-lock during Add against read-lock during Search

Deferred (3): HTTP server timeouts (consistent G0 punt),
Content-Type validation (internal service behind gateway), Lookup
dim assertion (in-memory state can't drift).

The K-B1 finding is worth pausing on: nil metadata on re-add is
the kind of API ergonomics bug only a code-reading reviewer
catches — smoke would never detect it because the smoke always
sends explicit metadata. Three lines changed in Add; the resulting
API matches what callers actually expect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:50:28 -05:00
root
d023b07b30 Real-scale validation post-G0: configurable ingest cap + workers_500k metrics
Validated G0 substrate against the production workers_500k.parquet
dataset (18 cols × 500,000 rows). Findings + one applied fix:

Finding #1 (FIXED): ingestd's hardcoded 256 MiB cap rejected the 500K
CSV (344 MiB) with 413. Cap fired correctly, no OOM. Extracted to
[ingestd].max_ingest_bytes config field; default 256 MiB, override
per deployment for known-large workloads. With cap bumped to 512 MiB,
500K ingest succeeds in 3.12s with ingestd peak RSS 209 MiB.

Finding #2 (deferred): ingestd doesn't release memory between
ingests. Go runtime conservative; long-running daemon, fine.

Finding #3: DuckDB-via-httpfs is healthy at 500K. GROUP BY 45ms,
count(*) 24ms, AVG 47ms, schema introspection 25ms. Sub-linear
scaling vs 100K — the s3:// read path is not a bottleneck.

Finding #4: ADR-010 type inference correctly handled real staffing
data. worker_id → BIGINT, numeric scores → DOUBLE, multi-line
resume_text → VARCHAR. 1000-row sample sufficient.

Finding #5: Go's encoding/csv handles RFC 4180 quoted-comma fields
and multi-line quoted text without LazyQuotes — confirming the D4
scrum's dismissal of Qwen's BLOCK on this point.

Net: substrate handles production-scale data with one config knob.
No correctness issues, no OOMs, no silent type errors.
All 6 G0 smokes still PASS after the cap-config change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:32:08 -05:00
root
b1d52306ad G0 D6: gateway reverse proxy fronting all 4 backing services · 2 scrum fixes · G0 COMPLETE
Last day of Phase G0. Gateway promotes the D1 stub endpoints into
real reverse-proxies on :3110 fronting storaged + catalogd + ingestd
+ queryd. /v1 prefix lives at the edge — internal services route on
/storage, /catalog, /ingest, /sql, with the prefix stripped by a
custom Director per Kimi K2's D1-plan finding.

Routes:
  /v1/storage/*  → storaged
  /v1/catalog/*  → catalogd
  /v1/ingest     → ingestd
  /v1/sql        → queryd

Acceptance smoke 6/6 PASS — every assertion goes through :3110, none
direct to backing services. Full ingest → storage → catalog → query
round-trip verified end-to-end. The smoke's "rows[0].name=Alice"
assertion is the architectural payoff: five binaries, six HTTP
routes, one round-trip through one edge.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                      1 BLOCK + 2 WARN + 2 INFO
  - Kimi K2-0905 (openrouter):                1 BLOCK + 3 WARN + 1 INFO (3 false positives, all from one wrong TrimPrefix theory)
  - Qwen3-coder (openrouter):                 5 completion tokens — "No BLOCKs."

Fixed (2, both Opus single-reviewer):
  O-BLOCK: Director path stripping fails if upstream URL has a
    non-empty path. The default Director's singleJoiningSlash runs
    BEFORE the custom code, so an upstream like http://host/api
    produces /api/v1/storage/... after the join — then TrimPrefix("/v1")
    is a no-op because the string starts with /api. Fix: strip /v1
    BEFORE calling origDirector. New TestProxy_SubPathUpstream regression
    locks this in. Today: bare-host URLs only, dormant — but moving
    gateway behind a sub-path in prod would have silently 404'd.
  O-WARN2: url.Parse is permissive — typo "127.0.0.1:3211" (no scheme)
    parses fine, produces empty Host, every request 502s. mustParseUpstream
    fail-fast at startup with a clear message naming the offending
    config field.

Dismissed (3, all Kimi, same false TrimPrefix theory):
  K-BLOCK "TrimPrefix loops forever on //v1storage" — false, single
    check-and-trim, no loop
  K-WARN "no upper bound on repeated // removal" — same false theory
  K-WARN "goroutines leak if upstream parse fails while binaries
    running" — confused scope; binaries are separate OS processes
    launched by the smoke script

D1 smoke updated (post-D6): the 501 stub probes are gone (gateway no
longer stubs /v1/ingest and /v1/sql). Replaced with proxy probes that
verify gateway forwards malformed requests to ingestd and queryd. Launch
order changed from parallel to dep-ordered (storaged → catalogd →
ingestd → queryd → gateway) since catalogd's rehydrate now needs
storaged, queryd's initial Refresh needs catalogd.

All six G0 smokes (D1 through D6) PASS end-to-end after every fix
round. Phase G0 substrate is complete: 5 binaries, 6 routes, 25 fixes
applied across 6 days from cross-lineage review.

G1+ next: gRPC adapters, Lance/HNSW vector indices, Go MCP SDK port,
distillation rebuild, observer + Langfuse integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:21:54 -05:00
root
9e9e4c26a4 G0 D5: queryd DuckDB SELECT over Parquet via httpfs · 4 scrum fixes
Phase G0 Day 5 ships queryd: in-memory DuckDB with custom Connector
that runs INSTALL httpfs / LOAD httpfs / CREATE OR REPLACE SECRET
(TYPE S3) on every new connection, sourced from SecretsProvider +
shared.S3Config. SetMaxOpenConns(1) so registrar's CREATE VIEWs and
handler's SELECTs serialize through one connection (avoids cross-
connection MVCC visibility edge cases).

Registrar.Refresh reads catalogd /catalog/list, runs CREATE OR
REPLACE VIEW "name" AS SELECT * FROM read_parquet('s3://bucket/key')
per manifest, drops views for removed manifests, skips on unchanged
updated_at (the implicit etag). Drop pass runs BEFORE create pass so
a poison manifest can't block other manifest refreshes (post-scrum
C1 fix).

POST /sql with JSON body {"sql":"…"} returns
{"columns":[{"name":"id","type":"BIGINT"},…], "rows":[[…]],
"row_count":N}. []byte → string conversion so VARCHAR rows
JSON-encode as text. 30s default refresh ticker, configurable via
[queryd].refresh_every.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                      1 BLOCK + 4 WARN + 4 INFO
  - Kimi K2-0905 (openrouter):                2 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):                 2 BLOCK + 1 WARN + 1 INFO

Fixed (4):
  C1 (Opus + Kimi convergent): Refresh aborts on first per-view error
    → drop pass first, collect errors, errors.Join. Poison manifest
    no longer blocks the rest of the catalog from re-syncing.
  B-CTX (Opus BLOCK): bootstrap closure captured OpenDB's ctx →
    cancelled-ctx silently fails every reconnect. context.Background()
    inside closure; passed ctx only for initial Ping.
  B-LEAK (Kimi BLOCK): firstLine(stmt) truncated CREATE SECRET to 80
    chars but those 80 chars contained KEY_ID + SECRET prefix → log
    aggregator captures credentials. Stable per-statement labels +
    redactCreds() filter on wrapped DuckDB errors.
  JSON-ERR (Opus WARN): swallowed json.Encode error → silent
    truncated 200 on unsupported column types. slog.Warn the failure.

Dismissed (4 false positives):
  Qwen BLOCK "bootstrap not transactional" — DuckDB DDL is auto-commit
  Qwen BLOCK "MaxBytesReader after Decode" — false, applied before
  Kimi BLOCK "concurrent Refresh + user SELECT deadlock" — not a
    deadlock, just serialization, by design with 10s timeout retry
  Kimi WARN "dropView leaves r.known inconsistent" — current code
    returns before the delete; the entry persists for retry

Critical reviewer behavior: 1 convergent BLOCK between Opus + Kimi
on the per-view error blocking, plus two independent single-reviewer
BLOCKs (B-CTX, B-LEAK) that smoke could never have caught. The
B-LEAK fix uses defense-in-depth: never pass SQL into the error
path AND redact known cred values from DuckDB's own error message.

DuckDB cgo path: github.com/duckdb/duckdb-go/v2 v2.10502.0 (per
ADR-001 §1) on Go 1.25 + arrow-go. Smoke 6/6 PASS after every
fix round.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:10:55 -05:00
root
4205ecd0f0 Pre-D5: extract CatalogClient to internal/catalogclient/ + add List
queryd (D5) needs the same HTTP client to catalogd that ingestd uses,
but the client lived in internal/ingestd — having queryd import from
ingestd would invert the data-flow direction (ingestd is upstream of
queryd; the package dep should not point back). Extract to a shared
internal/catalogclient/ package now, before D5 forces it under
implementation pressure.

Adds the List(ctx) method queryd will need for view registration.
Unit tests cover Register success/conflict and List success/error
paths against an httptest.Server fake.

ingestd's import flips from internal/ingestd → internal/catalogclient;
the wire format and behavior are unchanged. All four smokes (D1/D2/D3/
D4) PASS unchanged. DuckDB cgo path re-verified with the official
github.com/duckdb/duckdb-go/v2 (per ADR-001) on Go 1.25 + arrow-go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:58:34 -05:00
root
c1e411347a G0 D4: ingestd CSV → Parquet → catalogd register · 2 scrum fixes
Phase G0 Day 4 ships ingestd: multipart CSV upload, Arrow schema
inference per ADR-010 (default-to-string on ambiguity), single-pass
streaming CSV → Parquet via pqarrow batched writer (Snappy compressed,
8192 rows per batch), PUT to storaged at content-addressed key
datasets/<name>/<fp_hex>.parquet, register manifest with catalogd.
Acceptance smoke 6/6 PASS including idempotent re-ingest (proves
inference is deterministic — same CSV always produces same fingerprint)
and schema-drift → 409 (proves catalogd's gate fires on ingest traffic).

Schema fingerprint is SHA-256 over (name, type) tuples in header order
using ASCII record/unit separators (0x1e/0x1f) so column names with
commas can't collide. Nullability intentionally NOT in the fingerprint
— a column gaining nulls isn't a schema change.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                       4 WARN + 3 INFO (after 2 self-retracted BLOCKs)
  - Kimi K2-0905 (openrouter):                 1 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):                  2 BLOCK + 2 WARN + 2 INFO

Fixed (2, both Opus single-reviewer):
  C-DRIFT: PUT-then-register on fixed datasets/<name>/data.parquet
    meant a schema-drift ingest overwrote the live parquet BEFORE
    catalogd's 409 fired → storaged inconsistent with manifest.
    Fix: content-addressed key datasets/<name>/<fp_hex>.parquet.
    Drift writes to a different file (orphan in G2 GC scope); the
    live data is never corrupted.
  C-WCLOSE: pqarrow.NewFileWriter not Closed on error paths leaks
    buffered column data + OS resources per failed ingest.
    Fix: deferred guarded close with wClosed flag.

Dismissed (5, all false positives):
  Qwen BLOCK "csv.Reader needs LazyQuotes=true for multi-line" — false,
    Go csv handles RFC 4180 multi-line quoted fields by default
  Qwen BLOCK "row[i] OOB" — already bounds-checked at schema.go:73
    and csv.go:201
  Kimi BLOCK "type assertion panic if pqarrow reorders fields" —
    speculative, no real path
  Kimi WARN + Qwen WARN×2 "RecordBuilder leak on early error" —
    false convergent. Outer defer rb.Release() captures the current
    builder; in-loop release runs before reassignment. No leak.

Deferred (6 INFO + accepted-with-rationale on 3 WARN): sample
boundary type mismatch (G0 cap bounds peak), string-match
paranoia on http.MaxBytesError, multipart double-buffer (G2 spool-
to-disk), separator validation, body close ordering, etc.

The D4 scrum produced fewer real findings than D3 (2 vs 6) — both
were architectural hazards smoke wouldn't catch because the smoke's
"schema drift → 409" assertion was passing even in the corrupted-
state world. The 409 fires correctly; what was wrong was the PUT
having already mutated the live parquet before the validation check.
Opus's PUT-then-register read of the order is exactly the kind of
architectural insight the cross-lineage scrum is designed to surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:50:10 -05:00
root
66a704ca3e G0 D3: catalogd Parquet manifests + ADR-020 idempotent register · 6 scrum fixes
Phase G0 Day 3 ships catalogd: Arrow Parquet manifest codec, in-memory
registry with the ADR-020 idempotency contract (same name+fingerprint
reuses dataset_id; different fingerprint → 409 Conflict), HTTP client
to storaged for persistence, and rehydration on startup. Acceptance
smoke 6/6 PASSES end-to-end including rehydrate-across-restart — the
load-bearing test that the catalog/storaged service split actually
preserves state.

dataset_id derivation diverges from Rust: UUIDv5(namespace, name)
instead of v4 surrogate. Same name on any box generates the same
dataset_id; rehydrate after disk loss converges to the same identity
rather than silently re-issuing. Namespace pinned at
a8f3c1d2-4e5b-5a6c-9d8e-7f0a1b2c3d4e — every dataset_id ever issued
depends on these bytes.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                       1 BLOCK + 5 WARN + 3 INFO
  - Kimi K2-0905 (openrouter, validated D2):   2 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):                  2 BLOCK + 2 WARN + 2 INFO

Fixed:
  C1 list-offsets BLOCK (3-way convergent) → ValueOffsets(0) + bounds
  C2 Rehydrate mutex held across I/O → swap-under-brief-lock pattern
  S1 split-brain on persist failure → candidate-then-swap
  S2 brittle string-match for 400 vs 500 → ErrEmptyName/ErrEmptyFingerprint sentinels
  S3 Get/List shallow-copy aliasing → cloneManifest deep copy
  S4 keep-alive socket leak on error paths → drainAndClose helper

Dismissed (false positives, all single-reviewer):
  Kimi BLOCK "Decode crashes on empty Parquet" — already handled
  Kimi INFO "safeKey double-escapes" — wrong, splitting before escape is required
  Qwen INFO "rb.NewRecord() error unchecked" — API returns no error

Deferred to G1+: name validation regex, per-call deadlines, Snappy
compression, list pagination continuation tokens (storaged caps at
10k with sentinel for now).

Build clean, vet clean, all tests pass, smoke 6/6 PASS after every
fix round. arrow-go/v18 + google/uuid added; Go 1.24 → 1.25 forced
by arrow-go's minimum.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:36:57 -05:00
root
8cfcdb8e5f G0 D2: storaged S3 GET/PUT/LIST/DELETE · 3-lineage scrum · 4 fixes applied
Phase G0 Day 2 ships storaged: aws-sdk-go-v2 wrapper + chi routes
binding 127.0.0.1:3211 with 256 MiB MaxBytesReader, Content-Length
up-front 413, and a 4-slot non-blocking semaphore returning 503 +
Retry-After:5 when full. Acceptance smoke (6/6 probes) PASSES against
the dedicated MinIO bucket lakehouse-go-primary, isolated from the
Rust system's lakehouse bucket during coexistence.

Cross-lineage scrum on the shipped code:
  - Opus 4.7 (opencode): 1 BLOCK + 3 WARN + 3 INFO
  - Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO (3 false positives)
  - Kimi K2-0905 (openrouter, after route-shopping past opencode's 4k
    cap and the direct adapter's empty-content reasoning bug):
    1 BLOCK + 2 WARN + 1 INFO

Fixed:
  C1 buildRegistry ctx cancel footgun → context.Background()
     (Opus + Kimi convergent; future credential refresh chains)
  C2 MaxBytesReader unwrap through manager.Uploader multipart
     goroutines → Content-Length up-front 413 + string-suffix fallback
     (Opus + Kimi convergent; latent 500-instead-of-413 in 5-256 MiB range)
  C3 Bucket.List unbounded accumulation → MaxListResults=10_000 cap
     (Opus + Kimi convergent; OOM guard)
  S1 PUT response Content-Type: application/json (Opus single-reviewer)

Strict validateKey policy (J approved): rejects empty, >1024B, NUL,
leading "/", ".." path components, CR/LF/tab control characters.
DELETE exposed at HTTP layer (J approved option A) for symmetry +
smoke ergonomics.

Build clean, vet clean, all unit tests pass, smoke 6/6 PASS after
every fix round. go.mod 1.23 → 1.24 (required by aws-sdk-go-v2).

Process finding worth recording: opencode caps non-streaming Kimi at
max_tokens=4096; the direct kimi.com adapter consumed 8192 tokens of
reasoning but surfaced empty content; openrouter/moonshotai/kimi-k2-0905
delivered structured output in ~33s. Future Kimi scrums should default
to that route.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:23:03 -05:00
Claw
ad2ec1aca9 G0 D1 hardened: 3-lineage scrum review on shipped code · 7 fixes applied
Code-review pass after D1 shipped, all three model lineages running
in parallel against the actual Go source (not docs):

Convergent findings (≥2 reviewers — high confidence):
- C1 BLOCK · Run() errCh/select race could silently drop fast bind
  errors. Fixed: net.Listen() now runs synchronously before the
  goroutine; bind errors surface as Run()'s return value.
- C2 BLOCK · scripts/d1_smoke.sh sleep 0.5 races bind on cold boxes.
  Fixed: replaced with poll_health() loop, 5s/svc budget, 50ms poll.
- C3 WARN · LoadConfig silent fallback when file missing. Fixed:
  emits slog.Warn with path + hint when path given but file absent.

Single-reviewer fixes:
- S1 WARN · slog.SetDefault inside Run() mutated global state from a
  library function. Fixed: Run() no longer calls SetDefault.
- S2 WARN · os.IsNotExist → errors.Is(err, fs.ErrNotExist) idiom.
- S6 WARN · smoke double-curl collapsed to single curl -i parse.

Second-pass Opus review on post-fix code caught one more:
- head -1 on curl -i fragile against 1xx interim lines. Fixed:
  awk picks the last HTTP/* status line (robust to 100 Continue).

Accepted with rationale (deferred or planned):
- S3 secrets-in-lakehouse.toml: D2.3 SecretsProvider already planned
- S4 5x cmd/*/main.go duplication: defer until D2 reveals real
  per-service config consumption
- S5 /health log volume: defer post-G0, not on k8s yet
- 2nd-pass theoreticals: clean-exit-no-Shutdown path doesn't trigger,
  defensive defer ln.Close() aspirational, etc.

Verification:
- go build ./cmd/...  exit 0
- go vet ./...         clean
- ./scripts/d1_smoke.sh  D1 acceptance gate: PASSED
- 3-lineage code review · 14 findings · 7 fixed · 0 deferred · 5
  accepted with rationale

Total D1 review coverage across the phase:
- 3 doc-review passes (Opus + Kimi + Qwen) — 13 findings, 10 fixed
- 1 runtime smoke — 1 finding (port 3100 collision), fixed
- 1 code-review parallel pass — 14 findings, 7 fixed
- 1 code-review second pass (Opus) — 1 actionable, fixed
- Cumulative: 29 findings · 19 fixed inline · 5 accepted · 5 deferred

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 07:07:50 -05:00
Claw
1142f54f23 G0 D1 ships: skeleton + chi + /health × 5 binaries · acceptance gate PASSED
Phase G0 Day 1 executed end-to-end after a third-pass review by
qwen3-coder:480b consolidated all findings across Opus/Kimi/Qwen
lineages.

Cross-lineage review consolidation (3 model passes + 1 runtime pass):
- Opus 4.7: 9 findings · 7 fixed inline · 2 deferred
- Kimi K2.6: 2 BLOCKs (introduced by Opus fixes) · 2 fixed
- Qwen3-coder:480b: 2 WARNs · 1 fixed (D2.4 256 MiB cap + 4-slot
  semaphore on PUTs) · 1 deferred (Q2 view refresh batching)
- Runtime smoke: 1 finding (port 3100 collision with live Rust
  lakehouse) · fixed (Go dev ports shifted to 3110+)
- Total: 14 findings · 11 fixed · 3 deferred to G2

What landed in code:
- internal/shared/server.go — chi factory, slog JSON, /health,
  graceful shutdown via signal.NotifyContext
- internal/shared/config.go — TOML loader, DefaultConfig, -config flag
- cmd/{gateway,storaged,catalogd,ingestd,queryd}/main.go — five
  binaries, each ~30 lines using the shared factory
- lakehouse.toml — G0 dev defaults (3110-3214)
- scripts/d1_smoke.sh — repeatable smoke that exits 0 on PASS
- go.mod / go.sum — chi v5.2.5, pelletier/go-toml/v2 v2.3.0

Verified end-to-end via scripts/d1_smoke.sh:
- All 5 /health endpoints return 200 with correct service name
- Gateway /v1/ingest + /v1/sql stubs return 501 with X-Lakehouse-Stub
- Graceful shutdown logs cleanly on SIGTERM
- DuckDB cgo path verified separately (sql.Open("duckdb","") + ping)

D1 ACCEPTANCE GATE: PASSED.

Next: D2 — storaged S3 GET/PUT/LIST against MinIO.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 07:00:37 -05:00
Claw
a74fdb1204 docs: Phase G0 kickoff — Kimi K2.6 cross-lineage review pass
Second-pass review via opencode/kimi-k2.6 (different lineage than
Opus 4.7 used in the first pass) caught 2 BLOCKs that Opus missed —
and that the Opus-pass fixes themselves introduced:

- K1: D0.6 used `go install pkg@latest` to verify cgo, but that
  command requires a main package; duckdb-go/v2 is a library, so
  the verification fails BEFORE exercising cgo and could pass on a
  broken-cgo box. Replaced with a real compile-and-run smoke
  (tmp module + 5-line main.go that imports + calls sql.Open).
- K2: Gateway stubbed /v1/ingest and /v1/sql in D1.10, but ingestd
  serves /ingest and queryd serves /sql. httputil.NewSingleHostReverseProxy
  preserves the inbound path by default — D6.1 now specifies a
  custom Director that strips the /v1 prefix before forwarding.

Demonstrates the cross-lineage rotation value: one model's review
of the original ≠ different model's review of the post-fix version.
Same dynamic the Rust auditor exploits with Kimi/Haiku/Opus.

Disposition table appended below the Opus pass for full audit trail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:50:54 -05:00
Claw
ed3ccf7c53 docs: Phase G0 kickoff plan + scrum-style independent review
7-day day-by-day plan for the smallest end-to-end ingest+query path
in Go: D0 ops setup → D1 skeleton + chi + /health × 5 binaries → D2
storaged S3 → D3 catalogd Parquet manifests → D4 ingestd CSV→Parquet
→ D5 queryd DuckDB → D6 gate-day end-to-end → D7 cleanup + retro.

Plan was reviewed by opencode/claude-opus-4-7 via the gateway
(same path the production overseer correction loop uses post-G0).
9 findings (2 BLOCK + 5 WARN + 2 INFO):

- 2 BLOCK fixed inline:
  - cgo build dependency surfaced on D0 not D5
  - DuckDB CREATE SECRET (S3) plumbed from SecretsProvider on D5.1
- 4 of 5 WARN fixed inline:
  - storaged binds 127.0.0.1 only + 2 GiB body cap
  - queryd uses TTL-cached views + etag invalidation, not refresh-per-call
  - gateway reverse-proxy stubbed on D1.10 (501), promoted on D6
  - ADR stubs go in at start of D4/D5, finalized on D7
- 1 WARN deferred (orphan GC on two-phase write — punted to G2)
- 1 WARN accepted with note (shared-server.go refactor — G1+ follow-up)
- 2 INFO fixed inline (go mod tidy timing, ADR-after-fact inversion)

Disposition table appended to the doc itself for auditability —
matches the human_overrides.jsonl pattern from the Rust auditor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:47:15 -05:00
Claw
29468b1413 docs: 2026-04-28 upstream survey — three SPEC-changing pivots
Pre-Phase-G0 research sweep against current Go ecosystem state. Three
upstream changes that the day-of SPEC missed:

1. DuckDB Go binding ownership transferred. marcboeker/go-duckdb is
   deprecated as of v2.5.0 — official maintainer is now
   github.com/duckdb/duckdb-go/v2 (DuckDB team + Marc Boeker joint
   hand-off). Current v2.10502.0 / DuckDB v1.5.2. SPEC §3.1 +
   component table updated.

2. Official Go MCP SDK exists. Switching from mark3labs/mcp-go
   (community) to github.com/modelcontextprotocol/go-sdk (official,
   Google collaboration, v1.5.0 stable, 4.4k stars, targets MCP spec
   2025-11-25). Component table updated.

3. arrow-go is on v18, not v15. v18.5.2 (March 2026) has parquet
   encryption fixes relevant for PII-masked safe views. PRD locked
   stack + SPEC component table updated.

Validated unchanged: coder/hnsw (220 stars, active), chi (still the
clean-architecture pick over fiber/gin/echo).

Surfaced for future use: anthropics/anthropic-sdk-go (official,
available for direct Claude calls bypassing opencode if ever needed),
duckdb-wasm (browser-side analytics future option), IVF as HNSW
fallback if recall gate fails.

See docs/RESEARCH_LOG_2026-04-28.md for full survey + sources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:40:26 -05:00
Claw
f07668064e docs: seed PRD + SPEC for the Go-direction rewrite
Two documents only — no Go code yet. PRD restates the problem and
preserves the Rust PRD's invariants verbatim, then maps the locked
stack to Go libraries and surfaces four hard problems (DuckDB-via-cgo
for the query engine, Lance dropped, Dioxus → HTMX, arrow-go maturity).
SPEC walks each Rust crate + TS surface and tags the port with library
choice / effort estimate / risk + a 5-phase migration plan from
skeleton (Phase G0) to demo parity (Phase G5).

Six open questions remain that gate Phase G0:
- DuckDB cgo OK?
- HTMX vs React for the UI?
- Repo location?
- Distillation v1.0.0 port verbatim or rebuild?
- Pathway memory data — port 88 traces or start clean?
- Auditor lineage — port audit_baselines.jsonl or restart?

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:35:23 -05:00