33 Commits

Author SHA1 Message Date
root
1a3a82aedb validatord: coordinator session JSONL for offline analysis (B follow-up)
Closes the second half of J's 2026-05-02 multi-call observability
concern. Trace-id propagation (commit d6d2fdf) gave us the *live*
view in Langfuse; this gives us the *longitudinal* view for ad-hoc
DuckDB queries over thousands of sessions:

  "show me every session where the model produced a real candidate
   without ever needing a retry"
  "find sessions where validation rejected three times in a row"
  "first-shot success rate per model — did we feed it enough corpus?"

## What's in

internal/validator/session_log.go:
  - SessionRecord type (schema=session.iterate.v1)
  - SessionLogger writer — mutex-guarded append, best-effort posture,
    nil-safe (NewSessionLogger("") = nil = no-op on Append)
  - BuildSessionRecord helper — assembles a row from any
    iterate response/failure/infra-error combination, callable from
    other daemons that wrap iterate (cross-daemon shared schema)
  - 7 unit tests including concurrent-append safety + the three
    code paths (success / max_iter_exhausted / infra_error)

cmd/validatord/main.go:
  - handlers.sessionLog field + wiring from cfg.Validatord.SessionLogPath
  - Iterate handler: build + append a SessionRecord on every call
  - rosterCheckFor("fill") closure stamps grounded_in_roster — the
    load-bearing forensic property J flagged ("we can never
    hallucinate available staff members to contracts")

internal/shared/config.go + lakehouse.toml:
  - [validatord].session_log_path field; empty = disabled
  - Production: /var/lib/lakehouse/validator/sessions.jsonl

scripts/validatord_smoke.sh:
  - Adds a probe verifying validatord announces session log path on
    startup. Smoke is now 6/6 (was 5/5).

docs/SESSION_LOG.md:
  - Schema reference + 5 worked DuckDB query examples including the
    "alarm" query (sessions where grounded_in_roster=false on an
    accepted fill — should always be empty; if not, something is
    bypassing FillValidator).

## What this is NOT

This is NOT a duplicate of replay_runs.jsonl. They're siblings:
  - replay_runs.jsonl: replay tool's per-task retrieval+model output
  - sessions.jsonl: validatord's per-iterate full retry chain +
    grounded-in-roster verdict

A single coordinator session can produce rows in both streams; the
session_id (= Langfuse trace_id) is the join key.

## Layered observability now in place

  Live view:  Langfuse trace tree (X-Lakehouse-Trace-Id propagation)
              `iterate.attempt[N]` spans with prompt/raw/verdict
  Offline:    coordinator_sessions.jsonl (this commit)
              DuckDB-queryable; longitudinal forensics
  Hard gate:  FillValidator + WorkerLookup (existing)
              phantom IDs structurally rejected, never reach
              session log's grounded_in_roster=true bucket

Per the architecture invariant in STATE_OF_PLAY's DO NOT RELITIGATE
section — these layers are wired; future work targets the data, not
the wiring.

## Verification

- internal/validator: 7 new tests (session_log_test.go) — all PASS
- cmd/validatord: 3 new integration tests covering the success,
  failure, and grounded=false paths — all PASS
- validatord_smoke.sh: 6/6 PASS through gateway :3110
- Full go test ./... green across 33 packages

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:22:09 -05:00
root
afdeca80d9 docs: record Python sidecar drop in architecture comparison
Companion to lakehouse commit ba928b1 (aibridge: drop Python sidecar
from hot path; AiClient → direct Ollama).

ARCHITECTURE_COMPARISON.md:
  - Decisions tracker: "Drop Python sidecar" moves _open_ → DONE
  - "Cross-cutting abstracts to address" item #3 marked complete
  - "Python dependency (the load-bearing axis)" section reframed:
    pre/post-2026-05-02 diagram, lab UI / pipeline_lab clarified
    as dev-only Python (not on runtime hot path)
  - Change log: new entry

STATE_OF_PLAY: timestamp refresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:00:51 -05:00
root
7d6636b33e validator: align ValidationError JSON to Rust serde shape (6/6 parity)
Closes the 2026-05-02 parity finding: validator_parity probe found
5/6 body shapes diverging because Go emitted {"Kind":"...","Field":"...","Reason":"..."}
while Rust emits the externally-tagged-enum {"Schema":{"field":"...","reason":"..."}}.
A caller parsing the error envelope would break silently in cutover.

## Changes

internal/validator/types.go:
- Custom MarshalJSON emits the Rust shape:
    Schema:       {"Schema":      {"field":"x","reason":"y"}}
    Completeness: {"Completeness":{"reason":"y"}}
    Consistency:  {"Consistency": {"reason":"y"}}
    Policy:       {"Policy":      {"reason":"y"}}
- Custom UnmarshalJSON accepts BOTH the new Rust shape AND the legacy
  flat shape (migration safety for any persisted error rows).
- Unknown variants (e.g. a future Rust addition Go hasn't learned)
  surface as an Unmarshal error, not a silent default.

internal/validator/types_test.go:
- 4 pinning tests anchor the wire format. Failing them = wire-format
  drift; the parity probe is the secondary line of defense.

scripts/validatord_smoke.sh:
- Updated probes to read the new variant-name shape (jq keys[0],
  .Schema.field) instead of legacy .Kind/.Field.

## Verification

- internal/validator unit tests: PASS (4 new + all existing).
- cmd/validatord HTTP tests: PASS (UnmarshalJSON falls through to flat
  shape so existing tests reading ValidationError still work).
- validatord_smoke.sh: 5/5 PASS through gateway :3110.
- validator parity probe re-run: **6/6 match** (was 1/6).

## Pattern

Per architecture_comparison's "use the dual-implementation as a
measurement instrument" thesis: a parity probe surfaced this gap;
50 LOC of MarshalJSON closed it; 4 pinning tests prevent regression;
the probe is the longitudinal gate. Cutover-friendly direction (Go
matches Rust) chosen because Rust is the existing production
contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:49:28 -05:00
root
b0c8a3f227 parity probes: materializer + extract_json (caught + fixed real bug)
Two new cross-runtime parity probes joining the validator probe from
the gauntlet wave. Pattern: feed identical input through Rust and Go;
diff outputs. Each probe surfaced a different signal.

## Materializer parity probe
scripts/cutover/parity/materializer_parity.sh runs Bun + Go
materializer against an identical synthetic data/_kb/ root, diffs the
resulting evidence/ JSONL byte-equivalent (modulo provenance.recorded_at).

**First run: 0/2 match.** Real finding: Go's Provenance.LineOffset
had `json:"line_offset,omitempty"` which strips the field when value
is 0. Line offset 0 is the FIRST ROW of every source file — a real
semantic value, not absent. Bun side always emits it.

Fix: drop `omitempty` on Provenance.LineOffset. Updated comment
explaining why.

**Re-run: 2/2 match.** On-wire JSON parity holds.

## extract_json parity probe
scripts/cutover/parity/extract_json_parity.sh feeds 12 fixture
strings through both runtimes' extract_json:
  - fenced ```json``` blocks
  - unfenced ``` blocks
  - bare braces with prose around
  - first-balanced-of-many
  - nested objects
  - unicode in string values
  - escaped quotes
  - empty object
  - top-level array (both return first inner object)
  - no JSON
  - depth-balanced but invalid syntax
  - trailing garbage

Substrate gate: cargo test -p gateway extract_json PASS before probe.

**Result: 12/12 match.** Algorithms genuinely equivalent.

## scripts/cutover/parity/extract_json_helper/main.go
Tiny Go binary that reads stdin, calls validator.ExtractJSON, prints
{matched, value} JSON. Counterpart to the Rust parity_extract_json
binary in golangLAKEHOUSE's sibling lakehouse repo (separate commit).

## Pattern crystallized
Every cross-runtime port should land with a parity probe. Three
probes now exist:
  - validator (5/6 wire-format gap captured 2026-05-02)
  - materializer (caught + fixed real bug 2026-05-02)
  - extract_json (12/12 match 2026-05-02)

The instrument is reusable — each new shared HTTP/CLI surface gets
a probe row added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:43:54 -05:00
root
e8cf113af8 gauntlet 2026-05-02: smoke chain + per-component scrum + parity probe
Production-readiness gauntlet exploiting the dual Rust/Go
implementation as a measurement instrument.

## Phase 1 — Full smoke chain
21/21 PASS in ~60s. Substrate intact across the full service surface.

## Phase 2 — Per-component scrum (token-volume fix)
Prior wave (165KB diff): Kimi 62 tokens out, Qwen 297 → no useful
analysis. This wave splits today's commits into 4 focused bundles
(36-71KB each):
  c1 validatord (46KB) → 0 convergent / 11 distinct
  c2 vectord substrate (36KB) → 0 convergent / 10 distinct
  c3 materializer (71KB) → 0 convergent / 6 distinct (Opus emitted
                           a BLOCK then self-retracted in same response)
  c4 replay (45KB) → 0 convergent / 10 distinct

Reviewer engagement vs prior wave: Kimi went 62 → ~250 tokens out
once bundles dropped below 60KB.

scripts/scrum_review.sh hardening:
  * Diff-size guard (warn >60KB, hard-fail >100KB,
    SCRUM_FORCE_OVERSIZE=1 override)
  * Tightened prompt — file path must appear EXACTLY as in diff
    so post-processor can grep WHERE: lines reliably
  * Auto-tally step dedupes by (reviewer, location); convergence
    counts distinct lineages (closes the prior `opus+opus+opus`
    false-convergence bug)

## Phase 3 — Cross-runtime validator parity probe (the headline finding)
scripts/cutover/parity/validator_parity.sh sends 6 identical
/v1/validate cases to Rust :3100 AND Go :4110, compares status+body.

Result: **6/6 status codes match · 5/6 body shapes diverge.**

Rust returns serde-tagged enum:   {"Schema":{"field":"x","reason":"y"}}
Go returns flat exported-fields:  {"Kind":"schema","Field":"x","Reason":"y"}

Both round-trip inside their own runtime; a caller swapping one for
the other would break parsing silently. Captured as new _open_ row
in docs/ARCHITECTURE_COMPARISON.md decisions tracker.

This is the "use the dual-implementation as a measurement instrument"
return — single-repo scrums can't catch this class of cross-runtime
drift.

## Phase 4 — Production assessment
ship-with-known-gap. Validator wire-format gap is documented, not
regressed. ~50 LOC future fix on Go side (custom MarshalJSON on
ValidationError to match Rust's serde shape).

Persistent stack config (/tmp/lakehouse-persistent.toml) gains
validatord on :3221 + persistent-validatord binary so operators
bringing up the persistent stack get the new daemon automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:05:18 -05:00
root
f9e72412c1 validatord: /v1/validate + /v1/iterate HTTP surface (port 3221)
Closes the last "Go primary" backlog item in
docs/ARCHITECTURE_COMPARISON.md. Go now owns the entire validator
path end-to-end — no Rust dep for staffing safety net.

Architecture: cmd/validatord on :3221 hosts both endpoints. Calls
chatd directly for the iterate loop's LLM hop (no gateway
self-loopback like the Rust shape). Gateway proxies /v1/validate +
/v1/iterate to validatord.

What's in:
- internal/validator/playbook.go — 3rd validator kind (PRD checks:
  fill: prefix, endorsed_names ≤ target_count×2, fingerprint required)
- internal/validator/lookup_jsonl.go — JSONL roster loader (Parquet
  deferred; producer one-liner documented in package comment)
- internal/validator/iterate.go — ExtractJSON helper + Iterate
  orchestrator with ChatCaller seam for unit tests
- cmd/validatord/main.go — HTTP routes, roster load, chat client
- internal/shared/config.go — ValidatordConfig + gateway URL field
- lakehouse.toml — [validatord] section
- cmd/gateway/main.go — proxy routes for /v1/validate + /v1/iterate

Smoke: 5/5 PASS through gateway :3110:
  ✓ playbook happy path
  ✓ playbook missing fingerprint → 422 schema/fingerprint
  ✓ phantom candidate W-PHANTOM → 422 consistency
  ✓ unknown kind → 400
  ✓ roster loaded with 3 records

go test ./... green across 33 packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:53:20 -05:00
root
89ca72d471 materializer + replay ports + vectord substrate fix verified at scale
Two threads landing together — the doc edits interleave so they ship
in a single commit.

1. **vectord substrate fix verified at original scale** (closes the
   2026-05-01 thread). Re-ran multitier 5min @ conc=50: 132,211
   scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix).
   Throughput dropped 1,115 → 438/sec because previously-broken
   scenarios now do real HNSW Add work — honest cost of correctness.
   The fix (i.vectors side-store + safeGraphAdd recover wrappers +
   smallIndexRebuildThreshold=32 + saveTask coalescing) holds at the
   footprint that originally surfaced the bug.

2. **Materializer port** — internal/materializer + cmd/materializer +
   scripts/materializer_smoke.sh. Ports scripts/distillation/transforms.ts
   (12 transforms) + build_evidence_index.ts (idempotency, day-partition,
   receipt). On-wire JSON shape matches TS so Bun and Go runs are
   interchangeable. 14 tests green.

3. **Replay port** — internal/replay + cmd/replay +
   scripts/replay_smoke.sh. Ports scripts/distillation/replay.ts
   (retrieve → bundle → /v1/chat → validate → log). Closes audit-FULL
   phase 7 live invocation on the Go side. Both runtimes append to the
   same data/_kb/replay_runs.jsonl (schema=replay_run.v1). 14 tests green.

Side effect on internal/distillation/types.go: EvidenceRecord gained
prompt_tokens, completion_tokens, and metadata fields to mirror the TS
shape the materializer transforms produce.

STATE_OF_PLAY refreshed to 2026-05-02; ARCHITECTURE_COMPARISON decisions
tracker moves the materializer + replay items from _open_ to DONE and
adds the substrate-fix scale verification row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:31:02 -05:00
root
277884b5eb multitier_100k: 335k scenarios @ 1,115/sec against 100k corpus, 4/6 at 0% fail
J asked for a much more sophisticated test using the 100k corpus from
the Rust legacy database. This commit ships:

scripts/cutover/multitier/main.go — 6-scenario harness with weighted
random selection per goroutine. Mixes search, email/SMS/fill
validators (in-process via internal/validator), profile swap with
ExcludeIDs, repeat-cache exercise, and playbook record/replay.

Scenarios + weights (cumulative scenario fractions):
  35% cold_search_email      — search + email outreach + EmailValidator
  15% surge_fill_validate    — search + fill proposal + FillValidator + record
  15% profile_swap           — original search + ExcludeIDs swap + no-overlap check
  15% repeat_cache           — same query × 5 (cache effectiveness)
  10% sms_validate           — SMS draft (≤160 chars, phone for SSN-FP guard)
  10% playbook_record_replay — cold → record → warm w/ use_playbook=true

Test results (5-min sustained, conc=50, 100k workers indexed):
  TOTAL 335,257 scenarios @ 1,115/sec
  cold_search_email     117k @ 0.0% fail · p50 2.2ms · p99 8.6ms
  surge_fill_validate    50k @ 98.8% fail (substrate bug below)
  profile_swap           50k @ 0.0% fail · p50 4.5ms · ExcludeIDs verified
  repeat_cache           50k × 5 = 252k searches @ 0.0% fail · p50 11.7ms
  sms_validate           33k @ 0.0% fail · phone-pattern guard works
  playbook_record_replay 33k @ 96.8% fail (substrate bug below)
  Total successful workflows: ~250k+

Validator integration verified at load:
  150,930 EmailValidator passes across cold_search_email + sms_validate
  35 + 1,061 successful FillValidator + playbook_record (where the bug
    didn't fire)
  zero false positives on the SSN-pattern guard against phone numbers

Resource footprint at 100k:
  vectord 1.23GB RSS (linear with 100k vectors)
  matrixd 26MB, 75% CPU (1-core saturated at conc=50)
  Total across 11 daemons: 1.7GB
  Compare to Rust at 14.9GB — ~10× less even at 100k.

SUBSTRATE BUG SURFACED: coder/hnsw v0.6.1 nil-deref in
layerNode.search at graph.go:95. Triggers on /v1/matrix/playbooks/record
under sustained writes to the small playbook_memory index. Both Add
and Search paths can panic.

Workaround applied (this commit) in internal/vectord/index.go
BatchAdd: recover() guard converts panic to error; daemon stays up
instead of crashing the request handler.

Operator recovery procedure (also documented in the report):
  curl -X DELETE http://localhost:4215/vectors/index/playbook_memory
Next record recreates the index fresh.

Real fix DEFERRED — open in docs/ARCHITECTURE_COMPARISON.md
Decisions tracker. Three options:
  a) upstream patch to coder/hnsw
  b) custom small-index Add path that always rebuilds when len < threshold
  c) alternate store for playbook_memory (Lance? in-memory map?)

Evidence: reports/cutover/multitier_100k.md (full methodology +
results + repro + bug analysis). docs/ARCHITECTURE_COMPARISON.md
Decisions tracker updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 06:28:50 -05:00
root
2a974d6dea docs: ARCHITECTURE_COMPARISON.md as living source file
Per J's request: move the parallel-runtime comparison from
reports/cutover/ (where it lived as cutover-prep evidence) into
docs/ as the source-of-truth file. J will keep updating it as
fixes ship on either side.

Restructured for living-document use:
- Status header (last refresh date, owner, update triggers)
- 'How to update this doc' section with explicit dos and don'ts
- Decisions tracker at top — actioned items with commit refs
  + open backlog with LOC estimates
- Each comparison section now has 'Last verified' columns where
  numbers are time-sensitive
- Change log section at bottom for one-line entries on every
  meaningful refresh

The original at reports/cutover/architecture_comparison.md gains
a 'THIS IS A SNAPSHOT' header pointing at the docs/ source. Kept
as historical record but no longer the place to update.

Sister pointer file in /home/profit/lakehouse/docs/ARCHITECTURE_COMPARISON.md
so the doc is reachable from either repo side. That file explicitly
says the source lives in golangLAKEHOUSE and warns against
authoritative content in the pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:56:20 -05:00
root
814197cfd3 ADR-006: auth posture for non-loopback deploy + token rotation impl
ADR-003 locked the auth substrate; ADR-006 ratifies the operator
playbook + adds two implementation pieces needed for Sprint 4
deployment: env-resolved tokens and dual-token rotation.

Six decisions locked in docs/DECISIONS.md:
- 6.1: Non-loopback bind requires auth.token (mechanical gate at
       shared.Run, already implemented; this ratifies it).
- 6.2: Token from env, not TOML. /etc/lakehouse/auth.env (mode 0600)
       loaded by systemd EnvironmentFile=. New TokenEnv field on
       AuthConfig defaults to "AUTH_TOKEN".
- 6.3: AllowedIPs for inter-service same-trust-domain; Token for
       cross-trust-boundary (gateway ↔ external).
- 6.4: /health stays unauthenticated; everything else under
       shared.Run is gated. Already implemented; ratified here.
- 6.5: Token rotation is dual-token. New SecondaryTokens []string
       on AuthConfig — both primary and any secondary pass auth
       during the rotation window. Implemented in this commit.
- 6.6: TLS terminates at the network edge (nginx/Caddy), not
       in-process. Daemons stay HTTP-only; internal traffic stays
       on private subnets per Decision 6.3.

Implementation:
- internal/shared/config.go: AuthConfig gains TokenEnv +
  SecondaryTokens fields. New resolveAuthFromEnv() called by
  LoadConfig fills Token from os.Getenv(TokenEnv) when Token is
  empty. TokenEnv defaults to "AUTH_TOKEN" so the happy path needs
  no TOML config.
- internal/shared/auth.go: RequireAuth pre-encodes Bearer headers
  for primary + every secondary token; per-request constant-time
  compare walks the slice. Fast path is 1 compare (primary).

Tests:
- TestLoadConfig_AuthTokenFromEnv (3 sub-tests): default env name,
  custom token_env, explicit Token wins over env.
- TestRequireAuth_SecondaryTokenAccepted: both primary + secondary
  tokens pass during rotation window.
- TestRequireAuth_SecondaryTokensOnly: only-secondary path works
  for the case where primary was just promoted-to-empty mid-rotation.

go test ./internal/shared all green; existing auth_test.go
unchanged (constant-time compare path preserved).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:51:14 -05:00
root
2c71d1c637 ADR-005: observer fail-safe semantics
Closes the OPEN item from STATE_OF_PLAY. Required because observerd is
now on the prod-realistic data path via the lift harness boot (b2e45f7),
so the next consumer (scrum runner / distillation rebuild / production
workflow) needs the fail-safe rationale locked, not implicit.

The Rust "verdict:accept on crash" anti-pattern doesn't translate
one-to-one to the Go observer (witness, not gate). But four adjacent
fail-safe decisions are real and live:

5.1 Persist failure is logged-not-fatal; ring is in-flight source of
    truth. Persist-required mode deferred to a future opt-in ADR.

5.2 Mode failure → Success=false, no panic-swallow path. The runner
    catches mode errors and surfaces them via node.Error; downstream
    consumers see failures explicitly rather than as fake successes
    (the Rust anti-pattern surface).

5.3 One row per node, recorded post-run. A workflow with N nodes
    produces N audit rows, never a per-workflow catch-all that
    survives partial crashes. Known gap: recording happens after
    runner.Run returns (acceptable for short workflows; streaming
    callback is the right shape when workflows get longer).

5.4 /observer/event accepts on full ring (oldest evicted). Refusing
    to write would translate every burst into client errors — wrong
    direction for an audit witness.

Mostly ratifies existing behavior; cross-checked claims against
actual code (caught one error in Decision 5.3 draft — recording is
post-run-batched, not per-node-as-it-completes — and the ADR now
states reality).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:32:12 -05:00
root
511083ae40 docs: SPEC §3.9 (chatd) + §3.10 (local-review-harness sibling)
- SPEC §1 component table: add chatd row marked DONE; replaces
  Rust gateway's v1::ollama_cloud / openrouter / opencode adapters
  + the aibridge crate.
- SPEC §3.9 — chatd shipped: 5-provider routing (ollama, ollama_cloud,
  openrouter, opencode, kimi) by model-name prefix or :cloud suffix.
  Captures the Anthropic 4.7 temperature-deprecation quirk + the
  local-Ollama think=false default that the playbook_lift judge
  needed. Mentions scrum_review.sh as the reusable cross-lineage
  vehicle eating chatd's own /v1/chat.
- SPEC §3.10 — local-review-harness sibling tool: separate repo at
  git.agentview.dev/profit/local-review-harness, MVP shipped today.
  Documents the cross-pollination plan for when both substrates
  stabilize (chatd as the harness's LLM backend; harness findings
  as Lakehouse pathway-memory drift signal; .memory/known-risks
  as a matrix corpus). Explicit "don't re-port" so future Claudes
  don't try to absorb the harness into Lakehouse.
- STATE_OF_PLAY.md: SIBLING TOOLS section with 1-line summary
  + pointer to SPEC §3.10.

No code changes. just verify still PASS — touched only docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 01:01:23 -05:00
root
97dd3f826d SPEC §3.5/§3.6/§3.7/§3.8 — name F/B/C as port targets + add Archon-style workflow runner
Per the 2026-04-29 scope-discipline pause: the wave shipped four
pieces beyond SPEC §3.4 component scope, and one architectural
pattern surfaced (Archon-style multi-pass workflow runner) that's
the observer's natural growth path. Document them as port targets
so the next scrum review has authoritative SPEC components.

§3.5 — Drift quantification (loop 5 of the PRD)
  Names the SCORER drift work shipped in be65f85 + the deferred
  shapes (PLAYBOOK drift, EMBEDDING drift, AUDIT BASELINE drift).
  Acceptance gates G3.5.A–B.

§3.6 — Staffing-side structured filter
  Names the metadata-filter MVP shipped in b199093 + the deferred
  pre-retrieval SQL gate via queryd. Acceptance gates G3.6.A–C.

§3.7 — Operational rating wiring
  Names the bulk playbook-record endpoint shipped in 6392772 + the
  deferred UI shim, negative-feedback path, and time-decay.
  Acceptance gates G3.7.A–B.

§3.8 — Observer-KB workflow runner (Archon-style multi-pass) —
       PORT TARGET, not yet started
  Documents the architecture J was working on across the Rust
  observer-kb branch (10 commits ahead of main, never merged) and
  the local Archon mod (committed 2026-04-29 as 3f2afc8 in
  /home/profit/external/Archon, not pushed to coleam00/Archon).

  The pattern: multi-pass mode chain (extract → validator →
  hallucination → consensus → redteam → pipeline → render) where
  each pass is a deterministic measurement. The observer is the
  natural home — workflows ARE observation patterns whose every
  step is recorded. Five components in dependency order: workflow
  definition (YAML), node executor (DAG runner), provenance
  recording (ObservedOps), mode catalog (matrix.search,
  distillation.score, drift.scorer, llm.chat), HTTP surface
  (/v1/observer/workflow/run).

  Reference materials on the system (preserved, not lost):
    - /home/profit/lakehouse/.archon/workflows/lakehouse-architect-review.yaml
      (Rust main, 69919d9) — 3-node Archon-via-Lakehouse proof
    - /home/profit/external/Archon dev branch — upstream engine
      with local pi/provider.ts mod (3f2afc8) for Lakehouse routing
    - Rust observer-kb branch — apps/observer-kb/docs/PRD.md +
      Python prototypes proven on real ChatGPT/Claude PDF data

  Acceptance gates G3.8.A–D. Estimated effort: L.

PRD updated with "Observer as system resource (clarified
2026-04-29)" section pointing at §3.8 as the architectural growth
path. The bare-bones observerd in bc9ab93 is the substrate; the
workflow runner is what makes it the "objective measurement engine"
the small-model pipeline needs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:27:41 -05:00
root
a7620c8b6f PRD: name the product vision — small-model pipeline + 5-loop substrate
Adds a "Product vision" section before the Direction-pivot section.
Captures the framing J flagged 2026-04-29: the Go refactor is not the
goal. The goal is a small-model-driven autonomous pipeline that gets
better with each run, with frontier models in audit/oversight, not
the hot path.

Five loops named explicitly:
  1. Knowledge pathway (pathway memory + matrix indexer)
  2. Execution (small models on focused context)
  3. Observer (refines configs that got the model to a good pathway)
  4. Rating + distillation (outcomes fold back into the playbook)
  5. Drift (measure when the playbook stops matching reality)

Triage / human-in-loop named as the system's job, not an escape
hatch. The gate: "playbook + matrix indexer must give the results
we're looking for" — single load-bearing acceptance criterion.

Why Go after Rust: second-language pass surfaces architectural
weaknesses Rust hid; the pipeline must work AS A PIPELINE, not as
crates that interact. Maps existing Rust components (✓ pathway, ✓
matrix, ✓ observer, ✓ distillation, ✓ auditor; partial: drift,
rating gate, triage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:17:01 -05:00
root
71b35fb85e SPEC §1 + §3.4: name matrix indexer as a port target
Adds matrix indexer as its own row in the §1 component table and a
new §3.4 with port plan. Distinct from vectord (substrate); lives at
internal/matrix/ + gateway /v1/matrix/*.

Five components in dependency order: corpus builders → multi-corpus
retrieve+merge → relevance filter → strong-model downgrade gate →
learning-loop integration.

Locks in the framing J flagged 2026-04-29: in Rust the matrix indexer
was emergent across mode.rs + build_*_corpus.ts + observer /relevance,
and earlier port-planning reduced it to "we have vectord." The SPEC
now names it explicitly so the port preserves the multi-corpus
retrieval shape AND the learning loop, not just the HNSW substrate.

Sharding-by-id was investigated as a throughput fix and rejected —
corpus-as-shard at the matrix layer is the existing retrieval shape
and parallelizes Adds for free.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:12:10 -05:00
root
2a6234ff82 ADR-004 + internal/pathway: Mem0 versioned trace substrate
Closes Sprint 2 design-bar work (audit reports/scrum/sprint-backlog.md):
  S2.1 — ADR-004 documents the pathway-memory data model
  S2.2 — pathway port lands with deterministic fixture corpus
         and full test coverage on day one
  S2.3 — retired traces are excluded from retrieval (test
         passes; would fail without the filter)

Mem0-style operations: Add / AddIdempotent / Update / Revise /
Retire / Get / History / Search. Each operation is a method on
Store; persistence is JSONL append-only with corruption recovery
on Replay.

internal/pathway/types.go     Trace + event + SearchFilter + sentinel errors
internal/pathway/store.go     in-memory state + RWMutex + ops
internal/pathway/persistor.go JSONL append-only log with replay
internal/pathway/store_test.go  20 test funcs covering all 7
                                Sprint 2 claim rows + concurrency
internal/pathway/persistor_test.go  6 test funcs covering missing-
                                file, corruption recovery, long-line
                                handling, parent-dir auto-create,
                                apply-error skip behavior

Sprint 2 claim coverage row-by-row:
  ADD          TestAdd_AssignsUIDAndTimestamps + TestAdd_RejectsInvalidJSON
  UPDATE       TestUpdate_ReplacesContentSameUID + Update_MissingUID_Errors
  REVISE       TestRevise_LinksToPredecessorViaHistory +
               TestRevise_PredecessorMissing_Errors +
               TestRevise_ChainOfThree_BackwardWalk
  RETIRE       TestRetire_ExcludedFromSearch +
               TestRetire_StillAccessibleViaGet +
               TestRetire_StillAccessibleViaHistory
  HISTORY/cycle TestHistory_CycleDetected (injected via internal map),
                TestHistory_PredecessorMissing_TruncatesChain,
                TestHistory_UnknownUID_ErrorsClean
  REPLAY/dup   TestAddIdempotent_IncrementsReplayCount (locks the
               "replay preserves original content" rule per ADR-004)
  CORRUPTION   TestPersistor_CorruptedLines_Skipped +
               TestPersistor_ApplyError_Skipped
  ROUND-TRIP   TestPersistor_RoundTrip locks the full Save → fresh
               Store → Load → Stats-match contract

Two real bugs caught during testing:
  - Add returned the same *Trace stored in the map, so callers
    holding a reference saw later mutations. Fixed: clone before
    return (matches Get's contract). Same fix in AddIdempotent
    + Revise.
  - Test typo: {"v":different} isn't valid JSON; AddIdempotent's
    json.Valid rejected it as ErrInvalidContent. Test fixed to
    use {"v":"different"}; the validation behavior is correct.

Skipped this commit (next):
  - cmd/pathwayd HTTP binary
  - gateway routing for /v1/pathway/*
  - end-to-end smoke
  These add the wire surface; the substrate ships first so the
  wire layer can be a pure proxy in the next commit.

Verified:
  go test -count=1 ./internal/pathway/ — 26 tests green
  just verify                          — vet + test + 9 smokes 34s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 07:23:30 -05:00
root
0d18ffa780 ADR-003: inter-service auth posture — Bearer + IP allowlist
Locks in the auth model that R-001 + R-007 will be retrofitted
against. Doc-only — wiring deferred to Sprint 1 when the first
non-loopback binding is needed.

Decision: Bearer token (from secrets-go.toml [auth] section) + IP
allowlist (CIDR list). Both layers required when auth is on; empty
token = G0 dev no-op. /health exempt.

Implementation shape (when it lands):
  - internal/shared/auth.go middleware: one chi r.Use line per binary
  - shared.Run gates: refuses non-loopback bind without configured token
  - subtle.ConstantTimeCompare for token equality (timing-safe)

Alternatives considered + rejected:
  mTLS         — too heavy for single-machine inter-service traffic
  JWT          — buys nothing over Bearer without external IdP
  IP-only      — one stolen IP entry = full access; no defense depth
  OAuth2       — no external IdP commitment in G0-G3 timeline

What this doesn't do:
  - Doesn't implement (code lands Sprint 1)
  - Doesn't break G0 dev (empty token = middleware no-op)
  - Doesn't address gateway→end-user auth (different ADR shape)

Closes the design-decision blocker for R-001 and R-007. Wiring
ticket: Sprint 1 backlog story S1.2.

Also lifts ADR-002 (storaged per-prefix PUT cap) into the doc —
it was implemented in 423a381 but not yet recorded as an ADR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:05:59 -05:00
root
a81291e38c proof harness Phase A: scaffolding + canary case green
Per docs/TEST_PROOF_SCOPE.md, building the claims-verification tier
above the smoke chain. This commit lays the scaffolding and proves
the orchestrator end-to-end with one canary case (00_health).

What landed:

  tests/proof/
    README.md             how to read a report, layout, modes
    claims.yaml           24 claims enumerated (GOLAKE-001..100)
    run_proof.sh          orchestrator with --mode {contract|integration|performance}
                          and --no-bootstrap / --regenerate-{rankings,baseline}
    lib/
      env.sh              service URLs, report dir, mode, git context
      http.sh             curl wrappers writing per-probe JSON + body + headers
      assert.sh           proof_assert_{eq,ne,contains,lt,gt,status,json_eq} +
                          proof_skip — each emits one JSONL record per call
      metrics.sh          start/stop timers, value capture, RSS sampling,
                          percentile compute (for Phase D)
    cases/
      00_health.sh        canary — gateway + 6 services /health → 200,
                          body identifies service, latency < 500ms (21 assertions)
    fixtures/
      csv/workers.csv     spec's 5-row deterministic CSV
      text/docs.txt       4 deterministic vector docs
      expected/queries.json  expected results for the 5 SQL assertions

Wired into the task runner:

  just proof contract       # canary only this commit
  just proof integration    # Phase C
  just proof performance    # Phase D

.gitignore: /tests/proof/reports/* with !.gitkeep — same pattern as
reports/scrum/_evidence/. Per-run output is a runtime artifact.

Specs landed alongside (J's drops):
  docs/TEST_PROOF_SCOPE.md           the harness contract this implements
  docs/CLAUDE_REFACTOR_GUARDRAILS.md process discipline this harness obeys

Verified end-to-end (cached binaries):
  just proof contract        wall < 2s, 21 pass / 0 fail / 0 skip
  just verify                wall 31s, vet + test + 9 smokes still green

Two bugs fixed during canary run, both in run_proof.sh aggregation:
- grep -c exits 1 on zero matches; the `|| echo 0` form concatenated
  "0\n0" and broke jq --argjson + integer comparison. Fixed via a
  _count helper that captures count-or-zero cleanly.
- per-case table iterated case scripts (filename-based) but cases
  write evidence under CASE_ID. Switched to JSONL-file iteration so
  multi-case scripts work and the mapping is faithful.

Phase B (contract cases) lands next: 05_embedding, 06_vector_add,
08_gateway_contracts, 09_failure_modes. Each sourcing the same lib
helpers and writing to the same report shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:08:51 -05:00
root
91edd43164 scrum audit: 5 reports under reports/scrum/ · score 35/60
Adapts docs/SCRUM.md framework (originally written for the
matrix-agent-validated repo) to the Go rewrite. Five deliverables:

  golang-lakehouse-scrum-test.md  top-line + scoring + verdict
  risk-register.md                12 findings, R-001..R-012
  claim-coverage-table.md         claim/test/risk for Sprint 2
  sprint-backlog.md               5 sprints, ~2 weeks of work
  acceptance-gates.md             DoD as runnable commands

Every claim cites file:line, command output, or "missing evidence."
Smoke chain ran clean (33s wall, all 9 PASS) and is captured in
reports/scrum/_evidence/smoke_chain.log (gitignored — runtime artifact).

Scoring:
  Reproducibility       7/10  9 smokes deterministic, no just/CI gate
  Test Coverage         6/10  internal/ packages tested, 6/7 cmd/ aren't
  Trust Boundary        7/10  escapes ok, zero auth, /sql is RCE-eq off-loopback
  Memory Correctness    3/10  pathway/playbook/observer not yet ported
  Deployment Readiness  4/10  no REPLICATION, no env template, no systemd
  Maintainability       8/10  no god-files, 7 lean binaries, ADRs current

Top three risks:
  R-001 HIGH  queryd /sql + DuckDB + non-loopback bind = RCE-equivalent
  R-002 HIGH  internal/shared (server.go + config.go) zero tests
  R-003 HIGH  internal/storeclient zero tests, used by 2 services
  R-004 MED   9-smoke chain green but not gated (no justfile/hook)

The audit is the work; refactors come after. Sprint 0 owns coverage
+ CI gating; Sprint 1 owns trust-boundary decisions; Sprints 2-3 are
mostly design-bar work for unbuilt agent components.

.gitignore exception: /reports/* + !/reports/scrum/ keeps reports/
a runtime-artifact directory while exposing reports/scrum/ as
tracked documentation. Mirrors the pattern future audit passes will
land in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 04:51:47 -05:00
root
1f700e731d Staffing scale test: full 500K through gateway → embedd → vectord pipeline
scripts/staffing_500k/main.go: driver that reads workers_500k.csv,
embeds combined-text per worker via /v1/embed, adds to vectord index
"workers_500k", runs canonical staffing queries against the populated
index. Reproducible end-to-end test of the staffing co-pilot pipeline
at production scale.

Run results (2026-04-29 ~02:30):
  500,000 vectors ingested in 35m 36s (~234/sec avg)
  vectord peak RSS 4.5 GB (~9 KB/vector incl. HNSW graph)
  Query latency: embed 40-59ms + search 1-3ms = ~50ms end-to-end
  GPU avg ~65% (Ollama not the bottleneck — vectord Add is)

Semantic recall on canonical queries:
  "electrician with industrial wiring": top 2 are literal Electricians (d=0.30)
  "CNC operator with first article": Assembler / Quality Techs (adjacent, d=0.24)
  "forklift driver OSHA-30": warehouse roles (d=0.33)
  "warehouse picker night shift bilingual": Material Handlers (d=0.31)
  "dental hygienist": Production Workers at d=0.49+ — correctly
    LOW-similarity, signals "no dental hygienists in this manufacturing
    dataset" rather than hallucinating a fake match.

Documented gaps:
  - storaged's 256 MiB PUT cap blocks single-file LHV1 persistence
    above ~150K vectors at d=768. Test ran with persistence disabled.
  - vectord Add is RWMutex-serialized — with GPU at 65% util this is
    the throughput cap. Concurrent Adds would be 2-3x faster but
    require careful audit of coder/hnsw thread-safety (G1 scrum
    documented two known quirks).

PHASE_G0_KICKOFF.md gains a "Staffing scale test" section with full
metrics + the gaps-surfaced list. The architectural payoff is real:
six binaries, one HTTP route, ~50ms from text query to top-K
semantically-relevant workers across 500K records.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 02:31:30 -05:00
root
0cb29cda15 docs: README + PHASE_G0_KICKOFF reflect post-G0 state (G1, G1P, G2)
README was stuck on "Pre-Phase G0, implementation has not started"
while we shipped through G2. Updated to reflect the current 7-binary
service inventory, the 9 acceptance smokes, the cold-start deps
(MinIO bucket, Ollama with nomic-embed-text, secrets-go.toml).

PHASE_G0_KICKOFF gains a "Post-G0 work" pointer at the end —
brief table mapping each G1+/G2 commit to its smoke + scrum-fix
count. Full per-day detail stays in commit messages and the
project memory file.

No code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:45:59 -05:00
root
d023b07b30 Real-scale validation post-G0: configurable ingest cap + workers_500k metrics
Validated G0 substrate against the production workers_500k.parquet
dataset (18 cols × 500,000 rows). Findings + one applied fix:

Finding #1 (FIXED): ingestd's hardcoded 256 MiB cap rejected the 500K
CSV (344 MiB) with 413. Cap fired correctly, no OOM. Extracted to
[ingestd].max_ingest_bytes config field; default 256 MiB, override
per deployment for known-large workloads. With cap bumped to 512 MiB,
500K ingest succeeds in 3.12s with ingestd peak RSS 209 MiB.

Finding #2 (deferred): ingestd doesn't release memory between
ingests. Go runtime conservative; long-running daemon, fine.

Finding #3: DuckDB-via-httpfs is healthy at 500K. GROUP BY 45ms,
count(*) 24ms, AVG 47ms, schema introspection 25ms. Sub-linear
scaling vs 100K — the s3:// read path is not a bottleneck.

Finding #4: ADR-010 type inference correctly handled real staffing
data. worker_id → BIGINT, numeric scores → DOUBLE, multi-line
resume_text → VARCHAR. 1000-row sample sufficient.

Finding #5: Go's encoding/csv handles RFC 4180 quoted-comma fields
and multi-line quoted text without LazyQuotes — confirming the D4
scrum's dismissal of Qwen's BLOCK on this point.

Net: substrate handles production-scale data with one config knob.
No correctness issues, no OOMs, no silent type errors.
All 6 G0 smokes still PASS after the cap-config change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:32:08 -05:00
root
b1d52306ad G0 D6: gateway reverse proxy fronting all 4 backing services · 2 scrum fixes · G0 COMPLETE
Last day of Phase G0. Gateway promotes the D1 stub endpoints into
real reverse-proxies on :3110 fronting storaged + catalogd + ingestd
+ queryd. /v1 prefix lives at the edge — internal services route on
/storage, /catalog, /ingest, /sql, with the prefix stripped by a
custom Director per Kimi K2's D1-plan finding.

Routes:
  /v1/storage/*  → storaged
  /v1/catalog/*  → catalogd
  /v1/ingest     → ingestd
  /v1/sql        → queryd

Acceptance smoke 6/6 PASS — every assertion goes through :3110, none
direct to backing services. Full ingest → storage → catalog → query
round-trip verified end-to-end. The smoke's "rows[0].name=Alice"
assertion is the architectural payoff: five binaries, six HTTP
routes, one round-trip through one edge.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                      1 BLOCK + 2 WARN + 2 INFO
  - Kimi K2-0905 (openrouter):                1 BLOCK + 3 WARN + 1 INFO (3 false positives, all from one wrong TrimPrefix theory)
  - Qwen3-coder (openrouter):                 5 completion tokens — "No BLOCKs."

Fixed (2, both Opus single-reviewer):
  O-BLOCK: Director path stripping fails if upstream URL has a
    non-empty path. The default Director's singleJoiningSlash runs
    BEFORE the custom code, so an upstream like http://host/api
    produces /api/v1/storage/... after the join — then TrimPrefix("/v1")
    is a no-op because the string starts with /api. Fix: strip /v1
    BEFORE calling origDirector. New TestProxy_SubPathUpstream regression
    locks this in. Today: bare-host URLs only, dormant — but moving
    gateway behind a sub-path in prod would have silently 404'd.
  O-WARN2: url.Parse is permissive — typo "127.0.0.1:3211" (no scheme)
    parses fine, produces empty Host, every request 502s. mustParseUpstream
    fail-fast at startup with a clear message naming the offending
    config field.

Dismissed (3, all Kimi, same false TrimPrefix theory):
  K-BLOCK "TrimPrefix loops forever on //v1storage" — false, single
    check-and-trim, no loop
  K-WARN "no upper bound on repeated // removal" — same false theory
  K-WARN "goroutines leak if upstream parse fails while binaries
    running" — confused scope; binaries are separate OS processes
    launched by the smoke script

D1 smoke updated (post-D6): the 501 stub probes are gone (gateway no
longer stubs /v1/ingest and /v1/sql). Replaced with proxy probes that
verify gateway forwards malformed requests to ingestd and queryd. Launch
order changed from parallel to dep-ordered (storaged → catalogd →
ingestd → queryd → gateway) since catalogd's rehydrate now needs
storaged, queryd's initial Refresh needs catalogd.

All six G0 smokes (D1 through D6) PASS end-to-end after every fix
round. Phase G0 substrate is complete: 5 binaries, 6 routes, 25 fixes
applied across 6 days from cross-lineage review.

G1+ next: gRPC adapters, Lance/HNSW vector indices, Go MCP SDK port,
distillation rebuild, observer + Langfuse integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:21:54 -05:00
root
9e9e4c26a4 G0 D5: queryd DuckDB SELECT over Parquet via httpfs · 4 scrum fixes
Phase G0 Day 5 ships queryd: in-memory DuckDB with custom Connector
that runs INSTALL httpfs / LOAD httpfs / CREATE OR REPLACE SECRET
(TYPE S3) on every new connection, sourced from SecretsProvider +
shared.S3Config. SetMaxOpenConns(1) so registrar's CREATE VIEWs and
handler's SELECTs serialize through one connection (avoids cross-
connection MVCC visibility edge cases).

Registrar.Refresh reads catalogd /catalog/list, runs CREATE OR
REPLACE VIEW "name" AS SELECT * FROM read_parquet('s3://bucket/key')
per manifest, drops views for removed manifests, skips on unchanged
updated_at (the implicit etag). Drop pass runs BEFORE create pass so
a poison manifest can't block other manifest refreshes (post-scrum
C1 fix).

POST /sql with JSON body {"sql":"…"} returns
{"columns":[{"name":"id","type":"BIGINT"},…], "rows":[[…]],
"row_count":N}. []byte → string conversion so VARCHAR rows
JSON-encode as text. 30s default refresh ticker, configurable via
[queryd].refresh_every.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                      1 BLOCK + 4 WARN + 4 INFO
  - Kimi K2-0905 (openrouter):                2 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):                 2 BLOCK + 1 WARN + 1 INFO

Fixed (4):
  C1 (Opus + Kimi convergent): Refresh aborts on first per-view error
    → drop pass first, collect errors, errors.Join. Poison manifest
    no longer blocks the rest of the catalog from re-syncing.
  B-CTX (Opus BLOCK): bootstrap closure captured OpenDB's ctx →
    cancelled-ctx silently fails every reconnect. context.Background()
    inside closure; passed ctx only for initial Ping.
  B-LEAK (Kimi BLOCK): firstLine(stmt) truncated CREATE SECRET to 80
    chars but those 80 chars contained KEY_ID + SECRET prefix → log
    aggregator captures credentials. Stable per-statement labels +
    redactCreds() filter on wrapped DuckDB errors.
  JSON-ERR (Opus WARN): swallowed json.Encode error → silent
    truncated 200 on unsupported column types. slog.Warn the failure.

Dismissed (4 false positives):
  Qwen BLOCK "bootstrap not transactional" — DuckDB DDL is auto-commit
  Qwen BLOCK "MaxBytesReader after Decode" — false, applied before
  Kimi BLOCK "concurrent Refresh + user SELECT deadlock" — not a
    deadlock, just serialization, by design with 10s timeout retry
  Kimi WARN "dropView leaves r.known inconsistent" — current code
    returns before the delete; the entry persists for retry

Critical reviewer behavior: 1 convergent BLOCK between Opus + Kimi
on the per-view error blocking, plus two independent single-reviewer
BLOCKs (B-CTX, B-LEAK) that smoke could never have caught. The
B-LEAK fix uses defense-in-depth: never pass SQL into the error
path AND redact known cred values from DuckDB's own error message.

DuckDB cgo path: github.com/duckdb/duckdb-go/v2 v2.10502.0 (per
ADR-001 §1) on Go 1.25 + arrow-go. Smoke 6/6 PASS after every
fix round.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:10:55 -05:00
root
c1e411347a G0 D4: ingestd CSV → Parquet → catalogd register · 2 scrum fixes
Phase G0 Day 4 ships ingestd: multipart CSV upload, Arrow schema
inference per ADR-010 (default-to-string on ambiguity), single-pass
streaming CSV → Parquet via pqarrow batched writer (Snappy compressed,
8192 rows per batch), PUT to storaged at content-addressed key
datasets/<name>/<fp_hex>.parquet, register manifest with catalogd.
Acceptance smoke 6/6 PASS including idempotent re-ingest (proves
inference is deterministic — same CSV always produces same fingerprint)
and schema-drift → 409 (proves catalogd's gate fires on ingest traffic).

Schema fingerprint is SHA-256 over (name, type) tuples in header order
using ASCII record/unit separators (0x1e/0x1f) so column names with
commas can't collide. Nullability intentionally NOT in the fingerprint
— a column gaining nulls isn't a schema change.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                       4 WARN + 3 INFO (after 2 self-retracted BLOCKs)
  - Kimi K2-0905 (openrouter):                 1 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):                  2 BLOCK + 2 WARN + 2 INFO

Fixed (2, both Opus single-reviewer):
  C-DRIFT: PUT-then-register on fixed datasets/<name>/data.parquet
    meant a schema-drift ingest overwrote the live parquet BEFORE
    catalogd's 409 fired → storaged inconsistent with manifest.
    Fix: content-addressed key datasets/<name>/<fp_hex>.parquet.
    Drift writes to a different file (orphan in G2 GC scope); the
    live data is never corrupted.
  C-WCLOSE: pqarrow.NewFileWriter not Closed on error paths leaks
    buffered column data + OS resources per failed ingest.
    Fix: deferred guarded close with wClosed flag.

Dismissed (5, all false positives):
  Qwen BLOCK "csv.Reader needs LazyQuotes=true for multi-line" — false,
    Go csv handles RFC 4180 multi-line quoted fields by default
  Qwen BLOCK "row[i] OOB" — already bounds-checked at schema.go:73
    and csv.go:201
  Kimi BLOCK "type assertion panic if pqarrow reorders fields" —
    speculative, no real path
  Kimi WARN + Qwen WARN×2 "RecordBuilder leak on early error" —
    false convergent. Outer defer rb.Release() captures the current
    builder; in-loop release runs before reassignment. No leak.

Deferred (6 INFO + accepted-with-rationale on 3 WARN): sample
boundary type mismatch (G0 cap bounds peak), string-match
paranoia on http.MaxBytesError, multipart double-buffer (G2 spool-
to-disk), separator validation, body close ordering, etc.

The D4 scrum produced fewer real findings than D3 (2 vs 6) — both
were architectural hazards smoke wouldn't catch because the smoke's
"schema drift → 409" assertion was passing even in the corrupted-
state world. The 409 fires correctly; what was wrong was the PUT
having already mutated the live parquet before the validation check.
Opus's PUT-then-register read of the order is exactly the kind of
architectural insight the cross-lineage scrum is designed to surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:50:10 -05:00
root
66a704ca3e G0 D3: catalogd Parquet manifests + ADR-020 idempotent register · 6 scrum fixes
Phase G0 Day 3 ships catalogd: Arrow Parquet manifest codec, in-memory
registry with the ADR-020 idempotency contract (same name+fingerprint
reuses dataset_id; different fingerprint → 409 Conflict), HTTP client
to storaged for persistence, and rehydration on startup. Acceptance
smoke 6/6 PASSES end-to-end including rehydrate-across-restart — the
load-bearing test that the catalog/storaged service split actually
preserves state.

dataset_id derivation diverges from Rust: UUIDv5(namespace, name)
instead of v4 surrogate. Same name on any box generates the same
dataset_id; rehydrate after disk loss converges to the same identity
rather than silently re-issuing. Namespace pinned at
a8f3c1d2-4e5b-5a6c-9d8e-7f0a1b2c3d4e — every dataset_id ever issued
depends on these bytes.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                       1 BLOCK + 5 WARN + 3 INFO
  - Kimi K2-0905 (openrouter, validated D2):   2 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):                  2 BLOCK + 2 WARN + 2 INFO

Fixed:
  C1 list-offsets BLOCK (3-way convergent) → ValueOffsets(0) + bounds
  C2 Rehydrate mutex held across I/O → swap-under-brief-lock pattern
  S1 split-brain on persist failure → candidate-then-swap
  S2 brittle string-match for 400 vs 500 → ErrEmptyName/ErrEmptyFingerprint sentinels
  S3 Get/List shallow-copy aliasing → cloneManifest deep copy
  S4 keep-alive socket leak on error paths → drainAndClose helper

Dismissed (false positives, all single-reviewer):
  Kimi BLOCK "Decode crashes on empty Parquet" — already handled
  Kimi INFO "safeKey double-escapes" — wrong, splitting before escape is required
  Qwen INFO "rb.NewRecord() error unchecked" — API returns no error

Deferred to G1+: name validation regex, per-call deadlines, Snappy
compression, list pagination continuation tokens (storaged caps at
10k with sentinel for now).

Build clean, vet clean, all tests pass, smoke 6/6 PASS after every
fix round. arrow-go/v18 + google/uuid added; Go 1.24 → 1.25 forced
by arrow-go's minimum.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:36:57 -05:00
root
8cfcdb8e5f G0 D2: storaged S3 GET/PUT/LIST/DELETE · 3-lineage scrum · 4 fixes applied
Phase G0 Day 2 ships storaged: aws-sdk-go-v2 wrapper + chi routes
binding 127.0.0.1:3211 with 256 MiB MaxBytesReader, Content-Length
up-front 413, and a 4-slot non-blocking semaphore returning 503 +
Retry-After:5 when full. Acceptance smoke (6/6 probes) PASSES against
the dedicated MinIO bucket lakehouse-go-primary, isolated from the
Rust system's lakehouse bucket during coexistence.

Cross-lineage scrum on the shipped code:
  - Opus 4.7 (opencode): 1 BLOCK + 3 WARN + 3 INFO
  - Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO (3 false positives)
  - Kimi K2-0905 (openrouter, after route-shopping past opencode's 4k
    cap and the direct adapter's empty-content reasoning bug):
    1 BLOCK + 2 WARN + 1 INFO

Fixed:
  C1 buildRegistry ctx cancel footgun → context.Background()
     (Opus + Kimi convergent; future credential refresh chains)
  C2 MaxBytesReader unwrap through manager.Uploader multipart
     goroutines → Content-Length up-front 413 + string-suffix fallback
     (Opus + Kimi convergent; latent 500-instead-of-413 in 5-256 MiB range)
  C3 Bucket.List unbounded accumulation → MaxListResults=10_000 cap
     (Opus + Kimi convergent; OOM guard)
  S1 PUT response Content-Type: application/json (Opus single-reviewer)

Strict validateKey policy (J approved): rejects empty, >1024B, NUL,
leading "/", ".." path components, CR/LF/tab control characters.
DELETE exposed at HTTP layer (J approved option A) for symmetry +
smoke ergonomics.

Build clean, vet clean, all unit tests pass, smoke 6/6 PASS after
every fix round. go.mod 1.23 → 1.24 (required by aws-sdk-go-v2).

Process finding worth recording: opencode caps non-streaming Kimi at
max_tokens=4096; the direct kimi.com adapter consumed 8192 tokens of
reasoning but surfaced empty content; openrouter/moonshotai/kimi-k2-0905
delivered structured output in ~33s. Future Kimi scrums should default
to that route.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:23:03 -05:00
Claw
ad2ec1aca9 G0 D1 hardened: 3-lineage scrum review on shipped code · 7 fixes applied
Code-review pass after D1 shipped, all three model lineages running
in parallel against the actual Go source (not docs):

Convergent findings (≥2 reviewers — high confidence):
- C1 BLOCK · Run() errCh/select race could silently drop fast bind
  errors. Fixed: net.Listen() now runs synchronously before the
  goroutine; bind errors surface as Run()'s return value.
- C2 BLOCK · scripts/d1_smoke.sh sleep 0.5 races bind on cold boxes.
  Fixed: replaced with poll_health() loop, 5s/svc budget, 50ms poll.
- C3 WARN · LoadConfig silent fallback when file missing. Fixed:
  emits slog.Warn with path + hint when path given but file absent.

Single-reviewer fixes:
- S1 WARN · slog.SetDefault inside Run() mutated global state from a
  library function. Fixed: Run() no longer calls SetDefault.
- S2 WARN · os.IsNotExist → errors.Is(err, fs.ErrNotExist) idiom.
- S6 WARN · smoke double-curl collapsed to single curl -i parse.

Second-pass Opus review on post-fix code caught one more:
- head -1 on curl -i fragile against 1xx interim lines. Fixed:
  awk picks the last HTTP/* status line (robust to 100 Continue).

Accepted with rationale (deferred or planned):
- S3 secrets-in-lakehouse.toml: D2.3 SecretsProvider already planned
- S4 5x cmd/*/main.go duplication: defer until D2 reveals real
  per-service config consumption
- S5 /health log volume: defer post-G0, not on k8s yet
- 2nd-pass theoreticals: clean-exit-no-Shutdown path doesn't trigger,
  defensive defer ln.Close() aspirational, etc.

Verification:
- go build ./cmd/...  exit 0
- go vet ./...         clean
- ./scripts/d1_smoke.sh  D1 acceptance gate: PASSED
- 3-lineage code review · 14 findings · 7 fixed · 0 deferred · 5
  accepted with rationale

Total D1 review coverage across the phase:
- 3 doc-review passes (Opus + Kimi + Qwen) — 13 findings, 10 fixed
- 1 runtime smoke — 1 finding (port 3100 collision), fixed
- 1 code-review parallel pass — 14 findings, 7 fixed
- 1 code-review second pass (Opus) — 1 actionable, fixed
- Cumulative: 29 findings · 19 fixed inline · 5 accepted · 5 deferred

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 07:07:50 -05:00
Claw
1142f54f23 G0 D1 ships: skeleton + chi + /health × 5 binaries · acceptance gate PASSED
Phase G0 Day 1 executed end-to-end after a third-pass review by
qwen3-coder:480b consolidated all findings across Opus/Kimi/Qwen
lineages.

Cross-lineage review consolidation (3 model passes + 1 runtime pass):
- Opus 4.7: 9 findings · 7 fixed inline · 2 deferred
- Kimi K2.6: 2 BLOCKs (introduced by Opus fixes) · 2 fixed
- Qwen3-coder:480b: 2 WARNs · 1 fixed (D2.4 256 MiB cap + 4-slot
  semaphore on PUTs) · 1 deferred (Q2 view refresh batching)
- Runtime smoke: 1 finding (port 3100 collision with live Rust
  lakehouse) · fixed (Go dev ports shifted to 3110+)
- Total: 14 findings · 11 fixed · 3 deferred to G2

What landed in code:
- internal/shared/server.go — chi factory, slog JSON, /health,
  graceful shutdown via signal.NotifyContext
- internal/shared/config.go — TOML loader, DefaultConfig, -config flag
- cmd/{gateway,storaged,catalogd,ingestd,queryd}/main.go — five
  binaries, each ~30 lines using the shared factory
- lakehouse.toml — G0 dev defaults (3110-3214)
- scripts/d1_smoke.sh — repeatable smoke that exits 0 on PASS
- go.mod / go.sum — chi v5.2.5, pelletier/go-toml/v2 v2.3.0

Verified end-to-end via scripts/d1_smoke.sh:
- All 5 /health endpoints return 200 with correct service name
- Gateway /v1/ingest + /v1/sql stubs return 501 with X-Lakehouse-Stub
- Graceful shutdown logs cleanly on SIGTERM
- DuckDB cgo path verified separately (sql.Open("duckdb","") + ping)

D1 ACCEPTANCE GATE: PASSED.

Next: D2 — storaged S3 GET/PUT/LIST against MinIO.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 07:00:37 -05:00
Claw
a74fdb1204 docs: Phase G0 kickoff — Kimi K2.6 cross-lineage review pass
Second-pass review via opencode/kimi-k2.6 (different lineage than
Opus 4.7 used in the first pass) caught 2 BLOCKs that Opus missed —
and that the Opus-pass fixes themselves introduced:

- K1: D0.6 used `go install pkg@latest` to verify cgo, but that
  command requires a main package; duckdb-go/v2 is a library, so
  the verification fails BEFORE exercising cgo and could pass on a
  broken-cgo box. Replaced with a real compile-and-run smoke
  (tmp module + 5-line main.go that imports + calls sql.Open).
- K2: Gateway stubbed /v1/ingest and /v1/sql in D1.10, but ingestd
  serves /ingest and queryd serves /sql. httputil.NewSingleHostReverseProxy
  preserves the inbound path by default — D6.1 now specifies a
  custom Director that strips the /v1 prefix before forwarding.

Demonstrates the cross-lineage rotation value: one model's review
of the original ≠ different model's review of the post-fix version.
Same dynamic the Rust auditor exploits with Kimi/Haiku/Opus.

Disposition table appended below the Opus pass for full audit trail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:50:54 -05:00
Claw
ed3ccf7c53 docs: Phase G0 kickoff plan + scrum-style independent review
7-day day-by-day plan for the smallest end-to-end ingest+query path
in Go: D0 ops setup → D1 skeleton + chi + /health × 5 binaries → D2
storaged S3 → D3 catalogd Parquet manifests → D4 ingestd CSV→Parquet
→ D5 queryd DuckDB → D6 gate-day end-to-end → D7 cleanup + retro.

Plan was reviewed by opencode/claude-opus-4-7 via the gateway
(same path the production overseer correction loop uses post-G0).
9 findings (2 BLOCK + 5 WARN + 2 INFO):

- 2 BLOCK fixed inline:
  - cgo build dependency surfaced on D0 not D5
  - DuckDB CREATE SECRET (S3) plumbed from SecretsProvider on D5.1
- 4 of 5 WARN fixed inline:
  - storaged binds 127.0.0.1 only + 2 GiB body cap
  - queryd uses TTL-cached views + etag invalidation, not refresh-per-call
  - gateway reverse-proxy stubbed on D1.10 (501), promoted on D6
  - ADR stubs go in at start of D4/D5, finalized on D7
- 1 WARN deferred (orphan GC on two-phase write — punted to G2)
- 1 WARN accepted with note (shared-server.go refactor — G1+ follow-up)
- 2 INFO fixed inline (go mod tidy timing, ADR-after-fact inversion)

Disposition table appended to the doc itself for auditability —
matches the human_overrides.jsonl pattern from the Rust auditor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:47:15 -05:00
Claw
29468b1413 docs: 2026-04-28 upstream survey — three SPEC-changing pivots
Pre-Phase-G0 research sweep against current Go ecosystem state. Three
upstream changes that the day-of SPEC missed:

1. DuckDB Go binding ownership transferred. marcboeker/go-duckdb is
   deprecated as of v2.5.0 — official maintainer is now
   github.com/duckdb/duckdb-go/v2 (DuckDB team + Marc Boeker joint
   hand-off). Current v2.10502.0 / DuckDB v1.5.2. SPEC §3.1 +
   component table updated.

2. Official Go MCP SDK exists. Switching from mark3labs/mcp-go
   (community) to github.com/modelcontextprotocol/go-sdk (official,
   Google collaboration, v1.5.0 stable, 4.4k stars, targets MCP spec
   2025-11-25). Component table updated.

3. arrow-go is on v18, not v15. v18.5.2 (March 2026) has parquet
   encryption fixes relevant for PII-masked safe views. PRD locked
   stack + SPEC component table updated.

Validated unchanged: coder/hnsw (220 stars, active), chi (still the
clean-architecture pick over fiber/gin/echo).

Surfaced for future use: anthropics/anthropic-sdk-go (official,
available for direct Claude calls bypassing opencode if ever needed),
duckdb-wasm (browser-side analytics future option), IVF as HNSW
fallback if recall gate fails.

See docs/RESEARCH_LOG_2026-04-28.md for full survey + sources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:40:26 -05:00
Claw
f07668064e docs: seed PRD + SPEC for the Go-direction rewrite
Two documents only — no Go code yet. PRD restates the problem and
preserves the Rust PRD's invariants verbatim, then maps the locked
stack to Go libraries and surfaces four hard problems (DuckDB-via-cgo
for the query engine, Lance dropped, Dioxus → HTMX, arrow-go maturity).
SPEC walks each Rust crate + TS surface and tags the port with library
choice / effort estimate / risk + a 5-phase migration plan from
skeleton (Phase G0) to demo parity (Phase G5).

Six open questions remain that gate Phase G0:
- DuckDB cgo OK?
- HTMX vs React for the UI?
- Repo location?
- Distillation v1.0.0 port verbatim or rebuild?
- Pathway memory data — port 88 traces or start clean?
- Auditor lineage — port audit_baselines.jsonl or restart?

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:35:23 -05:00