36 Commits

Author SHA1 Message Date
root
e7fc63b216 observerd: /observer/inbox + multi-coord stress phase 1c (priority-ordered events)
Phase 3 ask: real-world inbox-style event injection during the stress
test. Coordinators in production receive emails + SMS that trigger
contract responses; the substrate has to RECORD these signals AND
react with a search using the embedded demand. This commit lands the
endpoint and exercises it end-to-end in the stress harness.

observerd surface:
- New POST /observer/inbox route — accepts {type, sender, subject,
  body, priority, tag} and records as ObservedOp with
  Source=SourceInbox. Type must be email|sms; body required;
  priority defaults to medium. The handler ONLY records — downstream
  triggers (search, ingest, etc.) are the caller's concern, recorded
  separately. Keeps the witness role pure.
- New observer.SourceInbox = "inbox" alongside SourceMCP /
  SourceScenario / SourceWorkflow.
- Three contract tests on the new route (happy path / bad type / empty
  body), router-mount test extended, all green.

Stress harness phase 1c (Hour 9):
- 6 inbox events fire in priority order (urgent → high → medium):
    2 urgent emails (forklift Cleveland, production Indianapolis)
    1 high email (crane Chicago)
    1 high sms (bilingual safety Indianapolis)
    1 medium sms (drone Chicago)
    1 medium email (warehouse Milwaukee FYI)
- Each event:
    1. POSTs to /v1/observer/inbox (recorded by observerd)
    2. Triggers matrix.search using a parsed demand (the demand
       extraction is hard-coded for now; production needs a small
       LLM to parse from body)
    3. Captures both as events in the run JSON

Run #006 result (with v2-moe embedder + all phases including inbox):

  Diversity:
    Same-role-across-contracts Jaccard = 0.000 (n=9)
    Different-roles-same-contract Jaccard = 0.046 (n=18)
  Determinism: 1.000
  Verbatim handover: 4/4 (100%)
  Paraphrase handover: 4/4 (100%)
  Inbox burst:
    6/6 events accepted by observerd (200 status, all recorded)
    6/6 triggered searches produced distinct top-1 worker IDs
    distance distribution: 0.24 (Indy production) → 0.71 (Chicago
    drone surveyor — honest stretch since drones aren't in the
    5K-worker corpus, system surfaces closest neighbor at high
    distance rather than fabricating)

The drone-Chicago case is the architectural-honesty signal: when
the demand asks for a specialist NOT in the roster, the system
returns the closest semantic neighbor with a distance that flags
"this is a stretch." Coordinators reading distances see "we don't
have a great match here" rather than a confident wrong answer.

Total events captured: 67 (was 61 pre-inbox).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:34:36 -05:00
root
9ce067bd9d observerd: test that locks ADR-005 Decision 5.3
TestWorkflowRun_AllProvenanceRecordedPostRun proves that
handleWorkflowRun records ObservedOps only AFTER runner.Run returns,
not interleaved with node execution.

The test pauses inside a node via a controlled channel, samples
observer.Store mid-run (must be 0), unblocks, then samples again
(must be N). If a future commit adds per-node streaming (e.g.
runner.NodeHook firing as each node finishes), n1's record would
appear before the unblock and the first assertion fires.

This is intentional test-as-spec lock. Closing the streaming gap is
deferred per the ADR ("acceptable for short workflows; streaming
callback is the right shape when workflows get longer") — but if
someone later adds the streaming callback without updating the ADR,
this test catches it in `go test` instead of leaving the doc and
code drifted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:35:41 -05:00
root
6c02c905c8 scrum lift_001: 4 fixes from cross-lineage review
Cross-lineage scrum on b2e45f7 produced 1 convergent + 3 single-reviewer
findings worth fixing. All apply.

1. (Opus WARN + Qwen INFO convergent) scripts/playbook_lift.sh: replace
   sleep 2.5 in SQL probe with active polling up to 5s. refresh_every=1s
   is a lower bound; under load the manifest may not be visible in a
   fixed sleep, which would 4xx the probe and abort the reality run.

2. (Opus INFO) scripts/playbook_lift.sh: report template glued
   "env JUDGE_MODEL" + value as "env JUDGE_MODELqwen2.5:latest" with no
   separator. Replaced two :+/:- substitution chains with a single
   JUDGE_SOURCE variable computed once at the top of the harness.

3. (Opus INFO) scripts/staffing_workers/main.go: -id-prefix "" silently
   allowed, defeating the flag's purpose (cross-corpus collision prevent).
   Now log.Fatal at startup with explicit hint.

4. (Opus WARN) cmd/{pathwayd,observerd}/main_test.go: newTestRouter
   returned http.Handler then re-cast to chi.Router for chi.Walk.
   Returning chi.Router directly satisfies http.Handler AND avoids an
   assertion that would panic if future middleware wraps the router.

Dismissed (with rationale):
- Kimi INFO hardcoded MinIO endpoint: harness is local-by-design.
- Kimi WARN matrixd accepts 502/500: documented; real retriever needs
  real upstreams the test doesn't spin up.
- Qwen INFO queryd string.Contains: brittle but very low risk; restating
  through typed-error path would couple without adding signal.

go test ./cmd/{matrixd,queryd,pathwayd,observerd} all green.

Verdicts at reports/scrum/_evidence/2026-04-30/verdicts/lift_001_*.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:27:24 -05:00
root
b2e45f7f26 playbook_lift: harness expansion + reality test #001 (7/8 lift, 87.5%)
The 5-loop substrate's load-bearing gate is verified — playbook +
matrix indexer give the results we're looking for. Per the report's
rubric, lift ≥ 50% of discoveries means matrix is doing real work;
7/8 = 87.5% blew through that.

Harness was structurally hiding bugs behind a 5-daemon stripped boot.
Expanding to the full 10-daemon prod stack surfaced 7 fixes in cascade:

1. driver→matrixd: {"query": ...} → {"query_text": ...} field name
2. harness temp toml missing [s3] → wrong default bucket → catalogd
   rehydrate 500 on first call
3. harness→queryd SQL probe: {"q": ...} → {"sql": ...} field name
4. expand boot from 5 → 10 daemons in dep-ordered launch
5. add SQL surface probe (3-row CSV ingest → COUNT(*)=3 assertion)
6. candidates corpus was synthetic SWE-tech (Swift/iOS, Scala/Spark) —
   wrong domain for staffing queries; replaced with ethereal_workers
   (10K rows, real staffing schema, "e-" id prefix to avoid collision
   with workers' "w-"). staffing_workers driver gains -index-name +
   -id-prefix flags so the same binary serves both corpora
7. local_judge qwen3.5:latest is a vision-SSM 256K-ctx build running
   ~30s per judge call against the lift loop; reverted to
   qwen2.5:latest (~1s/call, 30× faster, held lift theory)

Each contract drift (1, 3) is now locked into a cmd/<bin>/main_test.go
so future drift fires in `go test`, not in a reality run. R-005 closed:

- cmd/matrixd/main_test.go (new) — playbook record drift detector +
  score bounds + 6 routes mounted
- cmd/queryd/main_test.go — wrong-field-name drift detector
- cmd/pathwayd/main_test.go (new) — 9 routes + add round-trip + retire
- cmd/observerd/main_test.go (new) — 4 routes + invalid-op + unknown-mode

`go test ./cmd/{matrixd,queryd,pathwayd,observerd}` all green.

Reality test results (reports/reality-tests/playbook_lift_001.{json,md}):
  Queries              21 (staffing-domain, 7 categories)
  Discoveries          8 (judge ≠ cosine top-1)
  Lifts                7/8 (87.5%)
  Boosts triggered     9
  Mean Δ distance      -0.053 (warm closer than cold)
  OOD honesty          dental/RN/SWE rated 1, no fake matches
  Cross-corpus boosts  confirmed (e- ↔ w- swaps in lifts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:22:21 -05:00
root
0efc7363c5 scrum 2026-04-30: 4 real fixes + 2 INFOs from cross-lineage review
3-lineage scrum (Opus 4.7 / Kimi K2.6 / Qwen3-coder) on today's wave
landed 4 real findings (2 BLOCK + 2 WARN) and 2 INFO touch-ups.
Verbatim verdicts + disposition table at:
  reports/scrum/_evidence/2026-04-30/

B-1 (BLOCK Opus + INFO Kimi convergent) — ResolveKey API:
  collapse from 3-arg (envVar, envFileName, envFilePath) to 2-arg
  (envVar, envFilePath). Pre-fix every chatd caller passed the env
  var name twice; if operator renamed *_key_env in lakehouse.toml
  while keeping the canonical KEY= line in the .env file, fallback
  silently missed.

B-2 (WARN Opus + WARN Kimi convergent) — handleProviders probe:
  drop the synthesize-then-Resolve probe; look up by name directly
  via Registry.Available(name). Prior probe synthesized "<name>/probe"
  model strings and routed through Resolve, fragile to any future
  routing rule (e.g. cloud-suffix special case).

B-3 (BLOCK Opus single — verified by trace + end-to-end probe) —
  OllamaCloud.Chat StripPrefix used "cloud" but registry routes
  "ollama_cloud/<m>". Result: upstream got the prefixed model name
  and 400'd. Smoke missed it because chatd_smoke runs without
  ollama_cloud registered. Now strips the right prefix; new
  TestOllamaCloud_StripsCorrectPrefix locks both prefix + suffix
  cases. Verified live: ollama_cloud/deepseek-v3.2 round-trips
  cleanly through the real ollama.com endpoint.

B-4 (WARN Opus single) — Ollama finishReason: read done_reason
  field instead of inferring from done bool alone. Newer Ollama
  reports done=true with done_reason="length" on truncation; the
  prior code mapped that to "stop" and lost the truncation signal
  the playbook_lift judge needs to retry. New
  TestFinishReasonFromOllama_PrefersDoneReason covers the fallback
  ladder.

INFOs:
- B-5: replace hand-rolled insertion sort in Registry.Names with
  sort.Strings (Opus called the "avoid sort import" comment a
  false economy — correct).
- A-1: clarify the playbook_lift.sh comment around -judge "" arg
  passing (Opus noted the comment said "env priority" but didn't
  reflect that the empty arg also passes through the Go driver's
  resolution chain).

False positives dismissed (3, documented in disposition.md):
- Kimi: TestMaybeDowngrade_WithConfigList wrong assertion (test IS
  correct per design — model excluded from weak list = strong = downgrade)
- Qwen: nil-deref claim (defensive code already handles nil)
- Opus: qwen3.5:latest doesn't exist on Ollama hub (true on the
  public hub but local install has it)

just verify: PASS. chatd_smoke 6/6 PASS. New regression tests:
3 (B-2, B-3, B-4 each get a focused test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:28:08 -05:00
root
05273ac06b phase 4: chatd — multi-provider LLM dispatcher (ollama / cloud / openrouter / opencode / kimi)
new cmd/chatd on :3220 routes /v1/chat to the right provider based
on model-name prefix or :cloud suffix. closes the architectural gap
named in lakehouse.toml [models]: tiers map to model IDs, but until
phase 4 there was no service that could actually CALL those models
from go.

routing rules (registry.Resolve):
  ollama/<m>          → local Ollama (prefix stripped)
  ollama_cloud/<m>    → Ollama Cloud
  <m>:cloud           → Ollama Cloud (suffix variant — kimi-k2.6:cloud)
  openrouter/<v>/<m>  → OpenRouter (prefix stripped, OpenAI-compat)
  opencode/<m>        → OpenCode unified Zen+Go
  kimi/<m>            → Kimi For Coding (api.kimi.com/coding/v1)
  bare names          → local Ollama (default)

provider implementations:
- internal/chat/types.go      Provider interface, Request/Response, errors
- internal/chat/registry.go   prefix + :cloud suffix dispatch
- internal/chat/ollama.go     local Ollama via /api/chat (think=false default)
- internal/chat/ollama_cloud.go  Ollama Cloud via /api/generate (Bearer auth)
- internal/chat/openai_compat.go shared OpenAI Chat Completions for the
                                 OpenRouter/OpenCode/Kimi family
- internal/chat/builder.go    BuildRegistry from BuilderInput;
                              ResolveKey reads env then .env file fallback

config:
- ChatdConfig in internal/shared/config.go with bind, ollama_url,
  per-provider key env names + .env fallback paths, timeout
- Gateway gains chatd_url + /v1/chat + /v1/chat/* routes
- lakehouse.toml [chatd] block with /etc/lakehouse/<provider>.env defaults

tests (19 in internal/chat):
- registry: prefix + :cloud + errors + telemetry + provider listing
- ollama: happy path + prefix strip + format=json + 500 mapping +
  flatten_messages
- openai_compat: happy path + format=json + 429 mapping + zero-choices

think=false default in ollama + ollama_cloud — local hot path skips
reasoning, low-budget callers (the playbook_lift judge at max_tokens=10)
get direct answers instead of empty content + done_reason=length.
proven via chatd_smoke acceptance.

acceptance gate: scripts/chatd_smoke.sh — 6/6 PASS:
1. /v1/chat/providers lists exactly registered providers (1 in dev mode)
2. bare model → ollama default with content + token counts + latency
3. explicit ollama/<m> → prefix stripped at upstream
4. <m>:cloud without ollama_cloud registered → 404 (no silent fall-through)
5. unknown/<m> → falls through to default → upstream 502 (no prefix rewrite)
6. missing model field → 400

just verify: PASS (vet + 30 packages × short tests + 9 smokes).
chatd_smoke is a domain smoke (not in just verify, mirrors matrix /
observer / pathway pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:08:29 -05:00
root
622e124b8f phase 2: matrix.downgrade reads WeakModels from config
migrate the strong-model auto-downgrade gate from a hardcoded weak
list to cfg.Models.WeakModels. backward compatible: existing API
preserved, callers that don't migrate keep using DefaultWeakModels.

changes:
- internal/matrix/downgrade.go: split IsWeakModel into rule-based
  base (`:free` suffix/infix) + literal-list lookup. New
  IsWeakModelInList(model, list) takes the config-supplied list.
  DowngradeInput grows a WeakModels field; nil falls back to
  DefaultWeakModels (preserves pre-phase-2 behavior).
- internal/workflow/modes.go: add MatrixDowngradeWithWeakList(list)
  factory mirroring MatrixSearch's pattern. Plain MatrixDowngrade
  kept for backward compat.
- cmd/matrixd/main.go: handlers struct holds weakModels populated
  from cfg.Models.WeakModels at startup; handleDowngrade threads it
  into every DowngradeInput.
- cmd/observerd/main.go: registerBuiltinModes accepts weakModels
  and uses the factory variant. observerd reads cfg.Models.WeakModels
  in main().

end-to-end verified: downgrade + matrix + observer + workflow smokes
all pass. Existing TestMaybeDowngrade_TruthTable + TestIsWeakModel
unchanged (backward compat). Two new tests cover the config path:
- TestIsWeakModelInList — covers rule + literal + empty + nil
- TestMaybeDowngrade_WithConfigList — verifies cfg list overrides
  default

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:52:18 -05:00
root
8278eb9a87 scrum2 cleanup: JSON-marshal in stringifyValue, drop dead detectCycle, name SourceWorkflow
5 small fixes from the §3.8 scrum2 review wave:

- workflow.stringifyValue now JSON-marshals maps/slices instead of
  fmt.Sprint %v (Opus+Kimi convergent: LLM modes were getting Go's
  map[k:v] syntax, which is unparseable as JSON context).
- workflow.detectCycle removed — duplicate of topoSort that discarded
  the useful node ID. Validate() now calls topoSort directly and
  returns its wrapped ErrCycle.
- observer.SourceWorkflow named constant — was an implicit string
  cast (observer.Source("workflow")) at the cmd/observerd handler.
- Unused context imports + dead silencer comments removed across
  workflow/modes.go and observerd/main.go.
- Unused store parameter dropped from registerBuiltinModes (reserved
  comment removed; can be re-added when a mode actually needs it).

just verify still PASS — these are pure cleanup, no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:16:07 -05:00
root
c7e3124208 §3.8 second slice: real modes wired (matrix.relevance/downgrade/search,
distillation.score, drift.scorer)

Lands the workflow.Mode adapters for the §3.4 components + the
distillation scorer + drift quantifier. Workflows can now compose
real measurement capabilities; the substrate's parallel
capabilities become composable Lego bricks (per the prior commit's
closing insight).

Modes registered (in observerd's registerBuiltinModes):

  Pure-function wrappers (no I/O):
    - matrix.relevance    → matrix.FilterChunks
    - matrix.downgrade    → matrix.MaybeDowngrade
    - distillation.score  → distillation.ScoreRecord
    - drift.scorer        → drift.ComputeScorerDrift

  HTTP-backed:
    - matrix.search       → POST matrixd /matrix/search
                             (registered only when matrixd_url is set)

  Fixture (kept from §3.8 first slice):
    - fixture.echo, fixture.upper

internal/workflow/modes.go:
  Each mode follows the same glue pattern: marshal generic input
  through a typed struct (free schema validation + clear error
  messages), call the underlying capability, return a generic
  output map. Roundtrip-via-JSON gives us schema validation
  without writing custom field-by-field coercion.

internal/workflow/modes_test.go (10 tests, all PASS):
  - matrix.relevance filters adjacency pollution (Connector kept,
    catalogd::Registry dropped — same headline as the relevance
    smoke, run through the workflow mode)
  - matrix.downgrade flips lakehouse→isolation on strong model;
    keeps lakehouse on weak (qwen3.5:latest); errors on missing
    fields
  - distillation.score rates scrum_review attempt_1 as accepted;
    rejects empty record
  - drift.scorer reports zero drift on matched inputs; errors on
    empty inputs slice
  - matrix.search HTTP flow round-trips through httptest fake
    matrixd; non-OK status surfaces a clear error

scripts/workflow_smoke.sh (5 assertions PASS, was 4):
  New assertion #5: real-mode chain
    matrix.downgrade (lakehouse + grok-4.1-fast → isolation)
    → distillation.score (scrum_review attempt_1 → accepted)
  Proves §3.4 components compose through the workflow runner with
  no fixture intermediation. Both nodes ran successfully, runner
  recorded provenance, status=succeeded.

  Mode listing assertion now expects 7 modes (5 real + 2 fixture)
  instead of just the fixtures.

17-smoke regression all green. SPEC §3.8 acceptance gate G3.8.D
("Mode catalog dispatches matrix.search invocation to the matrixd
backend without going through HTTP") still pending — current path
goes through HTTP for matrix.search, which is the cleaner service-
mesh shape but slower than direct in-process. In-process dispatch
when matrixd is co-resident is a future optimization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:39:26 -05:00
root
e30da6e5aa §3.8 first slice: workflow runner skeleton + DAG executor + observerd integration
Lands the structural piece of SPEC §3.8 (Observer-KB workflow runner)
documented in 97dd3f8: types + DAG runner + reference substitution +
provenance recording into observerd. Real-mode integrations
(matrix.search, distillation.score, drift.scorer, llm.chat) come in
follow-up commits — this commit proves the mechanics.

internal/workflow/types.go:
  - Workflow / Node / NodeResult / RunResult types matching Archon's
    YAML shape so existing workflows (e.g. lakehouse-architect-review.yaml)
    load directly. Optional `mode` field added — implicit fall-back is
    "llm.chat" matching Archon's convention.
  - Mode signature: func(Context, map[string]any) (map[string]any, error)
  - 4 sentinel errors: ErrCycle, ErrMissingDep, ErrUnknownMode,
    ErrDuplicateNodeID, ErrUnresolvedRef
  - Validate enforces structural invariants: unique IDs, every
    depends_on resolves, no cycles

internal/workflow/runner.go:
  - Kahn's-algorithm topological sort, stable for declaration-order
    ties (deterministic execution + JSON output across runs)
  - Reference substitution: $node_id.output.key.path resolves through
    nested maps; $node_id alone resolves to the whole output map
  - Skip cascade: a node whose dependency failed/skipped is skipped
    with explicit "upstream node X failed" error in NodeResult, never
    silently dropped
  - Per-node provenance: NodeResult.StartedAt + DurationMs captured
    for every execution
  - Mode pre-validation: every node's mode checked against registry
    BEFORE any node runs — typo catches in 5ms not after 6 nodes

internal/workflow/runner_test.go (14 tests, all PASS):
  - Validate: missing name, no nodes, duplicate IDs, missing deps, cycles
  - Run: single node, 3-node DAG with chained $-refs (shape→weakness→improvement),
    failed-node skip cascade with independent siblings still running,
    unknown-mode abort, unresolved-reference error, implicit
    llm.chat fallback, provenance fields populated, inputs (not just
    prompt) honor $-refs, topological-sort stability for ties

cmd/observerd extended:
  - POST /observer/workflow/run executes a workflow, records each
    node's execution as an ObservedOp (source="workflow"), returns
    the full RunResult
  - GET /observer/workflow/modes lists the registered mode names
  - registerBuiltinModes wires fixture.echo + fixture.upper for v0;
    real modes register here in follow-up commits

scripts/workflow_smoke.sh (4 assertions PASS):
  - GET /modes lists fixture.echo + fixture.upper
  - 3-node DAG executes: shape (uppercase "hello world") → weakness
    (sees "HELLO WORLD" via $shape.output.upper ref) → improvement
    (sees "HELLO WORLD" propagated through 2-hop $weakness.output.prompt)
  - /observer/stats shows by_source.workflow == 3 (one per node) and
    total == 3 — provenance lands as expected
  - Unknown mode → 400 with "unknown mode" in error body

17-smoke regression all green. Acceptance gates G3.8.A (Archon-shape
workflow loads + executes topologically) + G3.8.B (per-node ObservedOps)
+ G3.8.C ($prior_node.output ref resolves, error on missing ref) all
satisfied. G3.8.D (in-process matrix.search dispatch) deferred until
a real mode is wired.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:34:30 -05:00
root
bc9ab93afe H: observerd — autonomous-iteration witness loop (SPEC §2 port)
Port of the load-bearing pieces of mcp-server/observer.ts (Rust
system, 852 lines TS) per SPEC §2's named target. Implements PRD
loop 3 ("Observer loop — watches each run, refines configs").

Routes (all under /v1/observer/* via gateway):
  GET  /observer/health   — liveness
  GET  /observer/stats    — total / successes / failures /
                             by_source / recent_scenario_ops
                             (matches Rust JSON shape exactly)
  POST /observer/event    — record one ObservedOp; auto-defaults
                             timestamp + source, validates required
                             fields (endpoint), persists to JSONL,
                             appends to ring buffer

Architecture:
  - internal/observer/types.go — ObservedOp model + Source taxonomy
    (mcp / scenario / langfuse / overseer_correction). Mirrors the
    Rust shape so JSON round-trips during cutover.
  - internal/observer/store.go — Store + Persistor. Ring buffer cap
    matches Rust's 2000; recent_scenarios cap matches Rust's 10.
    Same persist-then-apply order as pathwayd; same corruption-
    tolerant replay (skip malformed lines + warn).
  - cmd/observerd — :3219 HTTP service, fronted by gateway as
    /v1/observer/*.
  - lakehouse.toml + DefaultConfig — [observerd] block matches the
    pathwayd pattern (Bind + PersistPath; empty path = ephemeral).

Tests + smoke (all PASS):
  - 7 unit tests in store_test.go: validation, default fields,
    stats aggregation, recent-scenarios cap + ordering, ring-buffer
    rollover at cap, JSONL round-trip persistence, corruption-
    tolerant replay (1 valid + 1 corrupt + 1 valid → 2 applied)
  - scripts/observer_smoke.sh: 4 assertions through gateway —
    record 5 events (3 ok / 2 fail across 2 sources), stats
    aggregates correctly, empty-endpoint→400, kill+restart preserves
    via JSONL replay (5 ops, 3 ok, 2 err survive)

Deferred (named in package + cmd doc, not in this commit):
  - POST /observer/review (cloud-LLM hand-review fall-back). The
    heuristic-only path could land cheaply but the productized
    cloud path (qwen3-coder fall-back) is multi-day port.
  - Background loops: analyzeErrors, consolidatePlaybooks,
    tailOverseerCorrections (read overseer_corrections.jsonl into
    the ring buffer once per cycle).
  - escalateFailureClusterToLLMTeam (failure clustering trigger
    that posts to LLM Team's /api/run with code_review mode).

/relevance is NOT duplicated — already ported in 9588bd8 to
internal/matrix/relevance.go (component 3 of SPEC §3.4).

16-smoke regression all green (D1-D6, G1, G1P, G2, storaged_cap,
pathway, matrix, relevance, downgrade, playbook, observer).
13 binaries now: gateway, storaged, catalogd, ingestd, queryd,
vectord, embedd, pathwayd, matrixd, observerd, mcpd, fake_ollama
(plus catalogd-only test build).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:18:02 -05:00
root
6392772f41 C: bulk playbook record — operational rating wiring
POST /v1/matrix/playbooks/bulk accepts an array of playbook entries
and records each independently — failures per-entry don't abort the
batch. Designed for two operational use cases:

  1. Backfilling historical placement data into the playbook
     substrate (the Rust system has 4,701 fill operations recorded
     with embeddings; that data deserves to feed the Go learning
     loop without a 4,701-call procedural script).
  2. Batched click-tracking from a session's worth of coordinator
     interactions, posted once at idle rather than per-click.

Per-entry response shape: {index, playbook_id} on success or
{index, error} on failure. Caller can inspect failures without
diffing.

Smoke (scripts/playbook_smoke.sh, new assertion #4):
  Bulk POST 3 entries: 2 valid (alpha→widget-a, bravo→widget-b) +
  1 invalid (empty query_text). Verifies recorded=2, failed=1,
  the 2 valid ones get playbook_ids back, and the invalid one
  surfaces its validation error in-line.

Single-record /matrix/playbooks/record from 06e7152 still works
unchanged; bulk is additive. The corpus field can be set per-
entry or once at the batch level (entry-level wins on collision).

Per the small-model autonomous pipeline framing: this is the
"the playbook gets denser with each iteration" mechanism. Click
tracking → bulk POST → playbook entries → future similar queries
get those answers boosted via the existing /matrix/search
use_playbook path. The learning loop now has both inflows wired
(single + bulk) — what remains is the demo UI shim that calls
/feedback on result interaction (deferred — no Go demo UI yet).

15-smoke regression all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:10:13 -05:00
root
a730fc2016 scrum fixes: 4 real findings landed, 4 false positives dismissed
Cross-lineage scrum review on the 12 commits of this session
(afbb506..06e7152) via Rust gateway :3100 with Opus + Kimi +
Qwen3-coder. Results:

  Real findings landed:
    1. Opus BLOCK — vectord BatchAdd intra-batch duplicates panic
       coder/hnsw's "node not added" length-invariant. Fixed with
       last-write-wins dedup inside BatchAdd before the pre-pass.
       Regression test TestBatchAdd_IntraBatchDedup added.
    2. Opus + Kimi convergent WARN — strings.Contains(err.Error(),
       "status 404") was brittle string-matching to detect cold-
       start playbook state. Fixed: ErrCorpusNotFound sentinel
       returned by searchCorpus on HTTP 404; fetchPlaybookHits
       uses errors.Is.
    3. Opus WARN — corpusingest.Run returned nil on total batch
       failure, masking broken pipelines as "empty corpora." Fixed:
       Stats.FailedBatches counter, ErrPartialFailure sentinel
       returned when nonzero. New regression test
       TestRun_NonzeroFailedBatchesReturnsError.
    4. Opus WARN — dead var _ = io.EOF in staffing_500k/main.go
       was justified by a fictional comment. Removed.

  Drivers (staffing_500k, staffing_candidates, staffing_workers)
  updated to handle ErrPartialFailure gracefully — print warn, keep
  running queries — rather than fatal'ing on transient hiccups
  while still surfacing the failure clearly in the output.

  Documented (no code change):
    - Opus WARN: matrixd /matrix/downgrade reads
      LH_FORCE_FULL_ENRICHMENT from process env when body omits
      it. Comment now explains the opinionated default and points
      callers wanting deterministic behavior to pass the field
      explicitly.

  False positives dismissed (caught and verified, NOT acted on):
    A. Kimi BLOCK on errors.Is + wrapped error in cmd/matrixd:223.
       Verified false: Search wraps with %w (fmt.Errorf("%w: %v",
       ErrEmbed, err)), so errors.Is matches the chain correctly.
    B. Kimi INFO "BatchAdd has no unit tests." Verified false:
       batch_bench_test.go has BenchmarkBatchAdd; the new dedup
       test TestBatchAdd_IntraBatchDedup adds another.
    C. Opus BLOCK on missing finite/zero-norm pre-validation in
       cmd/vectord:280-291. Verified false: line 272 already calls
       vectord.ValidateVector before BatchAdd, so finite + zero-
       norm IS checked. Pre-validation is exhaustive.
    D. Opus WARN on relevance.go tokenRe (Opus self-corrected
       mid-finding when realizing leading char counts toward token
       length).

  Qwen3-coder returned NO FINDINGS — known issue with very long
  diffs through the OpenRouter free tier; lineage rotation worked
  as designed (Opus + Kimi between them caught everything Qwen
  would have).

15-smoke regression sweep all green (D1-D6, G1, G1P, G2,
storaged_cap, pathway, matrix, relevance, downgrade, playbook).
Unit tests all green (corpusingest +1, vectord +1).

Per feedback_cross_lineage_review.md: convergent finding #2 (404
detection) is the highest-signal one — both Opus and Kimi
flagged it independently. The other Opus findings stand on
single-reviewer signal but each one verified against the actual
code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:42:39 -05:00
root
06e71520c4 matrix: playbook memory + boost — SPEC §3.4 component 5 of 5 (LEARNING LOOP)
Closes SPEC §3.4. The matrix indexer is now a learning meta-index per
feedback_meta_index_vision.md — every successful (query → answer)
pair recorded via /matrix/playbooks/record boosts that answer for
future similar queries.

This is the architectural piece that lifts vectord from "static
hybrid search" to the meta-index J originally framed in Phase 19 of
the Rust system.

What's new:
  - internal/matrix/playbook.go — PlaybookEntry, PlaybookHit,
    ApplyPlaybookBoost. Pure-function boost math:
      distance' = distance * (1 - 0.5 * score)
    Score 0 = no boost (factor 1.0); score 1 = halve distance
    (factor 0.5). Capped at 0.5 deliberately so a single high-
    confidence playbook can't dominate the base ranking forever
    (runaway-feedback-loop guard).
  - Retriever.Record(entry, corpus) — embeds query_text, ensures
    playbook corpus exists (idempotent), upserts via deterministic
    sha256-derived ID (last score wins on re-record of same triple).
  - Retriever.Search extended with UsePlaybook + PlaybookCorpus +
    PlaybookTopK + PlaybookMaxDistance. Reuses the query vector —
    no extra embed call. Missing-corpus 404 = no-op (cold-start
    state before any Record call), not an error.
  - POST /v1/matrix/playbooks/record (matrixd) — caller submits
    {query_text, answer_id, answer_corpus, score, tags?}; gets
    {playbook_id} back.

Storage: a vectord index named "playbook_memory" (configurable per
request) with embed(query_text) as the vector and the
PlaybookEntry JSON as metadata. Just another corpus — observable
from /vectors/index, persistable through G1P, etc.

Match key for boost: (AnswerID, AnswerCorpus). Cross-corpus ID
collisions don't false-match — verified by
TestApplyPlaybookBoost_CorpusAttributionRespected.

End-to-end smoke (scripts/playbook_smoke.sh, all assertions PASS):
  - Baseline search: widget-c at distance 0.6566 (rank 3)
  - Record playbook: query → widget-c, score=1.0
  - Re-search with use_playbook=true:
      widget-c distance: 0.3283 (rank 2)
      ratio: 0.5 EXACTLY (matches boost math precisely)
      playbook_boosted: 1
  - widget-c jumped from #3 to #2 — learning loop visible

Tests:
  - 8 unit tests in internal/matrix/playbook_test.go covering
    Validate, BoostFactor (5 cases), the no-boost identity, the
    boost-moves-result-up scenario, highest-score wins on duplicate
    matches, cross-corpus attribution, JSON round-trip, and
    rejection of empty metadata
  - scripts/playbook_smoke.sh integration test (3 assertions PASS)

15-smoke regression sweep all green (D1-D6, G1, G1P, G2,
storaged_cap, pathway, matrix, relevance, downgrade, playbook).

SPEC §3.4 NOW COMPLETE: 5 of 5 components shipped. The matrix
indexer's port is done as a substrate; remaining work is operational
(rating signal sources, telemetry, eventual structured filtering for
staffing data — none in §3.4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:34:24 -05:00
root
3968ec8a7b matrix: strong-model downgrade gate — SPEC §3.4 component 4 of 5
Pure-Go port of mode.rs::execute's pass5 downgrade gate (Rust
2026-04-26). Adds POST /v1/matrix/downgrade endpoint via matrixd.

The gate captures the pass5 finding: composing matrix corpora into
codereview_lakehouse on a strong model LOST 5/5 head-to-head reps
against matrix-free codereview_isolation on grok-4.1-fast (p=0.031).
Strong models have enough native capacity that bug fingerprints +
adversarial framing + file content carry them; matrix chunks
displace depth-of-analysis.

Logic (matches Rust mode.rs:614-632):
  if mode == codereview_lakehouse
     && !forced_mode
     && !LH_FORCE_FULL_ENRICHMENT
     && !is_weak_model(model)
  → flip to codereview_isolation, record downgraded_from

is_weak_model captures the empirical weak-list:
  - `:free` suffix or `:free/` infix (OpenRouter free tier)
  - qwen3.5:latest, qwen3:latest (local last-resort rungs)
  - everything else → strong by default

Tests:
  - 3 unit tests in internal/matrix/downgrade_test.go: IsWeakModel
    coverage, MaybeDowngrade truth table (5 rows), forced-mode
    precedence (forced beats every other bypass)
  - scripts/downgrade_smoke.sh: 6 assertions through gateway covering
    all 5 truth-table rows + empty-mode 400

14-smoke regression sweep all green (D1-D6, G1, G1P, G2,
storaged_cap, pathway, matrix, relevance, downgrade).

SPEC §3.4 progress: 4 of 5 components shipped (corpus builders,
multi-corpus retrieve+merge, relevance filter, downgrade gate).
Last component is learning-loop integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:17:55 -05:00
root
9588bd82ae matrix: relevance filter — SPEC §3.4 component 3 of 5
Faithful port of mcp-server/relevance.ts (Rust observer's adjacency-
pollution filter). Same 5-signal scoring, same default threshold 0.3.
Adds POST /v1/matrix/relevance endpoint via matrixd.

Scoring signals (additive, can sign-flip):
  path_match     +1.0  chunk source/doc_id encodes focus.path
  filename_match +0.6  chunk text mentions focus's filename
  defined_match  +0.6  chunk text mentions focus.defined_symbols
  token_overlap  +0.4  jaccard of non-stopword tokens
  prefix_match   +0.3  chunk source shares first-2-segment prefix
  import_penalty -0.5  mentions ONLY imported symbols, no defined ones

What this does and doesn't do:
  - DOES filter code-aware corpora (eventually lakehouse_arch_v1,
    lakehouse_symbols_v1, scrum_findings_v1) — drops chunks about
    code the focus file IMPORTS rather than DEFINES, the
    "adjacency pollution" pattern that makes a reviewer LLM
    hallucinate imported-crate internals as belonging to the focus
  - DOES NOT meaningfully filter staffing data — the candidates
    reality test 2026-04-29 had "exact skill match buried at #3"
    which is a different problem (semantic-only ranking dominated
    by secondary text). Staffing needs structured filtering
    (status gates, location gates) that lives outside this
    package — future work, not in SPEC §3.4 yet

Headline smoke assertion: focus = crates/queryd/src/db.go which
defines Connector and imports catalogd::Registry. The filter
scores:
  Connector chunk: +0.68  (defined_match fires, kept)
  Registry chunk: -0.46  (import_only penalty fires, dropped)
  unrelated junk:  0.00  (no signals, dropped)

That's a 1.14-point gap between what we ARE and what we IMPORT —
the entire purpose of the filter.

Tests:
  - 9 unit tests in internal/matrix/relevance_test.go covering
    Tokenize, Jaccard, ExtractDefinedSymbols (Rust + TS),
    ExtractImportedSymbols, FilePrefix, ScoreRelevance per-signal,
    FilterChunks threshold splitting, and the headline
    AdjacencyPollutionScenario
  - scripts/relevance_smoke.sh integration smoke (3 assertions PASS):
    adjacency-pollution scenario, empty-chunks 400, threshold honored

13-smoke regression sweep all green (D1-D6, G1, G1P, G2,
storaged_cap, pathway, matrix, relevance).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:13:22 -05:00
root
c1d96b7b60 matrixd: multi-corpus retrieve+merge — SPEC §3.4 component 2 of 5
Lands the matrix indexer's first piece per docs/SPEC.md §3.4:
multi-corpus retrieve+merge with corpus attribution per result.
Future components (relevance filter, downgrade gate, learning-loop
integration) layer on top of this surface.

Architecture:
  - internal/matrix/retrieve.go — Retriever takes (query, corpora,
    k, per_corpus_k), parallel-fans across vectord indexes, merges
    by distance ascending, preserves corpus origin per hit
  - cmd/matrixd — HTTP service on :3217, fronts /v1/matrix/*
  - gateway proxy + [matrixd] config + lakehouse.toml entry
  - Either query_text (matrix calls embedd) or query_vector
    (caller pre-embedded) — vector takes precedence if both set

Error policy: fail-loud on any corpus error. Silent partial returns
would lie about coverage, defeating the matrix's whole purpose.
Bubbles vectord errors as 502 (upstream), validation as 400.

Smoke (scripts/matrix_smoke.sh, 6 assertions PASS first try):
  - /matrix/corpora lists indexes
  - Multi-corpus search returns hits from BOTH corpora
  - Top hit is the globally-closest across all corpora
    (b-near beats a-near at distance 0.05 vs 0.1 — proves merge)
  - Metadata round-trips through the merge
  - Distances ascending in result list
  - Negative paths: empty corpora → 400, missing corpus → 502,
    no query → 400

12-smoke regression sweep all green (D1-D6, G1, G1P, G2,
storaged_cap, pathway, matrix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:39:17 -05:00
root
f1c188323c vectord: BatchAdd — single-lock variadic batch (Option A)
Replaces the per-item Add loop in the HTTP handler with one call to
Index.BatchAdd, which acquires the write-lock once and pushes the
whole batch through coder/hnsw's variadic Graph.Add. Pre-validation
stays in the handler so per-item error messages keep their item-index
precision.

Microbench (internal/vectord/batch_bench_test.go) at d=768 cosine:

  N=16    SingleAdd 283µs/op  →  BatchAdd 170µs/op   1.66×
  N=128   SingleAdd 7.9ms/op  →  BatchAdd 7.5ms/op   1.05×
  N=1024  SingleAdd 87.5ms/op →  BatchAdd 83.4ms/op  1.05×

Win is biggest at staffing-driver batch sizes (N=16) where
per-call lock + validation overhead is a meaningful fraction. At
larger N the inner HNSW neighborhood search per insert dominates,
which is the load-bearing finding for Option B (sharded indexes):
the throughput ceiling lives inside the library, not at the lock,
so sharding to N parallel Graphs is the only path to true
concurrent-Add throughput.

g1, g1p, g2 smokes all PASS post-change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:05:48 -05:00
root
afbb506dbc pathwayd: HTTP service over internal/pathway · 11/11 smoke gate
Network-callable Mem0-style trace memory at :3217, fronted by gateway
/v1/pathway/*. Closes the ADR-004 wire-up: store substrate landed in
2a6234f, this lands the HTTP surface + [pathwayd] config + acceptance
gate.

Smoke proves the architecturally distinctive properties: Revise →
History walks the predecessor chain backward (audit trail), Retire
excludes from Search default but stays Get-able, AddIdempotent bumps
replay_count without replacing — and all survive kill+restart via
JSONL log replay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:49:42 -05:00
root
fa56134b90 ADR-003 wiring: Bearer token + IP allowlist middleware
Implements the auth posture from ADR-003 (commit 0d18ffa). Two
independent layers — Bearer token (constant-time compare via
crypto/subtle) and IP allowlist (CIDR set) — composed in shared.Run
so every binary inherits the same gate without per-binary wiring.

Together with the bind-gate from commit 6af0520, this mechanically
closes audit risks R-001 + R-007:
  - non-loopback bind without auth.token = startup refuse
  - non-loopback bind WITH auth.token + override env = allowed
  - loopback bind = all gates open (G0 dev unchanged)

internal/shared/auth.go (NEW)
  RequireAuth(cfg AuthConfig) returns chi-compatible middleware.
  Empty Token + empty AllowedIPs → pass-through (G0 dev mode).
  Token-only → 401 Bearer mismatch.
  AllowedIPs-only → 403 source IP not in CIDR set.
  Both → both gates apply.
  /health bypasses both layers (load-balancer / liveness probes
  shouldn't carry tokens).

  CIDR parsing pre-runs at boot; bare IP (no /N) treated as /32 (or
  /128 for IPv6). Invalid entries log warn and drop, fail-loud-but-
  not-fatal so a typo doesn't kill the binary.

  Token comparison: subtle.ConstantTimeCompare on the full
  "Bearer <token>" wire-format string. Length-mismatch returns 0
  (per stdlib spec), so wrong-length tokens reject without timing
  leak. Pre-encoded comparison slice stored in the middleware
  closure — one allocation per request.

  Source-IP extraction prefers net.SplitHostPort fallback to
  RemoteAddr-as-is for httptest compatibility. X-Forwarded-For
  support is a follow-up when a trusted proxy fronts the gateway
  (config knob TBD per ADR-003 §"Future").

internal/shared/server.go
  Run signature: gained AuthConfig parameter (4th arg).
  /health stays mounted on the outer router (public).
  Registered routes go inside chi.Group with RequireAuth applied —
  empty config = transparent group.
  Added requireAuthOnNonLoopback startup check: non-loopback bind
  with empty Token = refuse to start (cites R-001 + R-007 by name).

internal/shared/config.go
  AuthConfig type added with TOML tags. Fields: Token, AllowedIPs.
  Composed into Config under [auth].

cmd/<svc>/main.go × 7 (catalogd, embedd, gateway, ingestd, queryd,
storaged, vectord, mcpd is unaffected — stdio doesn't bind a port)
  Each call site adds cfg.Auth as the 4th arg to shared.Run. No
  other changes — middleware applies via shared.Run uniformly.

internal/shared/auth_test.go (12 test funcs)
  Empty config pass-through, missing-token 401, wrong-token 401,
  correct-token 200, raw-token-without-Bearer-prefix 401, /health
  always public, IP allowlist allow + reject, bare IP /32, both
  layers when both configured, invalid CIDR drop-with-warn, RemoteAddr
  shape extraction. The constant-time comparison is verified by
  inspection (comments in auth.go) plus the existence of the
  passthrough test (length-mismatch case).

Verified:
  go test -count=1 ./internal/shared/  — all green (was 21, now 33 funcs)
  just verify                            — vet + test + 9 smokes 33s
  just proof contract                    — 53/0/1 unchanged

Smokes + proof harness keep working without any token configuration:
default Auth is empty struct → middleware is no-op → existing tests
pass unchanged. To exercise the gate, operators set [auth].token in
lakehouse.toml (or, per the "future" note in the ADR, via env var).

Closes audit findings:
  R-001 HIGH — fully mechanically closed (was: partial via bind gate)
  R-007 MED  — fully mechanically closed (was: design-only ADR-003)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 07:11:34 -05:00
root
8f4c16fab1 mcpd: Go MCP SDK port — replaces Bun mcp-server tool surface
New cmd/mcpd binary using github.com/modelcontextprotocol/go-sdk
v1.5.0 over stdio transport. Exposes Lakehouse capabilities as MCP
tools: list_datasets, get_manifest, query_sql, embed_text,
search_vectors. Each tool proxies to the gateway via HTTP.

Replaces the MCP-tool subset of the Rust system's Bun mcp-server.ts
(the audit's "split this 2520-line empire" finding from R-005). HTTP
demo routes (the staffing co-pilot UI at /api/intelligence/*,
/headshots/*, etc.) stay Bun until G5 cutover — those are demo-
specific and depend on matrix-indexer signals not yet ported.

Architecture:
  cmd/mcpd/main.go (235 LoC)
    main() reads --gateway flag, builds server via buildServer(),
    runs on StdioTransport. Each tool's args is a typed struct with
    jsonschema tags (the SDK's canonical pattern); reflection
    generates the JSON Schema automatically.

    gatewayClient: thin HTTP wrapper over the configured gateway URL.
    30s per-request timeout. 16 MiB tool-response cap. Non-2xx
    surfaces as IsError CallToolResult (NOT as transport error) so
    the LLM caller sees the error text and can decide how to react.

    proxy() handles GET + POST + JSON body uniformly. errorResult()
    + jsonResult() helpers normalize CallToolResult shape.

  cmd/mcpd/main_test.go (13 test funcs)
    Tests the full MCP wire end-to-end without a subprocess: spin
    up a fake gateway via httptest, build the MCP server pointed at
    it, connect a client via in-memory transports (NewInMemoryTransports),
    call each tool. Each tool gets:
      - happy path (gateway returns 200 → tool returns content)
      - input validation (missing required fields → IsError)
      - upstream error (gateway 4xx → tool returns IsError)
    Plus TestListTools verifies all 5 tools register; TestGatewayUnreachable
    verifies network-level failures surface as IsError, not panics.

Setup for Claude Desktop / Code documented in README:
  {
    "mcpServers": {
      "lakehouse": {
        "command": "/path/to/bin/mcpd",
        "args": ["--gateway", "http://127.0.0.1:3110"]
      }
    }
  }

Verified:
  go test -count=1 ./cmd/mcpd/  — 13/13 green
  just verify                    — vet + test + 9 smokes 35s

Out of scope for this commit:
  - Resources (mcp.AddResource): not needed yet; tools cover the
    interactive surface. Add when an LLM-side use case shows up.
  - Prompts (mcp.AddPrompt): same.
  - Streamable transports (HTTP, SSE): stdio is the universal one;
    streamable can be added with srv.Run(ctx, &mcp.StreamableHTTPHandler{})
    swap if a daemon-mode deploy makes sense.
  - mcpd inside the daemon-supervised stack: it's stdio-only and
    spawned by the MCP client, not run as a service. Adding a
    daemon-mode (HTTP transport on a port) is a follow-up if MCP
    consumers want long-lived sessions.

This is a tool-surface only port. The Bun mcp-server.ts also serves
HTTP demo routes (/api/catalog/datasets, /intelligence/*, /headshots/*)
that depend on the matrix-indexer signals from the Rust system; those
stay Bun until G5 cutover when the staffing co-pilot service ports
to Go.

Direct deps added:
  github.com/modelcontextprotocol/go-sdk v1.5.0

Transitive (resolved by go mod tidy):
  github.com/google/jsonschema-go        v0.4.2
  github.com/yosida95/uritemplate/v3     v3.0.2
  golang.org/x/oauth2                    v0.35.0
  github.com/segmentio/encoding          v0.5.4
  github.com/golang-jwt/jwt/v5           v5.3.1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 07:00:38 -05:00
root
56844c3f31 embed cache — LRU at /v1/embed for repeat-query elimination
Adds CachedProvider wrapping the embedding Provider with a thread-safe
LRU keyed on (effective_model, sha256(text)) → []float32. Repeat
queries return the stored vector without round-tripping to Ollama.

Why this matters: the staffing 500K test (memory project_golang_lakehouse)
documented that the staffing co-pilot replays many of the same query
texts ("forklift driver IL", "welder Chicago", "warehouse safety", etc).
Each repeat paid the ~50ms Ollama round-trip. Cached repeats now serve
in <1µs (LRU lookup + sha256 of input).

Memory budget: ~3 KiB per entry at d=768. Default 10K entries ≈ 30 MiB.
Configurable via [embedd].cache_size; 0 disables (pass-through mode).

Per-text caching, not per-batch — a batch with mixed hits/misses only
fetches the misses upstream, then merges the result preserving caller
input order. Three-text batch with one miss = one upstream call for
that one text instead of three.

Implementation:
  internal/embed/cached.go (NEW, 150 LoC)
    CachedProvider implements Provider; uses hashicorp/golang-lru/v2.
    Key shape: "<model>:<sha256-hex>". Empty model resolves to
    defaultModel (request-derived) for the key — NOT res.Model
    (upstream-derived), so future requests with same input shape
    hit the same key. Caught by TestCachedProvider_EmptyModelResolvesToDefault.
    Atomic hit/miss counters + Stats() + HitRate() + Len().

  internal/embed/cached_test.go (NEW, 12 test funcs)
    Pass-through-when-zero, hit-on-repeat, mixed-batch only fetches
    misses, model-key isolation, empty-model resolves to default,
    LRU eviction at cap, error propagation, all-hits synthesized
    without upstream call, hit-rate accumulation, empty-texts
    rejected, concurrent-safe (50 goroutines × 100 calls), key
    stability + distinctness.

  internal/shared/config.go
    EmbeddConfig.CacheSize (toml: cache_size). Default 10000.

  cmd/embedd/main.go
    Wraps Ollama Provider with CachedProvider on startup. Adds
    /embed/stats endpoint exposing hits / misses / hit_rate / size.
    Operators check the rate to confirm the cache is working
    (high rate = good) or sized wrong (low rate + many misses on a
    workload that should have repeats).

  cmd/embedd/main_test.go
    Stats endpoint tests — disabled mode shape, enabled mode tracks
    hits + misses across repeat calls.

One real bug caught by my own test:
  Initial implementation cached under res.Model (upstream-resolved)
  rather than effectiveModel (request-resolved). A request with
  model="" caching under "test-model" (Ollama's default), then a
  request with model="the-default" (our config default) missing
  the cache. Fix: always use the request-derived effectiveModel
  for keys; that's the predictable side. Locked by
  TestCachedProvider_EmptyModelResolvesToDefault.

Verified:
  go test -count=1 ./internal/embed/  — all 12 cached tests + 6 ollama tests green
  go test -count=1 ./cmd/embedd/      — stats endpoint tests green
  just verify                          — vet + test + 9 smokes 33s

Production benefit:
  ~50ms Ollama round-trip → <1µs cache lookup for cached entries.
  At 10K-entry default + ~30% repeat rate (typical staffing co-pilot
  workload), saves several seconds per staffer-query session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:54:30 -05:00
root
fb08232f58 Batch 4: embed fixture-mode — partial R-006 closure
Adds cmd/fake_ollama, a minimal Ollama-API-compatible fake that
implements just enough surface for embedd to drive end-to-end
without a real Ollama install:

  GET  /api/tags        — fixed model list including nomic-embed-text
  POST /api/embeddings  — deterministic dim-D vector from sha256(prompt)
  GET  /health          — for the smoke's poll_health helper

Same prompt → bit-identical vector across runs, machines, and CI
nodes. Vectors are NOT semantically meaningful; the fake validates
the embed CONTRACT (dimension echo, response shape, status codes,
deterministic round-trip), not real semantic ranking. Real ranking
still requires real Ollama and lives in scripts/g2_smoke.sh + the
integration tier of the proof harness.

scripts/g2_smoke_fixtures.sh — full chain smoke against the fake:
  - Build fake_ollama + embedd + vectord + gateway
  - Start fake on :11435 (distinct from real Ollama at :11434)
  - Generate temp lakehouse.toml with provider_url override
  - Boot embedd/vectord/gateway with --config <override>
  - 4 assertions: dim=768, deterministic same-text, different-text
    divergence, bad-model → 4xx/5xx (fake 404 → embedd 502)
  - Trap-cleanup tears down all 4 binaries + tmp config

Wired into the task runner:
  just smoke-g2-fixtures

Closes R-006 partially:
  - Embed half: ✓ — CI / fresh-clone reviewers without Ollama can
    now run the embed contract smoke
  - Storage half: deferred — mocking S3 protocol is non-trivial
    (multipart, signed URLs, etc.) and MinIO itself is lightweight
    enough to install via Docker in any CI environment. Documented
    as Sprint 0 follow-up if a CI system without Docker shows up.

What this DOESN'T cover:
  - Real semantic similarity (use scripts/g2_smoke.sh + real Ollama)
  - Real Ollama API quirks (timeouts, version-specific shapes,
    /api/embed batch endpoint that newer versions support)

Verified:
  bash scripts/g2_smoke_fixtures.sh — 4/4 assertions PASS, ~3s wall
  just verify                       — vet + test + 9 smokes still green

Doesn't replace the existing g2_smoke.sh (which still requires real
Ollama and exercises the actual embed semantics). Adds an alternate
mode for portability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:22:07 -05:00
root
0f79bce948 Batch 3: cmd/<bin>/main_test.go × 6 — closes R-005
Adds main_test.go for each of the 6 cmd binaries that lacked them
(storaged already had main_test.go; that's where the pattern came
from). Each test file focuses on the cmd-specific surface — route
mounts, body caps, decode/validation paths — without re-testing
internal package logic that's covered elsewhere.

cmd/catalogd/main_test.go — 6 funcs
  TestRoutesMounted: chi.Walk asserts /catalog/{register,manifest/*,list}
  TestHandleRegister_BodyTooLarge: 5 MiB body → 4xx
  TestHandleRegister_MalformedJSON: 400
  TestHandleRegister_EmptyName_400: ErrEmptyName surfaces as 400
  TestHandleGetManifest_404 + TestHandleList_EmptyShape

cmd/embedd/main_test.go — 8 funcs
  stubProvider implements embed.Provider deterministically
  TestRoutesMounted, MalformedJSON_400, EmptyTextRejected_400 (per
    scrum O-W3), UpstreamError_502 (provider error → 502, not 500),
    HappyPath_ProviderEcho, BodyTooLarge (4xx range), TestItoa
    (covers the no-strconv helper)

cmd/gateway/main_test.go — 4 funcs
  TestMustParseUpstream_HappyPaths: 3 valid URLs
  TestMustParseUpstream_FailureExits: re-execs the test binary in a
    subprocess with env flag (standard pattern for testing os.Exit
    callers); subprocess invokes mustParseUpstream("127.0.0.1:3211")
    [missing scheme]; expects exit non-zero. Same pattern for garbage.
  TestUpstreamConfigKeys_DocumentedShape: locks the 6 _url keys

cmd/ingestd/main_test.go — 7 funcs
  Stubs both storaged and catalogd via httptest.Server so the cmd
  layer can be exercised without bringing the full chain up.
  TestHandleIngest_MissingNameQueryParam: 400 with "name" in body
  TestHandleIngest_MalformedMultipart: 400
  TestHandleIngest_MissingFormFile: 400 (valid multipart, wrong field)
  TestHandleIngest_BodyTooLarge: 4xx
  TestEscapeKeyPath: 6-case URL-escape table (apostrophe, space, etc.)
  TestParquetKeyPath_Format: locks the datasets/<n>/<fp>.parquet shape
    per scrum C-DRIFT (any rename breaks idempotent re-ingest)

cmd/queryd/main_test.go — 6 funcs
  Tests pre-DB paths (decode, body cap, empty SQL); db.QueryContext
  itself needs DuckDB so it's covered by GOLAKE-040 in the proof
  harness, not unit tests. handlers.db = nil here is intentional.
  TestHandleSQL_EmptySQL_400: 3 cases (empty, whitespace, mixed-WS)
  TestMaxSQLBodyBytes_Reasonable: locks the 64 KiB constant in a
    sane range so a refactor can't blow it open
  TestPrimaryBucket_Constant: locks "primary" — secrets lookup uses
    this; rename = silent secret-resolution failure at boot

cmd/vectord/main_test.go — 14 funcs
  All 6 routes verified mounted. handlers.persist = nil = pure
  in-memory mode; persistence is GOLAKE-070 in the proof harness.
  Coverage of every error branch in handleCreate/Add/Search/Delete:
    missing index → 404, dim mismatch → 400, empty items → 400,
    empty id → 400, malformed JSON → 400, body too large → 4xx,
    happy create → 201, happy list → 200.

One real finding caught during writing:
  Body-cap rejection is sometimes 413 (typed MaxBytesError survives
  unwrap) and sometimes 400 (decoder wraps it as a generic decode
  error). Both are valid client-error contracts; the contract isn't
  "exactly 413" but "fails loud as 4xx, never silent 200 or 5xx."
  Tests assert 4xx range. The proof harness's
  proof_assert_status_4xx already had this shape — just bringing
  the unit tests in line with it.

Verified:
  go test -count=1 -short ./cmd/...  — all 7 packages green
  just verify                         — vet + test + 9 smokes 35s

Closes audit risk R-005 (6/7 cmd/main.go untested). Combined with
the proof harness's wiring coverage, every cmd-level handler now
has both unit-test and integration-test coverage of the wiring
layer. R-005 → CLOSED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:18:46 -05:00
root
423a3817c5 D: storaged per-prefix PUT cap — vectord _vectors/ → 4 GiB
Closes the documented 500K-test gap (memory project_golang_lakehouse:
"storaged 256 MiB PUT cap blocks single-file LHV1 persistence above
~150K vectors at d=768"). Vectord persistence under "_vectors/" now
gets a 4 GiB cap; everything else (parquets, manifests, ingest)
keeps the 256 MiB default.

Why per-prefix and not "raise globally":
  - 256 MiB cap is a real DoS protection — runaway clients can't
    drain the daemon. Raising it for ALL traffic would expand the
    attack surface for routine paths that have no need.
  - Per-prefix preserves existing protection while opening the one
    documented production-scale path.

Why not split LHV1 across multiple keys (the alternative):
  - G1P shipped a single-Put framed format SPECIFICALLY to eliminate
    the torn-write class (memory: "Single Put eliminates the torn-
    write class that the 3-way convergent scrum finding identified").
  - Multi-key LHV1 would re-introduce the half-saved-state failure
    mode we just paid to fix. Streaming via existing manager.Uploader
    is the better architectural answer.

Why not bump the cap operationally via env/config:
  - Future operator-driven cap can drop in cleanly via the
    maxPutBytesFor function. Started with hardcoded 4 GiB to keep
    this commit small; config knob is a follow-up if production
    workloads diverge from the documented 500K-vector ceiling.

manager.Uploader is already streaming-multipart on the outbound
S3 side; the inbound MaxBytesReader cap is a safety gate, not a
memory bottleneck. So raising it for vectord just lets the
existing streaming path actually flow, without introducing new
memory pressure (4-slot semaphore × 4 GiB worst case = 16 GiB
only if all slots simultaneously max out — vanishingly unlikely).

Implementation:
  cmd/storaged/main.go:
    new constant maxPutBytesVectors = 4 GiB (covers >700K vectors @ d=768)
    new constant vectorsPrefix = "_vectors/" (synced with vectord.VectorPrefix)
    new function maxPutBytesFor(key) → cap-by-prefix
    handlePut: ContentLength check + MaxBytesReader use the per-key cap

  cmd/storaged/main_test.go (3 new test funcs):
    TestMaxPutBytesFor: 7 cases incl. nested prefix, substring-but-not-
      prefix, empty key, parquet/manifest paths.
    TestVectorPrefixSyncWithVectord: regression test that asserts
      vectorsPrefix == vectord.VectorPrefix. A future rename surfaces
      here instead of silently bypassing the larger cap.
    TestVectorCapAccommodates500KStaffingTest: bounds the cap above
      the documented production workload (~700 MiB conservative).

Verified:
  go test ./cmd/storaged/ — all green (was 1 func, now 4)
  just verify             — 9 smokes still pass · 32s wall
  just proof contract     — 53/0/1 unchanged

Out of scope for this commit (deserves its own):
  - Heavy integration smoke: 200K dim=768 synthetic vectors → ~700
    MiB LHV1 → kill+restart vectord → recall=1. ~5-10 min wall;
    follow-up if you want production-scale persistence verified
    end-to-end. Unit tests + existing g1p_smoke cover the wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:00:09 -05:00
root
9ee7fc5550 G2: embedd — text → vector via Ollama · 2 scrum fixes
Bridges the missing piece for the staffing co-pilot: text inputs to
vectord-shaped vectors. Standalone cmd/embedd on :3216 fronted by
gateway at /v1/embed. Pluggable embed.Provider interface (G2 ships
Ollama; OpenAI/Voyage swap in via the same interface in G3+).

Wire format:
  POST /v1/embed {"texts":[...], "model":"..."}  // model optional
  → 200 {"model","dimension","vectors":[[...]]}

Default model: nomic-embed-text (768-d). Ollama returns float64;
provider converts to float32 at the boundary so vectors flow through
vectord/HNSW without re-conversion.

Acceptance smoke 5/5 PASS — including the architectural payoff:
end-to-end embed → vectord add → search by re-embedded text returns
recall=1 at distance 5.96e-8 (float32 precision noise on identical
unit vectors). The staffing co-pilot pipeline (text → vector →
similarity search) is now functional end-to-end.

All 9 smokes (D1-D6 + G1 + G1P + G2) PASS deterministically.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                    0 BLOCK + 4 WARN + 3 INFO
  - Kimi K2-0905 (openrouter):              0 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):               "No BLOCKs" (3 tokens)

Fixed (2 — 1 convergent + 1 single-reviewer):
  C1 (Opus + Kimi convergent WARN): per-text 60s timeout × N-text
    batch was up to N×60s with no batch-level cap. One stuck Ollama
    call would stall the whole handler indefinitely. Fix:
    context.WithTimeout(r.Context(), 60s) wraps the entire batch.
  O-W3 (Opus WARN): empty strings in texts went to Ollama unchecked,
    producing version-dependent garbage. Fix: reject "" with 400 at
    the handler boundary so callers get a deterministic answer
    instead of an upstream-conditional 502.

Deferred (4): drainAndClose 64KiB cap (matches G0 pattern), no
concurrency limit on /embed (single-tenant G2), missing Accept
header (exotic-proxy concern), MaxBytesError string-match
redundancy (paranoia layer kept consistent across codebase).

Zero false positives this round — Qwen returned 3 tokens "No BLOCKs"
and the other two reviewers' findings were all real.

Setup confirmed: Ollama 0.21.0 on :11434 with nomic-embed-text loaded.
Per-text /api/embeddings used (forward-compat with 0.21+); newer
0.4+ /api/embed batch endpoint can swap in via the Provider interface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:42:27 -05:00
root
8b92518d21 G1P: vectord persistence to storaged + scrum (3 fixes incl. 3-way convergent)
Adds optional persistence to vectord (G1's HNSW vector search). Single-
file framed format per index — eliminates the torn-write class that
the 3-way convergent scrum finding identified:

  _vectors/<name>.lhv1  — single binary blob:
      [4 bytes magic "LHV1"]
      [4 bytes envelope_len uint32 BE]
      [envelope bytes — JSON params + metadata + version]
      [graph bytes — raw hnsw.Graph.Export]

Pre-extraction: internal/catalogd/store_client.go → internal/storeclient/
shared package, since both catalogd and vectord need it. Same pattern as
the pre-D5 catalogclient extraction.

Optional via [vectord].storaged_url config (empty = ephemeral mode).
On startup: List + Load each persisted index. After Create / batch Add /
DELETE: Save (or Delete from storaged). Save failures are logged-not-
fatal — in-memory state is the source of truth in flight.

Acceptance smoke G1P 8/8 PASS — kill+restart preserves state, post-
restart search returns dist=0 (graph round-trips exactly), DELETE
removes the file, post-delete restart shows count=0.

All 8 smokes (D1-D6 + G1 + G1P) PASS deterministically. The g1_smoke
gained scripts/g1_smoke.toml that disables persistence so the
in-memory API test stays decoupled from any rehydrate-from-storaged
state contamination.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                     1 BLOCK + 5 WARN + 3 INFO
  - Kimi K2-0905 (openrouter):               1 BLOCK + 2 WARN
  - Qwen3-coder (openrouter):                2 BLOCK + 2 WARN + 1 INFO

Fixed (3 — 1 convergent + 2 single-reviewer):
  C1 (Opus + Kimi + Qwen 3-WAY CONVERGENT WARN): Save was non-atomic
    across two PUTs — envelope-succeeds + graph-fails left a half-
    saved index that passed the "both present" List filter and
    silently mismatched metadata against vectors on Load.
    Fix: collapse to single framed file (no torn-write window
    possible).
  O-B1 (Opus BLOCK): isNotFound substring-matched "key not found"
    against the wrapped error message — brittle, any 5xx body
    containing that text would silently misclassify as missing.
    Fix: errors.Is(err, storeclient.ErrKeyNotFound).
  O-I3 (Opus INFO): handleAdd pre-validation only covered id+dim;
    NaN/Inf/zero-norm could still fail mid-batch leaving partial
    commits. Fix: extend pre-validation to call ValidateVector
    (newly exported) per item before any commit.

Dismissed (3 false positives):
  K-B1 + Q-B1 ("safeKey double-escapes %2F segments") — false
    convergent. Wire-protocol escape is decoded by storaged's chi
    router on the way in; on-disk key is the original literal.
    %2F round-trips correctly through PathEscape → URL → chi decode
    → S3 key.
  Q-B2 ("List vulnerable to race conditions") — vectord is single-
    process; no concurrent Save against List in the same vectord.

Deferred (3): rehydrate per-index timeout (G2+ multi-index scale),
saveAfter request ctx (matches G0 timeout deferral), Encode RLock
during slow writer (documented as buffer-only API).

The C1 finding is the strongest signal of the cross-lineage filter:
three independent reviewers all flagged the same torn-write hazard.
Single-file framing eliminates the class — there's now no Persistor
state where envelope and graph can disagree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:33:23 -05:00
root
b8c072cf0b G1: vectord — HNSW vector search via coder/hnsw · 6 scrum fixes applied
First G1+ piece. Standalone vectord service with in-memory HNSW
indexes keyed by string IDs and optional opaque JSON metadata.
Wraps github.com/coder/hnsw v0.6.1 (pure Go, no cgo). New port
:3215 with /v1/vectors/* routed through gateway.

API:
  POST   /v1/vectors/index            create
  GET    /v1/vectors/index            list
  GET    /v1/vectors/index/{name}     get info
  DELETE /v1/vectors/index/{name}
  POST   /v1/vectors/index/{name}/add (batch)
  POST   /v1/vectors/index/{name}/search

Acceptance smoke 7/7 PASS — including recall=1 on inserted vector
w-042 (cosine distance 5.96e-8, float32 precision noise), 200-
vector batch round-trip, dim mismatch → 400, missing index → 404,
duplicate create → 409.

Two upstream library quirks worked around in the wrapper:
  1. coder/hnsw.Add panics with "node not added" on re-adding an
     existing key (length-invariant fires because internal
     delete+re-add doesn't change Len). Pre-Delete fixes for n>1.
  2. Delete of the LAST node leaves layers[0] non-empty but
     entryless; next Add SIGSEGVs in Dims(). Workaround: when
     re-adding to a 1-node graph, recreate the underlying graph
     fresh via resetGraphLocked().

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                 0 BLOCK + 4 WARN + 3 INFO
  - Kimi K2-0905 (openrouter):           2 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):            "No BLOCKs" (4 tokens)

Fixed (4 real + 2 cleanup):
  O-W1: Lookup returned the raw []float32 from coder/hnsw — caller
    mutation would corrupt index. Now copies before return.
  O-W3: NaN/Inf vectors poison HNSW (distance comparisons return
    false for both < and >, breaking heap invariants). Zero-norm
    under cosine produces NaN. Now validated at Add time.
  K-B1: Re-adding with nil metadata silently cleared the existing
    entry — JSON-omitted "metadata" field deserializes as nil,
    making upsert non-idempotent. Now nil = "leave alone"; explicit
    {} or Delete to clear.
  O-W4: Batch Add with mid-batch failure left items 0..N-1
    committed and item N rejected. Now pre-validates all IDs+dims
    before any Add.
  O-I1: jsonItoa hand-roll replaced with strconv.Itoa — no
    measured allocation win.
  O-I2: distanceFn re-resolved per Search → use stored i.g.Distance.

Dismissed (2 false positives):
  K-B2 "MaxBytesReader applied after full read" — false, applied
    BEFORE Decode in decodeJSON
  K-W1 "Search distances under read lock might see invalidated
    slices from concurrent Add" — false, RWMutex serializes
    write-lock during Add against read-lock during Search

Deferred (3): HTTP server timeouts (consistent G0 punt),
Content-Type validation (internal service behind gateway), Lookup
dim assertion (in-memory state can't drift).

The K-B1 finding is worth pausing on: nil metadata on re-add is
the kind of API ergonomics bug only a code-reading reviewer
catches — smoke would never detect it because the smoke always
sends explicit metadata. Three lines changed in Add; the resulting
API matches what callers actually expect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:50:28 -05:00
root
d023b07b30 Real-scale validation post-G0: configurable ingest cap + workers_500k metrics
Validated G0 substrate against the production workers_500k.parquet
dataset (18 cols × 500,000 rows). Findings + one applied fix:

Finding #1 (FIXED): ingestd's hardcoded 256 MiB cap rejected the 500K
CSV (344 MiB) with 413. Cap fired correctly, no OOM. Extracted to
[ingestd].max_ingest_bytes config field; default 256 MiB, override
per deployment for known-large workloads. With cap bumped to 512 MiB,
500K ingest succeeds in 3.12s with ingestd peak RSS 209 MiB.

Finding #2 (deferred): ingestd doesn't release memory between
ingests. Go runtime conservative; long-running daemon, fine.

Finding #3: DuckDB-via-httpfs is healthy at 500K. GROUP BY 45ms,
count(*) 24ms, AVG 47ms, schema introspection 25ms. Sub-linear
scaling vs 100K — the s3:// read path is not a bottleneck.

Finding #4: ADR-010 type inference correctly handled real staffing
data. worker_id → BIGINT, numeric scores → DOUBLE, multi-line
resume_text → VARCHAR. 1000-row sample sufficient.

Finding #5: Go's encoding/csv handles RFC 4180 quoted-comma fields
and multi-line quoted text without LazyQuotes — confirming the D4
scrum's dismissal of Qwen's BLOCK on this point.

Net: substrate handles production-scale data with one config knob.
No correctness issues, no OOMs, no silent type errors.
All 6 G0 smokes still PASS after the cap-config change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:32:08 -05:00
root
b1d52306ad G0 D6: gateway reverse proxy fronting all 4 backing services · 2 scrum fixes · G0 COMPLETE
Last day of Phase G0. Gateway promotes the D1 stub endpoints into
real reverse-proxies on :3110 fronting storaged + catalogd + ingestd
+ queryd. /v1 prefix lives at the edge — internal services route on
/storage, /catalog, /ingest, /sql, with the prefix stripped by a
custom Director per Kimi K2's D1-plan finding.

Routes:
  /v1/storage/*  → storaged
  /v1/catalog/*  → catalogd
  /v1/ingest     → ingestd
  /v1/sql        → queryd

Acceptance smoke 6/6 PASS — every assertion goes through :3110, none
direct to backing services. Full ingest → storage → catalog → query
round-trip verified end-to-end. The smoke's "rows[0].name=Alice"
assertion is the architectural payoff: five binaries, six HTTP
routes, one round-trip through one edge.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                      1 BLOCK + 2 WARN + 2 INFO
  - Kimi K2-0905 (openrouter):                1 BLOCK + 3 WARN + 1 INFO (3 false positives, all from one wrong TrimPrefix theory)
  - Qwen3-coder (openrouter):                 5 completion tokens — "No BLOCKs."

Fixed (2, both Opus single-reviewer):
  O-BLOCK: Director path stripping fails if upstream URL has a
    non-empty path. The default Director's singleJoiningSlash runs
    BEFORE the custom code, so an upstream like http://host/api
    produces /api/v1/storage/... after the join — then TrimPrefix("/v1")
    is a no-op because the string starts with /api. Fix: strip /v1
    BEFORE calling origDirector. New TestProxy_SubPathUpstream regression
    locks this in. Today: bare-host URLs only, dormant — but moving
    gateway behind a sub-path in prod would have silently 404'd.
  O-WARN2: url.Parse is permissive — typo "127.0.0.1:3211" (no scheme)
    parses fine, produces empty Host, every request 502s. mustParseUpstream
    fail-fast at startup with a clear message naming the offending
    config field.

Dismissed (3, all Kimi, same false TrimPrefix theory):
  K-BLOCK "TrimPrefix loops forever on //v1storage" — false, single
    check-and-trim, no loop
  K-WARN "no upper bound on repeated // removal" — same false theory
  K-WARN "goroutines leak if upstream parse fails while binaries
    running" — confused scope; binaries are separate OS processes
    launched by the smoke script

D1 smoke updated (post-D6): the 501 stub probes are gone (gateway no
longer stubs /v1/ingest and /v1/sql). Replaced with proxy probes that
verify gateway forwards malformed requests to ingestd and queryd. Launch
order changed from parallel to dep-ordered (storaged → catalogd →
ingestd → queryd → gateway) since catalogd's rehydrate now needs
storaged, queryd's initial Refresh needs catalogd.

All six G0 smokes (D1 through D6) PASS end-to-end after every fix
round. Phase G0 substrate is complete: 5 binaries, 6 routes, 25 fixes
applied across 6 days from cross-lineage review.

G1+ next: gRPC adapters, Lance/HNSW vector indices, Go MCP SDK port,
distillation rebuild, observer + Langfuse integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:21:54 -05:00
root
9e9e4c26a4 G0 D5: queryd DuckDB SELECT over Parquet via httpfs · 4 scrum fixes
Phase G0 Day 5 ships queryd: in-memory DuckDB with custom Connector
that runs INSTALL httpfs / LOAD httpfs / CREATE OR REPLACE SECRET
(TYPE S3) on every new connection, sourced from SecretsProvider +
shared.S3Config. SetMaxOpenConns(1) so registrar's CREATE VIEWs and
handler's SELECTs serialize through one connection (avoids cross-
connection MVCC visibility edge cases).

Registrar.Refresh reads catalogd /catalog/list, runs CREATE OR
REPLACE VIEW "name" AS SELECT * FROM read_parquet('s3://bucket/key')
per manifest, drops views for removed manifests, skips on unchanged
updated_at (the implicit etag). Drop pass runs BEFORE create pass so
a poison manifest can't block other manifest refreshes (post-scrum
C1 fix).

POST /sql with JSON body {"sql":"…"} returns
{"columns":[{"name":"id","type":"BIGINT"},…], "rows":[[…]],
"row_count":N}. []byte → string conversion so VARCHAR rows
JSON-encode as text. 30s default refresh ticker, configurable via
[queryd].refresh_every.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                      1 BLOCK + 4 WARN + 4 INFO
  - Kimi K2-0905 (openrouter):                2 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):                 2 BLOCK + 1 WARN + 1 INFO

Fixed (4):
  C1 (Opus + Kimi convergent): Refresh aborts on first per-view error
    → drop pass first, collect errors, errors.Join. Poison manifest
    no longer blocks the rest of the catalog from re-syncing.
  B-CTX (Opus BLOCK): bootstrap closure captured OpenDB's ctx →
    cancelled-ctx silently fails every reconnect. context.Background()
    inside closure; passed ctx only for initial Ping.
  B-LEAK (Kimi BLOCK): firstLine(stmt) truncated CREATE SECRET to 80
    chars but those 80 chars contained KEY_ID + SECRET prefix → log
    aggregator captures credentials. Stable per-statement labels +
    redactCreds() filter on wrapped DuckDB errors.
  JSON-ERR (Opus WARN): swallowed json.Encode error → silent
    truncated 200 on unsupported column types. slog.Warn the failure.

Dismissed (4 false positives):
  Qwen BLOCK "bootstrap not transactional" — DuckDB DDL is auto-commit
  Qwen BLOCK "MaxBytesReader after Decode" — false, applied before
  Kimi BLOCK "concurrent Refresh + user SELECT deadlock" — not a
    deadlock, just serialization, by design with 10s timeout retry
  Kimi WARN "dropView leaves r.known inconsistent" — current code
    returns before the delete; the entry persists for retry

Critical reviewer behavior: 1 convergent BLOCK between Opus + Kimi
on the per-view error blocking, plus two independent single-reviewer
BLOCKs (B-CTX, B-LEAK) that smoke could never have caught. The
B-LEAK fix uses defense-in-depth: never pass SQL into the error
path AND redact known cred values from DuckDB's own error message.

DuckDB cgo path: github.com/duckdb/duckdb-go/v2 v2.10502.0 (per
ADR-001 §1) on Go 1.25 + arrow-go. Smoke 6/6 PASS after every
fix round.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:10:55 -05:00
root
4205ecd0f0 Pre-D5: extract CatalogClient to internal/catalogclient/ + add List
queryd (D5) needs the same HTTP client to catalogd that ingestd uses,
but the client lived in internal/ingestd — having queryd import from
ingestd would invert the data-flow direction (ingestd is upstream of
queryd; the package dep should not point back). Extract to a shared
internal/catalogclient/ package now, before D5 forces it under
implementation pressure.

Adds the List(ctx) method queryd will need for view registration.
Unit tests cover Register success/conflict and List success/error
paths against an httptest.Server fake.

ingestd's import flips from internal/ingestd → internal/catalogclient;
the wire format and behavior are unchanged. All four smokes (D1/D2/D3/
D4) PASS unchanged. DuckDB cgo path re-verified with the official
github.com/duckdb/duckdb-go/v2 (per ADR-001) on Go 1.25 + arrow-go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:58:34 -05:00
root
c1e411347a G0 D4: ingestd CSV → Parquet → catalogd register · 2 scrum fixes
Phase G0 Day 4 ships ingestd: multipart CSV upload, Arrow schema
inference per ADR-010 (default-to-string on ambiguity), single-pass
streaming CSV → Parquet via pqarrow batched writer (Snappy compressed,
8192 rows per batch), PUT to storaged at content-addressed key
datasets/<name>/<fp_hex>.parquet, register manifest with catalogd.
Acceptance smoke 6/6 PASS including idempotent re-ingest (proves
inference is deterministic — same CSV always produces same fingerprint)
and schema-drift → 409 (proves catalogd's gate fires on ingest traffic).

Schema fingerprint is SHA-256 over (name, type) tuples in header order
using ASCII record/unit separators (0x1e/0x1f) so column names with
commas can't collide. Nullability intentionally NOT in the fingerprint
— a column gaining nulls isn't a schema change.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                       4 WARN + 3 INFO (after 2 self-retracted BLOCKs)
  - Kimi K2-0905 (openrouter):                 1 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):                  2 BLOCK + 2 WARN + 2 INFO

Fixed (2, both Opus single-reviewer):
  C-DRIFT: PUT-then-register on fixed datasets/<name>/data.parquet
    meant a schema-drift ingest overwrote the live parquet BEFORE
    catalogd's 409 fired → storaged inconsistent with manifest.
    Fix: content-addressed key datasets/<name>/<fp_hex>.parquet.
    Drift writes to a different file (orphan in G2 GC scope); the
    live data is never corrupted.
  C-WCLOSE: pqarrow.NewFileWriter not Closed on error paths leaks
    buffered column data + OS resources per failed ingest.
    Fix: deferred guarded close with wClosed flag.

Dismissed (5, all false positives):
  Qwen BLOCK "csv.Reader needs LazyQuotes=true for multi-line" — false,
    Go csv handles RFC 4180 multi-line quoted fields by default
  Qwen BLOCK "row[i] OOB" — already bounds-checked at schema.go:73
    and csv.go:201
  Kimi BLOCK "type assertion panic if pqarrow reorders fields" —
    speculative, no real path
  Kimi WARN + Qwen WARN×2 "RecordBuilder leak on early error" —
    false convergent. Outer defer rb.Release() captures the current
    builder; in-loop release runs before reassignment. No leak.

Deferred (6 INFO + accepted-with-rationale on 3 WARN): sample
boundary type mismatch (G0 cap bounds peak), string-match
paranoia on http.MaxBytesError, multipart double-buffer (G2 spool-
to-disk), separator validation, body close ordering, etc.

The D4 scrum produced fewer real findings than D3 (2 vs 6) — both
were architectural hazards smoke wouldn't catch because the smoke's
"schema drift → 409" assertion was passing even in the corrupted-
state world. The 409 fires correctly; what was wrong was the PUT
having already mutated the live parquet before the validation check.
Opus's PUT-then-register read of the order is exactly the kind of
architectural insight the cross-lineage scrum is designed to surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:50:10 -05:00
root
66a704ca3e G0 D3: catalogd Parquet manifests + ADR-020 idempotent register · 6 scrum fixes
Phase G0 Day 3 ships catalogd: Arrow Parquet manifest codec, in-memory
registry with the ADR-020 idempotency contract (same name+fingerprint
reuses dataset_id; different fingerprint → 409 Conflict), HTTP client
to storaged for persistence, and rehydration on startup. Acceptance
smoke 6/6 PASSES end-to-end including rehydrate-across-restart — the
load-bearing test that the catalog/storaged service split actually
preserves state.

dataset_id derivation diverges from Rust: UUIDv5(namespace, name)
instead of v4 surrogate. Same name on any box generates the same
dataset_id; rehydrate after disk loss converges to the same identity
rather than silently re-issuing. Namespace pinned at
a8f3c1d2-4e5b-5a6c-9d8e-7f0a1b2c3d4e — every dataset_id ever issued
depends on these bytes.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                       1 BLOCK + 5 WARN + 3 INFO
  - Kimi K2-0905 (openrouter, validated D2):   2 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):                  2 BLOCK + 2 WARN + 2 INFO

Fixed:
  C1 list-offsets BLOCK (3-way convergent) → ValueOffsets(0) + bounds
  C2 Rehydrate mutex held across I/O → swap-under-brief-lock pattern
  S1 split-brain on persist failure → candidate-then-swap
  S2 brittle string-match for 400 vs 500 → ErrEmptyName/ErrEmptyFingerprint sentinels
  S3 Get/List shallow-copy aliasing → cloneManifest deep copy
  S4 keep-alive socket leak on error paths → drainAndClose helper

Dismissed (false positives, all single-reviewer):
  Kimi BLOCK "Decode crashes on empty Parquet" — already handled
  Kimi INFO "safeKey double-escapes" — wrong, splitting before escape is required
  Qwen INFO "rb.NewRecord() error unchecked" — API returns no error

Deferred to G1+: name validation regex, per-call deadlines, Snappy
compression, list pagination continuation tokens (storaged caps at
10k with sentinel for now).

Build clean, vet clean, all tests pass, smoke 6/6 PASS after every
fix round. arrow-go/v18 + google/uuid added; Go 1.24 → 1.25 forced
by arrow-go's minimum.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:36:57 -05:00
root
8cfcdb8e5f G0 D2: storaged S3 GET/PUT/LIST/DELETE · 3-lineage scrum · 4 fixes applied
Phase G0 Day 2 ships storaged: aws-sdk-go-v2 wrapper + chi routes
binding 127.0.0.1:3211 with 256 MiB MaxBytesReader, Content-Length
up-front 413, and a 4-slot non-blocking semaphore returning 503 +
Retry-After:5 when full. Acceptance smoke (6/6 probes) PASSES against
the dedicated MinIO bucket lakehouse-go-primary, isolated from the
Rust system's lakehouse bucket during coexistence.

Cross-lineage scrum on the shipped code:
  - Opus 4.7 (opencode): 1 BLOCK + 3 WARN + 3 INFO
  - Qwen3-coder (openrouter): 2 BLOCK + 1 WARN + 1 INFO (3 false positives)
  - Kimi K2-0905 (openrouter, after route-shopping past opencode's 4k
    cap and the direct adapter's empty-content reasoning bug):
    1 BLOCK + 2 WARN + 1 INFO

Fixed:
  C1 buildRegistry ctx cancel footgun → context.Background()
     (Opus + Kimi convergent; future credential refresh chains)
  C2 MaxBytesReader unwrap through manager.Uploader multipart
     goroutines → Content-Length up-front 413 + string-suffix fallback
     (Opus + Kimi convergent; latent 500-instead-of-413 in 5-256 MiB range)
  C3 Bucket.List unbounded accumulation → MaxListResults=10_000 cap
     (Opus + Kimi convergent; OOM guard)
  S1 PUT response Content-Type: application/json (Opus single-reviewer)

Strict validateKey policy (J approved): rejects empty, >1024B, NUL,
leading "/", ".." path components, CR/LF/tab control characters.
DELETE exposed at HTTP layer (J approved option A) for symmetry +
smoke ergonomics.

Build clean, vet clean, all unit tests pass, smoke 6/6 PASS after
every fix round. go.mod 1.23 → 1.24 (required by aws-sdk-go-v2).

Process finding worth recording: opencode caps non-streaming Kimi at
max_tokens=4096; the direct kimi.com adapter consumed 8192 tokens of
reasoning but surfaced empty content; openrouter/moonshotai/kimi-k2-0905
delivered structured output in ~33s. Future Kimi scrums should default
to that route.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:23:03 -05:00
Claw
1142f54f23 G0 D1 ships: skeleton + chi + /health × 5 binaries · acceptance gate PASSED
Phase G0 Day 1 executed end-to-end after a third-pass review by
qwen3-coder:480b consolidated all findings across Opus/Kimi/Qwen
lineages.

Cross-lineage review consolidation (3 model passes + 1 runtime pass):
- Opus 4.7: 9 findings · 7 fixed inline · 2 deferred
- Kimi K2.6: 2 BLOCKs (introduced by Opus fixes) · 2 fixed
- Qwen3-coder:480b: 2 WARNs · 1 fixed (D2.4 256 MiB cap + 4-slot
  semaphore on PUTs) · 1 deferred (Q2 view refresh batching)
- Runtime smoke: 1 finding (port 3100 collision with live Rust
  lakehouse) · fixed (Go dev ports shifted to 3110+)
- Total: 14 findings · 11 fixed · 3 deferred to G2

What landed in code:
- internal/shared/server.go — chi factory, slog JSON, /health,
  graceful shutdown via signal.NotifyContext
- internal/shared/config.go — TOML loader, DefaultConfig, -config flag
- cmd/{gateway,storaged,catalogd,ingestd,queryd}/main.go — five
  binaries, each ~30 lines using the shared factory
- lakehouse.toml — G0 dev defaults (3110-3214)
- scripts/d1_smoke.sh — repeatable smoke that exits 0 on PASS
- go.mod / go.sum — chi v5.2.5, pelletier/go-toml/v2 v2.3.0

Verified end-to-end via scripts/d1_smoke.sh:
- All 5 /health endpoints return 200 with correct service name
- Gateway /v1/ingest + /v1/sql stubs return 501 with X-Lakehouse-Stub
- Graceful shutdown logs cleanly on SIGTERM
- DuckDB cgo path verified separately (sql.Open("duckdb","") + ping)

D1 ACCEPTANCE GATE: PASSED.

Next: D2 — storaged S3 GET/PUT/LIST against MinIO.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 07:00:37 -05:00