golangLAKEHOUSE/docs/CLAUDE_REFACTOR_GUARDRAILS.md
root a81291e38c proof harness Phase A: scaffolding + canary case green
Per docs/TEST_PROOF_SCOPE.md, building the claims-verification tier
above the smoke chain. This commit lays the scaffolding and proves
the orchestrator end-to-end with one canary case (00_health).

What landed:

  tests/proof/
    README.md             how to read a report, layout, modes
    claims.yaml           24 claims enumerated (GOLAKE-001..100)
    run_proof.sh          orchestrator with --mode {contract|integration|performance}
                          and --no-bootstrap / --regenerate-{rankings,baseline}
    lib/
      env.sh              service URLs, report dir, mode, git context
      http.sh             curl wrappers writing per-probe JSON + body + headers
      assert.sh           proof_assert_{eq,ne,contains,lt,gt,status,json_eq} +
                          proof_skip — each emits one JSONL record per call
      metrics.sh          start/stop timers, value capture, RSS sampling,
                          percentile compute (for Phase D)
    cases/
      00_health.sh        canary — gateway + 6 services /health → 200,
                          body identifies service, latency < 500ms (21 assertions)
    fixtures/
      csv/workers.csv     spec's 5-row deterministic CSV
      text/docs.txt       4 deterministic vector docs
      expected/queries.json  expected results for the 5 SQL assertions

Wired into the task runner:

  just proof contract       # canary only this commit
  just proof integration    # Phase C
  just proof performance    # Phase D

.gitignore: /tests/proof/reports/* with !.gitkeep — same pattern as
reports/scrum/_evidence/. Per-run output is a runtime artifact.

Specs landed alongside (J's drops):
  docs/TEST_PROOF_SCOPE.md           the harness contract this implements
  docs/CLAUDE_REFACTOR_GUARDRAILS.md process discipline this harness obeys

Verified end-to-end (cached binaries):
  just proof contract        wall < 2s, 21 pass / 0 fail / 0 skip
  just verify                wall 31s, vet + test + 9 smokes still green

Two bugs fixed during canary run, both in run_proof.sh aggregation:
- grep -c exits 1 on zero matches; the `|| echo 0` form concatenated
  "0\n0" and broke jq --argjson + integer comparison. Fixed via a
  _count helper that captures count-or-zero cleanly.
- per-case table iterated case scripts (filename-based) but cases
  write evidence under CASE_ID. Switched to JSONL-file iteration so
  multi-case scripts work and the mapping is faithful.

Phase B (contract cases) lands next: 05_embedding, 06_vector_add,
08_gateway_contracts, 09_failure_modes. Each sourcing the same lib
helpers and writing to the same report shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:08:51 -05:00

6.0 KiB
Raw Blame History

Claude Refactor Guardrails — Go Lakehouse

Mission

Continue the Go refactor without recreating the Rust-era complexity.

The Go rewrite exists to make the lakehouse operationally legible:

  • small binaries
  • clear service boundaries
  • gateway-fronted APIs
  • smoke-testable behavior
  • fast build/run/debug loop
  • no accidental framework bureaucracy

The Rust repo had maturity, but also accumulated control-plane weight: validator, auditor, provider routing, iteration loops, UI, truth layers, and agent-era scaffolding. Do not blindly port all of that back.

Prime Directive

Preserve the Go spine:

gateway
  → storaged
  → catalogd
  → ingestd
  → queryd
  → vectord
  → embedd
Only add complexity when there is measured evidence, a failing smoke, or a documented feature-parity requirement.

Refactor Rules
1. No silent behavior changes

Before changing any service behavior, identify:

current route
current request schema
current response schema
current status-code behavior
current smoke test covering it

If no smoke exists, add one before or with the refactor.

2. Keep service boundaries hard

Do not let services reach into each others internals.

Allowed:

HTTP client calls
shared request/response structs when stable
small internal packages for config/secrets/logging

Avoid:

importing another services implementation package
hidden global state
“just for now” shared mutable registries
circular service knowledge
3. Go is not Rust

Do not imitate Rust patterns mechanically.

Prefer Go-native clarity:

simple structs
explicit errors
small interfaces at the consumer side
context-aware HTTP handlers
table-driven tests
boring package names
no abstraction tax unless repeated 3+ times
4. Validation replaces the borrow checker

Rust caught many problems at compile time. Go will not.

Therefore every refactor must preserve or improve:

input validation
dimension checks
duplicate handling
restart persistence checks
schema drift detection
error status mapping
smoke coverage
5. Performance work must be measured

Do not optimize by vibes.

For each performance change, record:

baseline command
baseline result
changed code path
new result
regression risk
rollback plan

Current known bottleneck:

vectord Add is RWMutex-serialized.
500K vectors: ~35m36s, ~234/sec avg.
GPU around 65%, so embedding is not the only bottleneck.

Do not claim concurrency improvements unless the HNSW library thread-safety is audited or writes are safely batched/sharded.

File/Package Expectations
cmd/

One binary per service. Keep main files thin.

Main should only:

load config
construct dependencies
wire routes
start server
handle shutdown
internal/

Shared code belongs here only when it is genuinely shared.

Good internal packages:

config
secrets
storeclient
catalogclient
gateway routing helpers
logging
request/response contracts

Bad internal packages:

vague “utils”
giant “common”
cross-service god objects
hidden dependency containers
scripts/

Every major behavior needs a runnable smoke.

Smokes are not decoration. They are the replacement nervous system.

Existing smoke pattern must remain:

d1 skeleton/health/gateway
d2 storaged
d3 catalogd
d4 ingestd
d5 queryd
d6 full ingest/query
g1 vectord
g1p vectord persistence
g2 embed → vector add → search

New functionality needs a new smoke or an extension to the closest existing one.

Refactor Checklist

Before editing:

 Read README.md
 Read docs/PRD.md
 Read docs/SPEC.md
 Read docs/DECISIONS.md
 Read docs/PHASE_G0_KICKOFF.md
 Identify affected services
 Identify affected smokes

During editing:

 Keep public API stable unless explicitly changing it
 Keep errors explicit
 Keep logs useful but not noisy
 Avoid package sprawl
 Avoid premature generic interfaces
 Preserve restart behavior
 Preserve gateway-only acceptance path

After editing:

 Run go test ./...
 Run relevant smoke script
 Run full smoke loop when service contracts changed
 Record evidence in a short refactor note

Full smoke loop:

for s in scripts/{d1,d2,d3,d4,d5,d6,g1,g1p,g2}_smoke.sh; do
  "$s" || break
done
Refactor Note Format

Create or update:

docs/refactor-notes/YYYYMMDD-<short-name>.md

Use this structure:

# Refactor Note: <name>

## Goal

## Files changed

## Behavior changed

## Behavior preserved

## Tests run

## Smoke results

## Performance before/after

## Risks

## Rollback
Anti-Patterns To Reject
Reject these unless specifically requested:

porting Rust modules 1:1
adding orchestration before service parity
adding AI/agent logic inside core services
making gateway business-aware
hiding failures behind retries
swallowing errors
“temporary” global maps
changing route contracts without smoke updates
adding dependencies for trivial code
optimizing vector ingestion without measurement
rebuilding the Rust bureaucracy in Go clothing
Preferred Next Targets

Prioritize in this order:

Stability of existing Go service contracts
Better smoke coverage
Persistence limits and large object handling
vectord ingestion bottleneck analysis
gateway observability
feature parity with Rust only where needed
UI/agent/auditor layers later, not now
Architectural Position

The Go rewrite should remain the production spine.

The Rust system remains historical reference and possible source for:

validation ideas
audit semantics
provider-routing lessons
prior acceptance criteria
edge cases

But Rust is not the shape to copy.

Go owns the clean operational path.
Rust owns historical scar tissue and high-performance lessons.

Do not confuse the two.


Also add a shorter command prompt when you hand this to Claude Code:

```md
Read `docs/CLAUDE_REFACTOR_GUARDRAILS.md` first.

Then inspect the current Go lakehouse repo and produce a refactor plan only. Do not edit code yet.

Your plan must identify:
1. affected services
2. affected routes
3. affected request/response contracts
4. affected smoke scripts
5. risks of accidentally reintroducing Rust-era complexity
6. exact tests/smokes you will run after changes

Do not port Rust structure blindly. Preserve the Go service spine.