golangLAKEHOUSE

profit/golangLAKEHOUSE

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	b216b7e5b6	fix the other 4: close all OPEN-list items in one wave Substantial wave addressing all 4 prior OPEN items. Three closed in full, one partially (the speculative half deliberately deferred). OPEN #1 — Periodic fresh→main index merge (FULL): - POST /v1/vectors/index/{src}/merge with {dest, clear_source} - Idempotent on re-runs (existing-in-dest items skipped) - internal/vectord/index.go: new Index.IDs() snapshot method + i.ids tracker field as canonical ID set, independent of meta map's nil-vs-{} sparseness (was a real bug — IDs() backed by meta alone missed items added with nil metadata) - 4 cmd-level integration tests (happy path drain+clear, dim mismatch, dest not found, self-merge rejection) + 1 unit test - DecodeIndex backward-compat: old envelopes restore i.ids from meta keys (best effort; new items going forward use the tracker) OPEN #2 — Distillation SFT export (SUBSTRATE): - internal/distillation/sft_export.go ports the load-bearing half: IsSftNever predicate + ListScoredRunFiles (data/scored-runs/YYYY/ MM/DD walk) + LoadScoredRunsFromFile + partial ExportSft. - Synthesis (instruction/input/response generation) deferred to a separate wave — too big for this session, but the substrate makes the next wave a port-not-design exercise. - TestSftNever_PinsExpectedSet locks the contamination firewall set: if a future commit adds/removes from SftNever, this test fails — forcing the change through review. - 5 new tests; firewall fires end-to-end through the partial port. OPEN #3 — Distribution drift via PSI (FULL): - internal/drift/drift.go: ComputeDistributionDrift via Population Stability Index. Standard finance/risk metric, well-defined verdict tiers (stable < 0.10, minor 0.10–0.25, major ≥ 0.25). - Equal-width bucketing over combined min/max so neither dist falls outside; epsilon-clamping for empty buckets so log doesn't blow up. Per-bucket breakdown for drilldown. - Pairs with the existing ComputeScorerDrift: scorer drift is categorical, distribution drift is continuous. Different shapes, same package. - 7 new tests covering identical-is-stable, hard-shift-is-major, moderate-detected-not-stable, empty-inputs-safe, all-identical- safe, bucket-counts-conserved, num-buckets-clamping. OPEN #4 — Ops nice-to-haves (PARTIAL — wall-clock done, others deferred): - (a) Real-time wall-clock for stress harness: per-phase elapsed time logged to stdout as it runs (`[stress] phase NAME starting (T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`). Output.PhaseTimings + Output.TotalElapsedMs in JSON. - (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase calibration: not actioned — no fired trigger, would be speculative. Documented as deferred-until-need rather than ignored. Per the project's discipline ("don't add features beyond what the task requires"). OPEN list now empty / steady-state. Future items will land as production triggers fire. Build + vet + tests green; 18 new tests across the 4 closures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 23:42:11 -05:00
root	57d0df125d	E (partial): distillation port — scorer + contamination firewall First slice of the Rust v1.0.0 distillation substrate (e7636f2) ported to Go per ADR-001 #4 (port LOGIC, not bit-identical reproducibility). This commit lands the LOAD-BEARING pieces named in project_distillation_substrate.md memory: - The deterministic Success Scorer (8 sub-scorers + dispatch) - The contamination firewall on SFT samples (the "non-negotiable" spec property: rejected/needs_human_review NEVER ship to SFT) - All on-wire types + validators for ScoredRun, SftSample, EvidenceRecord with Provenance Files: internal/distillation/types.go — types + ScorerVersion + SftNever + ValidateScoredRun + ValidateSftSample internal/distillation/scorer.go — ScoreRecord + 8 class scorers + BuildScoredRun (deterministic) internal/distillation/scorer_test.go — ~40 test cases: - source-class dispatch (verdict / telemetry / extraction) - scrum_review (4 attempt cases) - observer_review (5 verdict cases) - audit (legacy + severity, 9 cases) - auto_apply (4 cases) - outcomes / mode_experiment / extraction - CONTAMINATION FIREWALL: ErrSftContamination sentinel fires on rejected/needs_human_review, distinct from typo errors - empty-pair guard (instruction/response trim != "") - reasons-required ScoredRun validation - deterministic sig_hash on identical input - purity check (input not mutated, repeatable output) Per the 2026-04-29 cross-lineage scrum's discipline: false-positive findings would be dismissed inline (none in this commit). Real findings would be addressed before merge — but this is greenfield port code reviewed against its Rust source line-by-line, which the test suite encodes as truth tables. Explicitly DEFERRED to follow-up commits: - Materialization layer (jsonl read/write, date-partitioned storage in data/scored-runs/YYYY/MM/DD/, evidence index) - SFT exporter (file iteration + filtering — the SCORING firewall is here; the EXPORT firewall is the next layer) - export_preference, export_rag (other export shapes) - Acceptance harness (16/16 acceptance gate that locks v1.0.0) - replay, receipts, build_evidence_index, transforms The scorer + firewall validator are pure functions — operational tooling layers on top without changing the deterministic logic the downstream learning loop depends on. The Go ScorerVersion stays at v1.0.0 to match the Rust e7636f2 baseline; bumping in the Go materialization commit is reserved for the next scoring-rule change, NOT the port itself. 15-smoke regression all green. vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:04:29 -05:00

Author

SHA1

Message

Date

root

b216b7e5b6

fix the other 4: close all OPEN-list items in one wave

Substantial wave addressing all 4 prior OPEN items. Three closed in
full, one partially (the speculative half deliberately deferred).

OPEN #1 — Periodic fresh→main index merge (FULL):
- POST /v1/vectors/index/{src}/merge with {dest, clear_source}
- Idempotent on re-runs (existing-in-dest items skipped)
- internal/vectord/index.go: new Index.IDs() snapshot method +
  i.ids tracker field as canonical ID set, independent of meta
  map's nil-vs-{} sparseness (was a real bug — IDs() backed by meta
  alone missed items added with nil metadata)
- 4 cmd-level integration tests (happy path drain+clear, dim
  mismatch, dest not found, self-merge rejection) + 1 unit test
- DecodeIndex backward-compat: old envelopes restore i.ids from
  meta keys (best effort; new items going forward use the tracker)

OPEN #2 — Distillation SFT export (SUBSTRATE):
- internal/distillation/sft_export.go ports the load-bearing half:
  IsSftNever predicate + ListScoredRunFiles (data/scored-runs/YYYY/
  MM/DD walk) + LoadScoredRunsFromFile + partial ExportSft.
- Synthesis (instruction/input/response generation) deferred to a
  separate wave — too big for this session, but the substrate
  makes the next wave a port-not-design exercise.
- TestSftNever_PinsExpectedSet locks the contamination firewall
  set: if a future commit adds/removes from SftNever, this test
  fails — forcing the change through review.
- 5 new tests; firewall fires end-to-end through the partial port.

OPEN #3 — Distribution drift via PSI (FULL):
- internal/drift/drift.go: ComputeDistributionDrift via Population
  Stability Index. Standard finance/risk metric, well-defined
  verdict tiers (stable < 0.10, minor 0.10–0.25, major ≥ 0.25).
- Equal-width bucketing over combined min/max so neither dist
  falls outside; epsilon-clamping for empty buckets so log doesn't
  blow up. Per-bucket breakdown for drilldown.
- Pairs with the existing ComputeScorerDrift: scorer drift is
  categorical, distribution drift is continuous. Different shapes,
  same package.
- 7 new tests covering identical-is-stable, hard-shift-is-major,
  moderate-detected-not-stable, empty-inputs-safe, all-identical-
  safe, bucket-counts-conserved, num-buckets-clamping.

OPEN #4 — Ops nice-to-haves (PARTIAL — wall-clock done, others
deferred):
- (a) Real-time wall-clock for stress harness: per-phase elapsed
  time logged to stdout as it runs (`[stress] phase NAME starting
  (T+12.3s)` + `[stress] phase NAME done — 8.5s (T+20.8s)`).
  Output.PhaseTimings + Output.TotalElapsedMs in JSON.
- (b) chatd fixture-mode S3 mock + (c) liberal-paraphrase
  calibration: not actioned — no fired trigger, would be
  speculative. Documented as deferred-until-need rather than
  ignored. Per the project's discipline ("don't add features
  beyond what the task requires").

OPEN list now empty / steady-state. Future items will land as
production triggers fire.

Build + vet + tests green; 18 new tests across the 4 closures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 23:42:11 -05:00

root

57d0df125d

E (partial): distillation port — scorer + contamination firewall

First slice of the Rust v1.0.0 distillation substrate (e7636f2)
ported to Go per ADR-001 #4 (port LOGIC, not bit-identical
reproducibility). This commit lands the LOAD-BEARING pieces named
in project_distillation_substrate.md memory:

  - The deterministic Success Scorer (8 sub-scorers + dispatch)
  - The contamination firewall on SFT samples (the "non-negotiable"
    spec property: rejected/needs_human_review NEVER ship to SFT)
  - All on-wire types + validators for ScoredRun, SftSample,
    EvidenceRecord with Provenance

Files:
  internal/distillation/types.go  — types + ScorerVersion + SftNever
                                    + ValidateScoredRun + ValidateSftSample
  internal/distillation/scorer.go — ScoreRecord + 8 class scorers +
                                    BuildScoredRun (deterministic)
  internal/distillation/scorer_test.go — ~40 test cases:
    - source-class dispatch (verdict / telemetry / extraction)
    - scrum_review (4 attempt cases)
    - observer_review (5 verdict cases)
    - audit (legacy + severity, 9 cases)
    - auto_apply (4 cases)
    - outcomes / mode_experiment / extraction
    - CONTAMINATION FIREWALL: ErrSftContamination sentinel fires
      on rejected/needs_human_review, distinct from typo errors
    - empty-pair guard (instruction/response trim != "")
    - reasons-required ScoredRun validation
    - deterministic sig_hash on identical input
    - purity check (input not mutated, repeatable output)

Per the 2026-04-29 cross-lineage scrum's discipline: false-positive
findings would be dismissed inline (none in this commit). Real
findings would be addressed before merge — but this is greenfield
port code reviewed against its Rust source line-by-line, which the
test suite encodes as truth tables.

Explicitly DEFERRED to follow-up commits:
  - Materialization layer (jsonl read/write, date-partitioned
    storage in data/scored-runs/YYYY/MM/DD/, evidence index)
  - SFT exporter (file iteration + filtering — the SCORING firewall
    is here; the EXPORT firewall is the next layer)
  - export_preference, export_rag (other export shapes)
  - Acceptance harness (16/16 acceptance gate that locks v1.0.0)
  - replay, receipts, build_evidence_index, transforms

The scorer + firewall validator are pure functions — operational
tooling layers on top without changing the deterministic logic the
downstream learning loop depends on. The Go ScorerVersion stays at
v1.0.0 to match the Rust e7636f2 baseline; bumping in the Go
materialization commit is reserved for the next scoring-rule
change, NOT the port itself.

15-smoke regression all green. vet clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-29 20:04:29 -05:00

2 Commits