docs: 2026-04-28 upstream survey — three SPEC-changing pivots
Pre-Phase-G0 research sweep against current Go ecosystem state. Three upstream changes that the day-of SPEC missed: 1. DuckDB Go binding ownership transferred. marcboeker/go-duckdb is deprecated as of v2.5.0 — official maintainer is now github.com/duckdb/duckdb-go/v2 (DuckDB team + Marc Boeker joint hand-off). Current v2.10502.0 / DuckDB v1.5.2. SPEC §3.1 + component table updated. 2. Official Go MCP SDK exists. Switching from mark3labs/mcp-go (community) to github.com/modelcontextprotocol/go-sdk (official, Google collaboration, v1.5.0 stable, 4.4k stars, targets MCP spec 2025-11-25). Component table updated. 3. arrow-go is on v18, not v15. v18.5.2 (March 2026) has parquet encryption fixes relevant for PII-masked safe views. PRD locked stack + SPEC component table updated. Validated unchanged: coder/hnsw (220 stars, active), chi (still the clean-architecture pick over fiber/gin/echo). Surfaced for future use: anthropics/anthropic-sdk-go (official, available for direct Claude calls bypassing opencode if ever needed), duckdb-wasm (browser-side analytics future option), IVF as HNSW fallback if recall gate fails. See docs/RESEARCH_LOG_2026-04-28.md for full survey + sources. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
f07668064e
commit
29468b1413
@ -39,7 +39,7 @@ is an explicit re-platforming, not a refactor.
|
||||
| HTTP gateway | Axum + Tokio | `net/http` + `chi` (or `gin`) | High — Go's bread and butter |
|
||||
| gRPC | tonic | `google.golang.org/grpc` | High — Go is the reference impl |
|
||||
| Object store | Apache Arrow `object_store` | `aws-sdk-go-v2/service/s3` + thin wrapper | High |
|
||||
| Parquet I/O | parquet-rs (arrow-rs) | `apache/arrow-go/v15/parquet` | Medium — arrow-go lags arrow-rs but covers our needs |
|
||||
| Parquet I/O | parquet-rs (arrow-rs) | `apache/arrow-go/v18/parquet` | Medium — arrow-go lags arrow-rs but v18 covers our needs |
|
||||
| Query engine | DataFusion | **Hard problem** (see §Hard problems) | Low — no like-for-like Go equivalent |
|
||||
| Vector index (HNSW) | `hora` / hand-rolled | `coder/hnsw` or `Bithack/go-hnsw` (in-process) | High — HNSW is a self-contained algorithm |
|
||||
| Vector backend (Lance) | `lance` (Rust) | **Hard problem** — likely dropped, Parquet-only | Medium |
|
||||
@ -76,7 +76,7 @@ binaries built from one workspace.
|
||||
| gRPC | `google.golang.org/grpc` | Reference implementation |
|
||||
| Protobuf | `protoc-gen-go` + `buf` | Standard tooling |
|
||||
| Object store | `aws-sdk-go-v2` | Mature, covers S3 + MinIO + RustFS |
|
||||
| Parquet | `apache/arrow-go/v15` | Columnar I/O + Arrow interop |
|
||||
| Parquet | `apache/arrow-go/v18` | Columnar I/O + Arrow interop (v18.5.2 — March 2026) |
|
||||
| SQL engine | **Open** — see §Hard problems §1 | Biggest open decision |
|
||||
| Vector index | `coder/hnsw` | Pure-Go HNSW, in-process, no external service |
|
||||
| TOML config | `pelletier/go-toml/v2` | Standard |
|
||||
|
||||
133
docs/RESEARCH_LOG_2026-04-28.md
Normal file
133
docs/RESEARCH_LOG_2026-04-28.md
Normal file
@ -0,0 +1,133 @@
|
||||
# Research Log — 2026-04-28
|
||||
|
||||
Survey of upstream Go ecosystem state before Phase G0 begins. Goal:
|
||||
verify the SPEC's library choices reflect the *currently maintained*
|
||||
upstream and surface anything that has shifted since the SPEC was
|
||||
drafted earlier today.
|
||||
|
||||
## Sources used
|
||||
|
||||
- Context7 — `resolve-library-id` for `go-duckdb` and `arrow-go`
|
||||
- WebSearch — six targeted queries on Go HNSW, MCP SDK, DuckDB
|
||||
bindings, Anthropic SDK, HTTP frameworks, parquet maturity
|
||||
- WebFetch — direct GitHub READMEs for `coder/hnsw`,
|
||||
`modelcontextprotocol/go-sdk`, `duckdb/duckdb-go`
|
||||
|
||||
## Findings — three SPEC-changing pivots
|
||||
|
||||
### 1. DuckDB Go binding ownership transferred
|
||||
|
||||
The original SPEC named `marcboeker/go-duckdb` as the query-engine
|
||||
library. The package has been formally **deprecated** in favor of the
|
||||
official maintainer transfer.
|
||||
|
||||
| | Before (SPEC v1) | After |
|
||||
|---|---|---|
|
||||
| Repo | `github.com/marcboeker/go-duckdb` | `github.com/duckdb/duckdb-go/v2` |
|
||||
| Maintainer | Marc Boeker (community) | DuckDB team + Marc Boeker (joint, official) |
|
||||
| Latest | (varies) | v2.10502.0 (April 2026) |
|
||||
| DuckDB engine | varies | v1.5.2 |
|
||||
| Migration | n/a | `gofmt -w -r '"github.com/marcboeker/go-duckdb/v2" -> "github.com/duckdb/duckdb-go/v2"' .` |
|
||||
|
||||
**Why this matters:** the upstream migration happened at v2.5.0; using
|
||||
the old import means we'd be on a deprecated branch from day one.
|
||||
SPEC §3.1 now names the official path.
|
||||
|
||||
### 2. Official Go MCP SDK exists (Google collaboration)
|
||||
|
||||
The original SPEC named `mark3labs/mcp-go` (community implementation).
|
||||
An official SDK now exists.
|
||||
|
||||
| | Before (SPEC v1) | After |
|
||||
|---|---|---|
|
||||
| Repo | `mark3labs/mcp-go` | `github.com/modelcontextprotocol/go-sdk` |
|
||||
| Maintainer | mark3labs (community) | MCP org + Google (official) |
|
||||
| Stars | (smaller) | 4.4k |
|
||||
| Version | n/a | v1.5.0 stable |
|
||||
| Spec compat | various | targets MCP 2025-11-25, backward-compat to 2024-11-05 |
|
||||
| OSSF Scorecard | n/a | yes |
|
||||
|
||||
**Why this matters:** the MCP server (`cmd/mcp`) is one of the most
|
||||
visible Go binaries we'll ship — it's what AI agents talk to. Using
|
||||
the official SDK aligns with the MCP spec's evolution and gets us the
|
||||
Google-tested code path.
|
||||
|
||||
### 3. arrow-go is on v18, not v15
|
||||
|
||||
The original SPEC referenced `arrow-go/v15`. Apache Arrow Go has
|
||||
shipped through **v18.5.2** (March 2026).
|
||||
|
||||
| | Before (SPEC v1) | After |
|
||||
|---|---|---|
|
||||
| Module path | `apache/arrow-go/v15` | `apache/arrow-go/v18` |
|
||||
| Latest | n/a | v18.5.2 |
|
||||
| Recent fixes | n/a | parquet decryption, large string handling, complex type read |
|
||||
|
||||
**Why this matters:** v18 has parquet encryption fixes that are
|
||||
relevant for our PII-masked safe views (per Rust ADR-017's federation
|
||||
and the production cutover work). Skipping three major versions is
|
||||
unnecessary risk.
|
||||
|
||||
## Findings — validations (SPEC unchanged)
|
||||
|
||||
### `coder/hnsw` — keep
|
||||
|
||||
220 stars, 45 commits, active CI, recent PRs/issues. Documented
|
||||
import speed of 796.85 MB/s. No deprecation signals. In-memory
|
||||
alternative to Pinecone/Weaviate, fits the no-external-vector-DB
|
||||
constraint.
|
||||
|
||||
### `chi` for HTTP routing — keep
|
||||
|
||||
Confirmed as the "clean architecture, stdlib `net/http`, zero deps"
|
||||
pick. Fiber is faster (36k req/s vs ~34k for chi/gin/echo) but uses
|
||||
fasthttp, which is off the standard library path — wrong fit for our
|
||||
"boring is good" Go ethos. Fiber stays as a documented alternative if
|
||||
hot-path performance ever proves chi insufficient.
|
||||
|
||||
### `marcboeker/go-duckdb` → `duckdb/duckdb-go/v2`
|
||||
|
||||
Already covered above as pivot #1.
|
||||
|
||||
## New things worth noting (not SPEC-changing yet)
|
||||
|
||||
### Anthropic Go SDK is official
|
||||
|
||||
`github.com/anthropics/anthropic-sdk-go` is the official Anthropic Go
|
||||
client library. We currently route Claude calls through OpenCode/Zen,
|
||||
so this isn't on the Phase G0 critical path. **Worth knowing for**:
|
||||
direct Claude API calls in `aibridge` if we ever want to bypass
|
||||
opencode (e.g., for the overseer correction loop in dev mode without
|
||||
a Zen subscription).
|
||||
|
||||
Added as a noted option in the SPEC `aibridge` row.
|
||||
|
||||
### DuckDB-Wasm exists
|
||||
|
||||
`github.com/duckdb/duckdb-wasm` brings DuckDB to the browser via
|
||||
WebAssembly with Arrow / Parquet / CSV / JSON support. **Not in scope**
|
||||
for Phase G0 but a future option if the UI ever needs client-side
|
||||
analytical queries against fetched parquet (offline analytics over a
|
||||
permit cache, etc.).
|
||||
|
||||
### IVF as an HNSW alternative
|
||||
|
||||
If during Phase G3 the HNSW recall validation gate (G3.2.C) shows
|
||||
problems, IVF (Inverted File Index) is the next-best alternative:
|
||||
faster index builds, lower memory, better filtered-search performance
|
||||
than HNSW. No first-class Go IVF library was found — would need to
|
||||
wrap FAISS via cgo or hand-roll. Documented as a fallback only.
|
||||
|
||||
## What I checked but found nothing actionable
|
||||
|
||||
- HTMX-specific Go libraries (`htmx-go`, `gotempl`, etc.) — none has
|
||||
emerged as a clear standard. Sticking with `html/template` + raw
|
||||
HTMX as the SPEC plans.
|
||||
- Langfuse Go client — OSS support varies. SPEC's "hand-roll if
|
||||
needed" stays.
|
||||
|
||||
## Outcome
|
||||
|
||||
SPEC §1 + §2 + §3.1 updated to reflect the three pivots. PRD's locked
|
||||
stack table updated to match. No phase changes, no acceptance gate
|
||||
changes, no new hard problems.
|
||||
22
docs/SPEC.md
22
docs/SPEC.md
@ -23,15 +23,15 @@ Effort scale (one engineer-week = ~40h focused work):
|
||||
| Crate | Rust deps that mattered | Go target | Library | Effort | Risk |
|
||||
|---|---|---|---|---|---|
|
||||
| `gateway` | axum, tokio, tonic, tower | `cmd/gateway` | `chi` + stdlib `net/http` + `google.golang.org/grpc` | **L** | low — Go's strongest domain |
|
||||
| `catalogd` | parquet-rs, arrow, sqlite | `cmd/catalogd` | `arrow-go/v15`, `mattn/go-sqlite3` | **L** | low |
|
||||
| `catalogd` | parquet-rs, arrow, sqlite | `cmd/catalogd` | `apache/arrow-go/v18`, `mattn/go-sqlite3` | **L** | low |
|
||||
| `storaged` | object_store, aws-sdk | `cmd/storaged` | `aws-sdk-go-v2`, `minio-go` for MinIO-specific paths | **M** | low |
|
||||
| `queryd` | datafusion, arrow | `cmd/queryd` | `marcboeker/go-duckdb` (cgo) | **HARD** | high — see §3 |
|
||||
| `queryd` | datafusion, arrow | `cmd/queryd` | **`duckdb/duckdb-go/v2`** (cgo, official) | **HARD** | high — see §3 |
|
||||
| `ingestd` | csv, json, lopdf, postgres | `cmd/ingestd` | stdlib `encoding/csv`, `encoding/json`, `pdfcpu/pdfcpu`, `jackc/pgx/v5` | **L** | low |
|
||||
| `vectord` | hora, arrow, hnsw | `cmd/vectord` | `coder/hnsw`, `arrow-go/v15` | **L** | medium — re-validate HNSW recall |
|
||||
| `vectord` | hora, arrow, hnsw | `cmd/vectord` | `coder/hnsw`, `apache/arrow-go/v18` | **L** | medium — re-validate HNSW recall |
|
||||
| `vectord-lance` | lance | **DROPPED** | n/a | n/a | n/a — Parquet+HNSW only |
|
||||
| `journald` | parquet, arrow | `cmd/journald` | `arrow-go/v15` | **M** | low |
|
||||
| `aibridge` | reqwest | library | `net/http` + connection pool | **S** | low |
|
||||
| `validator` | parquet, custom | library | `arrow-go/v15` parquet reader | **M** | low — port the 24 unit tests as gates |
|
||||
| `journald` | parquet, arrow | `cmd/journald` | `apache/arrow-go/v18` | **M** | low |
|
||||
| `aibridge` | reqwest | library | `net/http` + connection pool · `anthropics/anthropic-sdk-go` available for direct Claude calls (currently routed via opencode) | **S** | low |
|
||||
| `validator` | parquet, custom | library | `apache/arrow-go/v18` parquet reader | **M** | low — port the 24 unit tests as gates |
|
||||
| `truth` | tomli, custom DSL | library | `pelletier/go-toml/v2` | **M** | low |
|
||||
| `proto` | tonic-build | `proto/` + `protoc-gen-go` | `buf` + `protoc-gen-go-grpc` | **S** | low |
|
||||
| `shared` | serde, anyhow | library | stdlib `encoding/json`, `errors` | **S** | low |
|
||||
@ -47,7 +47,7 @@ one engineer; 6–8 weeks for two).
|
||||
|
||||
| TS surface | Current location | Go target | Library | Effort | Risk |
|
||||
|---|---|---|---|---|---|
|
||||
| `mcp-server/index.ts` | Bun, :3700 | `cmd/mcp` | `mark3labs/mcp-go` (Go MCP SDK) | **L** | medium — MCP semantics |
|
||||
| `mcp-server/index.ts` | Bun, :3700 | `cmd/mcp` | **`modelcontextprotocol/go-sdk`** (official Go SDK, v1.5.0, Google-collab) | **L** | medium — MCP semantics |
|
||||
| `mcp-server/observer.ts` | Bun, :3800 | `cmd/observer` | stdlib `net/http`, `slog` | **M** | low |
|
||||
| `mcp-server/tracing.ts` | Bun, Langfuse client | library | `go.opentelemetry.io/otel` + Langfuse Go client (or hand-roll) | **M** | low — Langfuse Go OSS support varies |
|
||||
| `auditor/*.ts` | TS, runs as systemd | `cmd/auditor` | stdlib + `gitea API client` | **L** | medium — auditor cross-lineage logic is intricate |
|
||||
@ -64,7 +64,13 @@ one engineer; 6–8 weeks for two).
|
||||
|
||||
### §3.1 — Query engine (DuckDB via cgo)
|
||||
|
||||
**Library:** `marcboeker/go-duckdb` — Go bindings via cgo.
|
||||
**Library:** `github.com/duckdb/duckdb-go/v2` — official Go bindings via
|
||||
cgo. (Replaces the legacy `marcboeker/go-duckdb`, which was deprecated
|
||||
when the DuckDB team and Marc Boeker jointly relocated maintenance to
|
||||
the DuckDB org at v2.5.0. Migration is a one-line `gofmt -r` rewrite of
|
||||
import paths.) Current version v2.10502.0 (April 2026), DuckDB v1.5.2
|
||||
compat. Statically links default extensions: ICU, JSON, Parquet,
|
||||
Autocomplete.
|
||||
|
||||
**API shape** (replaces the DataFusion `SessionContext` pattern):
|
||||
```go
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user