Bridges the missing piece for the staffing co-pilot: text inputs to
vectord-shaped vectors. Standalone cmd/embedd on :3216 fronted by
gateway at /v1/embed. Pluggable embed.Provider interface (G2 ships
Ollama; OpenAI/Voyage swap in via the same interface in G3+).
Wire format:
POST /v1/embed {"texts":[...], "model":"..."} // model optional
→ 200 {"model","dimension","vectors":[[...]]}
Default model: nomic-embed-text (768-d). Ollama returns float64;
provider converts to float32 at the boundary so vectors flow through
vectord/HNSW without re-conversion.
Acceptance smoke 5/5 PASS — including the architectural payoff:
end-to-end embed → vectord add → search by re-embedded text returns
recall=1 at distance 5.96e-8 (float32 precision noise on identical
unit vectors). The staffing co-pilot pipeline (text → vector →
similarity search) is now functional end-to-end.
All 9 smokes (D1-D6 + G1 + G1P + G2) PASS deterministically.
Cross-lineage scrum on shipped code:
- Opus 4.7 (opencode): 0 BLOCK + 4 WARN + 3 INFO
- Kimi K2-0905 (openrouter): 0 BLOCK + 2 WARN + 1 INFO
- Qwen3-coder (openrouter): "No BLOCKs" (3 tokens)
Fixed (2 — 1 convergent + 1 single-reviewer):
C1 (Opus + Kimi convergent WARN): per-text 60s timeout × N-text
batch was up to N×60s with no batch-level cap. One stuck Ollama
call would stall the whole handler indefinitely. Fix:
context.WithTimeout(r.Context(), 60s) wraps the entire batch.
O-W3 (Opus WARN): empty strings in texts went to Ollama unchecked,
producing version-dependent garbage. Fix: reject "" with 400 at
the handler boundary so callers get a deterministic answer
instead of an upstream-conditional 502.
Deferred (4): drainAndClose 64KiB cap (matches G0 pattern), no
concurrency limit on /embed (single-tenant G2), missing Accept
header (exotic-proxy concern), MaxBytesError string-match
redundancy (paranoia layer kept consistent across codebase).
Zero false positives this round — Qwen returned 3 tokens "No BLOCKs"
and the other two reviewers' findings were all real.
Setup confirmed: Ollama 0.21.0 on :11434 with nomic-embed-text loaded.
Per-text /api/embeddings used (forward-compat with 0.21+); newer
0.4+ /api/embed batch endpoint can swap in via the Provider interface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
44 lines
1.8 KiB
Go
44 lines
1.8 KiB
Go
// Package embed turns text into vectors. G2 ships an Ollama-backed
|
|
// Provider (local model on :11434, no network call to a hosted
|
|
// service); the Provider interface keeps the door open for OpenAI /
|
|
// Voyage / etc. without per-call switching at the HTTP boundary.
|
|
//
|
|
// Vectors are float32 throughout the system (matches vectord +
|
|
// coder/hnsw types). Ollama returns float64; we convert at this
|
|
// package's boundary so callers don't have to think about it.
|
|
package embed
|
|
|
|
import (
|
|
"context"
|
|
"errors"
|
|
)
|
|
|
|
// Provider is the embed boundary. Embed takes a list of input
|
|
// texts and returns the same number of vectors, in input order,
|
|
// using the named model. Empty model = use the Provider's default.
|
|
type Provider interface {
|
|
Embed(ctx context.Context, texts []string, model string) (Result, error)
|
|
}
|
|
|
|
// Result is the per-call response shape. Model echoes which model
|
|
// was actually used (after default-resolution); Dimension is the
|
|
// vector dimension reported by the model. The Provider guarantees
|
|
// every Vector in Vectors has Dimension components.
|
|
type Result struct {
|
|
Model string `json:"model"`
|
|
Dimension int `json:"dimension"`
|
|
Vectors [][]float32 `json:"vectors"`
|
|
}
|
|
|
|
// ErrEmptyTexts is returned when the caller passed no inputs.
|
|
// Empty-batch is an error because returning an empty result
|
|
// silently wastes the round trip and hides caller bugs.
|
|
var ErrEmptyTexts = errors.New("embed: empty texts")
|
|
|
|
// ErrModelMismatch is returned when one call to the upstream
|
|
// returns a different vector dimension than the previous call in
|
|
// the same batch — only possible if the model actually switched
|
|
// mid-batch (load swap on the server). Surfaced as a 502 by
|
|
// callers because the upstream's behavior was inconsistent.
|
|
var ErrModelMismatch = errors.New("embed: dimension changed mid-batch")
|