Bridges the missing piece for the staffing co-pilot: text inputs to
vectord-shaped vectors. Standalone cmd/embedd on :3216 fronted by
gateway at /v1/embed. Pluggable embed.Provider interface (G2 ships
Ollama; OpenAI/Voyage swap in via the same interface in G3+).
Wire format:
POST /v1/embed {"texts":[...], "model":"..."} // model optional
→ 200 {"model","dimension","vectors":[[...]]}
Default model: nomic-embed-text (768-d). Ollama returns float64;
provider converts to float32 at the boundary so vectors flow through
vectord/HNSW without re-conversion.
Acceptance smoke 5/5 PASS — including the architectural payoff:
end-to-end embed → vectord add → search by re-embedded text returns
recall=1 at distance 5.96e-8 (float32 precision noise on identical
unit vectors). The staffing co-pilot pipeline (text → vector →
similarity search) is now functional end-to-end.
All 9 smokes (D1-D6 + G1 + G1P + G2) PASS deterministically.
Cross-lineage scrum on shipped code:
- Opus 4.7 (opencode): 0 BLOCK + 4 WARN + 3 INFO
- Kimi K2-0905 (openrouter): 0 BLOCK + 2 WARN + 1 INFO
- Qwen3-coder (openrouter): "No BLOCKs" (3 tokens)
Fixed (2 — 1 convergent + 1 single-reviewer):
C1 (Opus + Kimi convergent WARN): per-text 60s timeout × N-text
batch was up to N×60s with no batch-level cap. One stuck Ollama
call would stall the whole handler indefinitely. Fix:
context.WithTimeout(r.Context(), 60s) wraps the entire batch.
O-W3 (Opus WARN): empty strings in texts went to Ollama unchecked,
producing version-dependent garbage. Fix: reject "" with 400 at
the handler boundary so callers get a deterministic answer
instead of an upstream-conditional 502.
Deferred (4): drainAndClose 64KiB cap (matches G0 pattern), no
concurrency limit on /embed (single-tenant G2), missing Accept
header (exotic-proxy concern), MaxBytesError string-match
redundancy (paranoia layer kept consistent across codebase).
Zero false positives this round — Qwen returned 3 tokens "No BLOCKs"
and the other two reviewers' findings were all real.
Setup confirmed: Ollama 0.21.0 on :11434 with nomic-embed-text loaded.
Per-text /api/embeddings used (forward-compat with 0.21+); newer
0.4+ /api/embed batch endpoint can swap in via the Provider interface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
166 lines
6.4 KiB
Bash
Executable File
166 lines
6.4 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# G2 smoke — embedd service. All assertions go through gateway :3110.
|
|
#
|
|
# Validates:
|
|
# - POST /v1/embed with 2 texts → 200, dim=768 (nomic-embed-text),
|
|
# vectors[0] != vectors[1] (different texts → different vectors)
|
|
# - Same text twice → byte-identical vectors (deterministic)
|
|
# - Empty texts → 400
|
|
# - Bad model → 502 from upstream Ollama
|
|
# - End-to-end with vectord: embed text → store → search by text →
|
|
# same text round-trips at distance ≈ 0 (proves embed→vectord
|
|
# pipeline works)
|
|
#
|
|
# Requires Ollama running at :11434 with nomic-embed-text loaded.
|
|
#
|
|
# Usage: ./scripts/g2_smoke.sh
|
|
|
|
set -euo pipefail
|
|
cd "$(dirname "$0")/.."
|
|
|
|
export PATH="$PATH:/usr/local/go/bin"
|
|
|
|
echo "[g2-smoke] building embedd + vectord + gateway..."
|
|
go build -o bin/ ./cmd/embedd ./cmd/vectord ./cmd/gateway
|
|
|
|
pkill -f "bin/(embedd|vectord|gateway)" 2>/dev/null || true
|
|
sleep 0.3
|
|
|
|
PIDS=()
|
|
TMP="$(mktemp -d)"
|
|
cleanup() {
|
|
echo "[g2-smoke] cleanup"
|
|
for p in "${PIDS[@]}"; do [ -n "$p" ] && kill "$p" 2>/dev/null || true; done
|
|
rm -rf "$TMP"
|
|
}
|
|
trap cleanup EXIT INT TERM
|
|
|
|
poll_health() {
|
|
local port="$1" deadline=$(($(date +%s) + 5))
|
|
while [ "$(date +%s)" -lt "$deadline" ]; do
|
|
if curl -sS --max-time 1 "http://127.0.0.1:$port/health" >/dev/null 2>&1; then return 0; fi
|
|
sleep 0.05
|
|
done
|
|
return 1
|
|
}
|
|
|
|
# Verify Ollama is up before the test even starts — otherwise every
|
|
# embed call would 502 and the smoke would be misleading.
|
|
if ! curl -sS --max-time 3 http://localhost:11434/api/tags >/dev/null 2>&1; then
|
|
echo "[g2-smoke] Ollama not reachable on :11434 — skipping"
|
|
exit 0
|
|
fi
|
|
|
|
echo "[g2-smoke] launching embedd → vectord (no persist) → gateway..."
|
|
./bin/embedd > /tmp/embedd.log 2>&1 &
|
|
PIDS+=($!)
|
|
poll_health 3216 || { echo "embedd failed"; tail /tmp/embedd.log; exit 1; }
|
|
|
|
# vectord with persistence disabled (matches g1_smoke pattern —
|
|
# this smoke doesn't touch storaged).
|
|
./bin/vectord -config scripts/g1_smoke.toml > /tmp/vectord.log 2>&1 &
|
|
PIDS+=($!)
|
|
poll_health 3215 || { echo "vectord failed"; tail /tmp/vectord.log; exit 1; }
|
|
|
|
./bin/gateway > /tmp/gateway.log 2>&1 &
|
|
PIDS+=($!)
|
|
poll_health 3110 || { echo "gateway failed"; tail /tmp/gateway.log; exit 1; }
|
|
|
|
FAILED=0
|
|
|
|
echo "[g2-smoke] /v1/embed — two distinct texts:"
|
|
RESP="$(curl -sS -X POST http://127.0.0.1:3110/v1/embed \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"texts":["forklift operator with OSHA-30","CNC machinist precision parts"]}')"
|
|
DIM="$(echo "$RESP" | jq -r '.dimension')"
|
|
N="$(echo "$RESP" | jq -r '.vectors | length')"
|
|
MODEL="$(echo "$RESP" | jq -r '.model')"
|
|
SAME="$(echo "$RESP" | jq -r '.vectors[0][0] == .vectors[1][0]')"
|
|
if [ "$DIM" = "768" ] && [ "$N" = "2" ] && [ "$MODEL" = "nomic-embed-text" ] && [ "$SAME" = "false" ]; then
|
|
echo " ✓ dim=768, model=nomic-embed-text, 2 distinct vectors"
|
|
else
|
|
echo " ✗ resp: dim=$DIM n=$N model=$MODEL same=$SAME"; FAILED=1
|
|
fi
|
|
|
|
echo "[g2-smoke] determinism — same text twice → byte-identical vector:"
|
|
RESP1="$(curl -sS -X POST http://127.0.0.1:3110/v1/embed \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"texts":["determinism check"]}' | jq -c '.vectors[0]')"
|
|
RESP2="$(curl -sS -X POST http://127.0.0.1:3110/v1/embed \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"texts":["determinism check"]}' | jq -c '.vectors[0]')"
|
|
if [ "$RESP1" = "$RESP2" ]; then
|
|
echo " ✓ identical text → identical vector"
|
|
else
|
|
echo " ✗ deterministic mismatch"; FAILED=1
|
|
fi
|
|
|
|
echo "[g2-smoke] empty texts → 400:"
|
|
HTTP="$(curl -sS -o /dev/null -w '%{http_code}' -X POST http://127.0.0.1:3110/v1/embed \
|
|
-H 'Content-Type: application/json' -d '{"texts":[]}')"
|
|
if [ "$HTTP" = "400" ]; then echo " ✓ empty → 400"; else echo " ✗ empty → $HTTP"; FAILED=1; fi
|
|
|
|
echo "[g2-smoke] bad model → 502:"
|
|
HTTP="$(curl -sS -o /dev/null -w '%{http_code}' -X POST http://127.0.0.1:3110/v1/embed \
|
|
-H 'Content-Type: application/json' -d '{"texts":["x"],"model":"definitely-not-loaded"}')"
|
|
if [ "$HTTP" = "502" ]; then echo " ✓ unknown model → 502"; else echo " ✗ unknown → $HTTP"; FAILED=1; fi
|
|
|
|
echo "[g2-smoke] end-to-end: embed → vectord add → search by embed → recall:"
|
|
NAME="g2_demo"
|
|
# Create index. Default M/EfSearch; cosine distance.
|
|
curl -sS -o /dev/null -X POST http://127.0.0.1:3110/v1/vectors/index \
|
|
-H 'Content-Type: application/json' \
|
|
-d "{\"name\":\"$NAME\",\"dimension\":768,\"distance\":\"cosine\"}"
|
|
|
|
# Embed a few staffing-ish texts and add them.
|
|
TEXTS='["forklift operator with OSHA-30","CNC machinist precision parts","warehouse picker night shift","dental hygienist 3 years experience"]'
|
|
EMBEDS="$(curl -sS -X POST http://127.0.0.1:3110/v1/embed \
|
|
-H 'Content-Type: application/json' \
|
|
-d "{\"texts\":$TEXTS}")"
|
|
# Build the add payload — id-i + vector from embeds[i].
|
|
python3 - "$EMBEDS" <<'EOF' > "$TMP/add.json"
|
|
import json, sys
|
|
embeds = json.loads(sys.argv[1])
|
|
items = [
|
|
{"id": f"w-{i}", "vector": v, "metadata": {"text": t}}
|
|
for i, (v, t) in enumerate(zip(embeds["vectors"], [
|
|
"forklift operator with OSHA-30",
|
|
"CNC machinist precision parts",
|
|
"warehouse picker night shift",
|
|
"dental hygienist 3 years experience",
|
|
]))
|
|
]
|
|
print(json.dumps({"items": items}))
|
|
EOF
|
|
curl -sS -o /dev/null -X POST "http://127.0.0.1:3110/v1/vectors/index/$NAME/add" \
|
|
-H 'Content-Type: application/json' -d @"$TMP/add.json"
|
|
|
|
# Search by embedding the FIRST text again — should retrieve w-0 at dist≈0
|
|
QUERY_VEC="$(curl -sS -X POST http://127.0.0.1:3110/v1/embed \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"texts":["forklift operator with OSHA-30"]}' | jq -c '.vectors[0]')"
|
|
SEARCH="$(curl -sS -X POST "http://127.0.0.1:3110/v1/vectors/index/$NAME/search" \
|
|
-H 'Content-Type: application/json' \
|
|
-d "{\"vector\":$QUERY_VEC,\"k\":3}")"
|
|
TOP_ID="$(echo "$SEARCH" | jq -r '.results[0].id')"
|
|
TOP_DIST="$(echo "$SEARCH" | jq -r '.results[0].distance')"
|
|
DIST_OK="$(python3 -c "import sys; sys.exit(0 if abs($TOP_DIST) < 1e-4 else 1)" && echo y || echo n)"
|
|
if [ "$TOP_ID" = "w-0" ] && [ "$DIST_OK" = "y" ]; then
|
|
echo " ✓ embed → store → search round-trip: w-0 at dist=$TOP_DIST"
|
|
else
|
|
echo " ✗ recall: top=$TOP_ID dist=$TOP_DIST"
|
|
echo " full: $SEARCH"
|
|
FAILED=1
|
|
fi
|
|
|
|
# Clean up the index.
|
|
curl -sS -o /dev/null -X DELETE "http://127.0.0.1:3110/v1/vectors/index/$NAME" || true
|
|
|
|
if [ "$FAILED" -eq 0 ]; then
|
|
echo "[g2-smoke] G2 acceptance gate: PASSED"
|
|
exit 0
|
|
else
|
|
echo "[g2-smoke] G2 acceptance gate: FAILED"
|
|
exit 1
|
|
fi
|