root 9ee7fc5550 G2: embedd — text → vector via Ollama · 2 scrum fixes
Bridges the missing piece for the staffing co-pilot: text inputs to
vectord-shaped vectors. Standalone cmd/embedd on :3216 fronted by
gateway at /v1/embed. Pluggable embed.Provider interface (G2 ships
Ollama; OpenAI/Voyage swap in via the same interface in G3+).

Wire format:
  POST /v1/embed {"texts":[...], "model":"..."}  // model optional
  → 200 {"model","dimension","vectors":[[...]]}

Default model: nomic-embed-text (768-d). Ollama returns float64;
provider converts to float32 at the boundary so vectors flow through
vectord/HNSW without re-conversion.

Acceptance smoke 5/5 PASS — including the architectural payoff:
end-to-end embed → vectord add → search by re-embedded text returns
recall=1 at distance 5.96e-8 (float32 precision noise on identical
unit vectors). The staffing co-pilot pipeline (text → vector →
similarity search) is now functional end-to-end.

All 9 smokes (D1-D6 + G1 + G1P + G2) PASS deterministically.

Cross-lineage scrum on shipped code:
  - Opus 4.7 (opencode):                    0 BLOCK + 4 WARN + 3 INFO
  - Kimi K2-0905 (openrouter):              0 BLOCK + 2 WARN + 1 INFO
  - Qwen3-coder (openrouter):               "No BLOCKs" (3 tokens)

Fixed (2 — 1 convergent + 1 single-reviewer):
  C1 (Opus + Kimi convergent WARN): per-text 60s timeout × N-text
    batch was up to N×60s with no batch-level cap. One stuck Ollama
    call would stall the whole handler indefinitely. Fix:
    context.WithTimeout(r.Context(), 60s) wraps the entire batch.
  O-W3 (Opus WARN): empty strings in texts went to Ollama unchecked,
    producing version-dependent garbage. Fix: reject "" with 400 at
    the handler boundary so callers get a deterministic answer
    instead of an upstream-conditional 502.

Deferred (4): drainAndClose 64KiB cap (matches G0 pattern), no
concurrency limit on /embed (single-tenant G2), missing Accept
header (exotic-proxy concern), MaxBytesError string-match
redundancy (paranoia layer kept consistent across codebase).

Zero false positives this round — Qwen returned 3 tokens "No BLOCKs"
and the other two reviewers' findings were all real.

Setup confirmed: Ollama 0.21.0 on :11434 with nomic-embed-text loaded.
Per-text /api/embeddings used (forward-compat with 0.21+); newer
0.4+ /api/embed batch endpoint can swap in via the Provider interface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:42:27 -05:00

golangLAKEHOUSE

Go reimplementation of the Lakehouse — a versioned knowledge substrate for staffing analytics + local AI workloads.

Status

Pre-Phase G0. Documents seeded; Go module declared; implementation has not started. See docs/PRD.md for direction and docs/SPEC.md for the component-by-component port plan.

Phase G0 prerequisites (must be done before any code lands)

  1. Install Go 1.23+ on the dev box. Not currently present at /usr/local/go or elsewhere on the build machine. Standard install:
    curl -L https://go.dev/dl/go1.23.linux-amd64.tar.gz | sudo tar -C /usr/local -xz
    echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
    
  2. Ensure cgo toolchain is present (gcc + libc-dev) — required by the DuckDB binding per ADR-001 §1.1. apt install build-essential on Debian-based systems.
  3. Initialize the dependency tree with go mod tidy once cmd/gateway/main.go declares its first imports.

Layout

docs/                         Direction + spec + ADRs
cmd/                          (forthcoming) main packages — one per service
internal/                     (forthcoming) shared packages
web/                          (forthcoming) HTMX templates + static
scripts/                      (forthcoming) cold-start, smoke, distill
tests/                        (forthcoming) golden files, integration tests

Reading order

  1. docs/PRD.md — what we're building and why
  2. docs/SPEC.md — how, per-component
  3. docs/DECISIONS.md — ADRs, starting with ADR-001 (foundational)
  4. docs/RUST_PATHWAY_MEMORY_NOTE.md — historical reference for the Rust era's pathway memory state (not migrated)

Predecessor

The Rust Lakehouse this rewrite supersedes lives at git.agentview.dev/profit/lakehouse. It remains the live system until this Go implementation reaches feature parity (per docs/SPEC.md §7).

Description
Go reimplementation of the Lakehouse — versioned knowledge substrate for staffing analytics + local AI workloads
Readme 3.2 MiB
Languages
Go 79.4%
Shell 20.1%
Just 0.3%
Dockerfile 0.2%