embedd: bump default to nomic-embed-text-v2-moe (475M MoE, 768d drop-in)
Local Ollama has three embedding models loaded: nomic-embed-text:latest 137M 768d (previous default) nomic-embed-text-v2-moe:latest 475M 768d (this commit's default) qwen3-embedding:latest 7.6B 4096d (would require dim change) v2-moe is a drop-in upgrade — same 768 dim, 3.5× more params, MoE architecture. Workers index doesn't need rebuilding, just future ingests embed with the stronger model. Run #005 result on the multi-coord stress suite: Diversity (same-role-across-contracts): 0.080 → 0.000 (n=9) → MoE is more discriminating: zero worker overlap across Milwaukee / Indianapolis / Chicago for shared role names. The geo + cert + skill context fully separates worker pools. Different-roles-same-contract: 0.013 → 0.036 (still ~96% diff) Determinism: 1.000 (unchanged) Verbatim handover: 4/4 (100%) Paraphrase handover: 4/4 (100%) 200-worker swap: Jaccard 0.000 (unchanged — still perfect) Fresh-resume verify: STILL doesn't surface fresh workers in top-8. With v2-moe, distances increased (top-1 = 0.43–0.65 vs v1's 0.25–0.39) — the embedder is MORE discriminating, but the fresh worker's vector still doesn't outrank the 8th-best existing worker. Now suspect of being an HNSW post-build add issue (coder/hnsw incremental adds can land in hard-to-reach graph regions, not an embedder problem). Better embedder didn't fix it; needs a different strategy: full index rebuild after fresh adds, or explicit playbook-layer score boost for fresh workers, or hybrid (keyword + semantic) retrieval. Phase 3 investigation. Cost: ingest is ~5× slower (workers 20s→100s; ethereal 35s→112s). Acceptable for the quality jump on diversity. Real production with incremental ingest won't pay this once-per-deploy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
84a32f0d29
commit
4da32ad102
@ -43,7 +43,7 @@ bind = "127.0.0.1:3216"
|
|||||||
# G2: Ollama local. G3+ may swap in OpenAI/Voyage by changing
|
# G2: Ollama local. G3+ may swap in OpenAI/Voyage by changing
|
||||||
# this URL + the wire format inside the provider.
|
# this URL + the wire format inside the provider.
|
||||||
provider_url = "http://localhost:11434"
|
provider_url = "http://localhost:11434"
|
||||||
default_model = "nomic-embed-text"
|
default_model = "nomic-embed-text-v2-moe"
|
||||||
|
|
||||||
[queryd]
|
[queryd]
|
||||||
bind = "127.0.0.1:3214"
|
bind = "127.0.0.1:3214"
|
||||||
@ -129,7 +129,7 @@ level = "info"
|
|||||||
[models]
|
[models]
|
||||||
# Tier 1 — local hot path
|
# Tier 1 — local hot path
|
||||||
local_fast = "qwen3.5:latest"
|
local_fast = "qwen3.5:latest"
|
||||||
local_embed = "nomic-embed-text"
|
local_embed = "nomic-embed-text-v2-moe" # 475M MoE, drop-in upgrade from 137M v1 — verified 2026-04-30 same 768-dim
|
||||||
# local_judge stays on qwen2.5:latest — qwen3.5:latest is a vision-SSM
|
# local_judge stays on qwen2.5:latest — qwen3.5:latest is a vision-SSM
|
||||||
# build with 256K context that runs ~30s per judge call against the
|
# build with 256K context that runs ~30s per judge call against the
|
||||||
# playbook_lift loop (verified 2026-04-30). qwen2.5:latest at ~1s/call
|
# playbook_lift loop (verified 2026-04-30). qwen2.5:latest at ~1s/call
|
||||||
|
|||||||
82
reports/reality-tests/multi_coord_stress_005.md
Normal file
82
reports/reality-tests/multi_coord_stress_005.md
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
# Multi-Coordinator Stress Test — Run 005
|
||||||
|
|
||||||
|
**Generated:** 2026-04-30T13:25:15.497712275Z
|
||||||
|
**Coordinators:** alice / bob / carol (each with own playbook namespace: `playbook_alice` / `playbook_bob` / `playbook_carol`)
|
||||||
|
**Contracts:** alpha_milwaukee_distribution / beta_indianapolis_manufacturing / gamma_chicago_construction
|
||||||
|
**Corpora:** `workers,ethereal_workers`
|
||||||
|
**K per query:** 8
|
||||||
|
**Total events captured:** 61
|
||||||
|
**Evidence:** `reports/reality-tests/multi_coord_stress_005.json`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Diversity — is the system locking into scenarios or cycling?
|
||||||
|
|
||||||
|
| Metric | Mean Jaccard | n pairs | Interpretation |
|
||||||
|
|---|---:|---:|---|
|
||||||
|
| Same role across different contracts | 0 | 9 | Lower = more diverse (different region/cert mix → different workers) |
|
||||||
|
| Different roles within same contract | 0.03610093610093609 | 18 | Should be near-zero (different roles = different worker pools) |
|
||||||
|
|
||||||
|
**Healthy ranges:**
|
||||||
|
- Same role across contracts: < 0.30 means the system is genuinely picking different workers per region/contract.
|
||||||
|
- Different roles same contract: < 0.10 means role-specific retrieval is working.
|
||||||
|
- If either is > 0.50, the system is "cycling" the same handful of workers regardless of query intent.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Determinism — same query reissued, top-K stability
|
||||||
|
|
||||||
|
| Metric | Value |
|
||||||
|
|---|---:|
|
||||||
|
| Mean Jaccard on retrieval-only reissue | 1 |
|
||||||
|
| Number of reissue pairs | 12 |
|
||||||
|
|
||||||
|
**Interpretation:**
|
||||||
|
- ≥ 0.95: HNSW retrieval is highly deterministic; reissues land on near-identical top-K. Good — system locks into a stable view of "best workers for this query."
|
||||||
|
- 0.80 – 0.95: Some HNSW or embed variance, acceptable.
|
||||||
|
- < 0.80: Retrieval is unstable — reissues see substantially different results, suggesting either embed nondeterminism (Ollama returning slightly different vectors) or vectord nondeterminism (HNSW insertion order affecting recall).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Learning — handover hit rate
|
||||||
|
|
||||||
|
Bob takes Alice's contract using Alice's playbook namespace. Did Alice's recorded answers surface in Bob's results?
|
||||||
|
|
||||||
|
| Metric | Value |
|
||||||
|
|---|---:|
|
||||||
|
| Verbatim handover queries run | 4 |
|
||||||
|
| Alice's recorded answer at Bob's top-1 (verbatim) | 4 |
|
||||||
|
| Alice's recorded answer in Bob's top-K (verbatim) | 4 |
|
||||||
|
| **Verbatim handover hit rate (top-1)** | **1** |
|
||||||
|
| Paraphrase handover queries run | 4 |
|
||||||
|
| Alice's recorded answer at Bob's top-1 (paraphrase) | 4 |
|
||||||
|
| Alice's recorded answer in Bob's top-K (paraphrase) | 4 |
|
||||||
|
| **Paraphrase handover hit rate (top-1)** | **1** |
|
||||||
|
|
||||||
|
**Interpretation:**
|
||||||
|
- Verbatim hit rate ≈ 1.0: trivial case — Bob runs identical queries; should always hit.
|
||||||
|
- Paraphrase hit rate ≥ 0.5: institutional memory survives wording change — the harder learning property.
|
||||||
|
- Paraphrase hit rate ≈ 0.0: Bob's paraphrases drift past the inject threshold, so Alice's recordings don't activate. Same caveat as the playbook_lift paraphrase pass.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Per-event capture
|
||||||
|
|
||||||
|
All matrix.search responses live in the JSON — top-K with worker IDs, distances, and per-corpus counts. Search by phase:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
jq '.events[] | select(.phase == "merge")' reports/reality-tests/multi_coord_stress_005.json
|
||||||
|
jq '.events[] | select(.coordinator == "alice" and .phase == "baseline")' reports/reality-tests/multi_coord_stress_005.json
|
||||||
|
jq '.events[] | select(.role == "warehouse worker") | {phase, contract, top_k_ids: [.top_k[].id]}' reports/reality-tests/multi_coord_stress_005.json
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What's NOT in this run (Phase 1 deliberately defers)
|
||||||
|
|
||||||
|
- **48-hour clock.** Events fire as discrete steps, not on a timeline.
|
||||||
|
- **Email / SMS ingest.** No endpoints exist on the Go side yet.
|
||||||
|
- **New-resume injection mid-run.** The corpus is fixed at the start.
|
||||||
|
- **Langfuse traces.** Need Go-side wiring.
|
||||||
|
|
||||||
|
These are Phase 2/3. The Phase 1 substrate is what the time-based runner will mount on top of.
|
||||||
@ -100,7 +100,7 @@ refresh_every = "1s"
|
|||||||
[embedd]
|
[embedd]
|
||||||
bind = "127.0.0.1:3216"
|
bind = "127.0.0.1:3216"
|
||||||
provider_url = "http://localhost:11434"
|
provider_url = "http://localhost:11434"
|
||||||
default_model = "nomic-embed-text"
|
default_model = "nomic-embed-text-v2-moe"
|
||||||
|
|
||||||
[vectord]
|
[vectord]
|
||||||
bind = "127.0.0.1:3215"
|
bind = "127.0.0.1:3215"
|
||||||
|
|||||||
@ -795,7 +795,7 @@ func matrixSearch(hc *http.Client, gw, query string, corpora []string, k int, us
|
|||||||
func ingestFreshWorker(hc *http.Client, gw, id, text string, metadata map[string]any) error {
|
func ingestFreshWorker(hc *http.Client, gw, id, text string, metadata map[string]any) error {
|
||||||
embedBs, _ := json.Marshal(map[string]any{
|
embedBs, _ := json.Marshal(map[string]any{
|
||||||
"texts": []string{text},
|
"texts": []string{text},
|
||||||
"model": "nomic-embed-text",
|
"model": "nomic-embed-text-v2-moe",
|
||||||
})
|
})
|
||||||
req, _ := http.NewRequest("POST", gw+"/v1/embed", bytes.NewReader(embedBs))
|
req, _ := http.NewRequest("POST", gw+"/v1/embed", bytes.NewReader(embedBs))
|
||||||
req.Header.Set("Content-Type", "application/json")
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
|||||||
@ -161,7 +161,7 @@ refresh_every = "1s"
|
|||||||
[embedd]
|
[embedd]
|
||||||
bind = "127.0.0.1:3216"
|
bind = "127.0.0.1:3216"
|
||||||
provider_url = "http://localhost:11434"
|
provider_url = "http://localhost:11434"
|
||||||
default_model = "nomic-embed-text"
|
default_model = "nomic-embed-text-v2-moe"
|
||||||
|
|
||||||
[vectord]
|
[vectord]
|
||||||
bind = "127.0.0.1:3215"
|
bind = "127.0.0.1:3215"
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user