root 59379c624d Fix Ollama timeout: set num_ctx dynamically, truncate oversized prompts
Root cause: query_ollama() sent no num_ctx option, so Ollama defaulted
to 2048 tokens. Research mode with 15 questions builds prompts that
exceed model context windows, causing Ollama to hang until the 300s
timeout.

Fix:
- Calculate num_ctx from prompt size + 1024 token response buffer
- Cap at model's actual context limit
- Truncate prompts that exceed context window minus 512 response tokens
- Uses smart_truncate() to preserve start + end of prompt
- Updated MODEL_CONTEXT map with accurate limits for all local models

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 02:29:11 -05:00
Description
LLM Team UI - Full-stack local AI orchestration platform
9.2 MiB
Languages
Python 97.4%
Shell 2.6%