Root cause: query_ollama() sent no num_ctx option, so Ollama defaulted to 2048 tokens. Research mode with 15 questions builds prompts that exceed model context windows, causing Ollama to hang until the 300s timeout. Fix: - Calculate num_ctx from prompt size + 1024 token response buffer - Cap at model's actual context limit - Truncate prompts that exceed context window minus 512 response tokens - Uses smart_truncate() to preserve start + end of prompt - Updated MODEL_CONTEXT map with accurate limits for all local models Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
LLM Team UI - Full-stack local AI orchestration platform
Languages
Python
97.4%
Shell
2.6%