llm-team-ui

Author	SHA1	Message	Date
root	ae53ffe451	UI overhaul + adaptive pipeline + response cache + image generation + ComfyUI Layout & UX: - Full-screen composer on load, full-screen output on run (no split view) - Output scrolls within container (flex-shrink:0 fix), no page scroll - Progress panel: fixed position above output, collapses to thin bar, fades on completion - Output cards edge-to-edge in output mode (no panel border/padding/max-width) - Modern theme as default, theme persists across all pages (CSS injection fix) - Reddit theme renamed to Coral (copyright) Markdown & Typography: - Marked.js + DOMPurify for proper markdown rendering in response cards - Editorial typography: custom numbered list counters, gold bullet dots, blockquote styling - Table striping, code blocks with purple syntax, link underlines - Text normalization: strips excessive indentation, collapses blank lines between list items - Removed mermaid.js (unreliable with LLM output, caused visible errors) Response Cache: - DB tables: response_cache + response_cache_history - Cache key: SHA256(prompt + mode + sorted models) - Instant return on cache hit, auto-upgrade when better score arrives - Full version history for training data export - "Fresh Run" button to bypass cache - API: /api/cache/stats, /cache/history, /cache/export, /cache/clear Adaptive Pipeline (new mode): - Self-evaluating models: answer + confidence score + limitations - Quality gates: below threshold triggers RAG retrieval + model escalation - Knowledge base: successful responses embedded and stored for future retrieval - Flywheel: system gets smarter with every run - DB tables: knowledge_base + adaptive_runs - API: /api/knowledge-base/stats, /search, /entries, /adaptive-runs Model Error Handling: - Full HTTP error code coverage (429, 402, 404, 401, 403, 500-504, timeouts) - safe_query_with_fallback: failed model's role taken by next available model - Runners updated: debate, pipeline, validator use fallback system - UI shows "model X failed, model Y took over" notices Image Generation: - SDXL Turbo via diffusers (fallback) + ComfyUI + DreamShaper XL Turbo (primary) - ComfyUI on :8188 with DPM++ SDE Karras sampler, 8 steps - Abstract editorial style: 8 rotating prompts, forced no-people/no-text - Disk cache for generated images - "Illustrate" button on every response card - Imagegen proxy at :3600, API endpoint /api/imagegen - 1024x512 at ~3.5s per image (ComfyUI) or ~1s (diffusers fallback) Prompt Effects: - Sample chip animations: click-to-swap with exit/enter transitions - Shuffle icon on hover to cycle prompts without typing - Typewriter spam-click protection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 09:33:46 -05:00
root	a484c05189	Major feature batch: optimization UX, security hardening, prompt effects, Pipeline Lab Optimization & History: - Fix optimization history display on both /history and main page slide-out panel - Full card layout with score bars, "Use This" on all variations, A/B compare, export - Deep Optimize: chain 2-3 rounds, feed each winner to the next - Prompt template library: save winners, browse as quick-start chips - Mode recommendation engine from historical scores - Score calibration: strict anchor examples (scores now spread 4-8, not 7-9) Security Hardening: - Auto-escalation: 3 violations in 60s triggers instant ban + high-alert mode (30s scans) - Sentinel prompt injection defense: sanitize log data, adversarial boundary instruction - XSS fixes: escapeHtml on model names, mode labels in history panel - Log redaction: passwords/tokens/secrets auto-redacted from log display - Rate-limited /api/admin/logs endpoint (10 req/min) - HSTS + COOP headers, persistent session secret, HttpOnly+SameSite cookies - Concurrent ban execution via ThreadPoolExecutor Prompt Window (pretext integration): - Canvas particle system: keystroke particles, focus sparkle, paste explosion - Ghost text typewriter: cycling placeholder with animated typing - Pretext-powered line measurement for accurate metrics - Mode-colored particle cascade on mode switch - Sample prompt typewriter effect with spam-click protection - Live metrics bar: chars, words, lines, est. tokens Showcase mode now allows /optimize, /deep-optimize, /score endpoints. CSP updated for Google Fonts + esm.sh (pretext). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 04:42:24 -05:00
root	ac54743f54	Fix lost runs on disconnect + separate public/admin runs in history Server-side save (survives page refresh/close): - Moved save_run() from generator (client-dependent) into pipeline thread - Pipeline thread collects responses server-side independently - save_run() executes in pipeline thread's finally block — ALWAYS runs - Even if user closes browser mid-run, the run completes and saves to DB Public user tracking: - Runs from demo/showcase users tagged with config.owner = "public" - Admin runs tagged with actual username - History list shows orange "PUB" badge on public user runs - owner column added to history list query for fast filtering Architecture change: - _pipeline_collected[] built by pipeline thread (not generator loop) - _run_config stored before generator starts, accessible by pipeline thread - run_saved SSE event emitted from pipeline thread after save - Generator's collected[] still tracks for display, but save is independent Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:57:10 -05:00
root	39c421806d	Add mass selection in history: select-all, shift-click range, count badge Select All checkbox in header row: - Toggles all visible checkboxes at once - Shows indeterminate state when partially selected - Syncs with individual checkbox changes Shift-click range selection: - Hold Shift + click to select/deselect a range of rows - Tracks last clicked index for range calculation Selection count badge: - Shows "N selected / M runs" in the run count badge when items selected - Updates on every checkbox change Header updated to include Score column matching the data grid. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:53:51 -05:00
root	d4d114b2fd	Fix demo/showcase banner: auto-fade, no nav blocking, click to dismiss Banner no longer covers navigation: - Pushes content down with body padding instead of overlapping - Fades out automatically after 10 seconds - Click to dismiss immediately - Remembered per session via sessionStorage (won't reappear after dismissal) - Smooth transition: opacity fade + slide up Demo mode runs ARE saved to database (confirmed /api/run is in DEMO_ALLOWED_POSTS and save_run executes in the SSE generator). The "read-only" restriction only applies to admin actions like archiving, tagging, banning — not to running prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:48:46 -05:00
root	ef68f5b9f7	Fix optimize crash: normalize LLM strategies that return dicts instead of strings The analysis LLM sometimes returns strategies as objects like [{"name": "clarity"}] instead of plain strings ["clarity"]. The ', '.join(strategies) call then fails with "expected str, got dict". Fix: normalize each strategy to a string regardless of format — handles str, dict with name/strategy keys, or fallback to str(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:24:53 -05:00
root	462d81868f	Overhaul history: scores visible, optimization history, full run tracking Archive list improvements: - Score column added to run list (color-coded: green 7+, amber 5+, red) - OPT badge on runs generated by optimization (shows parent run) - quality_score, score_method, tags, source, parent_run now in list query Detail panel score display: - Large color-coded score badge in header (e.g., "8.0/10" in green) - Shows scoring method (auto/thumbs up/thumbs down) - Persistent — visible every time you open the detail, not just once Full optimization history section: - Shows all optimization runs with timestamps, scores, call counts - Each run lists ranked variations with strategy, mode, score - Winner highlighted with star and green border - "View" button opens any variation's full detail - "Use" button on winner sends prompt to composer via sessionStorage - Always loads from /api/optimize-history — no stale data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 07:32:50 -05:00
root	856f584666	Fix "Use This" button: transfer prompt across pages via sessionStorage The optimization "Use This" button was on the /history page but tried to set document.getElementById('prompt') which only exists on /. The JS value was lost on navigation. Fix: store prompt in sessionStorage, pick it up on main page load. Also opens the composer overlay so the user sees the loaded prompt immediately instead of landing on an empty output view. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 07:24:59 -05:00
root	7b9b7f6641	Add optimization history, reconnect, and duplicate prevention History detail panel now shows optimization results: - If a run has been optimized, shows results section with best score, original score, and link to view the winning variation - Fetches full optimization history via GET /api/optimize-history/<id> - Shows count of optimizations run and child variation count - Button changes to "Re-Optimize" for already-optimized runs Reconnect to active optimizations: - If optimization is already running, returns job_id in error response - Frontend detects this and reconnects to the SSE stream - No more losing progress when navigating away and coming back - Refactored startOptimize() into startOptimize() + _showOptimizeStream() New endpoint: GET /api/optimize-history/<run_id> - Returns all pipeline_runs where pipeline='optimize' for that parent - Returns all child team_runs created by optimization - Includes scores, strategies, rankings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 07:20:01 -05:00
root	bc2ad7c1a9	Fix Lab UX: visual selection, auto-navigate, live status, stuck detection Lab experiment selection: - Selected experiment now highlighted with accent border + glow - Clicking auto-navigates to relevant tab (config if idle, monitor if running) - No more silent toast-only feedback Live status display: - SSE "status" events now rendered in monitor (were silently dropped before) - Shows real-time: "Proposing change... (trial 3/50)" during execution - Error messages displayed inline instead of just toast Stuck experiment fix: - On app startup, reset all "running" experiments to "paused" - Prevents ghost "running" status after service restart - Fixed experiments 2, 3, 4 that showed running but had dead threads Trial cap fix: - Changed from lifetime cap (trial_num < 50) to per-run cap (trials_this_run < 50) - Prevents runaway experiments like #1 that accumulated 3762 trials - Shows trial progress in status: "trial 3/50" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 07:14:12 -05:00
root	3b4fa449f1	Add Auto-Optimize: AI agent for history-driven prompt improvement When viewing any past run in History, click "Optimize" to trigger an automated workflow that: 1. Analyzes the original prompt + responses + score 2. Identifies improvement strategies (clarity, depth, specificity, etc.) 3. Generates 3-5 improved prompt variations 4. Tests each variation across original mode + brainstorm 5. Auto-scores all results via background judge 6. Ranks results and highlights the winner 7. "Use This" button loads winning prompt into composer Architecture: - _run_optimize(job_id, run_id): background thread, 5-phase engine - POST /api/runs/<id>/optimize: starts optimization job - GET /api/optimize/<job_id>/stream: SSE for live progress - Budget-capped at 15 model calls per optimization - Child runs saved as real team_runs (source: "optimize") - Auto-scored → feeds into analytics + routing table automatically - Results saved to pipeline_runs (pipeline: "optimize") Frontend: - "Optimize" button in history detail panel (accent-colored) - startOptimize(runId): replaces detail view with live optimization stream - Phase cards: Analysis → Variations → Testing → Ranked Results - Score bars with color coding (green/amber/red) - Winner row highlighted with star + "Use This" button Closes the learning loop: system studies its own history → generates better prompts → tests them → scores results → routing table improves. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 07:03:27 -05:00
root	8ad221b41f	Add self-improving pipeline: auto-scoring, analytics, reactive refine, routing intelligence Phase 1 — Run Quality Scoring: - Auto-score every run in background via qwen2.5 judge (1-10) - Thumbs up/down vote buttons on output cards - POST /api/runs/<id>/score for user feedback - run_saved SSE event enables vote buttons after run completes - User votes override auto-scores (race-condition safe) - DB: quality_score, score_method, score_metadata on team_runs Phase 1 — Analytics Dashboard: - GET /api/admin/analytics: score-by-mode, score-by-model, heatmap, trend - New Analytics tab on Admin page with bar charts, heatmap table, trend sparkline - Scoring coverage tracker (scored vs total runs) - Model × Mode heatmap with color-coded cells Phase 2 — Reactive Pipeline: - _assess_stage(): orchestrator evaluates each stage's output mid-run - _reactive_decide(): can insert/skip stages based on assessment - Dynamic stage loop replaces fixed iteration in run_refine() - Budget tracking prevents infinite loops (max_stages hard cap) - Reactive decisions render as dashed notification bars between cards - Pipeline adjusts in real-time: "Inserting VALIDATE — high severity gaps found" Phase 3 — Cross-Run Learning: - _build_routing_table(): queries historical scores for model×mode performance - Best stage sequences per content_type from pipeline_runs - Routing table cached with 30-min TTL - Auto-Refine strategist prompt augmented with historical data - GET /api/suggest-models?mode=X returns top 3 models for that mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 06:18:32 -05:00
root	c2cc211f21	Expand sample prompts to 5 per tier across all 21 modes (315 total) Each mode now has {basic: [...], mid: [...], advanced: [...]} with 5 prompts per difficulty level. Renderer picks one random prompt from each tier on every mode switch, so users see fresh examples each time. 315 hand-crafted prompts designed to highlight each mode's strengths: - brainstorm: creative problem-solving at increasing scale - pipeline: multi-step transformations from simple to complex - debate: ethical dilemmas with escalating nuance - validator: common myths to complex historical misconceptions - roundrobin: writing tasks that benefit from iterative refinement - redteam: security vulnerabilities from obvious to systemic - consensus: opinion questions from clear to deeply contested - codereview: coding tasks from functions to distributed systems - ladder: concepts that scale from kindergarten to PhD - tournament: creative competitions from one-liners to algorithms - evolution: optimization targets from names to city infrastructure - blindassembly: decomposable projects from explanations to systems - staircase: progressive constraints from party planning to treaties - drift: factual claims from simple dates to complex event sequences - mesh: stakeholder analysis from office policies to life-or-death - hallucination: fact-checkable claims from simple to obscure - timeloop: cascading failures from restaurants to civilization - research: deep dives from single topics to geopolitical analysis - eval: benchmark prompts from trivia to formal proofs - extract: structured extraction from sentences to legal documents - refine: documents from product blurbs to architecture specs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 05:22:35 -05:00
root	0d09bb5293	Add Auto-Refine mode, composer UX, select dropdown fixes Auto-Refine mode (21st mode): - AI strategist analyzes content type and quality - Selects 3-5 optimal refinement stages from 8 available (validate, critique, expand, structure, stakeholder, clarity, edge_cases, align) - Executes stages sequentially with output chaining - Final synthesis produces polished version - Stages are content-aware — PRD gets different pipeline than essay - Saved to pipeline_runs DB Composer UX overhaul: - Initial state: full-screen centered composer overlay - Mode grid + models + prompt front-and-center for new users - On Run: composer closes, output takes full screen width - "New Prompt" button in header nav bar (not floating) - Close button (×) on composer overlay - Works across all 4 themes + mobile Dropdown fixes: - Dark theme: select options get solid #1a1d23 bg - Modern theme: select options get solid #18181b bg - Light/Reddit: select options get white bg with dark text - Native <option> elements now readable in all themes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 05:12:35 -05:00
root	713f18a65f	Add 4-theme system, fix enrichment panel layout, enable Docker on boot Theme system (Dark/Light/Reddit/Modern): - Injectable CSS/JS via after_request — zero template changes - Dark: original gold accent on black - Light: warm off-white with indigo accent, readable buttons - Reddit: bluish-gray bg, orange accent, pill buttons, 8px corners - Modern: glassmorphism dark, blue accent, frosted cards, 16px corners - Toggle cycles all 4 themes, persists via localStorage - Button injected into every page header automatically Enrichment panel fix: - threat-card changed from display:flex to display:grid - enrich-panel now spans full width via grid-column:1/-1 - Added .enrich-section/.enrich-title/.enrich-grid CSS classes - Sections (Geo, Deep Scan, AI) visually separated with dividers Iterate/repipe modal themed for all modes: - Light themes get white modal bg, proper contrast - Reddit gets rounded corners + orange accent - Modern gets glassmorphism modal with blue glow Scrollbar styling across all themes: - Rounded, properly sized (6-8px), theme-colored thumbs - macOS-style inset look via background-clip Layout improvements: - Output area min-height 400px, padding-bottom 40px - Empty state centered with more breathing room - Docker + containerd enabled at boot for web-check survival Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 04:31:01 -05:00
root	411040f206	Fix IP banning: nginx deny list + connection kill for instant enforcement fail2ban was using nftables action while UFW uses iptables-nft, so bans were recorded but never enforced. Added three-layer ban enforcement: 1. nginx deny list (/etc/nginx/banned_ips.conf) for instant 403 2. ss -K to kill existing TCP connections on ban 3. Auto-sync nginx deny file on ban/unban (manual, mass, AI sentinel) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 13:05:49 -05:00
root	eea8ff46db	Three-tier access: Off → Demo → Showcase Off: login required for everything Demo: public gets Team UI + run modes + admin page (browse only) Blocked: /logs, /admin/monitor, /history, threat intel APIs, sentinel, wall-of-shame, meta-pipelines, self-reports, vectors Showcase: public gets full read-only access to ALL pages Allowed: admin, monitor, logs, threat intel, enrichment, lab, history, self-analysis, meta-pipelines Blocked: config changes, bans, deletes, bulk operations Admin (logged in): full access to everything always SHOWCASE_ONLY_ROUTES set defines which pages/APIs are blocked in basic demo but allowed in showcase mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 05:29:39 -05:00
root	ffd5e43709	Fix demo/showcase toggle: separate buttons, distinct modes Problem: plain toggle set showcase=true, so demo always became showcase. No way to enable basic demo mode separately. Fix: - Three explicit buttons: [Demo] [Showcase] [Off] - Demo mode: active=true, showcase=false (team UI only) - Showcase mode: active=true, showcase=true (full read-only admin) - Off: both false - Plain toggle cycles demo on/off without touching showcase - Clear status text shows which mode and what it means Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 05:26:48 -05:00
root	732f29d836	Fix showcase toggle: remove /api/demo/toggle from blocked POSTs The demo toggle route was in DEMO_BLOCKED_POSTS, so once showcase was enabled, the before_request handler blocked the toggle POST even for admins (the before_request check ran before the route's own admin check could verify the session). Fix: removed /api/demo/toggle from blocked list. The route already has its own admin-only check (line 460). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 05:24:45 -05:00
root	f0cf69b4bd	Fix NameError: ADMIN_WRITE_ROUTES renamed to DEMO_BLOCKED_POSTS before_request handler still referenced old variable name. Updated to use DEMO_BLOCKED_POSTS with simpler path-in-set check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 05:23:01 -05:00
root	9f48a050c8	Showcase Mode: full read-only admin access for client demos New mode: Showcase (replaces basic demo mode for client demos) - Visitors see EVERYTHING: Admin, Monitor, Logs, Threat Intel, Lab, History, Meta-Pipelines — all without logging in - Read-only: all GET requests allowed on all routes - Allowed POSTs: team runs, self-analysis, IP enrichment (read-like operations that don't modify system config) - Blocked POSTs: config changes, bans, deletes, bulk archive Admin UI (Security tab): - "Enable Showcase" button (magenta) — one click to activate - "Turn Off" button appears when active - Clear description of what visitors can and can't do - Status shows "SHOWCASE MODE" with magenta styling Banner: - Magenta gradient banner on all pages when showcase is active - Shows: "Showcase Mode — Full Read-Only Access — Admin · Monitor · Logs · Lab · History" - Demo button in nav shows "Showcase" in magenta Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 05:19:41 -05:00
root	dfab02f114	Fix meta-pipeline detail panel collapsing on auto-refresh Auto-refresh now skips when any detail panel is open (checks for meta-detail-* elements). Panel stays stable while reading results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 05:07:47 -05:00
root	c9901dbc94	Meta-pipeline UI: add Stop/Restart/Results controls per pipeline Each pipeline card now shows: - Status dot + name + status tag + best score - Stop button (red) when running - Restart button (green) when stopped/completed - Results button (magenta) to drill into iterations - Live progress text when running - Stages and iteration count on info line Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 05:06:24 -05:00
root	28df789745	Fix runaway experiments: cap at 50 trials, fix DB permissions Bugs fixed: - Ratchet loop had no trial cap — experiment #1 ran 3762 trials unchecked. Now capped at max_trials=50 per start cycle. - meta_pipelines, meta_runs, self_reports tables had no GRANT for kbuser — fixed permissions for all tables and sequences. All 4 running experiments auto-paused on restart. Stress test confirms all tables accessible, all models responding, meta-pipeline creation working, self-report save/retrieve working. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:56:37 -05:00
root	4dc561af12	Meta-Pipeline: self-improving multi-mode chains on real system data Engine: - Chains modes in sequence: extract → research → validate → debate → synthesize - Each stage feeds its output to the next as input - Runs same pipeline with different model sets (one model per iteration) - Auto-scores final output using judge model (1-10) - Keeps best result across all iterations - All stage results + final outputs saved to meta_runs table 4 preset pipelines: 1. Security Deep Dive — security logs through 5-stage analysis 2. Run History Insights — team run data through 4-stage extraction 3. Threat Intel Enrichment — profiled IPs through 5-stage analysis 4. Cross-Report Synthesis — past self-reports through 4-stage debate Database: - meta_pipelines: name, source, stages, status, best_score, iterations - meta_runs: per-iteration stage results, final output, score, models API: - POST /api/meta-pipeline — create pipeline from preset - POST /api/meta-pipeline/:id/start — run in background - POST /api/meta-pipeline/:id/stop — halt execution - GET /api/meta-pipelines — list all with live status - GET /api/meta-pipeline/:id — full detail with all iteration results UI (Lab page): - Magenta-bordered Meta-Pipeline card with 4 clickable presets - Click preset → creates + auto-starts pipeline - Pipeline list with live status dots, progress, scores - Click pipeline → drill-down with per-iteration results - Each stage expandable (click to show output) - Best output highlighted in green border - Auto-refreshes every 5 seconds during runs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:54:08 -05:00
root	804898b658	Auto-save self-analysis reports to DB with browsable history Database: - self_reports table: report_type, model, report text, data_size, timestamp - Reports auto-saved on generation (no extra step needed) API: - GET /api/self-reports — list all past reports (id, type, model, size, date) - GET /api/self-reports/:id — full report text UI: - "✓ Saved as report #N" indicator after generation - "Past Reports (N)" section below self-analysis buttons - Click any past report → expands inline (toggle on/off) - Shows: type, model, timestamp for each saved report - Reports persist across page refreshes and restarts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:49:17 -05:00
root	28e641f939	Self-Analysis: AI reports from system's own data + Lab experiments API 4 one-click self-analysis reports in Lab: 1. Threat Intelligence Report — security logs → attack taxonomy, attacker profiling, predictive analysis, recommendations 2. Model Performance Analysis — 96 team runs → usage patterns, model workload, response efficiency, optimization opportunities 3. Usage Analytics — nginx access logs → traffic patterns, feature usage, user journey mapping, UX recommendations 4. Security Posture Assessment — combined audit of security logs, sentinel verdicts, fail2ban, threat intel DB → risk rating API: POST /api/self-analyze - type: threat_intel\|model_performance\|access_patterns\|security_posture - model: which local model to use (default qwen2.5) - Returns structured report from real system data Lab UI: - Green-bordered Self-Analysis card above experiment templates - Click any report → runs analysis in background → result panel expands inline with full report (scrollable, closeable) - Loading state shows "Analyzing..." during generation Each report analyzes REAL data: actual security logs, actual run history, actual nginx access patterns — not synthetic test data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:42:07 -05:00
root	ca660cbd10	Lab: add 3 experiment templates with auto-fill Templates section below experiment list: BASIC — Better Summaries (3 eval cases) Optimize summarization quality. Tests across biology, history, and technical content. Shows the simplest Lab workflow. INTERMEDIATE — Code Explainer (4 eval cases) Find the best prompt+model to explain code to non-programmers. Tests loops, recursion, error handling, comprehensions. Shows how the ratchet evolves system prompts. ADVANCED — Security Analyst Persona (5 eval cases) Evolve a cybersecurity AI across threat classification, executive summaries, developer education, incident response, and forensics. Tests multi-audience adaptation and domain expertise. Click any template → auto-fills the create form with name, objective, metric, all eval cases, and selects all available models. User can modify before creating. Each template card shows: level badge (green/amber/red), name, eval case count, and a description explaining what the experiment does and why it matters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:32:39 -05:00
root	f34e05168b	Retheme Lab page: retro-brutalist matching all other pages - Full theme swap: amber accents, JetBrains Mono, 2px borders - Animated dot-grid background + scanlines - Backdrop-filter blur on cards - Status pills: square with borders (was rounded) - Model chips: square with 2px borders - Chart wraps: dark background with 2px borders - Trial items: monospace numbers and scores - Best config box: monospace with green border - Nav bar with links to Team, History, Admin, Logs - Toast: monospace with fade-out animation - Config textarea: monospace font with dark background - Responsive: tabs compact on mobile Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:28:29 -05:00
root	efa547bb68	Full history page with tags, notes, vector API, and bulk ops New /history page (replaces slide-out panel): - Full-page data table: ID, Mode, Prompt, Models, Tags, Date - Active/Archived/All view toggle - Filter by mode, tag, or search text - Checkbox select for bulk archive/restore - Click any row → detail panel with full responses Per-run detail: - Inline tag editor: add tags (Enter), remove tags (click ✕) - Notes textarea with auto-save (1s debounce) - Archive/Restore/Delete buttons - Collapsible response cards (click header to expand) Database: - tags TEXT[] column with GIN index for fast tag queries - notes TEXT column for freeform annotations APIs: - POST /api/runs/:id/tags — update tags and/or notes - GET /api/runs/tags — list all unique tags in use - GET /api/runs/vectors — structured text documents for AI/embedding Returns: mode, prompt, models, date, tags, notes + all response text Filters: ?mode=, ?tag=, ?limit= Each doc includes token estimate for embedding planning Main UI: History button now links to /history page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:19:22 -05:00
root	aeab1f0194	Archive/restore history: soft-delete with toggle and bulk ops Database: - Added 'archived' boolean column to team_runs (indexed) - Active runs filtered by archived=false by default API: - GET /api/runs?show=active\|archived\|all - POST /api/runs/:id/archive — archive single run - POST /api/runs/:id/restore — restore single run - POST /api/runs/bulk-archive — archive/restore by IDs or date History panel UI: - Active/Archived toggle tabs at top - Per-run Archive button (magenta) in detail view - Per-run Restore button (green) in detail view for archived runs - "Archive All" bulk button when viewing active runs - "Restore All" bulk button when viewing archived runs - Archived runs hidden from active view, accessible anytime Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:13:33 -05:00
root	7948089f04	Fix sentinel countdown: sync to actual scan schedule, not page load - Sentinel thread sets next_scan_ts = time.time() + interval BEFORE sleeping - API returns next_scan_in derived from real next_scan_ts, not estimated - Frontend calculates server clock offset and counts down to the actual target timestamp — refresh shows the same remaining time, not a reset - Shows ✓ in green when scan fires, resumes countdown on next poll Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:07:15 -05:00
root	357918013d	Compact sentinel card: single-line with mini ring + collapsible verdicts - Entire sentinel status fits in one header row now - Mini 28px countdown ring (was 64px) inline with title - Scans/bans counts inline as text, not grid boxes - Verdicts collapsed by default — click to expand - Card padding reduced (8px vs 14px) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:03:57 -05:00
root	3cdfc01835	Sentinel countdown ring timer with live stats - SVG progress ring shows time until next scan (magenta arc) - Countdown ticks every second: "245s → 244s → ... → scanning..." - Ring fills as time progresses, resets on scan - Turns green and shows "scanning..." when timer hits 0 - Stats grid: Scans count, AI Bans count, Last Run time, Interval - Backend API returns elapsed_since_scan and next_scan_in Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 04:01:09 -05:00
root	418da99fa7	Wall of Shame: persistent threat intel database with drill-down table Database: - threat_intel table with full enrichment data per IP - UPSERT on IP — re-enriching updates existing record - Stores: geo, AI analysis, web-check results, indicators, raw JSON - Indexed on IP (unique), threat_level, enriched_at Auto-save: - Every enrichment auto-saves to DB (step 5 in enrichment pipeline) - "Saved to Wall of Shame database" indicator in enrichment panel - No duplicate scans — re-enrich updates the existing record Wall of Shame tab (/logs): - Stats bar: Total Profiled, Critical, High, Proxies, Automated - Sortable table: IP, Threat, Type, Summary, Country, Ports - Click any row to expand full detail: ISP, Org, ASN, City, Proxy/Hosting flags, Confidence, Blocklist count, Pattern, Recommendation, Indicators - All data persists across restarts — no re-scanning needed API: - /api/admin/wall-of-shame — list all enriched IPs with sorting/filtering Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 03:52:34 -05:00
root	e7f12a6d93	Tighten AI security prompts — aggressive stance for private server Enrichment AI prompt: - Explicitly states this is a PRIVATE application - Strict threat level rules: 10+ blocklists = always critical, exploit scans = always critical, SSH-only = suspicious - Added "compromised_host" classification option - Recommendation options: ban permanently, ban 24h, monitor, ignore Sentinel batch prompt: - "Err on the side of banning" directive - .env.production/.env.local probing = targeted recon, instant ban - When in doubt, BAN — private server has no public scanning excuse - Tighter rules for automated UA detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 03:49:17 -05:00
root	3c4846d52c	Expand web-check enrichment: traceroute, headers, status, full rendering Now queries 6 web-check endpoints per IP: - ports — open port scan - dns — reverse DNS / PTR records - block-lists — DNS blocklist check (AdGuard, CloudFlare, etc.) - trace-route — full network path with per-hop latency - headers — HTTP response headers (server, powered-by, etc.) - status — HTTP status code and response time Frontend rendering: - Traceroute displayed as hop chips with latency: IP (45ms) → IP (56ms) - HTTP status with response time - Server headers inline - Errors silently skipped (many endpoints fail on raw IPs) AI analysis now includes: - Blocklist count and names in prompt - Traceroute hops in prompt for network path analysis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 03:46:43 -05:00
root	51ffd2b82c	Fix enrichment: run web-check before AI analysis so data is available Web-check (ports, DNS, blocklists) now runs as step 3, AI analysis as step 4. AI prompt includes open ports and blocklist status for richer threat verdicts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 03:42:23 -05:00
root	e816e81820	Integrate web-check Docker for deep IP enrichment Setup: - lissy93/web-check running in Docker on port 3000 - Queries ports, DNS, and blocklist endpoints per IP Enrichment now includes 4 layers: 1. Geolocation (ip-api.com) — country, ISP, proxy/hosting flags 2. Web-Check deep scan — open ports, DNS/PTR, blocklist status 3. Security log aggregation — all activity for that IP 4. AI analysis (qwen2.5) — gets ALL above data as context Frontend rendering: - Open ports displayed in red (security risk indicators) - Blocklist status: "3/8 blocked (AdGuard, AdGuard Family, ...)" - Reverse DNS (PTR records) - All data feeds into AI analysis prompt for richer verdicts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 03:39:42 -05:00
root	472a5d0917	IP threat intel: sorting, mass ban, enrichment with geo + AI analysis Sorting: - Sort by: hits, threat level, recent activity, banned status - Active sort button highlighted in amber Mass operations: - Checkbox per IP for multi-select - "Ban Selected" / "Unban Selected" buttons with confirmation - /api/admin/security/mass-ban endpoint handles batch operations - Selection counter shows "N selected" IP Enrichment (click "Enrich" button per IP): - Geolocation via ip-api.com (country, city, ISP, org, AS number) - Proxy/hosting/mobile detection flags (red for proxy/hosting) - AI threat analysis via local qwen2.5: - Threat level, classification, confidence score - Attack pattern description - Specific indicators list - Automated detection flag - Actionable recommendation - Enrichment panel expands inline below the IP card (toggle) Per-IP drill-down: - Expandable raw log lines per IP (click to show/hide) - User agent listing with count - First seen / last seen timestamps - HTTP method breakdown (GET:5 POST:2) - AI sentinel verdicts shown inline - Jail information for banned IPs Enhanced backend: - Security API returns per-IP log lines, first_seen, methods, event_types - AI verdicts attached to IP records - Multiple UA detection (fingerprint: rotating scanner) - Sort parameter support (?sort=threat\|hits\|recent\|banned) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 03:24:32 -05:00
root	de4ca533dd	AI Security Sentinel: local LLM scans logs every 5 minutes Background thread runs qwen2.5 to analyze new security log entries: - Aggregates new entries by IP since last scan - Sends batch to local LLM with security analysis prompt - LLM classifies each IP: threat level, action, attack type, reason - Auto-bans IPs the AI recommends banning (via fail2ban) - Logs all verdicts and bans to /var/log/llm-team-sentinel.log - Logs AI bans to security log as AI_BAN events API: - /api/admin/sentinel — sentinel status, stats, recent verdicts Threat Intel tab enhancement: - Sentinel status card with magenta accent (distinct from threat cards) - Shows: model, scan count, ban count, last run, interval - Recent AI verdicts table: action, IP, attack type, reason - Errors displayed inline Security prompt tuning: - Explicit rules for common attack patterns - Low temperature (0.1) for consistent classification - JSON-only response format for reliable parsing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 03:08:02 -05:00
root	f1bb2a92e7	Interactive threat intelligence dashboard with one-click ban Security API: - /api/admin/security — aggregates security log into per-IP threat intel (hit count, exploit scans, login fails, paths probed, threat level) - /api/admin/security/ban — manual ban/unban via fail2ban (logs MANUAL_BAN/MANUAL_UNBAN to security log) Threat Intel tab in /logs: - Summary stats: Critical IPs, High Threat, Currently Banned - Per-IP cards showing: threat level, hit count, scan count, paths probed - Critical IPs have red border, high threat amber - One-click "Ban 24h" button per IP (calls fail2ban-client banip) - One-click "Unban" for currently banned IPs - Banned IPs shown at reduced opacity - LAN IPs (192.168.*) filtered out fail2ban tuning: - llm-team-exploit findtime: 600s → 3600s (catch slow scanners) - llm-team-exploit maxretry: 3 → 2 (more aggressive) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 03:05:01 -05:00
root	21c8c2a3e5	Monitor: drill-down pipeline view with step timeline Highlander pattern — one view at each level, clean transitions: Level 1 - Run List: - Active runs (live, with progress bars) - Recent runs (in-memory session runs) - History from DB (all saved runs, click to drill down) Level 2 - Pipeline Detail (click any DB run): - Breadcrumb nav: Monitor → mode #id - Header card with mode, models, timestamp, full prompt - Step timeline with dot indicators on a vertical line - Each step shows: model, role tag, character count, token estimate - Green dots for completed, red for errors Level 3 - Response Text (click any step): - Accordion expand/collapse on click - Full response text in monospace scrollable container - Smooth max-height transition Architecture ready for Level 4 (future AI comparison): - Responses are individually addressable by step index - Role-based grouping visible in timeline - Side-by-side view can be added per-step Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 02:58:34 -05:00
root	9af071df6c	Retheme admin page, improve save feedback, add monitor nav link Admin UI: - Full retro-brutalist theme matching main UI - JetBrains Mono headings, amber accent, 2px borders - Animated dot-grid background + scanlines - Square toggles (was rounded) - Backdrop-filter blur on cards - Nav bar with links to Team, Lab, Logs, Monitor Save feedback: - Every save now verifies the API response (checks d.ok) - Toast shows what was saved: "ollama provider saved / Enabled" - Toast shows details: "Cloud models saved / 3 models configured" - Toast shows timeout details: "Timeouts saved / Global: 300s, 2 overrides" - Failed saves show red toast with error message - Toast fade-out animation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 02:44:58 -05:00
root	344e11f4b2	Replace GoAccess with built-in log viewer, clickable error links New /logs page with 5 tabs: - App Log (journalctl for llm-team-ui service) - Run History (all completed runs with errors inline) - Nginx Errors (with red highlighting) - Nginx Access (with color-coded status codes) - Security Log (fail2ban/exploit detection) Features: - Live text filter (grep-style) - Configurable line limit (50-500) - Auto-refresh every 10s - Run history shows mode, user, duration, response count, errors - Error lines highlighted red, warnings amber - Status codes color-coded (2xx green, 3xx blue, 4xx amber, 5xx red) Error linking: - Stream errors in main UI link to /admin/monitor - Error response cards have "View error details" link - Error cards styled with red border and monospace body Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 02:35:17 -05:00
root	59379c624d	Fix Ollama timeout: set num_ctx dynamically, truncate oversized prompts Root cause: query_ollama() sent no num_ctx option, so Ollama defaulted to 2048 tokens. Research mode with 15 questions builds prompts that exceed model context windows, causing Ollama to hang until the 300s timeout. Fix: - Calculate num_ctx from prompt size + 1024 token response buffer - Cap at model's actual context limit - Truncate prompts that exceed context window minus 512 response tokens - Uses smart_truncate() to preserve start + end of prompt - Updated MODEL_CONTEXT map with accurate limits for all local models Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 02:29:11 -05:00
root	1ac7a436e6	Add live metrics dashboard to progress panel 8 real-time metrics in the progress panel: - Elapsed time (updates every 500ms) - Models active/total (tracks unique models as they respond) - Responses received (count) - Estimated tokens (~chars/4) - Data received (formatted KB) - SSE events (total protocol events) - Errors (turns red if > 0) - Heartbeats (keepalive count) Metrics update every 500ms during run. On completion, all metric values turn green. Magenta/purple theme for metric values, micro labels underneath. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 01:55:29 -05:00
root	c507ba1016	Progress bar: magenta→cyan gradient with green completion - Border: magenta (#d946ef) with purple glow - Fill: gradient from magenta → purple → cyan - Shimmer animation sweeps across the fill - Step indicators: magenta active pulse with glow - Completed steps: magenta→green gradient - Phase labels: bright green with gradient fade line - Completion: green→cyan gradient with green glow - 8px height track (was 6px) for better visibility - All text in progress panel uses purple/pink tones - Clearly distinct from the amber UI elements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 01:35:00 -05:00
root	9eaac813df	Sticky progress bar, phase labels, auto-scroll - Progress panel is now position:sticky at top of output — always visible - Phase labels (─── scouting ───, ─── researching ───, etc.) appear between response cards when the pipeline role changes - Auto-scroll to latest response card as they arrive - Completion state shows response count and fades after 5s - Clear previous errors: all 'input stream' errors were caused by service restarts during in-flight runs, not code bugs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 01:30:53 -05:00
root	c124b01681	Fix SSE stream reliability: threaded server, async keepalive, streaming responses - Enable Flask threaded=True for concurrent request handling - Refactor generate() to use producer-consumer queue pattern: - Runner executes in background thread, pushes events to queue - Heartbeat thread sends keepalive every 10s independently - Generator reads from queue — stream never goes silent - Brainstorm mode: stream responses as they arrive (was waiting for all) - Prevents nginx/browser timeout during long model queries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 01:27:42 -05:00

1 2

63 Commits