63 Commits

Author SHA1 Message Date
root
ae53ffe451 UI overhaul + adaptive pipeline + response cache + image generation + ComfyUI
Layout & UX:
- Full-screen composer on load, full-screen output on run (no split view)
- Output scrolls within container (flex-shrink:0 fix), no page scroll
- Progress panel: fixed position above output, collapses to thin bar, fades on completion
- Output cards edge-to-edge in output mode (no panel border/padding/max-width)
- Modern theme as default, theme persists across all pages (CSS injection fix)
- Reddit theme renamed to Coral (copyright)

Markdown & Typography:
- Marked.js + DOMPurify for proper markdown rendering in response cards
- Editorial typography: custom numbered list counters, gold bullet dots, blockquote styling
- Table striping, code blocks with purple syntax, link underlines
- Text normalization: strips excessive indentation, collapses blank lines between list items
- Removed mermaid.js (unreliable with LLM output, caused visible errors)

Response Cache:
- DB tables: response_cache + response_cache_history
- Cache key: SHA256(prompt + mode + sorted models)
- Instant return on cache hit, auto-upgrade when better score arrives
- Full version history for training data export
- "Fresh Run" button to bypass cache
- API: /api/cache/stats, /cache/history, /cache/export, /cache/clear

Adaptive Pipeline (new mode):
- Self-evaluating models: answer + confidence score + limitations
- Quality gates: below threshold triggers RAG retrieval + model escalation
- Knowledge base: successful responses embedded and stored for future retrieval
- Flywheel: system gets smarter with every run
- DB tables: knowledge_base + adaptive_runs
- API: /api/knowledge-base/stats, /search, /entries, /adaptive-runs

Model Error Handling:
- Full HTTP error code coverage (429, 402, 404, 401, 403, 500-504, timeouts)
- safe_query_with_fallback: failed model's role taken by next available model
- Runners updated: debate, pipeline, validator use fallback system
- UI shows "model X failed, model Y took over" notices

Image Generation:
- SDXL Turbo via diffusers (fallback) + ComfyUI + DreamShaper XL Turbo (primary)
- ComfyUI on :8188 with DPM++ SDE Karras sampler, 8 steps
- Abstract editorial style: 8 rotating prompts, forced no-people/no-text
- Disk cache for generated images
- "Illustrate" button on every response card
- Imagegen proxy at :3600, API endpoint /api/imagegen
- 1024x512 at ~3.5s per image (ComfyUI) or ~1s (diffusers fallback)

Prompt Effects:
- Sample chip animations: click-to-swap with exit/enter transitions
- Shuffle icon on hover to cycle prompts without typing
- Typewriter spam-click protection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 09:33:46 -05:00
root
a484c05189 Major feature batch: optimization UX, security hardening, prompt effects, Pipeline Lab
Optimization & History:
- Fix optimization history display on both /history and main page slide-out panel
- Full card layout with score bars, "Use This" on all variations, A/B compare, export
- Deep Optimize: chain 2-3 rounds, feed each winner to the next
- Prompt template library: save winners, browse as quick-start chips
- Mode recommendation engine from historical scores
- Score calibration: strict anchor examples (scores now spread 4-8, not 7-9)

Security Hardening:
- Auto-escalation: 3 violations in 60s triggers instant ban + high-alert mode (30s scans)
- Sentinel prompt injection defense: sanitize log data, adversarial boundary instruction
- XSS fixes: escapeHtml on model names, mode labels in history panel
- Log redaction: passwords/tokens/secrets auto-redacted from log display
- Rate-limited /api/admin/logs endpoint (10 req/min)
- HSTS + COOP headers, persistent session secret, HttpOnly+SameSite cookies
- Concurrent ban execution via ThreadPoolExecutor

Prompt Window (pretext integration):
- Canvas particle system: keystroke particles, focus sparkle, paste explosion
- Ghost text typewriter: cycling placeholder with animated typing
- Pretext-powered line measurement for accurate metrics
- Mode-colored particle cascade on mode switch
- Sample prompt typewriter effect with spam-click protection
- Live metrics bar: chars, words, lines, est. tokens

Showcase mode now allows /optimize, /deep-optimize, /score endpoints.
CSP updated for Google Fonts + esm.sh (pretext).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 04:42:24 -05:00
root
ac54743f54 Fix lost runs on disconnect + separate public/admin runs in history
Server-side save (survives page refresh/close):
- Moved save_run() from generator (client-dependent) into pipeline thread
- Pipeline thread collects responses server-side independently
- save_run() executes in pipeline thread's finally block — ALWAYS runs
- Even if user closes browser mid-run, the run completes and saves to DB

Public user tracking:
- Runs from demo/showcase users tagged with config.owner = "public"
- Admin runs tagged with actual username
- History list shows orange "PUB" badge on public user runs
- owner column added to history list query for fast filtering

Architecture change:
- _pipeline_collected[] built by pipeline thread (not generator loop)
- _run_config stored before generator starts, accessible by pipeline thread
- run_saved SSE event emitted from pipeline thread after save
- Generator's collected[] still tracks for display, but save is independent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:57:10 -05:00
root
39c421806d Add mass selection in history: select-all, shift-click range, count badge
Select All checkbox in header row:
- Toggles all visible checkboxes at once
- Shows indeterminate state when partially selected
- Syncs with individual checkbox changes

Shift-click range selection:
- Hold Shift + click to select/deselect a range of rows
- Tracks last clicked index for range calculation

Selection count badge:
- Shows "N selected / M runs" in the run count badge when items selected
- Updates on every checkbox change

Header updated to include Score column matching the data grid.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:53:51 -05:00
root
d4d114b2fd Fix demo/showcase banner: auto-fade, no nav blocking, click to dismiss
Banner no longer covers navigation:
- Pushes content down with body padding instead of overlapping
- Fades out automatically after 10 seconds
- Click to dismiss immediately
- Remembered per session via sessionStorage (won't reappear after dismissal)
- Smooth transition: opacity fade + slide up

Demo mode runs ARE saved to database (confirmed /api/run is in
DEMO_ALLOWED_POSTS and save_run executes in the SSE generator).
The "read-only" restriction only applies to admin actions like
archiving, tagging, banning — not to running prompts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:48:46 -05:00
root
ef68f5b9f7 Fix optimize crash: normalize LLM strategies that return dicts instead of strings
The analysis LLM sometimes returns strategies as objects like
[{"name": "clarity"}] instead of plain strings ["clarity"]. The
', '.join(strategies) call then fails with "expected str, got dict".

Fix: normalize each strategy to a string regardless of format —
handles str, dict with name/strategy keys, or fallback to str().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:24:53 -05:00
root
462d81868f Overhaul history: scores visible, optimization history, full run tracking
Archive list improvements:
- Score column added to run list (color-coded: green 7+, amber 5+, red)
- OPT badge on runs generated by optimization (shows parent run)
- quality_score, score_method, tags, source, parent_run now in list query

Detail panel score display:
- Large color-coded score badge in header (e.g., "8.0/10" in green)
- Shows scoring method (auto/thumbs up/thumbs down)
- Persistent — visible every time you open the detail, not just once

Full optimization history section:
- Shows all optimization runs with timestamps, scores, call counts
- Each run lists ranked variations with strategy, mode, score
- Winner highlighted with star and green border
- "View" button opens any variation's full detail
- "Use" button on winner sends prompt to composer via sessionStorage
- Always loads from /api/optimize-history — no stale data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:32:50 -05:00
root
856f584666 Fix "Use This" button: transfer prompt across pages via sessionStorage
The optimization "Use This" button was on the /history page but tried
to set document.getElementById('prompt') which only exists on /. The
JS value was lost on navigation.

Fix: store prompt in sessionStorage, pick it up on main page load.
Also opens the composer overlay so the user sees the loaded prompt
immediately instead of landing on an empty output view.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:24:59 -05:00
root
7b9b7f6641 Add optimization history, reconnect, and duplicate prevention
History detail panel now shows optimization results:
- If a run has been optimized, shows results section with best score,
  original score, and link to view the winning variation
- Fetches full optimization history via GET /api/optimize-history/<id>
- Shows count of optimizations run and child variation count
- Button changes to "Re-Optimize" for already-optimized runs

Reconnect to active optimizations:
- If optimization is already running, returns job_id in error response
- Frontend detects this and reconnects to the SSE stream
- No more losing progress when navigating away and coming back
- Refactored startOptimize() into startOptimize() + _showOptimizeStream()

New endpoint: GET /api/optimize-history/<run_id>
- Returns all pipeline_runs where pipeline='optimize' for that parent
- Returns all child team_runs created by optimization
- Includes scores, strategies, rankings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:20:01 -05:00
root
bc2ad7c1a9 Fix Lab UX: visual selection, auto-navigate, live status, stuck detection
Lab experiment selection:
- Selected experiment now highlighted with accent border + glow
- Clicking auto-navigates to relevant tab (config if idle, monitor if running)
- No more silent toast-only feedback

Live status display:
- SSE "status" events now rendered in monitor (were silently dropped before)
- Shows real-time: "Proposing change... (trial 3/50)" during execution
- Error messages displayed inline instead of just toast

Stuck experiment fix:
- On app startup, reset all "running" experiments to "paused"
- Prevents ghost "running" status after service restart
- Fixed experiments 2, 3, 4 that showed running but had dead threads

Trial cap fix:
- Changed from lifetime cap (trial_num < 50) to per-run cap (trials_this_run < 50)
- Prevents runaway experiments like #1 that accumulated 3762 trials
- Shows trial progress in status: "trial 3/50"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:14:12 -05:00
root
3b4fa449f1 Add Auto-Optimize: AI agent for history-driven prompt improvement
When viewing any past run in History, click "Optimize" to trigger an
automated workflow that:

1. Analyzes the original prompt + responses + score
2. Identifies improvement strategies (clarity, depth, specificity, etc.)
3. Generates 3-5 improved prompt variations
4. Tests each variation across original mode + brainstorm
5. Auto-scores all results via background judge
6. Ranks results and highlights the winner
7. "Use This" button loads winning prompt into composer

Architecture:
- _run_optimize(job_id, run_id): background thread, 5-phase engine
- POST /api/runs/<id>/optimize: starts optimization job
- GET /api/optimize/<job_id>/stream: SSE for live progress
- Budget-capped at 15 model calls per optimization
- Child runs saved as real team_runs (source: "optimize")
- Auto-scored → feeds into analytics + routing table automatically
- Results saved to pipeline_runs (pipeline: "optimize")

Frontend:
- "Optimize" button in history detail panel (accent-colored)
- startOptimize(runId): replaces detail view with live optimization stream
- Phase cards: Analysis → Variations → Testing → Ranked Results
- Score bars with color coding (green/amber/red)
- Winner row highlighted with star + "Use This" button

Closes the learning loop: system studies its own history → generates
better prompts → tests them → scores results → routing table improves.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:03:27 -05:00
root
8ad221b41f Add self-improving pipeline: auto-scoring, analytics, reactive refine, routing intelligence
Phase 1 — Run Quality Scoring:
- Auto-score every run in background via qwen2.5 judge (1-10)
- Thumbs up/down vote buttons on output cards
- POST /api/runs/<id>/score for user feedback
- run_saved SSE event enables vote buttons after run completes
- User votes override auto-scores (race-condition safe)
- DB: quality_score, score_method, score_metadata on team_runs

Phase 1 — Analytics Dashboard:
- GET /api/admin/analytics: score-by-mode, score-by-model, heatmap, trend
- New Analytics tab on Admin page with bar charts, heatmap table, trend sparkline
- Scoring coverage tracker (scored vs total runs)
- Model × Mode heatmap with color-coded cells

Phase 2 — Reactive Pipeline:
- _assess_stage(): orchestrator evaluates each stage's output mid-run
- _reactive_decide(): can insert/skip stages based on assessment
- Dynamic stage loop replaces fixed iteration in run_refine()
- Budget tracking prevents infinite loops (max_stages hard cap)
- Reactive decisions render as dashed notification bars between cards
- Pipeline adjusts in real-time: "Inserting VALIDATE — high severity gaps found"

Phase 3 — Cross-Run Learning:
- _build_routing_table(): queries historical scores for model×mode performance
- Best stage sequences per content_type from pipeline_runs
- Routing table cached with 30-min TTL
- Auto-Refine strategist prompt augmented with historical data
- GET /api/suggest-models?mode=X returns top 3 models for that mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 06:18:32 -05:00
root
c2cc211f21 Expand sample prompts to 5 per tier across all 21 modes (315 total)
Each mode now has {basic: [...], mid: [...], advanced: [...]} with 5
prompts per difficulty level. Renderer picks one random prompt from
each tier on every mode switch, so users see fresh examples each time.

315 hand-crafted prompts designed to highlight each mode's strengths:
- brainstorm: creative problem-solving at increasing scale
- pipeline: multi-step transformations from simple to complex
- debate: ethical dilemmas with escalating nuance
- validator: common myths to complex historical misconceptions
- roundrobin: writing tasks that benefit from iterative refinement
- redteam: security vulnerabilities from obvious to systemic
- consensus: opinion questions from clear to deeply contested
- codereview: coding tasks from functions to distributed systems
- ladder: concepts that scale from kindergarten to PhD
- tournament: creative competitions from one-liners to algorithms
- evolution: optimization targets from names to city infrastructure
- blindassembly: decomposable projects from explanations to systems
- staircase: progressive constraints from party planning to treaties
- drift: factual claims from simple dates to complex event sequences
- mesh: stakeholder analysis from office policies to life-or-death
- hallucination: fact-checkable claims from simple to obscure
- timeloop: cascading failures from restaurants to civilization
- research: deep dives from single topics to geopolitical analysis
- eval: benchmark prompts from trivia to formal proofs
- extract: structured extraction from sentences to legal documents
- refine: documents from product blurbs to architecture specs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 05:22:35 -05:00
root
0d09bb5293 Add Auto-Refine mode, composer UX, select dropdown fixes
Auto-Refine mode (21st mode):
- AI strategist analyzes content type and quality
- Selects 3-5 optimal refinement stages from 8 available
  (validate, critique, expand, structure, stakeholder, clarity,
  edge_cases, align)
- Executes stages sequentially with output chaining
- Final synthesis produces polished version
- Stages are content-aware — PRD gets different pipeline than essay
- Saved to pipeline_runs DB

Composer UX overhaul:
- Initial state: full-screen centered composer overlay
- Mode grid + models + prompt front-and-center for new users
- On Run: composer closes, output takes full screen width
- "New Prompt" button in header nav bar (not floating)
- Close button (×) on composer overlay
- Works across all 4 themes + mobile

Dropdown fixes:
- Dark theme: select options get solid #1a1d23 bg
- Modern theme: select options get solid #18181b bg
- Light/Reddit: select options get white bg with dark text
- Native <option> elements now readable in all themes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 05:12:35 -05:00
root
713f18a65f Add 4-theme system, fix enrichment panel layout, enable Docker on boot
Theme system (Dark/Light/Reddit/Modern):
- Injectable CSS/JS via after_request — zero template changes
- Dark: original gold accent on black
- Light: warm off-white with indigo accent, readable buttons
- Reddit: bluish-gray bg, orange accent, pill buttons, 8px corners
- Modern: glassmorphism dark, blue accent, frosted cards, 16px corners
- Toggle cycles all 4 themes, persists via localStorage
- Button injected into every page header automatically

Enrichment panel fix:
- threat-card changed from display:flex to display:grid
- enrich-panel now spans full width via grid-column:1/-1
- Added .enrich-section/.enrich-title/.enrich-grid CSS classes
- Sections (Geo, Deep Scan, AI) visually separated with dividers

Iterate/repipe modal themed for all modes:
- Light themes get white modal bg, proper contrast
- Reddit gets rounded corners + orange accent
- Modern gets glassmorphism modal with blue glow

Scrollbar styling across all themes:
- Rounded, properly sized (6-8px), theme-colored thumbs
- macOS-style inset look via background-clip

Layout improvements:
- Output area min-height 400px, padding-bottom 40px
- Empty state centered with more breathing room
- Docker + containerd enabled at boot for web-check survival

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 04:31:01 -05:00
root
411040f206 Fix IP banning: nginx deny list + connection kill for instant enforcement
fail2ban was using nftables action while UFW uses iptables-nft, so bans
were recorded but never enforced. Added three-layer ban enforcement:
1. nginx deny list (/etc/nginx/banned_ips.conf) for instant 403
2. ss -K to kill existing TCP connections on ban
3. Auto-sync nginx deny file on ban/unban (manual, mass, AI sentinel)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:05:49 -05:00
root
eea8ff46db Three-tier access: Off → Demo → Showcase
Off: login required for everything

Demo: public gets Team UI + run modes + admin page (browse only)
  Blocked: /logs, /admin/monitor, /history, threat intel APIs,
  sentinel, wall-of-shame, meta-pipelines, self-reports, vectors

Showcase: public gets full read-only access to ALL pages
  Allowed: admin, monitor, logs, threat intel, enrichment,
  lab, history, self-analysis, meta-pipelines
  Blocked: config changes, bans, deletes, bulk operations

Admin (logged in): full access to everything always

SHOWCASE_ONLY_ROUTES set defines which pages/APIs are
blocked in basic demo but allowed in showcase mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:29:39 -05:00
root
ffd5e43709 Fix demo/showcase toggle: separate buttons, distinct modes
Problem: plain toggle set showcase=true, so demo always became showcase.
No way to enable basic demo mode separately.

Fix:
- Three explicit buttons: [Demo] [Showcase] [Off]
- Demo mode: active=true, showcase=false (team UI only)
- Showcase mode: active=true, showcase=true (full read-only admin)
- Off: both false
- Plain toggle cycles demo on/off without touching showcase
- Clear status text shows which mode and what it means

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:26:48 -05:00
root
732f29d836 Fix showcase toggle: remove /api/demo/toggle from blocked POSTs
The demo toggle route was in DEMO_BLOCKED_POSTS, so once showcase
was enabled, the before_request handler blocked the toggle POST
even for admins (the before_request check ran before the route's
own admin check could verify the session).

Fix: removed /api/demo/toggle from blocked list. The route already
has its own admin-only check (line 460).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:24:45 -05:00
root
f0cf69b4bd Fix NameError: ADMIN_WRITE_ROUTES renamed to DEMO_BLOCKED_POSTS
before_request handler still referenced old variable name.
Updated to use DEMO_BLOCKED_POSTS with simpler path-in-set check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:23:01 -05:00
root
9f48a050c8 Showcase Mode: full read-only admin access for client demos
New mode: Showcase (replaces basic demo mode for client demos)
- Visitors see EVERYTHING: Admin, Monitor, Logs, Threat Intel,
  Lab, History, Meta-Pipelines — all without logging in
- Read-only: all GET requests allowed on all routes
- Allowed POSTs: team runs, self-analysis, IP enrichment
  (read-like operations that don't modify system config)
- Blocked POSTs: config changes, bans, deletes, bulk archive

Admin UI (Security tab):
- "Enable Showcase" button (magenta) — one click to activate
- "Turn Off" button appears when active
- Clear description of what visitors can and can't do
- Status shows "SHOWCASE MODE" with magenta styling

Banner:
- Magenta gradient banner on all pages when showcase is active
- Shows: "Showcase Mode — Full Read-Only Access — Admin · Monitor · Logs · Lab · History"
- Demo button in nav shows "Showcase" in magenta

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:19:41 -05:00
root
dfab02f114 Fix meta-pipeline detail panel collapsing on auto-refresh
Auto-refresh now skips when any detail panel is open (checks for
meta-detail-* elements). Panel stays stable while reading results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:07:47 -05:00
root
c9901dbc94 Meta-pipeline UI: add Stop/Restart/Results controls per pipeline
Each pipeline card now shows:
- Status dot + name + status tag + best score
- Stop button (red) when running
- Restart button (green) when stopped/completed
- Results button (magenta) to drill into iterations
- Live progress text when running
- Stages and iteration count on info line

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 05:06:24 -05:00
root
28df789745 Fix runaway experiments: cap at 50 trials, fix DB permissions
Bugs fixed:
- Ratchet loop had no trial cap — experiment #1 ran 3762 trials
  unchecked. Now capped at max_trials=50 per start cycle.
- meta_pipelines, meta_runs, self_reports tables had no GRANT
  for kbuser — fixed permissions for all tables and sequences.

All 4 running experiments auto-paused on restart.
Stress test confirms all tables accessible, all models responding,
meta-pipeline creation working, self-report save/retrieve working.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:56:37 -05:00
root
4dc561af12 Meta-Pipeline: self-improving multi-mode chains on real system data
Engine:
- Chains modes in sequence: extract → research → validate → debate → synthesize
- Each stage feeds its output to the next as input
- Runs same pipeline with different model sets (one model per iteration)
- Auto-scores final output using judge model (1-10)
- Keeps best result across all iterations
- All stage results + final outputs saved to meta_runs table

4 preset pipelines:
1. Security Deep Dive — security logs through 5-stage analysis
2. Run History Insights — team run data through 4-stage extraction
3. Threat Intel Enrichment — profiled IPs through 5-stage analysis
4. Cross-Report Synthesis — past self-reports through 4-stage debate

Database:
- meta_pipelines: name, source, stages, status, best_score, iterations
- meta_runs: per-iteration stage results, final output, score, models

API:
- POST /api/meta-pipeline — create pipeline from preset
- POST /api/meta-pipeline/:id/start — run in background
- POST /api/meta-pipeline/:id/stop — halt execution
- GET /api/meta-pipelines — list all with live status
- GET /api/meta-pipeline/:id — full detail with all iteration results

UI (Lab page):
- Magenta-bordered Meta-Pipeline card with 4 clickable presets
- Click preset → creates + auto-starts pipeline
- Pipeline list with live status dots, progress, scores
- Click pipeline → drill-down with per-iteration results
- Each stage expandable (click to show output)
- Best output highlighted in green border
- Auto-refreshes every 5 seconds during runs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:54:08 -05:00
root
804898b658 Auto-save self-analysis reports to DB with browsable history
Database:
- self_reports table: report_type, model, report text, data_size, timestamp
- Reports auto-saved on generation (no extra step needed)

API:
- GET /api/self-reports — list all past reports (id, type, model, size, date)
- GET /api/self-reports/:id — full report text

UI:
- "✓ Saved as report #N" indicator after generation
- "Past Reports (N)" section below self-analysis buttons
- Click any past report → expands inline (toggle on/off)
- Shows: type, model, timestamp for each saved report
- Reports persist across page refreshes and restarts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:49:17 -05:00
root
28e641f939 Self-Analysis: AI reports from system's own data + Lab experiments API
4 one-click self-analysis reports in Lab:
1. Threat Intelligence Report — security logs → attack taxonomy,
   attacker profiling, predictive analysis, recommendations
2. Model Performance Analysis — 96 team runs → usage patterns,
   model workload, response efficiency, optimization opportunities
3. Usage Analytics — nginx access logs → traffic patterns, feature
   usage, user journey mapping, UX recommendations
4. Security Posture Assessment — combined audit of security logs,
   sentinel verdicts, fail2ban, threat intel DB → risk rating

API: POST /api/self-analyze
- type: threat_intel|model_performance|access_patterns|security_posture
- model: which local model to use (default qwen2.5)
- Returns structured report from real system data

Lab UI:
- Green-bordered Self-Analysis card above experiment templates
- Click any report → runs analysis in background → result panel
  expands inline with full report (scrollable, closeable)
- Loading state shows "Analyzing..." during generation

Each report analyzes REAL data: actual security logs, actual run
history, actual nginx access patterns — not synthetic test data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:42:07 -05:00
root
ca660cbd10 Lab: add 3 experiment templates with auto-fill
Templates section below experiment list:

BASIC — Better Summaries (3 eval cases)
  Optimize summarization quality. Tests across biology, history,
  and technical content. Shows the simplest Lab workflow.

INTERMEDIATE — Code Explainer (4 eval cases)
  Find the best prompt+model to explain code to non-programmers.
  Tests loops, recursion, error handling, comprehensions.
  Shows how the ratchet evolves system prompts.

ADVANCED — Security Analyst Persona (5 eval cases)
  Evolve a cybersecurity AI across threat classification, executive
  summaries, developer education, incident response, and forensics.
  Tests multi-audience adaptation and domain expertise.

Click any template → auto-fills the create form with name, objective,
metric, all eval cases, and selects all available models. User can
modify before creating.

Each template card shows: level badge (green/amber/red), name,
eval case count, and a description explaining what the experiment
does and why it matters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:32:39 -05:00
root
f34e05168b Retheme Lab page: retro-brutalist matching all other pages
- Full theme swap: amber accents, JetBrains Mono, 2px borders
- Animated dot-grid background + scanlines
- Backdrop-filter blur on cards
- Status pills: square with borders (was rounded)
- Model chips: square with 2px borders
- Chart wraps: dark background with 2px borders
- Trial items: monospace numbers and scores
- Best config box: monospace with green border
- Nav bar with links to Team, History, Admin, Logs
- Toast: monospace with fade-out animation
- Config textarea: monospace font with dark background
- Responsive: tabs compact on mobile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:28:29 -05:00
root
efa547bb68 Full history page with tags, notes, vector API, and bulk ops
New /history page (replaces slide-out panel):
- Full-page data table: ID, Mode, Prompt, Models, Tags, Date
- Active/Archived/All view toggle
- Filter by mode, tag, or search text
- Checkbox select for bulk archive/restore
- Click any row → detail panel with full responses

Per-run detail:
- Inline tag editor: add tags (Enter), remove tags (click ✕)
- Notes textarea with auto-save (1s debounce)
- Archive/Restore/Delete buttons
- Collapsible response cards (click header to expand)

Database:
- tags TEXT[] column with GIN index for fast tag queries
- notes TEXT column for freeform annotations

APIs:
- POST /api/runs/:id/tags — update tags and/or notes
- GET /api/runs/tags — list all unique tags in use
- GET /api/runs/vectors — structured text documents for AI/embedding
  Returns: mode, prompt, models, date, tags, notes + all response text
  Filters: ?mode=, ?tag=, ?limit=
  Each doc includes token estimate for embedding planning

Main UI: History button now links to /history page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:19:22 -05:00
root
aeab1f0194 Archive/restore history: soft-delete with toggle and bulk ops
Database:
- Added 'archived' boolean column to team_runs (indexed)
- Active runs filtered by archived=false by default

API:
- GET /api/runs?show=active|archived|all
- POST /api/runs/:id/archive — archive single run
- POST /api/runs/:id/restore — restore single run
- POST /api/runs/bulk-archive — archive/restore by IDs or date

History panel UI:
- Active/Archived toggle tabs at top
- Per-run Archive button (magenta) in detail view
- Per-run Restore button (green) in detail view for archived runs
- "Archive All" bulk button when viewing active runs
- "Restore All" bulk button when viewing archived runs
- Archived runs hidden from active view, accessible anytime

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:13:33 -05:00
root
7948089f04 Fix sentinel countdown: sync to actual scan schedule, not page load
- Sentinel thread sets next_scan_ts = time.time() + interval BEFORE sleeping
- API returns next_scan_in derived from real next_scan_ts, not estimated
- Frontend calculates server clock offset and counts down to the actual
  target timestamp — refresh shows the same remaining time, not a reset
- Shows ✓ in green when scan fires, resumes countdown on next poll

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:07:15 -05:00
root
357918013d Compact sentinel card: single-line with mini ring + collapsible verdicts
- Entire sentinel status fits in one header row now
- Mini 28px countdown ring (was 64px) inline with title
- Scans/bans counts inline as text, not grid boxes
- Verdicts collapsed by default — click to expand
- Card padding reduced (8px vs 14px)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:03:57 -05:00
root
3cdfc01835 Sentinel countdown ring timer with live stats
- SVG progress ring shows time until next scan (magenta arc)
- Countdown ticks every second: "245s → 244s → ... → scanning..."
- Ring fills as time progresses, resets on scan
- Turns green and shows "scanning..." when timer hits 0
- Stats grid: Scans count, AI Bans count, Last Run time, Interval
- Backend API returns elapsed_since_scan and next_scan_in

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 04:01:09 -05:00
root
418da99fa7 Wall of Shame: persistent threat intel database with drill-down table
Database:
- threat_intel table with full enrichment data per IP
- UPSERT on IP — re-enriching updates existing record
- Stores: geo, AI analysis, web-check results, indicators, raw JSON
- Indexed on IP (unique), threat_level, enriched_at

Auto-save:
- Every enrichment auto-saves to DB (step 5 in enrichment pipeline)
- "Saved to Wall of Shame database" indicator in enrichment panel
- No duplicate scans — re-enrich updates the existing record

Wall of Shame tab (/logs):
- Stats bar: Total Profiled, Critical, High, Proxies, Automated
- Sortable table: IP, Threat, Type, Summary, Country, Ports
- Click any row to expand full detail:
  ISP, Org, ASN, City, Proxy/Hosting flags, Confidence,
  Blocklist count, Pattern, Recommendation, Indicators
- All data persists across restarts — no re-scanning needed

API:
- /api/admin/wall-of-shame — list all enriched IPs with sorting/filtering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:52:34 -05:00
root
e7f12a6d93 Tighten AI security prompts — aggressive stance for private server
Enrichment AI prompt:
- Explicitly states this is a PRIVATE application
- Strict threat level rules: 10+ blocklists = always critical,
  exploit scans = always critical, SSH-only = suspicious
- Added "compromised_host" classification option
- Recommendation options: ban permanently, ban 24h, monitor, ignore

Sentinel batch prompt:
- "Err on the side of banning" directive
- .env.production/.env.local probing = targeted recon, instant ban
- When in doubt, BAN — private server has no public scanning excuse
- Tighter rules for automated UA detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:49:17 -05:00
root
3c4846d52c Expand web-check enrichment: traceroute, headers, status, full rendering
Now queries 6 web-check endpoints per IP:
- ports — open port scan
- dns — reverse DNS / PTR records
- block-lists — DNS blocklist check (AdGuard, CloudFlare, etc.)
- trace-route — full network path with per-hop latency
- headers — HTTP response headers (server, powered-by, etc.)
- status — HTTP status code and response time

Frontend rendering:
- Traceroute displayed as hop chips with latency: IP (45ms) → IP (56ms)
- HTTP status with response time
- Server headers inline
- Errors silently skipped (many endpoints fail on raw IPs)

AI analysis now includes:
- Blocklist count and names in prompt
- Traceroute hops in prompt for network path analysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:46:43 -05:00
root
51ffd2b82c Fix enrichment: run web-check before AI analysis so data is available
Web-check (ports, DNS, blocklists) now runs as step 3, AI analysis
as step 4. AI prompt includes open ports and blocklist status for
richer threat verdicts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:42:23 -05:00
root
e816e81820 Integrate web-check Docker for deep IP enrichment
Setup:
- lissy93/web-check running in Docker on port 3000
- Queries ports, DNS, and blocklist endpoints per IP

Enrichment now includes 4 layers:
1. Geolocation (ip-api.com) — country, ISP, proxy/hosting flags
2. Web-Check deep scan — open ports, DNS/PTR, blocklist status
3. Security log aggregation — all activity for that IP
4. AI analysis (qwen2.5) — gets ALL above data as context

Frontend rendering:
- Open ports displayed in red (security risk indicators)
- Blocklist status: "3/8 blocked (AdGuard, AdGuard Family, ...)"
- Reverse DNS (PTR records)
- All data feeds into AI analysis prompt for richer verdicts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:39:42 -05:00
root
472a5d0917 IP threat intel: sorting, mass ban, enrichment with geo + AI analysis
Sorting:
- Sort by: hits, threat level, recent activity, banned status
- Active sort button highlighted in amber

Mass operations:
- Checkbox per IP for multi-select
- "Ban Selected" / "Unban Selected" buttons with confirmation
- /api/admin/security/mass-ban endpoint handles batch operations
- Selection counter shows "N selected"

IP Enrichment (click "Enrich" button per IP):
- Geolocation via ip-api.com (country, city, ISP, org, AS number)
- Proxy/hosting/mobile detection flags (red for proxy/hosting)
- AI threat analysis via local qwen2.5:
  - Threat level, classification, confidence score
  - Attack pattern description
  - Specific indicators list
  - Automated detection flag
  - Actionable recommendation
- Enrichment panel expands inline below the IP card (toggle)

Per-IP drill-down:
- Expandable raw log lines per IP (click to show/hide)
- User agent listing with count
- First seen / last seen timestamps
- HTTP method breakdown (GET:5 POST:2)
- AI sentinel verdicts shown inline
- Jail information for banned IPs

Enhanced backend:
- Security API returns per-IP log lines, first_seen, methods, event_types
- AI verdicts attached to IP records
- Multiple UA detection (fingerprint: rotating scanner)
- Sort parameter support (?sort=threat|hits|recent|banned)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:24:32 -05:00
root
de4ca533dd AI Security Sentinel: local LLM scans logs every 5 minutes
Background thread runs qwen2.5 to analyze new security log entries:
- Aggregates new entries by IP since last scan
- Sends batch to local LLM with security analysis prompt
- LLM classifies each IP: threat level, action, attack type, reason
- Auto-bans IPs the AI recommends banning (via fail2ban)
- Logs all verdicts and bans to /var/log/llm-team-sentinel.log
- Logs AI bans to security log as AI_BAN events

API:
- /api/admin/sentinel — sentinel status, stats, recent verdicts

Threat Intel tab enhancement:
- Sentinel status card with magenta accent (distinct from threat cards)
- Shows: model, scan count, ban count, last run, interval
- Recent AI verdicts table: action, IP, attack type, reason
- Errors displayed inline

Security prompt tuning:
- Explicit rules for common attack patterns
- Low temperature (0.1) for consistent classification
- JSON-only response format for reliable parsing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:08:02 -05:00
root
f1bb2a92e7 Interactive threat intelligence dashboard with one-click ban
Security API:
- /api/admin/security — aggregates security log into per-IP threat intel
  (hit count, exploit scans, login fails, paths probed, threat level)
- /api/admin/security/ban — manual ban/unban via fail2ban
  (logs MANUAL_BAN/MANUAL_UNBAN to security log)

Threat Intel tab in /logs:
- Summary stats: Critical IPs, High Threat, Currently Banned
- Per-IP cards showing: threat level, hit count, scan count, paths probed
- Critical IPs have red border, high threat amber
- One-click "Ban 24h" button per IP (calls fail2ban-client banip)
- One-click "Unban" for currently banned IPs
- Banned IPs shown at reduced opacity
- LAN IPs (192.168.*) filtered out

fail2ban tuning:
- llm-team-exploit findtime: 600s → 3600s (catch slow scanners)
- llm-team-exploit maxretry: 3 → 2 (more aggressive)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 03:05:01 -05:00
root
21c8c2a3e5 Monitor: drill-down pipeline view with step timeline
Highlander pattern — one view at each level, clean transitions:

Level 1 - Run List:
- Active runs (live, with progress bars)
- Recent runs (in-memory session runs)
- History from DB (all saved runs, click to drill down)

Level 2 - Pipeline Detail (click any DB run):
- Breadcrumb nav: Monitor → mode #id
- Header card with mode, models, timestamp, full prompt
- Step timeline with dot indicators on a vertical line
- Each step shows: model, role tag, character count, token estimate
- Green dots for completed, red for errors

Level 3 - Response Text (click any step):
- Accordion expand/collapse on click
- Full response text in monospace scrollable container
- Smooth max-height transition

Architecture ready for Level 4 (future AI comparison):
- Responses are individually addressable by step index
- Role-based grouping visible in timeline
- Side-by-side view can be added per-step

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 02:58:34 -05:00
root
9af071df6c Retheme admin page, improve save feedback, add monitor nav link
Admin UI:
- Full retro-brutalist theme matching main UI
- JetBrains Mono headings, amber accent, 2px borders
- Animated dot-grid background + scanlines
- Square toggles (was rounded)
- Backdrop-filter blur on cards
- Nav bar with links to Team, Lab, Logs, Monitor

Save feedback:
- Every save now verifies the API response (checks d.ok)
- Toast shows what was saved: "ollama provider saved / Enabled"
- Toast shows details: "Cloud models saved / 3 models configured"
- Toast shows timeout details: "Timeouts saved / Global: 300s, 2 overrides"
- Failed saves show red toast with error message
- Toast fade-out animation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 02:44:58 -05:00
root
344e11f4b2 Replace GoAccess with built-in log viewer, clickable error links
New /logs page with 5 tabs:
- App Log (journalctl for llm-team-ui service)
- Run History (all completed runs with errors inline)
- Nginx Errors (with red highlighting)
- Nginx Access (with color-coded status codes)
- Security Log (fail2ban/exploit detection)

Features:
- Live text filter (grep-style)
- Configurable line limit (50-500)
- Auto-refresh every 10s
- Run history shows mode, user, duration, response count, errors
- Error lines highlighted red, warnings amber
- Status codes color-coded (2xx green, 3xx blue, 4xx amber, 5xx red)

Error linking:
- Stream errors in main UI link to /admin/monitor
- Error response cards have "View error details" link
- Error cards styled with red border and monospace body

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 02:35:17 -05:00
root
59379c624d Fix Ollama timeout: set num_ctx dynamically, truncate oversized prompts
Root cause: query_ollama() sent no num_ctx option, so Ollama defaulted
to 2048 tokens. Research mode with 15 questions builds prompts that
exceed model context windows, causing Ollama to hang until the 300s
timeout.

Fix:
- Calculate num_ctx from prompt size + 1024 token response buffer
- Cap at model's actual context limit
- Truncate prompts that exceed context window minus 512 response tokens
- Uses smart_truncate() to preserve start + end of prompt
- Updated MODEL_CONTEXT map with accurate limits for all local models

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 02:29:11 -05:00
root
1ac7a436e6 Add live metrics dashboard to progress panel
8 real-time metrics in the progress panel:
- Elapsed time (updates every 500ms)
- Models active/total (tracks unique models as they respond)
- Responses received (count)
- Estimated tokens (~chars/4)
- Data received (formatted KB)
- SSE events (total protocol events)
- Errors (turns red if > 0)
- Heartbeats (keepalive count)

Metrics update every 500ms during run. On completion, all
metric values turn green. Magenta/purple theme for metric
values, micro labels underneath.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:55:29 -05:00
root
c507ba1016 Progress bar: magenta→cyan gradient with green completion
- Border: magenta (#d946ef) with purple glow
- Fill: gradient from magenta → purple → cyan
- Shimmer animation sweeps across the fill
- Step indicators: magenta active pulse with glow
- Completed steps: magenta→green gradient
- Phase labels: bright green with gradient fade line
- Completion: green→cyan gradient with green glow
- 8px height track (was 6px) for better visibility
- All text in progress panel uses purple/pink tones
- Clearly distinct from the amber UI elements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:35:00 -05:00
root
9eaac813df Sticky progress bar, phase labels, auto-scroll
- Progress panel is now position:sticky at top of output — always visible
- Phase labels (─── scouting ───, ─── researching ───, etc.) appear
  between response cards when the pipeline role changes
- Auto-scroll to latest response card as they arrive
- Completion state shows response count and fades after 5s
- Clear previous errors: all 'input stream' errors were caused by
  service restarts during in-flight runs, not code bugs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:30:53 -05:00
root
c124b01681 Fix SSE stream reliability: threaded server, async keepalive, streaming responses
- Enable Flask threaded=True for concurrent request handling
- Refactor generate() to use producer-consumer queue pattern:
  - Runner executes in background thread, pushes events to queue
  - Heartbeat thread sends keepalive every 10s independently
  - Generator reads from queue — stream never goes silent
- Brainstorm mode: stream responses as they arrive (was waiting for all)
- Prevents nginx/browser timeout during long model queries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:27:42 -05:00