Optimization & History:
- Fix optimization history display on both /history and main page slide-out panel
- Full card layout with score bars, "Use This" on all variations, A/B compare, export
- Deep Optimize: chain 2-3 rounds, feed each winner to the next
- Prompt template library: save winners, browse as quick-start chips
- Mode recommendation engine from historical scores
- Score calibration: strict anchor examples (scores now spread 4-8, not 7-9)
Security Hardening:
- Auto-escalation: 3 violations in 60s triggers instant ban + high-alert mode (30s scans)
- Sentinel prompt injection defense: sanitize log data, adversarial boundary instruction
- XSS fixes: escapeHtml on model names, mode labels in history panel
- Log redaction: passwords/tokens/secrets auto-redacted from log display
- Rate-limited /api/admin/logs endpoint (10 req/min)
- HSTS + COOP headers, persistent session secret, HttpOnly+SameSite cookies
- Concurrent ban execution via ThreadPoolExecutor
Prompt Window (pretext integration):
- Canvas particle system: keystroke particles, focus sparkle, paste explosion
- Ghost text typewriter: cycling placeholder with animated typing
- Pretext-powered line measurement for accurate metrics
- Mode-colored particle cascade on mode switch
- Sample prompt typewriter effect with spam-click protection
- Live metrics bar: chars, words, lines, est. tokens
Showcase mode now allows /optimize, /deep-optimize, /score endpoints.
CSP updated for Google Fonts + esm.sh (pretext).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server-side save (survives page refresh/close):
- Moved save_run() from generator (client-dependent) into pipeline thread
- Pipeline thread collects responses server-side independently
- save_run() executes in pipeline thread's finally block — ALWAYS runs
- Even if user closes browser mid-run, the run completes and saves to DB
Public user tracking:
- Runs from demo/showcase users tagged with config.owner = "public"
- Admin runs tagged with actual username
- History list shows orange "PUB" badge on public user runs
- owner column added to history list query for fast filtering
Architecture change:
- _pipeline_collected[] built by pipeline thread (not generator loop)
- _run_config stored before generator starts, accessible by pipeline thread
- run_saved SSE event emitted from pipeline thread after save
- Generator's collected[] still tracks for display, but save is independent
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Select All checkbox in header row:
- Toggles all visible checkboxes at once
- Shows indeterminate state when partially selected
- Syncs with individual checkbox changes
Shift-click range selection:
- Hold Shift + click to select/deselect a range of rows
- Tracks last clicked index for range calculation
Selection count badge:
- Shows "N selected / M runs" in the run count badge when items selected
- Updates on every checkbox change
Header updated to include Score column matching the data grid.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Banner no longer covers navigation:
- Pushes content down with body padding instead of overlapping
- Fades out automatically after 10 seconds
- Click to dismiss immediately
- Remembered per session via sessionStorage (won't reappear after dismissal)
- Smooth transition: opacity fade + slide up
Demo mode runs ARE saved to database (confirmed /api/run is in
DEMO_ALLOWED_POSTS and save_run executes in the SSE generator).
The "read-only" restriction only applies to admin actions like
archiving, tagging, banning — not to running prompts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The analysis LLM sometimes returns strategies as objects like
[{"name": "clarity"}] instead of plain strings ["clarity"]. The
', '.join(strategies) call then fails with "expected str, got dict".
Fix: normalize each strategy to a string regardless of format —
handles str, dict with name/strategy keys, or fallback to str().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Archive list improvements:
- Score column added to run list (color-coded: green 7+, amber 5+, red)
- OPT badge on runs generated by optimization (shows parent run)
- quality_score, score_method, tags, source, parent_run now in list query
Detail panel score display:
- Large color-coded score badge in header (e.g., "8.0/10" in green)
- Shows scoring method (auto/thumbs up/thumbs down)
- Persistent — visible every time you open the detail, not just once
Full optimization history section:
- Shows all optimization runs with timestamps, scores, call counts
- Each run lists ranked variations with strategy, mode, score
- Winner highlighted with star and green border
- "View" button opens any variation's full detail
- "Use" button on winner sends prompt to composer via sessionStorage
- Always loads from /api/optimize-history — no stale data
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The optimization "Use This" button was on the /history page but tried
to set document.getElementById('prompt') which only exists on /. The
JS value was lost on navigation.
Fix: store prompt in sessionStorage, pick it up on main page load.
Also opens the composer overlay so the user sees the loaded prompt
immediately instead of landing on an empty output view.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
History detail panel now shows optimization results:
- If a run has been optimized, shows results section with best score,
original score, and link to view the winning variation
- Fetches full optimization history via GET /api/optimize-history/<id>
- Shows count of optimizations run and child variation count
- Button changes to "Re-Optimize" for already-optimized runs
Reconnect to active optimizations:
- If optimization is already running, returns job_id in error response
- Frontend detects this and reconnects to the SSE stream
- No more losing progress when navigating away and coming back
- Refactored startOptimize() into startOptimize() + _showOptimizeStream()
New endpoint: GET /api/optimize-history/<run_id>
- Returns all pipeline_runs where pipeline='optimize' for that parent
- Returns all child team_runs created by optimization
- Includes scores, strategies, rankings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lab experiment selection:
- Selected experiment now highlighted with accent border + glow
- Clicking auto-navigates to relevant tab (config if idle, monitor if running)
- No more silent toast-only feedback
Live status display:
- SSE "status" events now rendered in monitor (were silently dropped before)
- Shows real-time: "Proposing change... (trial 3/50)" during execution
- Error messages displayed inline instead of just toast
Stuck experiment fix:
- On app startup, reset all "running" experiments to "paused"
- Prevents ghost "running" status after service restart
- Fixed experiments 2, 3, 4 that showed running but had dead threads
Trial cap fix:
- Changed from lifetime cap (trial_num < 50) to per-run cap (trials_this_run < 50)
- Prevents runaway experiments like #1 that accumulated 3762 trials
- Shows trial progress in status: "trial 3/50"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When viewing any past run in History, click "Optimize" to trigger an
automated workflow that:
1. Analyzes the original prompt + responses + score
2. Identifies improvement strategies (clarity, depth, specificity, etc.)
3. Generates 3-5 improved prompt variations
4. Tests each variation across original mode + brainstorm
5. Auto-scores all results via background judge
6. Ranks results and highlights the winner
7. "Use This" button loads winning prompt into composer
Architecture:
- _run_optimize(job_id, run_id): background thread, 5-phase engine
- POST /api/runs/<id>/optimize: starts optimization job
- GET /api/optimize/<job_id>/stream: SSE for live progress
- Budget-capped at 15 model calls per optimization
- Child runs saved as real team_runs (source: "optimize")
- Auto-scored → feeds into analytics + routing table automatically
- Results saved to pipeline_runs (pipeline: "optimize")
Frontend:
- "Optimize" button in history detail panel (accent-colored)
- startOptimize(runId): replaces detail view with live optimization stream
- Phase cards: Analysis → Variations → Testing → Ranked Results
- Score bars with color coding (green/amber/red)
- Winner row highlighted with star + "Use This" button
Closes the learning loop: system studies its own history → generates
better prompts → tests them → scores results → routing table improves.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1 — Run Quality Scoring:
- Auto-score every run in background via qwen2.5 judge (1-10)
- Thumbs up/down vote buttons on output cards
- POST /api/runs/<id>/score for user feedback
- run_saved SSE event enables vote buttons after run completes
- User votes override auto-scores (race-condition safe)
- DB: quality_score, score_method, score_metadata on team_runs
Phase 1 — Analytics Dashboard:
- GET /api/admin/analytics: score-by-mode, score-by-model, heatmap, trend
- New Analytics tab on Admin page with bar charts, heatmap table, trend sparkline
- Scoring coverage tracker (scored vs total runs)
- Model × Mode heatmap with color-coded cells
Phase 2 — Reactive Pipeline:
- _assess_stage(): orchestrator evaluates each stage's output mid-run
- _reactive_decide(): can insert/skip stages based on assessment
- Dynamic stage loop replaces fixed iteration in run_refine()
- Budget tracking prevents infinite loops (max_stages hard cap)
- Reactive decisions render as dashed notification bars between cards
- Pipeline adjusts in real-time: "Inserting VALIDATE — high severity gaps found"
Phase 3 — Cross-Run Learning:
- _build_routing_table(): queries historical scores for model×mode performance
- Best stage sequences per content_type from pipeline_runs
- Routing table cached with 30-min TTL
- Auto-Refine strategist prompt augmented with historical data
- GET /api/suggest-models?mode=X returns top 3 models for that mode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each mode now has {basic: [...], mid: [...], advanced: [...]} with 5
prompts per difficulty level. Renderer picks one random prompt from
each tier on every mode switch, so users see fresh examples each time.
315 hand-crafted prompts designed to highlight each mode's strengths:
- brainstorm: creative problem-solving at increasing scale
- pipeline: multi-step transformations from simple to complex
- debate: ethical dilemmas with escalating nuance
- validator: common myths to complex historical misconceptions
- roundrobin: writing tasks that benefit from iterative refinement
- redteam: security vulnerabilities from obvious to systemic
- consensus: opinion questions from clear to deeply contested
- codereview: coding tasks from functions to distributed systems
- ladder: concepts that scale from kindergarten to PhD
- tournament: creative competitions from one-liners to algorithms
- evolution: optimization targets from names to city infrastructure
- blindassembly: decomposable projects from explanations to systems
- staircase: progressive constraints from party planning to treaties
- drift: factual claims from simple dates to complex event sequences
- mesh: stakeholder analysis from office policies to life-or-death
- hallucination: fact-checkable claims from simple to obscure
- timeloop: cascading failures from restaurants to civilization
- research: deep dives from single topics to geopolitical analysis
- eval: benchmark prompts from trivia to formal proofs
- extract: structured extraction from sentences to legal documents
- refine: documents from product blurbs to architecture specs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-Refine mode (21st mode):
- AI strategist analyzes content type and quality
- Selects 3-5 optimal refinement stages from 8 available
(validate, critique, expand, structure, stakeholder, clarity,
edge_cases, align)
- Executes stages sequentially with output chaining
- Final synthesis produces polished version
- Stages are content-aware — PRD gets different pipeline than essay
- Saved to pipeline_runs DB
Composer UX overhaul:
- Initial state: full-screen centered composer overlay
- Mode grid + models + prompt front-and-center for new users
- On Run: composer closes, output takes full screen width
- "New Prompt" button in header nav bar (not floating)
- Close button (×) on composer overlay
- Works across all 4 themes + mobile
Dropdown fixes:
- Dark theme: select options get solid #1a1d23 bg
- Modern theme: select options get solid #18181b bg
- Light/Reddit: select options get white bg with dark text
- Native <option> elements now readable in all themes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Theme system (Dark/Light/Reddit/Modern):
- Injectable CSS/JS via after_request — zero template changes
- Dark: original gold accent on black
- Light: warm off-white with indigo accent, readable buttons
- Reddit: bluish-gray bg, orange accent, pill buttons, 8px corners
- Modern: glassmorphism dark, blue accent, frosted cards, 16px corners
- Toggle cycles all 4 themes, persists via localStorage
- Button injected into every page header automatically
Enrichment panel fix:
- threat-card changed from display:flex to display:grid
- enrich-panel now spans full width via grid-column:1/-1
- Added .enrich-section/.enrich-title/.enrich-grid CSS classes
- Sections (Geo, Deep Scan, AI) visually separated with dividers
Iterate/repipe modal themed for all modes:
- Light themes get white modal bg, proper contrast
- Reddit gets rounded corners + orange accent
- Modern gets glassmorphism modal with blue glow
Scrollbar styling across all themes:
- Rounded, properly sized (6-8px), theme-colored thumbs
- macOS-style inset look via background-clip
Layout improvements:
- Output area min-height 400px, padding-bottom 40px
- Empty state centered with more breathing room
- Docker + containerd enabled at boot for web-check survival
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fail2ban was using nftables action while UFW uses iptables-nft, so bans
were recorded but never enforced. Added three-layer ban enforcement:
1. nginx deny list (/etc/nginx/banned_ips.conf) for instant 403
2. ss -K to kill existing TCP connections on ban
3. Auto-sync nginx deny file on ban/unban (manual, mass, AI sentinel)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Off: login required for everything
Demo: public gets Team UI + run modes + admin page (browse only)
Blocked: /logs, /admin/monitor, /history, threat intel APIs,
sentinel, wall-of-shame, meta-pipelines, self-reports, vectors
Showcase: public gets full read-only access to ALL pages
Allowed: admin, monitor, logs, threat intel, enrichment,
lab, history, self-analysis, meta-pipelines
Blocked: config changes, bans, deletes, bulk operations
Admin (logged in): full access to everything always
SHOWCASE_ONLY_ROUTES set defines which pages/APIs are
blocked in basic demo but allowed in showcase mode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Problem: plain toggle set showcase=true, so demo always became showcase.
No way to enable basic demo mode separately.
Fix:
- Three explicit buttons: [Demo] [Showcase] [Off]
- Demo mode: active=true, showcase=false (team UI only)
- Showcase mode: active=true, showcase=true (full read-only admin)
- Off: both false
- Plain toggle cycles demo on/off without touching showcase
- Clear status text shows which mode and what it means
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The demo toggle route was in DEMO_BLOCKED_POSTS, so once showcase
was enabled, the before_request handler blocked the toggle POST
even for admins (the before_request check ran before the route's
own admin check could verify the session).
Fix: removed /api/demo/toggle from blocked list. The route already
has its own admin-only check (line 460).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
before_request handler still referenced old variable name.
Updated to use DEMO_BLOCKED_POSTS with simpler path-in-set check.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New mode: Showcase (replaces basic demo mode for client demos)
- Visitors see EVERYTHING: Admin, Monitor, Logs, Threat Intel,
Lab, History, Meta-Pipelines — all without logging in
- Read-only: all GET requests allowed on all routes
- Allowed POSTs: team runs, self-analysis, IP enrichment
(read-like operations that don't modify system config)
- Blocked POSTs: config changes, bans, deletes, bulk archive
Admin UI (Security tab):
- "Enable Showcase" button (magenta) — one click to activate
- "Turn Off" button appears when active
- Clear description of what visitors can and can't do
- Status shows "SHOWCASE MODE" with magenta styling
Banner:
- Magenta gradient banner on all pages when showcase is active
- Shows: "Showcase Mode — Full Read-Only Access — Admin · Monitor · Logs · Lab · History"
- Demo button in nav shows "Showcase" in magenta
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-refresh now skips when any detail panel is open (checks for
meta-detail-* elements). Panel stays stable while reading results.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each pipeline card now shows:
- Status dot + name + status tag + best score
- Stop button (red) when running
- Restart button (green) when stopped/completed
- Results button (magenta) to drill into iterations
- Live progress text when running
- Stages and iteration count on info line
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bugs fixed:
- Ratchet loop had no trial cap — experiment #1 ran 3762 trials
unchecked. Now capped at max_trials=50 per start cycle.
- meta_pipelines, meta_runs, self_reports tables had no GRANT
for kbuser — fixed permissions for all tables and sequences.
All 4 running experiments auto-paused on restart.
Stress test confirms all tables accessible, all models responding,
meta-pipeline creation working, self-report save/retrieve working.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Engine:
- Chains modes in sequence: extract → research → validate → debate → synthesize
- Each stage feeds its output to the next as input
- Runs same pipeline with different model sets (one model per iteration)
- Auto-scores final output using judge model (1-10)
- Keeps best result across all iterations
- All stage results + final outputs saved to meta_runs table
4 preset pipelines:
1. Security Deep Dive — security logs through 5-stage analysis
2. Run History Insights — team run data through 4-stage extraction
3. Threat Intel Enrichment — profiled IPs through 5-stage analysis
4. Cross-Report Synthesis — past self-reports through 4-stage debate
Database:
- meta_pipelines: name, source, stages, status, best_score, iterations
- meta_runs: per-iteration stage results, final output, score, models
API:
- POST /api/meta-pipeline — create pipeline from preset
- POST /api/meta-pipeline/:id/start — run in background
- POST /api/meta-pipeline/:id/stop — halt execution
- GET /api/meta-pipelines — list all with live status
- GET /api/meta-pipeline/:id — full detail with all iteration results
UI (Lab page):
- Magenta-bordered Meta-Pipeline card with 4 clickable presets
- Click preset → creates + auto-starts pipeline
- Pipeline list with live status dots, progress, scores
- Click pipeline → drill-down with per-iteration results
- Each stage expandable (click to show output)
- Best output highlighted in green border
- Auto-refreshes every 5 seconds during runs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Database:
- self_reports table: report_type, model, report text, data_size, timestamp
- Reports auto-saved on generation (no extra step needed)
API:
- GET /api/self-reports — list all past reports (id, type, model, size, date)
- GET /api/self-reports/:id — full report text
UI:
- "✓ Saved as report #N" indicator after generation
- "Past Reports (N)" section below self-analysis buttons
- Click any past report → expands inline (toggle on/off)
- Shows: type, model, timestamp for each saved report
- Reports persist across page refreshes and restarts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 one-click self-analysis reports in Lab:
1. Threat Intelligence Report — security logs → attack taxonomy,
attacker profiling, predictive analysis, recommendations
2. Model Performance Analysis — 96 team runs → usage patterns,
model workload, response efficiency, optimization opportunities
3. Usage Analytics — nginx access logs → traffic patterns, feature
usage, user journey mapping, UX recommendations
4. Security Posture Assessment — combined audit of security logs,
sentinel verdicts, fail2ban, threat intel DB → risk rating
API: POST /api/self-analyze
- type: threat_intel|model_performance|access_patterns|security_posture
- model: which local model to use (default qwen2.5)
- Returns structured report from real system data
Lab UI:
- Green-bordered Self-Analysis card above experiment templates
- Click any report → runs analysis in background → result panel
expands inline with full report (scrollable, closeable)
- Loading state shows "Analyzing..." during generation
Each report analyzes REAL data: actual security logs, actual run
history, actual nginx access patterns — not synthetic test data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Templates section below experiment list:
BASIC — Better Summaries (3 eval cases)
Optimize summarization quality. Tests across biology, history,
and technical content. Shows the simplest Lab workflow.
INTERMEDIATE — Code Explainer (4 eval cases)
Find the best prompt+model to explain code to non-programmers.
Tests loops, recursion, error handling, comprehensions.
Shows how the ratchet evolves system prompts.
ADVANCED — Security Analyst Persona (5 eval cases)
Evolve a cybersecurity AI across threat classification, executive
summaries, developer education, incident response, and forensics.
Tests multi-audience adaptation and domain expertise.
Click any template → auto-fills the create form with name, objective,
metric, all eval cases, and selects all available models. User can
modify before creating.
Each template card shows: level badge (green/amber/red), name,
eval case count, and a description explaining what the experiment
does and why it matters.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Full theme swap: amber accents, JetBrains Mono, 2px borders
- Animated dot-grid background + scanlines
- Backdrop-filter blur on cards
- Status pills: square with borders (was rounded)
- Model chips: square with 2px borders
- Chart wraps: dark background with 2px borders
- Trial items: monospace numbers and scores
- Best config box: monospace with green border
- Nav bar with links to Team, History, Admin, Logs
- Toast: monospace with fade-out animation
- Config textarea: monospace font with dark background
- Responsive: tabs compact on mobile
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New /history page (replaces slide-out panel):
- Full-page data table: ID, Mode, Prompt, Models, Tags, Date
- Active/Archived/All view toggle
- Filter by mode, tag, or search text
- Checkbox select for bulk archive/restore
- Click any row → detail panel with full responses
Per-run detail:
- Inline tag editor: add tags (Enter), remove tags (click ✕)
- Notes textarea with auto-save (1s debounce)
- Archive/Restore/Delete buttons
- Collapsible response cards (click header to expand)
Database:
- tags TEXT[] column with GIN index for fast tag queries
- notes TEXT column for freeform annotations
APIs:
- POST /api/runs/:id/tags — update tags and/or notes
- GET /api/runs/tags — list all unique tags in use
- GET /api/runs/vectors — structured text documents for AI/embedding
Returns: mode, prompt, models, date, tags, notes + all response text
Filters: ?mode=, ?tag=, ?limit=
Each doc includes token estimate for embedding planning
Main UI: History button now links to /history page
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Database:
- Added 'archived' boolean column to team_runs (indexed)
- Active runs filtered by archived=false by default
API:
- GET /api/runs?show=active|archived|all
- POST /api/runs/:id/archive — archive single run
- POST /api/runs/:id/restore — restore single run
- POST /api/runs/bulk-archive — archive/restore by IDs or date
History panel UI:
- Active/Archived toggle tabs at top
- Per-run Archive button (magenta) in detail view
- Per-run Restore button (green) in detail view for archived runs
- "Archive All" bulk button when viewing active runs
- "Restore All" bulk button when viewing archived runs
- Archived runs hidden from active view, accessible anytime
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sentinel thread sets next_scan_ts = time.time() + interval BEFORE sleeping
- API returns next_scan_in derived from real next_scan_ts, not estimated
- Frontend calculates server clock offset and counts down to the actual
target timestamp — refresh shows the same remaining time, not a reset
- Shows ✓ in green when scan fires, resumes countdown on next poll
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Entire sentinel status fits in one header row now
- Mini 28px countdown ring (was 64px) inline with title
- Scans/bans counts inline as text, not grid boxes
- Verdicts collapsed by default — click to expand
- Card padding reduced (8px vs 14px)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SVG progress ring shows time until next scan (magenta arc)
- Countdown ticks every second: "245s → 244s → ... → scanning..."
- Ring fills as time progresses, resets on scan
- Turns green and shows "scanning..." when timer hits 0
- Stats grid: Scans count, AI Bans count, Last Run time, Interval
- Backend API returns elapsed_since_scan and next_scan_in
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Database:
- threat_intel table with full enrichment data per IP
- UPSERT on IP — re-enriching updates existing record
- Stores: geo, AI analysis, web-check results, indicators, raw JSON
- Indexed on IP (unique), threat_level, enriched_at
Auto-save:
- Every enrichment auto-saves to DB (step 5 in enrichment pipeline)
- "Saved to Wall of Shame database" indicator in enrichment panel
- No duplicate scans — re-enrich updates the existing record
Wall of Shame tab (/logs):
- Stats bar: Total Profiled, Critical, High, Proxies, Automated
- Sortable table: IP, Threat, Type, Summary, Country, Ports
- Click any row to expand full detail:
ISP, Org, ASN, City, Proxy/Hosting flags, Confidence,
Blocklist count, Pattern, Recommendation, Indicators
- All data persists across restarts — no re-scanning needed
API:
- /api/admin/wall-of-shame — list all enriched IPs with sorting/filtering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enrichment AI prompt:
- Explicitly states this is a PRIVATE application
- Strict threat level rules: 10+ blocklists = always critical,
exploit scans = always critical, SSH-only = suspicious
- Added "compromised_host" classification option
- Recommendation options: ban permanently, ban 24h, monitor, ignore
Sentinel batch prompt:
- "Err on the side of banning" directive
- .env.production/.env.local probing = targeted recon, instant ban
- When in doubt, BAN — private server has no public scanning excuse
- Tighter rules for automated UA detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Now queries 6 web-check endpoints per IP:
- ports — open port scan
- dns — reverse DNS / PTR records
- block-lists — DNS blocklist check (AdGuard, CloudFlare, etc.)
- trace-route — full network path with per-hop latency
- headers — HTTP response headers (server, powered-by, etc.)
- status — HTTP status code and response time
Frontend rendering:
- Traceroute displayed as hop chips with latency: IP (45ms) → IP (56ms)
- HTTP status with response time
- Server headers inline
- Errors silently skipped (many endpoints fail on raw IPs)
AI analysis now includes:
- Blocklist count and names in prompt
- Traceroute hops in prompt for network path analysis
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Web-check (ports, DNS, blocklists) now runs as step 3, AI analysis
as step 4. AI prompt includes open ports and blocklist status for
richer threat verdicts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Setup:
- lissy93/web-check running in Docker on port 3000
- Queries ports, DNS, and blocklist endpoints per IP
Enrichment now includes 4 layers:
1. Geolocation (ip-api.com) — country, ISP, proxy/hosting flags
2. Web-Check deep scan — open ports, DNS/PTR, blocklist status
3. Security log aggregation — all activity for that IP
4. AI analysis (qwen2.5) — gets ALL above data as context
Frontend rendering:
- Open ports displayed in red (security risk indicators)
- Blocklist status: "3/8 blocked (AdGuard, AdGuard Family, ...)"
- Reverse DNS (PTR records)
- All data feeds into AI analysis prompt for richer verdicts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sorting:
- Sort by: hits, threat level, recent activity, banned status
- Active sort button highlighted in amber
Mass operations:
- Checkbox per IP for multi-select
- "Ban Selected" / "Unban Selected" buttons with confirmation
- /api/admin/security/mass-ban endpoint handles batch operations
- Selection counter shows "N selected"
IP Enrichment (click "Enrich" button per IP):
- Geolocation via ip-api.com (country, city, ISP, org, AS number)
- Proxy/hosting/mobile detection flags (red for proxy/hosting)
- AI threat analysis via local qwen2.5:
- Threat level, classification, confidence score
- Attack pattern description
- Specific indicators list
- Automated detection flag
- Actionable recommendation
- Enrichment panel expands inline below the IP card (toggle)
Per-IP drill-down:
- Expandable raw log lines per IP (click to show/hide)
- User agent listing with count
- First seen / last seen timestamps
- HTTP method breakdown (GET:5 POST:2)
- AI sentinel verdicts shown inline
- Jail information for banned IPs
Enhanced backend:
- Security API returns per-IP log lines, first_seen, methods, event_types
- AI verdicts attached to IP records
- Multiple UA detection (fingerprint: rotating scanner)
- Sort parameter support (?sort=threat|hits|recent|banned)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Background thread runs qwen2.5 to analyze new security log entries:
- Aggregates new entries by IP since last scan
- Sends batch to local LLM with security analysis prompt
- LLM classifies each IP: threat level, action, attack type, reason
- Auto-bans IPs the AI recommends banning (via fail2ban)
- Logs all verdicts and bans to /var/log/llm-team-sentinel.log
- Logs AI bans to security log as AI_BAN events
API:
- /api/admin/sentinel — sentinel status, stats, recent verdicts
Threat Intel tab enhancement:
- Sentinel status card with magenta accent (distinct from threat cards)
- Shows: model, scan count, ban count, last run, interval
- Recent AI verdicts table: action, IP, attack type, reason
- Errors displayed inline
Security prompt tuning:
- Explicit rules for common attack patterns
- Low temperature (0.1) for consistent classification
- JSON-only response format for reliable parsing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Highlander pattern — one view at each level, clean transitions:
Level 1 - Run List:
- Active runs (live, with progress bars)
- Recent runs (in-memory session runs)
- History from DB (all saved runs, click to drill down)
Level 2 - Pipeline Detail (click any DB run):
- Breadcrumb nav: Monitor → mode #id
- Header card with mode, models, timestamp, full prompt
- Step timeline with dot indicators on a vertical line
- Each step shows: model, role tag, character count, token estimate
- Green dots for completed, red for errors
Level 3 - Response Text (click any step):
- Accordion expand/collapse on click
- Full response text in monospace scrollable container
- Smooth max-height transition
Architecture ready for Level 4 (future AI comparison):
- Responses are individually addressable by step index
- Role-based grouping visible in timeline
- Side-by-side view can be added per-step
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New /logs page with 5 tabs:
- App Log (journalctl for llm-team-ui service)
- Run History (all completed runs with errors inline)
- Nginx Errors (with red highlighting)
- Nginx Access (with color-coded status codes)
- Security Log (fail2ban/exploit detection)
Features:
- Live text filter (grep-style)
- Configurable line limit (50-500)
- Auto-refresh every 10s
- Run history shows mode, user, duration, response count, errors
- Error lines highlighted red, warnings amber
- Status codes color-coded (2xx green, 3xx blue, 4xx amber, 5xx red)
Error linking:
- Stream errors in main UI link to /admin/monitor
- Error response cards have "View error details" link
- Error cards styled with red border and monospace body
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: query_ollama() sent no num_ctx option, so Ollama defaulted
to 2048 tokens. Research mode with 15 questions builds prompts that
exceed model context windows, causing Ollama to hang until the 300s
timeout.
Fix:
- Calculate num_ctx from prompt size + 1024 token response buffer
- Cap at model's actual context limit
- Truncate prompts that exceed context window minus 512 response tokens
- Uses smart_truncate() to preserve start + end of prompt
- Updated MODEL_CONTEXT map with accurate limits for all local models
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8 real-time metrics in the progress panel:
- Elapsed time (updates every 500ms)
- Models active/total (tracks unique models as they respond)
- Responses received (count)
- Estimated tokens (~chars/4)
- Data received (formatted KB)
- SSE events (total protocol events)
- Errors (turns red if > 0)
- Heartbeats (keepalive count)
Metrics update every 500ms during run. On completion, all
metric values turn green. Magenta/purple theme for metric
values, micro labels underneath.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Border: magenta (#d946ef) with purple glow
- Fill: gradient from magenta → purple → cyan
- Shimmer animation sweeps across the fill
- Step indicators: magenta active pulse with glow
- Completed steps: magenta→green gradient
- Phase labels: bright green with gradient fade line
- Completion: green→cyan gradient with green glow
- 8px height track (was 6px) for better visibility
- All text in progress panel uses purple/pink tones
- Clearly distinct from the amber UI elements
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Progress panel is now position:sticky at top of output — always visible
- Phase labels (─── scouting ───, ─── researching ───, etc.) appear
between response cards when the pipeline role changes
- Auto-scroll to latest response card as they arrive
- Completion state shows response count and fades after 5s
- Clear previous errors: all 'input stream' errors were caused by
service restarts during in-flight runs, not code bugs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Enable Flask threaded=True for concurrent request handling
- Refactor generate() to use producer-consumer queue pattern:
- Runner executes in background thread, pushes events to queue
- Heartbeat thread sends keepalive every 10s independently
- Generator reads from queue — stream never goes silent
- Brainstorm mode: stream responses as they arrive (was waiting for all)
- Prevents nginx/browser timeout during long model queries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend:
- Active run tracking with step/substep/error state
- SSE keepalive heartbeat every 15s to prevent nginx timeout
- Run log (last 100 completed runs with timing/errors)
- Research mode: per-question progress, context caps, graceful failures
- Hard cap on research questions (15), answer truncation (8K chars)
Frontend:
- Real progress bar with step segments, elapsed time, event counter
- Progress shimmer animation, step completion indicators
- Improved error display with timing context
- Green completion state with fade
Admin:
- /admin/monitor — live process dashboard
- Stats: active runs, completed, errors, avg duration
- Active run cards with live progress, substep detail, errors
- Recent run history with error traces
- Auto-polls every 3 seconds
- Full retro-brutalist theme matching main UI
Nginx:
- proxy_read_timeout 600s, proxy_send_timeout 600s
- proxy_buffering off for SSE streaming
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>