5.9 KiB
UX Reference: Dashboard Health Windows
Feature Branch: 026-dashboard-health-windows
Created: 2026-03-10
Status: Phase 6 Verified
1. User Persona & Context
- Who is the user?: Data Engineer, System Administrator, or Data Analyst (Data Team).
- What is their goal?: Schedule automated LLM checks for many dashboards without overloading the database, and quickly monitor which dashboards are currently broken.
- Context: Managing dozens or hundreds of dashboards across environments (e.g., production, dev) via a web interface, and needing a high-level overview of system health every morning.
2. The "Happy Path" Narrative
The user navigates to "Settings -> Automation" and creates a new validation policy for 15 critical production dashboards. Instead of writing complex cron jobs, they simply set an "Execution Window" from 01:00 to 05:00. They check the box to "Auto-notify Owners". The system automatically spaces out the 15 checks within those 4 hours.
During the night, a dashboard owned by the user fails. The user immediately receives a Telegram message: "🚨 Validation Failed: Sales Executive (Broken Chart). [View in ss-tools]".
In the morning, the user opens the app and immediately notices a red badge [2🔴] next to "Dashboards" in the sidebar. They click it to open the "Health Center", which displays a clear "traffic light" summary: 13 Green, 0 Yellow, 2 Red. The two broken dashboards are listed at the top with a summary of the issues. They can also just ask the AI Assistant: "Which dashboards failed last night?" and get a direct link.
3. Interface Mockups
UI Layout & Flow
Screen/Component: Automation Settings (Validation Policies)
- Layout: List view of existing policies with a "Create Rule" modal.
- Key Elements:
- Policy List: Shows Active/Paused status, target count, schedule window, and active notification channels.
- Create Modal:
- Scope selection (Environment dropdown, Filter by selected/tags/all).
- Schedule configuration (Days of week checkboxes, Start/End time for Execution Window).
- Alerts configuration:
- Trigger condition (e.g., "Only FAIL").
- Toggle: "Auto-notify Owners (Email / Messenger)".
- Add custom Channels (Slack webhook, General Email).
- States:
- Info Message: When selecting times, a helpful hint appears: "💡 System will automatically distribute 15 checks within this 4-hour window to avoid peak database load."
Screen/Component: Dashboard Health Center
- Layout: Top summary cards (Traffic lights), followed by a data table.
- Key Elements:
- Environment Selector: Dropdown to switch between
ss-prod,ss-dev, etc. - Health Matrix Summary:
🟢 145 Pass | 🟡 5 Warn | 🔴 2 Fail - Data Table: Columns for Dashboard Name, Status (Badge), Detected Issues (Text), Checked At (Time), and a "View Report" action button.
- Sidebar Badge: The main navigation sidebar has a badge
[2🔴]next to the Dashboards menu item.
- Environment Selector: Dropdown to switch between
Screen/Component: Global Settings -> Notifications
- Layout: Form fields grouped by provider type (Email, Slack, Telegram).
- Key Elements:
- Email (SMTP): Host, Port, Login, Password, "From" Address.
- Slack: Webhook URL list.
- Telegram: Bot Token.
Notification Payloads
Email Example (Rich HTML):
- Subject: 🔴 [ss-tools] Ошибка валидации: Sales Executive Dashboard
- Body: 🔴 FAIL: Sales Executive (Env: ss-prod). "Сломан чарт Revenue YTD. Ошибка: колонка profit не найдена." [Embedded Screenshot] -> [Button: View Full Report].
Messenger Example (Compact Text):
🚨 **Dashboard Validation Failed**
**Dashboard:** [Sales Executive](link)
**Env:** ss-prod | **Time:** 03:15 AM
🤖 **LLM Summary:**
Обнаружена SQL-ошибка в чарте "Revenue". Данные не отображаются.
🔗 [View in ss-tools]
Chat Assistant Interaction
User: "Show dashboards that failed the night check"
Assistant: "I found 2 dashboards that failed their last validation in production:
1. 🔴 Sales Executive (Issues: Broken Chart, Timeout)
2. 🔴 Marketing Funnel (Issues: SQL Syntax Error)
[View detailed reports in Health Center]"
4. The "Error" Experience
Philosophy: Guide the user to resolve failing dashboard validations quickly and prevent misconfiguration of schedules.
Scenario A: Execution Window Too Small
- User Action: Selects an execution window of 5 minutes for 100 dashboards.
- System Response:
- (UI) Warning message appears below the time selection: "⚠️ The selected window is too narrow for 100 dashboards. This may cause database spikes. Recommended window: at least 60 minutes."
- Recovery: User adjusts the end time to broaden the window before saving.
Scenario B: Dashboard Fails Validation
- User Action: Navigates to Health Center to view a failed dashboard.
- System Response: Row is highlighted in red. Clicking "View Report" takes them directly to the detailed LLM analysis page showing the screenshot and specific errors.
5. Tone & Voice
- Style: Professional, monitoring-focused, reassuring.
- Terminology: Use "Execution Window" for time ranges, "Health Center" for the monitoring view, and "Policies" for automated rules. Avoid overly technical scheduling terms like "Cron expressions" in the UI.
6. Phase 6 Implementation Verification
- T029: Profile UI updated with Telegram ID and Email Override. Matches "Smart Notifications" section in User Preferences.
- T030/T031: Backend routing logic implemented. Supports owner routing and custom channels.
- T032: Dispatch wired into LLM analysis plugin.
- T033: Global Notification Settings UI implemented with SMTP, Telegram, and Slack sections.