UX Reference: Dashboard Health Windows

Feature Branch: 026-dashboard-health-windows Created: 2026-03-10 Status: Phase 6 Verified

1. User Persona & Context

Who is the user?: Data Engineer, System Administrator, or Data Analyst (Data Team).
What is their goal?: Schedule automated LLM checks for many dashboards without overloading the database, and quickly monitor which dashboards are currently broken.
Context: Managing dozens or hundreds of dashboards across environments (e.g., production, dev) via a web interface, and needing a high-level overview of system health every morning.

2. The "Happy Path" Narrative

The user navigates to "Settings -> Automation" and creates a new validation policy for 15 critical production dashboards. Instead of writing complex cron jobs, they simply set an "Execution Window" from 01:00 to 05:00. They check the box to "Auto-notify Owners". The system automatically spaces out the 15 checks within those 4 hours.

During the night, a dashboard owned by the user fails. The user immediately receives a Telegram message: "🚨 Validation Failed: Sales Executive (Broken Chart). [View in ss-tools]".

In the morning, the user opens the app and immediately notices a red badge [2🔴] next to "Dashboards" in the sidebar. They click it to open the "Health Center", which displays a clear "traffic light" summary: 13 Green, 0 Yellow, 2 Red. The two broken dashboards are listed at the top with a summary of the issues. They can also just ask the AI Assistant: "Which dashboards failed last night?" and get a direct link.

3. Interface Mockups

UI Layout & Flow

Screen/Component: Automation Settings (Validation Policies)

Layout: List view of existing policies with a "Create Rule" modal.
Key Elements:
- Policy List: Shows Active/Paused status, target count, schedule window, and active notification channels.
- Create Modal:
  - Scope selection (Environment dropdown, Filter by selected/tags/all).
  - Schedule configuration (Days of week checkboxes, Start/End time for Execution Window).
  - Alerts configuration:
    - Trigger condition (e.g., "Only FAIL").
    - Toggle: "Auto-notify Owners (Email / Messenger)".
    - Add custom Channels (Slack webhook, General Email).
States:
- Info Message: When selecting times, a helpful hint appears: "💡 System will automatically distribute 15 checks within this 4-hour window to avoid peak database load."

Screen/Component: Dashboard Health Center

Layout: Top summary cards (Traffic lights), followed by a data table.
Key Elements:
- Environment Selector: Dropdown to switch between ss-prod, ss-dev, etc.
- Health Matrix Summary: 🟢 145 Pass | 🟡 5 Warn | 🔴 2 Fail
- Data Table: Columns for Dashboard Name, Status (Badge), Detected Issues (Text), Checked At (Time), and a "View Report" action button.
- Sidebar Badge: The main navigation sidebar has a badge [2🔴] next to the Dashboards menu item.

Screen/Component: Global Settings -> Notifications

Layout: Form fields grouped by provider type (Email, Slack, Telegram).
Key Elements:
- Email (SMTP): Host, Port, Login, Password, "From" Address.
- Slack: Webhook URL list.
- Telegram: Bot Token.

Notification Payloads

Email Example (Rich HTML):

Subject: 🔴 [ss-tools] Ошибка валидации: Sales Executive Dashboard
Body: 🔴 FAIL: Sales Executive (Env: ss-prod). "Сломан чарт Revenue YTD. Ошибка: колонка profit не найдена." [Embedded Screenshot] -> [Button: View Full Report].

Messenger Example (Compact Text):

🚨 **Dashboard Validation Failed**
**Dashboard:** [Sales Executive](link)
**Env:** ss-prod | **Time:** 03:15 AM

🤖 **LLM Summary:**
Обнаружена SQL-ошибка в чарте "Revenue". Данные не отображаются.
🔗 [View in ss-tools]

Chat Assistant Interaction

User: "Show dashboards that failed the night check"
Assistant: "I found 2 dashboards that failed their last validation in production:
1. 🔴 Sales Executive (Issues: Broken Chart, Timeout)
2. 🔴 Marketing Funnel (Issues: SQL Syntax Error)
[View detailed reports in Health Center]"

4. The "Error" Experience

Philosophy: Guide the user to resolve failing dashboard validations quickly and prevent misconfiguration of schedules.

Scenario A: Execution Window Too Small

User Action: Selects an execution window of 5 minutes for 100 dashboards.
System Response:
- (UI) Warning message appears below the time selection: "⚠️ The selected window is too narrow for 100 dashboards. This may cause database spikes. Recommended window: at least 60 minutes."
Recovery: User adjusts the end time to broaden the window before saving.

Scenario B: Dashboard Fails Validation

User Action: Navigates to Health Center to view a failed dashboard.
System Response: Row is highlighted in red. Clicking "View Report" takes them directly to the detailed LLM analysis page showing the screenshot and specific errors.

5. Tone & Voice

Style: Professional, monitoring-focused, reassuring.
Terminology: Use "Execution Window" for time ranges, "Health Center" for the monitoring view, and "Policies" for automated rules. Avoid overly technical scheduling terms like "Cron expressions" in the UI.

6. Phase 6 Implementation Verification

T029: Profile UI updated with Telegram ID and Email Override. Matches "Smart Notifications" section in User Preferences.
T030/T031: Backend routing logic implemented. Supports owner routing and custom channels.
T032: Dispatch wired into LLM analysis plugin.
T033: Global Notification Settings UI implemented with SMTP, Telegram, and Slack sections.

5.9 KiB Raw Blame History