ss-tools/specs/026-dashboard-health-windows/research.md

# Phase 0: Research & Clarifications

## Needs Clarification Resolution

### 1. ThrottledScheduler Architecture

**Context**: We need to schedule N tasks evenly across a time window (e.g., 100 tasks between 01:00 and 05:00) rather than at a single exact time, to avoid database overload.

**Decision**: Instead of a completely standalone orchestrator, we will enhance the existing `SchedulerService` (which wraps `APScheduler`) with a specific policy type for "Windowed Execution". When the scheduler evaluates a `ValidationPolicy`, it will dynamically generate N distinct job triggers spread across the configured window using `CronTrigger` or `DateTrigger`.
**Rationale**: `APScheduler` is already running as a reliable background process in our FastAPI app. Building a custom orchestrator loop would duplicate persistence and recovery logic. By calculating the distributed execution times at the point of policy evaluation (or via a daily setup job), we can feed those exact times into the existing robust scheduler.
**Alternatives considered**:
1.  **Queue-based throttling**: Push all 100 tasks to a queue and use a rate-limited worker. *Rejected* because we want users to predictably know *when* a task will run (e.g., "Sometime between 1am and 5am"), not just randomly delay it.
2.  **Standalone orchestrator thread**: A loop that sleeps and triggers tasks. *Rejected* due to complexity in managing state if the server restarts.

### 2. Health Center Data Aggregation

**Context**: The Health Center needs to display the *latest* validation status for each dashboard.

**Decision**: We will extend the existing `ResourceService.get_dashboards_with_status` to include the aggregated LLM validation outcome (derived from the most recent `ValidationRecord` for each dashboard). The frontend `DashboardHub` already has grid capabilities; we will create a specialized "Health View" projection of this grid, optimized for showing the `ValidationRecord` structured issues and statuses.
**Rationale**: Reusing the existing dashboard hub fetching logic (`get_dashboards`) ensures consistency with RBAC, environment filtering, and Git status. It prevents duplicating the heavy lifting of joining Superset dashboards with local SQLite metadata.
**Alternatives considered**:
1.  **Dedicated `/health` endpoint**: Querying only `ValidationRecord` and joining backward to Superset. *Rejected* because Superset is the source of truth for dashboard existence and ownership; querying SQLite first might show deleted dashboards.

### 3. Policy Execution Scope

**Context**: How do we define which dashboards are in a policy?

**Decision**: A `ValidationPolicy` will store a JSON list of explicit `dashboard_id`s, or dynamic tags (e.g., "tag:production", "owner:data-team"), and an `environment_id`. For V1, to simplify the UI and ensure predictable scheduling math, we will support explicit selection of dashboards (saving an array of IDs).
**Rationale**: Explicit IDs map perfectly to the requirement "Select 15 dashboards". It allows the scheduler to exactly know N = 15.
**Alternatives considered**: Purely dynamic tag evaluation at runtime. *Rejected for V1* because if a tag applies to 1000 dashboards, the scheduler wouldn't know until the moment of execution, making it harder to pre-calculate the execution window intervals.