30 lines
3.2 KiB
Markdown
30 lines
3.2 KiB
Markdown
# Phase 0: Research & Clarifications
|
|
|
|
## Needs Clarification Resolution
|
|
|
|
### 1. ThrottledScheduler Architecture
|
|
|
|
**Context**: We need to schedule N tasks evenly across a time window (e.g., 100 tasks between 01:00 and 05:00) rather than at a single exact time, to avoid database overload.
|
|
|
|
**Decision**: Instead of a completely standalone orchestrator, we will enhance the existing `SchedulerService` (which wraps `APScheduler`) with a specific policy type for "Windowed Execution". When the scheduler evaluates a `ValidationPolicy`, it will dynamically generate N distinct job triggers spread across the configured window using `CronTrigger` or `DateTrigger`.
|
|
**Rationale**: `APScheduler` is already running as a reliable background process in our FastAPI app. Building a custom orchestrator loop would duplicate persistence and recovery logic. By calculating the distributed execution times at the point of policy evaluation (or via a daily setup job), we can feed those exact times into the existing robust scheduler.
|
|
**Alternatives considered**:
|
|
1. **Queue-based throttling**: Push all 100 tasks to a queue and use a rate-limited worker. *Rejected* because we want users to predictably know *when* a task will run (e.g., "Sometime between 1am and 5am"), not just randomly delay it.
|
|
2. **Standalone orchestrator thread**: A loop that sleeps and triggers tasks. *Rejected* due to complexity in managing state if the server restarts.
|
|
|
|
### 2. Health Center Data Aggregation
|
|
|
|
**Context**: The Health Center needs to display the *latest* validation status for each dashboard.
|
|
|
|
**Decision**: We will extend the existing `ResourceService.get_dashboards_with_status` to include the aggregated LLM validation outcome (derived from the most recent `ValidationRecord` for each dashboard). The frontend `DashboardHub` already has grid capabilities; we will create a specialized "Health View" projection of this grid, optimized for showing the `ValidationRecord` structured issues and statuses.
|
|
**Rationale**: Reusing the existing dashboard hub fetching logic (`get_dashboards`) ensures consistency with RBAC, environment filtering, and Git status. It prevents duplicating the heavy lifting of joining Superset dashboards with local SQLite metadata.
|
|
**Alternatives considered**:
|
|
1. **Dedicated `/health` endpoint**: Querying only `ValidationRecord` and joining backward to Superset. *Rejected* because Superset is the source of truth for dashboard existence and ownership; querying SQLite first might show deleted dashboards.
|
|
|
|
### 3. Policy Execution Scope
|
|
|
|
**Context**: How do we define which dashboards are in a policy?
|
|
|
|
**Decision**: A `ValidationPolicy` will store a JSON list of explicit `dashboard_id`s, or dynamic tags (e.g., "tag:production", "owner:data-team"), and an `environment_id`. For V1, to simplify the UI and ensure predictable scheduling math, we will support explicit selection of dashboards (saving an array of IDs).
|
|
**Rationale**: Explicit IDs map perfectly to the requirement "Select 15 dashboards". It allows the scheduler to exactly know N = 15.
|
|
**Alternatives considered**: Purely dynamic tag evaluation at runtime. *Rejected for V1* because if a tag applies to 1000 dashboards, the scheduler wouldn't know until the moment of execution, making it harder to pre-calculate the execution window intervals. |