Files
2026-03-10 12:00:18 +03:00

3.2 KiB

Phase 0: Research & Clarifications

Needs Clarification Resolution

1. ThrottledScheduler Architecture

Context: We need to schedule N tasks evenly across a time window (e.g., 100 tasks between 01:00 and 05:00) rather than at a single exact time, to avoid database overload.

Decision: Instead of a completely standalone orchestrator, we will enhance the existing SchedulerService (which wraps APScheduler) with a specific policy type for "Windowed Execution". When the scheduler evaluates a ValidationPolicy, it will dynamically generate N distinct job triggers spread across the configured window using CronTrigger or DateTrigger. Rationale: APScheduler is already running as a reliable background process in our FastAPI app. Building a custom orchestrator loop would duplicate persistence and recovery logic. By calculating the distributed execution times at the point of policy evaluation (or via a daily setup job), we can feed those exact times into the existing robust scheduler. Alternatives considered:

  1. Queue-based throttling: Push all 100 tasks to a queue and use a rate-limited worker. Rejected because we want users to predictably know when a task will run (e.g., "Sometime between 1am and 5am"), not just randomly delay it.
  2. Standalone orchestrator thread: A loop that sleeps and triggers tasks. Rejected due to complexity in managing state if the server restarts.

2. Health Center Data Aggregation

Context: The Health Center needs to display the latest validation status for each dashboard.

Decision: We will extend the existing ResourceService.get_dashboards_with_status to include the aggregated LLM validation outcome (derived from the most recent ValidationRecord for each dashboard). The frontend DashboardHub already has grid capabilities; we will create a specialized "Health View" projection of this grid, optimized for showing the ValidationRecord structured issues and statuses. Rationale: Reusing the existing dashboard hub fetching logic (get_dashboards) ensures consistency with RBAC, environment filtering, and Git status. It prevents duplicating the heavy lifting of joining Superset dashboards with local SQLite metadata. Alternatives considered:

  1. Dedicated /health endpoint: Querying only ValidationRecord and joining backward to Superset. Rejected because Superset is the source of truth for dashboard existence and ownership; querying SQLite first might show deleted dashboards.

3. Policy Execution Scope

Context: How do we define which dashboards are in a policy?

Decision: A ValidationPolicy will store a JSON list of explicit dashboard_ids, or dynamic tags (e.g., "tag:production", "owner:data-team"), and an environment_id. For V1, to simplify the UI and ensure predictable scheduling math, we will support explicit selection of dashboards (saving an array of IDs). Rationale: Explicit IDs map perfectly to the requirement "Select 15 dashboards". It allows the scheduler to exactly know N = 15. Alternatives considered: Purely dynamic tag evaluation at runtime. Rejected for V1 because if a tag applies to 1000 dashboards, the scheduler wouldn't know until the moment of execution, making it harder to pre-calculate the execution window intervals.