tasks ready

2026-05-08 18:01:49 +03:00
parent d8df1fff59
commit bdd376595c
32 changed files with 3243 additions and 229 deletions
--- a/specs/028-llm-datasource-supeset/spec.md
+++ b/specs/028-llm-datasource-supeset/spec.md
@@ -0,0 +1,332 @@
+# Feature Specification: LLM Table Translation Service
+
+**Feature Branch**: `028-llm-datasource-supeset`  
+**Created**: 2026-05-08  
+**Status**: Draft  
+**Input**: User description: "Я хочу добавить сервис llm перевода данных в таблицах. Пусть механизм использует datasource supeset для получения данных (строки для перевода + контекст), и insert values в материализованные таблы (можно выполнять в sqllab) для готовых строк. Обязательно нужно выбирать столбец для переводаб столбцы контекста, ключи (может быть несколько) по которым данные будут инсертится в таблу целевую."
+
+## Clarifications
+
+### Session 2026-05-08
+
+- Q: Should translation jobs support scheduled/periodic execution? → A: Yes, translation jobs can be placed on a schedule (cron-like or interval-based), with each scheduled trigger creating a new Translation Run and optionally auto-executing the INSERT via Superset SQL Lab API.
+- Q: Should the system support a terminology dictionary passed as LLM context? → A: Yes, a user-maintained terminology dictionary (source_term → target_translation pairs) is passed as additional context to the LLM during translation to ensure consistent, domain-accurate translations. Dictionaries can be created, edited, populated manually, and attached to translation jobs.
+- Q: Can users feed corrections from translation results back into the dictionary? → A: Yes, in the run results view users can select a specific incorrectly translated word/phrase, provide the corrected translation, and submit it to a chosen terminology dictionary for future runs.
+- Q: What access control model should govern translation jobs, dictionaries, and run execution? → A: Fine-grained configurable permissions through the existing role-based access control (RBAC) model.
+- Q: How should the system handle large dictionaries (10K+ terms) that would exceed LLM context window limits if injected in full? → A: Per-batch filtering: before each LLM call, the system scans the rows in the current batch and includes only those dictionary entries whose source_term appears as a substring (case-insensitive, word-boundary-aware) in at least one row of the batch. The dictionary itself has no hard size limit; the prompt grows proportionally to batch content, not total dictionary size.
+- Q: How should scheduled runs detect which source rows need translation (change detection strategy)? → A: New-key-only: the system translates only rows whose key-column values are absent from the most recent run with `insert_status = succeeded` of the same job. Source data is append-only (INSERT-only, no UPDATEs), so existing rows are never retranslated. If the last successful run is older than 90 days and its key data has been pruned, the system falls back to full translation, treating all keys as new. A `run_started` event with reason `baseline_expired` is emitted; the run proceeds as a normal full translation. Manual runs remain available for full retranslation.
+- Q: What level of observability (logging, metrics, alerting) is required for production operation? → A: Full observability: structured event log for every run, batch, and schedule trigger; latency metrics per batch; success/failure counters per job; token usage and cost trends; and an admin dashboard aggregating these signals. Failure notifications via existing notification infrastructure.
+- Q: How long should detailed translation run data (source snapshots, translations, INSERT statements) be retained? → A: 90 days of full detail, then aggregation: detailed run snapshots are retained for 90 days after run completion. Beyond 90 days, only the run metadata record and aggregated metrics (row count, status, token usage, cost) are preserved; source row snapshots and generated INSERT statements are pruned. Cumulative metrics are persisted in a metric snapshot table before event pruning.
+- Q: How is INSERT SQL executed against the target table? → A: All INSERT/UPSERT execution goes through Superset SQL Lab API `/api/v1/sqllab/execute/`. The system submits generated SQL, polls execution status, and records the Superset query reference and outcome. Manual copy/paste into SQL Lab UI is not a supported workflow; generated SQL may be exposed for audit/debugging only.
+- Q: What is the preview model — quality gate or row-level approval? → A: Preview is a quality gate for prompt, settings, and dictionary. After preview is accepted, the full run processes all eligible source rows. Approve/edit/reject actions in preview apply only to the preview sample and serve as quality feedback; they do not gate individual unseen rows. Rejected preview rows are excluded from the full run only if they were part of the sample.
+
+### Session 2026-05-08 (post-review)
+
+- Q: Should the system support "keep both" as a dictionary conflict resolution option? → A: No. Dictionary entries are unique per (dictionary, source_term). Conflict options are: overwrite, keep existing, or cancel. Variant support is deferred.
+- Q: What database dialect does SQL generation target? → A: The dialect is determined dynamically from the Superset datasource's database connection type. Supported dialects for MVP: PostgreSQL (including Greenplum) and ClickHouse. The system queries Superset for the database backend and generates dialect-appropriate SQL (identifier quoting, UPSERT syntax, value encoding).
+- Q: How does the system handle the case where the last successful run's key data has been pruned (90-day retention) and a scheduled new-key-only run triggers? → A: The system falls back to full translation, treating all keys as new. A `run_started` event with reason `baseline_expired` is emitted; the run proceeds as a normal full translation run, recording the usual terminal event upon completion.
+- Q: How are cumulative metrics preserved beyond the 90-day event/record retention window? → A: A metric snapshot is persisted at pruning time, capturing cumulative token count, cost, and run counts. The metrics dashboard reads from both live events and snapshots.
+- Q: What happens when an in-progress run exists and the job configuration is edited? → A: In-progress runs are NOT invalidated. They continue using their config snapshot taken at run start. Configuration changes apply to future runs only (snapshot isolation).
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - Configure a translation job from a Superset datasource (Priority: P1)
+
+An analytics engineer or localization specialist selects a Superset datasource, picks the column whose values need translation, optionally selects context columns that help the LLM understand the meaning, specifies one or more key columns that uniquely identify rows for target-table insertion (with explicit source-to-target key column mapping), and designates the target insertable physical table and column where translated values will be written.
+
+**Why this priority**: Without a correctly configured translation job, no data can flow from source to target. Configuration is the critical prerequisite that gates all downstream value.
+
+**Independent Test**: Can be fully tested by opening the translation job configuration interface, selecting a Superset datasource, specifying translation, context, and key columns with source→target mapping, defining the target table and target column, saving the configuration, and verifying that the system detects the database dialect from the Superset connection, validates column existence and key mapping, and warns if the dialect is unsupported.
+
+**Acceptance Scenarios**:
+
+1. **Given** the user opens the translation configuration interface, **When** they select a Superset datasource, **Then** the system displays available columns with their types and allows the user to designate one translation column, zero or more context columns, and at least one key column with explicit mapping to target key columns.
+2. **Given** the user selects columns from the datasource, **When** they specify a target table and target column name, **Then** the system validates that the mapped key columns exist in both the source datasource schema and the target table schema.
+3. **Given** the user configures multiple key columns (composite key), **When** the configuration is saved, **Then** the system stores the composite key definition with source→target column mapping and uses it for matching rows during INSERT generation.
+4. **Given** the user attempts to save a configuration with no translation column selected, **When** save is triggered, **Then** the system blocks the action and highlights the missing required field.
+5. **Given** the user selects a translation column and context columns, **When** the datasource has computed or virtual columns, **Then** the system distinguishes physical columns from virtual columns and warns if a virtual column is selected as a key column (virtual columns as translation/context columns are allowed if Superset can query them).
+
+---
+
+### User Story 2 - Preview translated output as quality gate (Priority: P2)
+
+Before committing translated data into a target table, the user previews a sample of source rows alongside their LLM-generated translations, reviews translation quality against the provided context and attached dictionaries, adjusts the LLM prompt or target language if needed, and confirms the preview as a quality gate. Preview is a quality check for prompt/settings/dictionary — not a row-level approval for all dataset rows. After preview is accepted, the full run processes all eligible source rows.
+
+**Why this priority**: Translation quality assurance is essential—blindly inserting machine-translated content without preview creates data quality risk that undermines the entire feature.
+
+**Independent Test**: Can be fully tested by running a preview on a translation job with a small sample (e.g., 5–10 rows), verifying that the system shows source values, context values, LLM translations side-by-side, allowing language/prompt adjustment, and confirming that the preview acceptance gates the full run.
+
+**Acceptance Scenarios**:
+
+1. **Given** a saved translation job configuration, **When** the user requests a preview, **Then** the system fetches a configurable number of source rows, sends them to the LLM with the configured context columns and per-batch filtered dictionary, and displays source text, context, and translation in a side-by-side view.
+2. **Given** the preview results are displayed, **When** the user finds an unsatisfactory translation, **Then** they can mark it for retranslation, edit it manually, or reject it as quality feedback. These actions apply only to the preview sample.
+3. **Given** the user adjusts the translation prompt or target language, **When** they re-run the preview, **Then** the system re-fetches the same sample rows and applies the updated prompt/language settings.
+4. **Given** the user is satisfied with preview quality, **When** they confirm preview acceptance, **Then** the system records the preview session as accepted and enables full execution. The full run will process all eligible source rows, not only the preview sample.
+5. **Given** the source table contains a large number of rows, **When** the user requests a full batch execution, **Then** the system warns about the estimated row count, token usage, and cost before proceeding, and allows the user to set a row limit.
+
+---
+
+### User Story 3 - Execute translation and insert results via Superset SQL Lab API (Priority: P3)
+
+The user initiates the full translation batch, the system processes rows through the LLM in configurable batches, generates safe INSERT/UPSERT SQL for the target table keyed by the configured key columns, submits the SQL to Superset via `/api/v1/sqllab/execute/`, polls execution status, and records the Superset query reference with full traceability.
+
+**Why this priority**: Execution is the final value-delivery step; once configuration and quality preview are sound, the user needs reliable, auditable insertion of translated data.
+
+**Independent Test**: Can be fully tested by executing a full batch on a configured job with preview accepted, verifying that the system generates correct INSERT SQL, submits it to Superset SQL Lab API, and records the execution outcome with Superset query reference.
+
+**Acceptance Scenarios**:
+
+1. **Given** a translation job with preview accepted, **When** the user triggers execution, **Then** the system processes source rows in configurable batches, calls the LLM for each batch, generates safe INSERT/UPSERT SQL, and submits it to Superset SQL Lab API `/api/v1/sqllab/execute/`.
+2. **Given** the SQL is submitted to Superset, **When** the system polls execution status, **Then** the run result shows the Superset execution status, query reference, rows affected (if available), and any errors.
+3. **Given** a row already has a translation in the target table (matched by key columns via UNIQUE/PRIMARY KEY constraint), **When** the user triggers execution, **Then** the system applies the configured UPSERT strategy: skip existing (ON CONFLICT DO NOTHING), overwrite (ON CONFLICT DO UPDATE), or plain INSERT (relies on user ensuring key uniqueness).
+4. **Given** the LLM fails to translate a batch (timeout, rate limit, API error), **When** the batch fails, **Then** the system records the failure in the TranslationBatch record with error details, and allows the user to retry only the failed batch without reprocessing successful batches.
+5. **Given** execution completes, **When** the user reviews the run result, **Then** the system shows the number of rows translated, rows skipped, batches processed, Superset execution reference, target table name, and the generated SQL (for audit/debugging).
+
+---
+
+### User Story 4 - Review translation history and audit trail (Priority: P4)
+
+A data steward or auditor reviews past translation runs, inspects which rows were translated with which prompts, traces INSERT executions back to their source rows via Superset query references, and verifies that translation decisions (approvals, edits, rejections) are preserved for compliance.
+
+**Why this priority**: Auditability is important for enterprise use but does not block the core translation workflow. It can be delivered after the primary flow is functional.
+
+**Independent Test**: Can be fully tested by opening the translation history view, selecting a past run, verifying that source rows, translations, prompts, key values, Superset execution references, and INSERT SQL are displayed, and confirming that filtered views by datasource, target table, and date range work.
+
+**Acceptance Scenarios**:
+
+1. **Given** multiple translation runs exist, **When** the user opens translation history, **Then** the system lists runs with datasource name, target table, row count, execution date, status (translation + insert), and the user who triggered them.
+2. **Given** a specific translation run is selected, **When** the user inspects its details, **Then** the system shows the configuration snapshot, the prompt template, the sample of source rows with their translations, the generated SQL, and the Superset execution reference with status.
+3. **Given** a translation run contains rows that were manually edited during preview, **When** the user inspects those rows, **Then** the system clearly marks the original LLM translation and the user-edited final value separately.
+4. **Given** the user wants to reuse a previous configuration, **When** they duplicate a past translation job, **Then** the system creates a new job pre-filled with the previous datasource, columns, keys, and target table configuration.
+5. **Given** a run's detailed data has been pruned (older than 90 days), **When** the user views it, **Then** the system shows run metadata and aggregated metrics; source row snapshots and SQL are marked as unavailable.
+
+---
+
+### User Story 5 - Build and manage a terminology dictionary for consistent translations (Priority: P2)
+
+A localization specialist or domain expert creates a terminology dictionary containing source-term → target-translation pairs, populates it manually or via bulk import, and attaches it to a translation job so the LLM respects these fixed translations rather than guessing domain-specific terms. The dictionary content is injected into the LLM prompt as authoritative context alongside the regular context columns. Dictionary terms are matched against batch rows using case-insensitive, word-boundary-aware substring comparison.
+
+**Why this priority**: Without a terminology dictionary, domain-specific terms will be translated inconsistently or incorrectly by the LLM, undermining trust in the entire translation pipeline. The dictionary must be available before preview and execution to deliver acceptable quality.
+
+**Independent Test**: Can be fully tested by creating a dictionary with 5–10 term pairs, attaching it to a translation job, running preview, and verifying that the LLM output consistently uses the dictionary translations for matched terms rather than generating alternative translations.
+
+**Acceptance Scenarios**:
+
+1. **Given** the user navigates to the dictionary management section, **When** they create a new dictionary, **Then** the system provides an empty table with «Source Term» and «Target Translation» columns and allows adding rows one by one.
+2. **Given** a dictionary has been created, **When** the user opens it, **Then** they can add new term pairs inline, edit existing pairs, delete individual entries, or clear the entire dictionary with confirmation.
+3. **Given** the user has an external list of terms (CSV, TSV, or pasted text), **When** they import it into the dictionary, **Then** the system parses the file, shows a preview of detected term pairs, flags duplicates or conflicts, and allows the user to confirm or adjust before saving. Duplicate source_term entries offer: overwrite, keep existing, or skip the new entry.
+4. **Given** a translation job configuration is open, **When** the user selects a dictionary from the list of available dictionaries (filtered to those matching the job's target language), **Then** the system attaches it to the job and the dictionary content will be injected into every LLM translation request for that job. Dictionaries with a mismatched target language are not offered.
+5. **Given** a dictionary is attached to a job, **When** the LLM processes a batch, **Then** the system includes the per-batch filtered dictionary content in the prompt as an authoritative glossary, instructing the LLM to use the provided translations for exact matches and to consider them for partial or contextual matches.
+6. **Given** multiple dictionaries exist, **When** the user attaches them to a job, **Then** the system merges them into the prompt context in priority order (lower priority number = higher precedence). When the same source_term appears in multiple dictionaries, the highest-priority entry is used; lower-priority duplicates are omitted and surfaced as non-blocking validation notes.
+
+---
+
+### User Story 6 - Correct translations and feed back into the dictionary (Priority: P3)
+
+After a translation run completes, the user reviews the results and notices that a specific word or phrase was translated incorrectly. They select the problematic source term (from the source column value) and the incorrect target translation, provide the correct target translation, and submit it to a chosen terminology dictionary so that future runs use the corrected term. If the same source term already exists in the dictionary, the system asks whether to overwrite or keep the existing entry.
+
+**Why this priority**: The feedback loop turns one-time corrections into permanent improvements. Without it, the same translation mistakes would recur across runs, forcing the user to manually edit the same terms repeatedly. It is valuable but depends on the dictionary (Story 5) already existing.
+
+**Independent Test**: Can be fully tested by completing a translation run, identifying an incorrect translation, selecting the source term and incorrect target term, providing a corrected target term, submitting it to a dictionary, re-running the same job, and verifying that the new translation output uses the corrected term.
+
+**Acceptance Scenarios**:
+
+1. **Given** a completed translation run with results displayed, **When** the user selects a source term and its incorrect target translation within a translated value, **Then** the system shows a pop-up: «Correct this term» with the source term and incorrect translation pre-filled, and an empty input for the corrected target translation.
+2. **Given** the user provides a corrected target translation in the pop-up, **When** they choose a target dictionary (matching the job's target language) from a dropdown and submit, **Then** the system adds the term pair to the selected dictionary and records the origin (which run, which row, which user, timestamp) for audit.
+3. **Given** the corrected source term already exists in the selected dictionary, **When** the user submits, **Then** the system shows a conflict dialog: «Term already exists with translation 'X'. Overwrite with 'Y'?» with options to overwrite, keep existing, or cancel.
+4. **Given** the user selects multiple incorrect translations across different rows in the result view, **When** they use bulk correction mode, **Then** the system collects all selected terms, allows mass editing of the corrected values, and submits them to the dictionary in one atomic operation (all succeed or all fail with conflicts listed).
+5. **Given** a dictionary was updated via the feedback loop, **When** the user re-runs the same translation job, **Then** the system includes the newly added terms in the LLM prompt context and the translation output reflects the corrections.
+
+---
+
+### User Story 7 - Schedule translation jobs for periodic execution (Priority: P3)
+
+A localization manager configures a translation job to run automatically on a schedule — for example, every Monday at 06:00 Europe/Moscow to translate new product names that appeared during the week. Each scheduled execution creates a new Translation Run with the job's configuration snapshot, generates INSERT SQL, submits it to Superset SQL Lab API, and records the outcome. Manual runs require a preview quality gate; scheduled runs may bypass preview only after the job has passed at least one successful manual run with the same effective configuration.
+
+**Why this priority**: Scheduling eliminates the manual overhead of re-running translation jobs when source data changes. It is valuable for operational efficiency but depends on the core execution flow (Stories 1–3) already being stable.
+
+**Independent Test**: Can be fully tested by configuring a schedule for a translation job (e.g., «every 5 minutes» for testing), waiting for the scheduled trigger, and verifying that a new Translation Run was created with the correct configuration, source rows translated (new-key-only), SQL submitted to Superset API, and execution outcome recorded.
+
+**Acceptance Scenarios**:
+
+1. **Given** a saved translation job, **When** the user opens the schedule configuration, **Then** the system offers schedule types: one-time future run, interval-based (every N minutes/hours/days), and cron-based (e.g., «0 6 * * 1»). All schedules include a timezone selector.
+2. **Given** the user configures a schedule, **When** they enable it, **Then** the system validates the cron expression or interval, shows the next 3 planned execution times with timezone, and verifies that the job has at least one prior successful manual run before allowing scheduled execution.
+3. **Given** a scheduled job reaches its trigger time, **When** the scheduler fires, **Then** the system creates a new Translation Run from the job's configuration snapshot, fetches the current source data, and executes the full translation pipeline. Preview is bypassed; the INSERT SQL is submitted to Superset SQL Lab API.
+4. **Given** a scheduled run completes successfully, **When** the user reviews the run, **Then** the generated INSERT SQL is available for audit, and the Superset execution reference is recorded.
+5. **Given** a scheduled run fails (LLM unavailable, datasource inaccessible, Superset API error), **When** the failure occurs, **Then** the system records the failed run with error details, leaves the schedule enabled for the next trigger, and notifies the user via the existing notification infrastructure.
+6. **Given** a translation job has an active schedule, **When** the user edits the job configuration, **Then** the system warns that the schedule will use the updated configuration from the next trigger onward. In-progress runs are NOT invalidated — they continue using their config snapshot.
+7. **Given** a scheduled job should be paused, **When** the user disables the schedule, **Then** the system stops triggering new runs but preserves the schedule configuration for later re-enabling.
+8. **Given** a scheduled run triggers but no new source rows exist (all keys already translated), **When** the system detects this, **Then** a `run_noop` event is recorded with reason `no_new_rows` and no INSERT SQL is generated.
+
+---
+
+### Edge Cases
+
+- What happens when the source datasource contains NULL values in the translation column?  
+  → System MUST skip NULL translation values and log them, continuing with the next row.
+- What happens when a context column value is NULL or empty?  
+  → System MUST send the available context to the LLM, marking NULL context fields as empty with a clear placeholder.
+- How does the system handle a key column value that does not exist in the target table?  
+  → System MUST generate INSERT statements (not UPDATE), treating all rows as new insertions. The key columns serve as identifiers but the target table may not have the row yet. If a UNIQUE/PRIMARY KEY constraint exists and a duplicate is inserted, the UPSERT strategy controls behavior.
+- What happens when the target table does not exist or is inaccessible in Superset?  
+  → System MUST warn the user at configuration time and block execution with a clear explanation.
+- How does the system handle very large source tables (100k+ rows)?  
+  → System MUST enforce configurable batch sizes, show progress, estimate token count and cost before execution, and allow cancellation mid-run.
+- What happens when the LLM provider returns a response in an unexpected format or language?  
+  → System MUST request structured JSON output from the LLM keyed by stable row identifiers. The system MUST validate that each requested row has exactly one translation. Missing, duplicate, malformed, or extra outputs are marked as failed.
+- How does the system handle concurrent translation runs on the same target table?  
+  → System MUST warn if another run targets the same table and key range, and provide guidance to avoid data conflicts.
+- What happens when the user changes the translation column or key columns after a run has started?  
+  → In-progress runs are NOT invalidated. They continue using their config snapshot taken at run start (snapshot isolation). Configuration changes apply to future runs only.
+- What happens when all rows in a batch fail to translate (LLM unavailable, quota exhausted)?  
+  → System MUST preserve the batch state in the TranslationBatch record and allow retry with the same or different LLM provider settings.
+- How does the system handle composite keys where one key component is NULL?  
+  → System MUST reject rows with NULL key values during INSERT generation and report them as unprocessable.
+- What happens when a terminology dictionary contains duplicate source terms?  
+  → System MUST detect duplicates at entry time and require explicit resolution (overwrite or keep existing) before saving.
+- How does the system handle dictionary updates while a translation run is in progress?  
+  → System MUST snapshot the dictionary content at the start of each run so the run uses a consistent dictionary state throughout. Mid-run dictionary edits do not affect the in-progress run.
+- What happens when an attached dictionary is deleted while a job references it?  
+  → System MUST warn the user and prevent deletion of dictionaries that are attached to active or scheduled jobs. Dictionaries attached only to historical runs can be deleted.
+- How does the system handle a scheduled run overlapping with a still-running previous scheduled run?  
+  → System MUST detect overlap (same job, previous run still in progress) and either skip the new trigger (with a log event) or queue it for execution after the previous run completes, depending on the job's configured concurrency policy. Queue holds at most one pending run; additional triggers are skipped.
+- What happens when a scheduled job's datasource becomes unavailable between triggers?  
+  → System MUST record the failure for that trigger, leave the schedule enabled, and attempt the next trigger as planned. After N consecutive failures (configurable, default 3), the system optionally disables the schedule and notifies the user.
+- How does the system handle feedback-loop corrections that reference a different base language than the dictionary's target language?  
+  → System MUST validate that the target language of the dictionary matches the translation job's target language before allowing submission, and reject cross-language corrections with a clear message.
+- What happens when a scheduled new-key-only run triggers but the last successful run is older than 90 days (key data pruned)?  
+  → System MUST fall back to full translation, treating all keys as new. A `run_started` event with reason `baseline_expired` is emitted; the run proceeds as a normal full translation run and records the usual terminal event.
+- What happens when the Superset SQL Lab API execution returns an error?  
+  → System MUST record the error in `TranslationRun.insert_error_message` and mark `insert_status = failed`. The translation data remains available for retry or manual inspection.
+- How does the system handle SQL identifier injection through user-provided table/column names?  
+  → System MUST validate table and column identifiers against Superset datasource metadata and quote them using the detected database dialect rules. Raw user-provided identifiers are never interpolated directly into SQL.
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: The system MUST allow users to create a translation job by selecting a Superset datasource as the source of data.
+- **FR-002**: The system MUST display available columns from the selected datasource and allow the user to designate exactly one column as the translation source column.
+- **FR-003**: The system MUST allow the user to select zero or more context columns whose values are sent to the LLM alongside the translation text to improve translation quality.
+- **FR-004**: The system MUST require the user to select at least one key column with explicit source→target column mapping (supports composite keys) that uniquely identifies each row for INSERT into the target table.
+- **FR-005**: The system MUST allow the user to specify a target insertable physical table name and target column name where translated values will be inserted. Views and materialized views are not supported as targets.
+- **FR-006**: The system MUST validate that the mapped key columns exist in both the source datasource schema and the target table schema, and are type-compatible.
+- **FR-007**: The system MUST support configurable batch sizes for LLM processing to control throughput, token usage, and cost.
+- **FR-008**: The system MUST provide a preview mode that fetches a limited sample of source rows, sends them to the LLM with filtered dictionary context, and displays source values, context, and translations side-by-side as a quality gate before full execution.
+- **FR-009**: The system MUST allow the user to adjust the LLM translation prompt, target language, and provider settings within the translation job configuration.
+- **FR-010**: The system MUST allow the user to mark preview rows as approved, manually edited, or rejected as quality feedback for the preview sample.
+- **FR-011**: The system MUST require preview acceptance before allowing full execution. Rejected preview sample rows are excluded from the full run; approved/edited preview sample rows are included. Unseen rows (not in preview sample) are processed normally.
+- **FR-012**: The system MUST generate safe INSERT/UPSERT SQL for the configured target table and target column, using the dialect detected from the Superset datasource's database connection (supported: PostgreSQL/Greenplum, ClickHouse). Identifier quoting, UPSERT syntax, and value encoding MUST follow dialect-specific rules. Raw user-provided identifiers MUST NOT be interpolated directly.
+- **FR-013**: The system MUST submit generated SQL to Superset via `/api/v1/sqllab/execute/`, poll execution status, and record the Superset query reference, execution status, and error details. Generated SQL MAY be exposed for audit/debugging but is not the primary execution mechanism.
+- **FR-014**: The system MUST estimate and display token count and approximate cost before executing a full translation batch.
+- **FR-015**: The system MUST handle LLM failures (timeout, rate limit, API error) gracefully by recording the failed batch in TranslationBatch and allowing retry of only the failed rows.
+- **FR-016**: The system MUST skip source rows where the translation column value is NULL and log them.
+- **FR-017**: The system MUST reject rows where any key column value is NULL during INSERT generation.
+- **FR-018**: The system MUST support an UPSERT strategy: `skip_existing` (ON CONFLICT DO NOTHING), `overwrite` (ON CONFLICT DO UPDATE), or `insert` (plain INSERT — user guarantees key uniqueness). The system MUST document that `insert` strategy does not handle duplicates.
+- **FR-019**: The system MUST record each translation run with its configuration snapshot (including config_hash), dictionary snapshot, source rows, translations, prompt used, key values, generated SQL, and Superset execution outcome.
+- **FR-020**: The system MUST provide a translation history view listing past runs with datasource, target table, row count, translation status, insert status, date, and triggering user.
+- **FR-021**: The system MUST allow the user to duplicate an existing translation job configuration as a starting point for a new job.
+- **FR-022**: The system MUST warn the user if a concurrent run targets the same target table and overlapping key range.
+- **FR-023**: The system MUST use snapshot isolation: in-progress runs continue using their config snapshot taken at run start. Configuration changes apply to future runs only and do not invalidate in-progress runs.
+- **FR-024**: The system MUST allow users to create, edit, and delete terminology dictionaries, each containing source-term → target-translation pairs.
+- **FR-025**: The system MUST allow users to populate a dictionary by manual inline entry, bulk text paste, or file import (CSV, TSV).
+- **FR-026**: The system MUST detect duplicate source terms within a dictionary at entry time and require the user to resolve conflicts (overwrite or keep existing) before saving.
+- **FR-027**: The system MUST allow users to attach one or more terminology dictionaries to a translation job, with configurable priority ordering (lower priority number = higher precedence). Only dictionaries matching the job's target language are offered for attachment.
+- **FR-028**: The system MUST inject the per-batch filtered content of all attached dictionaries into the LLM translation prompt as an authoritative glossary, instructing the LLM to use provided translations for exact matches and to consider them for partial matches.
+- **FR-029**: The system MUST snapshot the dictionary content at the start of each translation run so the run uses a consistent dictionary state throughout.
+- **FR-030**: The system MUST prevent deletion of dictionaries that are attached to active or scheduled translation jobs.
+- **FR-031**: The system MUST allow users to identify a mistranslated term by selecting the source term and its incorrect target translation within a run result, and submit a corrected target translation to a chosen terminology dictionary.
+- **FR-032**: The system MUST detect when a submitted correction conflicts with an existing dictionary entry and prompt the user to overwrite or keep the existing entry.
+- **FR-033**: The system MUST record the origin of each dictionary entry added via the feedback loop, including source run identifier, source row, submitting user, and timestamp.
+- **FR-034**: The system MUST support bulk correction mode where users select multiple incorrectly translated terms and submit them to a dictionary in one atomic operation (all succeed or all fail with conflicts listed).
+- **FR-035**: The system MUST allow users to configure a schedule for a translation job, supporting one-time future execution, interval-based recurrence, and cron-based recurrence with timezone.
+- **FR-036**: The system MUST display the next N planned execution times (with timezone) when a schedule is configured, so the user can verify the schedule before enabling it.
+- **FR-037**: The system MUST, on each scheduled trigger, create a new Translation Run from the job's configuration snapshot and the current source data state.
+- **FR-038**: The system MUST submit generated INSERT SQL to Superset SQL Lab API for every run (both manual and scheduled). Scheduled runs execute automatically; manual runs execute on user trigger.
+- **FR-039**: The system MUST detect overlapping scheduled runs for the same job and handle them according to a configurable concurrency policy (skip new trigger or queue at most one run).
+- **FR-040**: The system MUST allow users to pause (disable) and resume (re-enable) a schedule without losing the schedule configuration.
+- **FR-041**: The system MUST optionally notify users of scheduled run failures via the existing notification infrastructure.
+- **FR-042**: The system MUST warn the user when editing a job configuration that has an active schedule, confirming that the updated configuration will apply to future triggers without affecting in-progress runs.
+- **FR-043**: The system MUST enforce granular access control on translation resources through the existing RBAC model (see Access Control Matrix below).
+- **FR-044**: The system MUST filter dictionary entries per batch before sending to the LLM: only entries whose source_term appears as a case-insensitive, word-boundary-aware substring in at least one translation-column value within the current batch are included. Dictionaries have no hard size limit.
+- **FR-045**: The system MUST, for scheduled runs, translate only source rows whose key-column values are absent from the most recent run with `insert_status = succeeded` (new-key-only strategy). If that run's key data has been pruned (>90 days), the system falls back to full translation with a `baseline_expired` event.
+- **FR-046**: The system MUST emit structured events for every significant lifecycle transition: run started, batch started/completed/failed, run succeeded/partial/failed/cancelled/skipped, schedule triggered/skipped/failed, insert submitted/succeeded/failed. Events MUST be queryable for audit and trend analysis.
+- **FR-047**: The system MUST track per-job cumulative metrics: total runs, success/failure ratio, cumulative token usage, cumulative estimated cost, average batch latency. Metrics MUST be exposed in an admin-accessible dashboard. Cumulative metrics MUST be persisted in a metric snapshot table before event pruning to survive the 90-day retention window.
+- **FR-048**: The system MUST send a notification via the existing notification infrastructure when a scheduled run fails, including the job name, failure reason, and a link to the failed run details.
+- **FR-049**: The system MUST retain detailed translation run data for 90 days. Beyond 90 days, the system MUST persist a metric snapshot (cumulative token count, cost, run counts) and prune detailed data (source row snapshots, TranslationRecord rows, TranslationEvent rows, generated SQL). Run metadata (row count, status, Superset reference) is preserved.
+
+### Access Control Matrix
+
+| Action | Required Permission | Ownership Constraint |
+|--------|-------------------|---------------------|
+| List jobs | `translate.job.view` | Scoped to owned jobs unless admin |
+| View job | `translate.job.view` | Owner OR admin |
+| Create job | `translate.job.create` | — |
+| Edit job | `translate.job.edit` | Owner OR admin |
+| Delete job | `translate.job.delete` | Owner OR admin |
+| Execute job (manual run) | `translate.job.execute` | Owner OR admin; also requires Superset datasource read access |
+| List dictionaries | `translate.dictionary.view` | Scoped to owned unless admin |
+| Create dictionary | `translate.dictionary.create` | — |
+| Edit dictionary | `translate.dictionary.edit` | Owner OR admin |
+| Delete dictionary | `translate.dictionary.delete` | Owner OR admin |
+| Use dictionary in job | Implicit: dictionary must be visible to user | — |
+| View schedule | `translate.schedule.view` | Owner OR admin |
+| Manage schedule | `translate.schedule.manage` | Owner OR admin |
+| Auto-INSERT on schedule | `translate.schedule.manage` | Owner OR admin; also requires Superset target write access |
+| View history | `translate.history.view` | Scoped to owned runs unless admin |
+| View metrics | `translate.metrics.view` | Admin only by default |
+
+### Key Entities *(include if feature involves data)*
+
+- **Translation Job**: A persistent configuration binding a Superset datasource, source→target column mappings (translation, context, key columns), target insertable physical table/column, LLM settings, prompt template, attached dictionaries with priority ordering, and optional schedule.
+- **Translation Run**: A single execution (manual or scheduled) with translation_status (pending|running|completed|partial|failed|cancelled|skipped) and insert_status (not_started|submitted|running|succeeded|failed|skipped). Contains config snapshot, dictionary snapshot, config_hash, Superset query reference, statistics.
+- **Translation Batch**: A group of TranslationRecord rows processed in one LLM API call. Tracks batch_index, status, row counts, token_count, estimated_cost, latency_ms, error details.
+- **Translation Record**: An individual row containing source text, context, key values, key_hash, LLM translation, optional user edit, final INSERT value, and status.
+- **Preview Session**: A persistent record of a preview quality gate: job_id, user_id, sample_size, config_hash, dictionary_snapshot_hash, status (pending|accepted|rejected), timestamps.
+- **Terminology Dictionary**: A named, language-specific (source_language optional, target_language required) collection of source_term → target_translation pairs with audit origin metadata.
+- **Dictionary Entry**: A single term pair within a dictionary, unique per (dictionary_id, source_term). Stores origin metadata (run_id, row_key, user_id, timestamp) for feedback-loop entries.
+- **Translation Schedule**: Scheduling configuration: type (cron|interval|once), expression, timezone, enabled state, concurrency policy, next_run_at.
+- **Translation Event**: Immutable lifecycle event: run_id (nullable for pre-run events), job_id, event_type, timestamp, type-specific JSON payload.
+- **Metric Snapshot**: Persistent cumulative metrics per job, saved at pruning time: job_id, snapshot_date, cumulative_tokens, cumulative_cost, total_runs, success_runs, failed_runs.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: Users can configure a complete translation job (datasource → columns → keys → target) in under 3 minutes without external documentation.
+- **SC-002**: Preview mode returns translation results for a sample of 10 rows within 30 seconds for standard LLM providers.
+- **SC-003**: 100% of generated SQL for supported dialects (PostgreSQL/Greenplum, ClickHouse) is syntactically valid when tested against validated schemas for each dialect.
+- **SC-004**: Users can recover from a failed batch (LLM timeout, rate limit) and retry only the failed rows in under 2 minutes.
+- **SC-005**: Translation run audit records contain all required traceability information (configuration snapshot, prompt, source rows, translations, INSERT SQL, Superset execution reference) for 100% of completed runs within the 90-day retention window.
+- **SC-006**: At least 80% of pilot users successfully complete the end-to-end flow (configure → preview → execute → verify) on their first attempt during moderated usability review.
+- **SC-007**: NULL translation values are correctly skipped and logged without blocking the remaining rows in 100% of test cases.
+- **SC-008**: Domain-specific terms covered by an attached dictionary are translated consistently (exactly matching the dictionary entry) in at least 95% of cases where the source term appears verbatim in the translation column.
+- **SC-009**: Users can populate a 50-term dictionary via file import in under 1 minute, with duplicate detection completing in under 5 seconds.
+- **SC-010**: Feedback-loop corrections submitted to a dictionary are reflected in the next run of the same job in 100% of cases where the corrected source term reappears.
+- **SC-011**: Scheduled translation runs trigger within ±60 seconds of the planned execution time for at least 98% of triggers under normal operating conditions.
+- **SC-012**: A scheduled run that overlaps with a still-running previous run is correctly skipped or queued (per the configured policy) in 100% of overlap scenarios.
+- **SC-013**: Structured events for 100% of run lifecycle transitions are recorded and queryable within 10 seconds of occurrence.
+- **SC-014**: Per-job cumulative metrics remain accurate (±5%) after pruning events older than 90 days, as verified by comparing pre-prune metric snapshots with post-prune dashboard values.
+- **SC-015**: Detailed run data is pruned within 24 hours after exceeding the 90-day retention window, with metric snapshots and run metadata preserved intact in 100% of cases.
+
+## Assumptions
+
+- Users already have access to Superset datasources and permission to read data from them.
+- The Superset instance supports `/api/v1/sqllab/execute/` and the user's Superset credentials have permission to execute SQL against the target database.
+- The LLM provider is already configured in ss-tools (provider selection, API key, model selection are handled by the existing LLM infrastructure).
+- The target table is a physical insertable table in a database backing the Superset datasource. The database dialect (PostgreSQL/Greenplum or ClickHouse for MVP) is detected from the Superset connection configuration. Views and materialized views are not supported as targets. Unsupported dialects are rejected at configuration time with a clear message.
+- Translation quality is ultimately the user's responsibility; the system provides tools for preview, editing, and approval but does not guarantee translation accuracy.
+- The primary use case is batch translation of static or slowly-changing reference data (not real-time streaming data).
+- Multiple key columns (composite keys) are supported with explicit source→target column mapping.
+- Preview is a mandatory quality gate before manual execution. Scheduled runs may bypass preview only after at least one successful manual run with the same effective configuration.
+- Source data is append-only: new rows are INSERTed over time but existing rows are never UPDATEd in place. Scheduled runs use a new-key-only strategy — only previously unseen key values trigger translation.
+- The feature is intended for internal operational use where data volume is measured in thousands to tens of thousands of rows per run.
+- Terminology dictionaries are language-specific; a dictionary's target_language must match the job's target_language for attachment.
+- The scheduling infrastructure builds on existing scheduler foundations already present in the ss-tools backend.
+- Dictionary content is treated as authoritative by the LLM for exact matches. The LLM may deviate for terms not present in the dictionary or for partial matches.
+- Dictionaries have no hard size limit; per-batch case-insensitive, word-boundary-aware filtering ensures only relevant terms are injected into each LLM prompt.
+- The feedback-loop correction flow requires the user to identify both the source term and the incorrect target translation.
+- Concurrency policies for scheduled runs default to «skip»; queuing holds at most one pending run per job.
+- Access control for translation resources uses the existing RBAC infrastructure with the permission matrix defined above.
+- Snapshot isolation: in-progress runs use their config snapshot; configuration edits affect only future runs.
+- Cumulative metrics survive the 90-day retention window via metric snapshots persisted at pruning time.