# Semantic Contracts: LLM Table Translation Service **Feature Branch**: `028-llm-datasource-supeset` **Date**: 2026-05-08 (updated post-review) **Protocol**: GRACE-Poly v2.4 --- ## Backend Contracts (Python / FastAPI) ### [DEF:TranslatePlugin:Module] `backend/src/plugins/translate/plugin.py` @COMPLEXITY 3 @PURPOSE Plugin entry point implementing PluginBase for the translation service. Registers API routes, ORM models, and scheduler integration. @RELATION INHERITS -> [PluginBase:Class] @RELATION CALLS -> [TranslationOrchestrator:Class] @RELATION CALLS -> [DictionaryManager:Class] @RELATION CALLS -> [TranslationScheduler:Class] @RELATION BINDS_TO -> [TranslateRoutes:Module] [/DEF:TranslatePlugin:Module] ### [DEF:TranslationOrchestrator:Class] `backend/src/plugins/translate/orchestrator.py` @COMPLEXITY 5 @PURPOSE Central coordinator for translation run lifecycle: validates preconditions, manages preview quality gate, dispatches to executor, generates safe SQL, submits to Superset SQL Lab API, manages retry, records events, enforces retention. @PRE Job configuration is saved and valid. Superset datasource is accessible. LLM provider is configured and reachable. For manual runs: preview session must be accepted. For scheduled runs: at least one prior successful manual run exists. @POST TranslationRun created with translation_status and insert_status. INSERT SQL generated (safe PostgreSQL dialect) and submitted to Superset `/api/v1/sqllab/execute/`. Superset query reference recorded. Structured events recorded for all lifecycle transitions. @SIDE_EFFECT Creates TranslationRun, TranslationBatch, TranslationRecord, TranslationEvent, MetricSnapshot rows. Calls LLM provider API (token consumption). Calls Superset SQL Lab API. @DATA_CONTRACT Input: TranslationJob (config snapshot) + datasource rows → Output: TranslationRun (result) + Superset query reference @INVARIANT A run must transition through states: pending → running → (completed|partial|failed|cancelled|skipped). insert_status must transition through one of: not_started → skipped (if no insert needed); not_started → submitted → (succeeded|failed) (Superset may complete immediately); not_started → submitted → running → (succeeded|failed) (async polling). No other state transitions are allowed. Snapshot isolation: in-progress runs use config snapshot; config edits affect future runs only. @RELATION CALLS -> [TranslationPreview:Class] @RELATION CALLS -> [TranslationExecutor:Class] @RELATION CALLS -> [SQLGenerator:Class] @RELATION CALLS -> [SupersetSqlLabExecutor:Class] @RELATION CALLS -> [TranslationEventLog:Class] @RELATION CALLS -> [LLMProviderService:Module] @RELATION CALLS -> [SupersetClient:Module] @RATIONALE Centralized orchestrator is needed because preview gating, execution, SQL generation, Superset API submission, event logging, and retry share state (run_id, config snapshot) and must coordinate within a single transaction boundary for consistency. @REJECTED Distributed actor model (Celery tasks per batch) was rejected because it introduces eventual-consistency challenges for run status tracking without proportional benefit at the expected scale. Synchronous batch processing provides simpler debugging and deterministic retry. UPDATE statements are never generated — all modifications use INSERT/UPSERT per PostgreSQL dialect. [/DEF:TranslationOrchestrator:Class] ### [DEF:TranslationPreview:Class] `backend/src/plugins/translate/preview.py` @COMPLEXITY 4 @PURPOSE Fetch a sample of source rows, send to LLM with configured context and per-batch filtered dictionary, return side-by-side preview for quality gate acceptance. Creates persistent PreviewSession and PreviewRecord rows. @PRE Job configuration is saved. Datasource is accessible. Preview row count is configured (default: 10). @POST PreviewSession created with config_hash and dict_snapshot_hash. PreviewRecord rows returned with source_text, context, key_values, llm_translation, and status. Accepting the preview session gates full execution. No data is persisted to target table. @SIDE_EFFECT Calls LLM provider API (token consumption). Creates PreviewSession and PreviewRecord rows. @RELATION CALLS -> [LLMProviderService:Module] @RELATION CALLS -> [SupersetClient:Module] @RELATION CALLS -> [DictionaryManager:Class] [/DEF:TranslationPreview:Class] ### [DEF:TranslationExecutor:Class] `backend/src/plugins/translate/executor.py` @COMPLEXITY 4 @PURPOSE Process source rows in batches through LLM, collect translations, handle batch-level retry, and produce TranslationBatch and TranslationRecord rows. Requests structured JSON output from LLM keyed by stable row identifiers; validates row alignment. @PRE Run exists with translation_status `running`. Source rows fetched. Batch size configured. @POST All processable rows have TranslationRecord entries. Each batch has a TranslationBatch record with statistics and timing. Run statistics updated. @SIDE_EFFECT Calls LLM provider API (token consumption). Creates TranslationBatch and TranslationRecord rows. @RELATION CALLS -> [LLMProviderService:Module] @RELATION CALLS -> [DictionaryManager:Class] [/DEF:TranslationExecutor:Class] ### [DEF:SQLGenerator:Class] `backend/src/plugins/translate/sql_generator.py` @COMPLEXITY 3 @PURPOSE Generate safe dialect-appropriate INSERT/UPSERT SQL from TranslationRecord rows, keyed by configured target key columns. Detects dialect from Superset connection (PostgreSQL/Greenplum, ClickHouse supported for MVP). Validates and quotes identifiers per dialect rules; safely encodes values. @PRE TranslationRecord rows exist with status translated or edited and final_value is non-null. Target table schema validated at configuration time. Target database dialect is PostgreSQL. @POST Returns a syntactically valid, injection-safe PostgreSQL INSERT (or INSERT ... ON CONFLICT) statement string. @RELATION CALLS -> [SupersetClient:Module] @RATIONALE Separate contract because SQL generation reused by both manual and scheduled runs, and independently testable for SQL syntax correctness (SC-003) and injection safety. @REJECTED UPDATE statements not generated because source is append-only (new-key-only strategy). UPSERT covers the overwrite case without separate UPDATE logic. [/DEF:SQLGenerator:Class] ### [DEF:SupersetSqlLabExecutor:Class] `backend/src/plugins/translate/superset_executor.py` @COMPLEXITY 3 @PURPOSE Submit generated SQL to Superset SQL Lab API `/api/v1/sqllab/execute/`, poll execution status, record Superset query reference, status, and error details in the TranslationRun. @PRE TranslationRun exists with translation_status completed/partial. SQL is generated and syntactically valid. @POST TranslationRun.insert_status updated (submitted→running→succeeded|failed). Superset query reference, error details, rows_affected (if available) stored. @SIDE_EFFECT Calls Superset API (SQL execution, database write). Updates TranslationRun row. @RELATION CALLS -> [SupersetClient:Module] @RELATION CALLS -> [TranslationEventLog:Class] [/DEF:SupersetSqlLabExecutor:Class] ### [DEF:DictionaryManager:Class] `backend/src/plugins/translate/dictionary.py` @COMPLEXITY 4 @PURPOSE CRUD for TerminologyDictionary and DictionaryEntry (unique per dictionary_id + source_term). CSV/TSV import with conflict detection (overwrite/keep existing). Per-batch term filtering (case-insensitive, word-boundary-aware substring matching). Language validation on job attachment. @PRE Dictionary model exists. User has required permissions. For attachment: dictionary target_language matches job target_language. @POST CRUD operations reflected in database. Filtered term list returned for batch. Entries with mismatched language rejected on attachment. @SIDE_EFFECT Creates/updates/deletes DictionaryEntry rows. May read TranslationRun for origin tracking. @RELATION CALLS -> [DictionaryEntry:Class] @RELATION CALLS -> [TranslationJobDictionary:Class] [/DEF:DictionaryManager:Class] ### [DEF:TranslationScheduler:Class] `backend/src/plugins/translate/scheduler.py` @COMPLEXITY 4 @PURPOSE Manage translation job schedules with timezone support: create/update/delete, register with APScheduler, handle trigger dispatch with concurrency policy (skip/queue at most one), enforce new-key-only strategy with baseline_expired fallback. @PRE SchedulerService is running. Job has at least one prior successful manual run. Job configuration exists. @POST Schedule registered with APScheduler or removed. On trigger: new TranslationRun created and dispatched to TranslationOrchestrator. New-key-only filter applied; if baseline expired (>90 days since last successful run), full translation with baseline_expired event. @SIDE_EFFECT Creates TranslationRun rows. Calls SchedulerService. Emits schedule_triggered/schedule_skipped/schedule_failed events. Calls NotificationService on failure. @RELATION CALLS -> [SchedulerService:Class] @RELATION CALLS -> [TranslationOrchestrator:Class] @RELATION CALLS -> [TranslationEventLog:Class] @RELATION CALLS -> [NotificationService:Module] [/DEF:TranslationScheduler:Class] ### [DEF:TranslationEventLog:Class] `backend/src/plugins/translate/events.py` @COMPLEXITY 5 @PURPOSE Structured event logging: write immutable events (run_id nullable for pre-run events), query events for audit/dashboard, enforce 90-day retention pruning with MetricSnapshot persistence before deletion. @PRE TranslationEvent table exists. Event type recognized. For run-scoped events: run exists. For pre-run events (schedule_*, run_noop): run_id is NULL. @POST Event row created with type-specific payload. Pruning job: persists MetricSnapshot, then removes events and records older than 90 days. @SIDE_EFFECT Creates TranslationEvent rows. Creates MetricSnapshot rows at pruning time. Deletes expired rows. @DATA_CONTRACT Input: (run_id?, job_id, event_type, payload: dict) → Output: TranslationEvent @INVARIANT Every created run MUST have exactly one run_started event and exactly one terminal event among: run_succeeded, run_partial, run_failed, run_cancelled, run_skipped. Events are immutable after creation. Cumulative metrics survive pruning via MetricSnapshot. @RELATION CALLS -> [TranslationEvent:Class] @RELATION CALLS -> [MetricSnapshot:Class] @RATIONALE C5 warranted: event log is single source of truth for observability, metrics, and audit. Immutability, retention, and metric continuity invariants must be enforced to prevent data loss or tampering. @REJECTED stdout-only logging lacks structured payload integrity and cannot enforce terminal-event invariant. Event-sourced metrics without snapshots would lose cumulative data after pruning. [/DEF:TranslationEventLog:Class] ### [DEF:TranslationMetrics:Class] `backend/src/plugins/translate/metrics.py` @COMPLEXITY 3 @PURPOSE Aggregate per-job metrics from live TranslationEvent log AND persistent MetricSnapshot table. For recent data (<90 days): compute from events. For cumulative totals: read latest MetricSnapshot + recent events. @PRE TranslationEvent rows or MetricSnapshot rows exist for target job. @POST Returns MetricsResponse DTO with accurate cumulative values spanning both live and pruned data. @RELATION CALLS -> [TranslationEventLog:Class] @RELATION CALLS -> [MetricSnapshot:Class] [/DEF:TranslationMetrics:Class] ### [DEF:TranslateRoutes:Module] `backend/src/api/routes/translate.py` @COMPLEXITY 3 @PURPOSE FastAPI route handlers: CRUD jobs/dictionaries, preview, run trigger, cancel run, schedule, history, metrics, feedback-loop submission. All endpoints enforce RBAC per access-control matrix. @RELATION CALLS -> [TranslationOrchestrator:Class] @RELATION CALLS -> [DictionaryManager:Class] @RELATION CALLS -> [TranslationScheduler:Class] @RELATION CALLS -> [TranslationEventLog:Class] @RELATION CALLS -> [TranslationMetrics:Class] @RELATION BINDS_TO -> [PermissionChecker:Dependency] [/DEF:TranslateRoutes:Module] ### [DEF:TranslateModels:Module] `backend/src/models/translate.py` @COMPLEXITY 2 @PURPOSE SQLAlchemy ORM models: TranslationJob, TranslationRun, TranslationBatch, TranslationRecord, TranslationEvent, TranslationPreviewSession, TranslationPreviewRecord, TerminologyDictionary, DictionaryEntry, TranslationSchedule, TranslationJobDictionary, MetricSnapshot. @RELATION INHERITS -> [Base:Class] [/DEF:TranslateModels:Module] ### [DEF:TranslateSchemas:Module] `backend/src/schemas/translate.py` @COMPLEXITY 2 @PURPOSE Pydantic v2 request/response schemas for translation API endpoints. [/DEF:TranslateSchemas:Module] --- ## Frontend Contracts (Svelte 5 / SvelteKit) ### [DEF:TranslateJobList:Component] `frontend/src/routes/translate/+page.svelte` @COMPLEXITY 3 @PURPOSE SvelteKit page listing all translation jobs with status/schedule indicators. @UX_STATE idle, loading, empty, populated, error [/DEF:TranslateJobList:Component] ### [DEF:TranslationJobConfig:Component] `frontend/src/routes/translate/[id]/+page.svelte` @COMPLEXITY 3 @PURPOSE SvelteKit page for job configuration: datasource selection, column mapping with source→target key mapping, target table/column, LLM settings, dictionary attachment (language-filtered), schedule tab with timezone. @UX_STATE idle, loading, configured, saving, validation_error, datasource_unavailable @UX_REACTIVITY Column list $derived from datasource; dictionary list filtered by target_language [/DEF:TranslationJobConfig:Component] ### [DEF:TranslationPreview:Component] `frontend/src/lib/components/translate/TranslationPreview.svelte` @COMPLEXITY 4 @PURPOSE Side-by-side preview of source rows with LLM translations, approve/edit/reject as quality feedback, accept preview as quality gate. Shows config_hash to detect stale previews. @UX_STATE idle, loading, preview_loaded, preview_error, accepted, stale_config @UX_FEEDBACK Spinner during LLM call; visual distinction for LLM-generated vs user-edited values @UX_RECOVERY Retry preview; re-fetch with updated config [/DEF:TranslationPreview:Component] ### [DEF:TranslationRunProgress:Component] `frontend/src/lib/components/translate/TranslationRunProgress.svelte` @COMPLEXITY 4 @PURPOSE Live progress display: progress bar, batch counter, success/failure counts, cancel button. WebSocket-driven. Shows both translation and insert execution phases. @UX_STATE idle, running, cancelling, cancelled, completed, partial, failed, insert_pending, insert_running, insert_failed @UX_FEEDBACK Progress percentage $derived; real-time batch/insert status; Superset execution reference on completion @UX_RECOVERY Retry failed batches; cancel run; view generated SQL (audit/debug) [/DEF:TranslationRunProgress:Component] ### [DEF:TranslationRunResult:Component] `frontend/src/lib/components/translate/TranslationRunResult.svelte` @COMPLEXITY 4 @PURPOSE Completion summary with statistics, Superset execution status/reference, generated SQL (audit/debug), inline feedback-loop correction controls. @UX_STATE completed, partial, failed, insert_failed @UX_FEEDBACK Superset execution status badge; SQL block for audit @UX_RECOVERY Retry failed rows; retry insert; submit corrections @RELATION CALLS -> [TermCorrectionPopup:Component] [/DEF:TranslationRunResult:Component] ### [DEF:TermCorrectionPopup:Component] `frontend/src/lib/components/translate/TermCorrectionPopup.svelte` @COMPLEXITY 3 @PURPOSE Popup for selecting source term + incorrect target term in run results, providing corrected target term, submitting to dictionary. Conflict: overwrite or keep existing. @UX_STATE closed, selecting, editing, submitting, conflict_detected, submitted [/DEF:TermCorrectionPopup:Component] ### [DEF:BulkCorrectionSidebar:Component] `frontend/src/lib/components/translate/BulkCorrectionSidebar.svelte` @COMPLEXITY 3 @PURPOSE Sidebar for bulk correction: collect multiple terms, mass edit, submit atomically. @UX_STATE closed, collecting, reviewing, submitting, submitted [/DEF:BulkCorrectionSidebar:Component] ### [DEF:DictionaryEditor:Component] `frontend/src/lib/components/translate/DictionaryEditor.svelte` @COMPLEXITY 3 @PURPOSE Inline editor: add/edit/delete entries, CSV/TSV import with conflict preview (overwrite/keep existing), export. @UX_STATE idle, loading, editing, importing, import_preview, import_conflict, saving [/DEF:DictionaryEditor:Component] ### [DEF:DictionaryList:Component] `frontend/src/routes/translate/dictionaries/+page.svelte` @COMPLEXITY 3 @PURPOSE SvelteKit page listing dictionaries with language, term count, attachment info. @UX_STATE idle, loading, empty, populated, delete_blocked [/DEF:DictionaryList:Component] ### [DEF:ScheduleConfig:Component] `frontend/src/lib/components/translate/ScheduleConfig.svelte` @COMPLEXITY 3 @PURPOSE Schedule panel: type selector, cron/interval with timezone, next-N-executions preview, concurrency policy, enable/disable. Warns if no prior successful manual run. @UX_STATE idle, editing, validating, enabled, disabled, no_prior_run_warning @UX_REACTIVITY Next execution times $derived with timezone display [/DEF:ScheduleConfig:Component] ### [DEF:TranslationHistory:Component] `frontend/src/routes/translate/history/+page.svelte` @COMPLEXITY 3 @PURPOSE Filterable run history: datasource, target table, row count, translation_status, insert_status, date. Detail view with config snapshot, Superset reference, SQL. Pruned runs show metadata only. @UX_STATE idle, loading, empty, populated, detail_open, pruned [/DEF:TranslationHistory:Component] ### [DEF:TranslateApiClient:Module] `frontend/src/lib/api/translate.js` @COMPLEXITY 2 @PURPOSE API client wrapping requestApi/fetchApi for all translate endpoints. [/DEF:TranslateApiClient:Module] ### [DEF:translateStore:Store] `frontend/src/lib/stores/translate.js` @COMPLEXITY 3 @PURPOSE Svelte 5 rune store for translation feature state. @RELATION BINDS_TO -> [TranslateApiClient:Module] @RELATION BINDS_TO -> [TaskWebSocket:Module] [/DEF:translateStore:Store] --- ## Integration Contracts (Existing System) | Contract ID | What It Provides | How Translation Uses It | |-------------|-----------------|------------------------| | `[LLMProviderService:Module]` | LLM API call with provider selection, key encryption | Sends batches with constructed prompts; receives structured JSON translations | | `[SupersetClient:Module]` | Superset API: datasource schema, SQL Lab execution | Fetch column metadata, submit SQL to `/api/v1/sqllab/execute/`, poll status | | `[SchedulerService:Class]` | APScheduler lifecycle, add_job/remove_job | Translation schedules registered as APScheduler jobs | | `[NotificationService:Module]` | Email/in-app notification dispatch | Scheduled run failure notifications | | `[PermissionChecker:Dependency]` | FastAPI dependency for RBAC enforcement | Route handlers annotated per access-control matrix | | `[TaskWebSocket:Module]` | WebSocket for real-time task progress | Translation run progress events streamed to frontend | | `[TaskContext:Class]` | Background task lifecycle context | Orchestrator runs as async background task |