18 KiB
Semantic Contracts: LLM Table Translation Service
Feature Branch: 028-llm-datasource-supeset
Date: 2026-05-08 (updated post-review)
Protocol: GRACE-Poly v2.4
Backend Contracts (Python / FastAPI)
[DEF:TranslatePlugin:Module]
backend/src/plugins/translate/plugin.py
@COMPLEXITY 3
@PURPOSE Plugin entry point implementing PluginBase for the translation service. Registers API routes, ORM models, and scheduler integration.
@RELATION INHERITS -> [PluginBase:Class]
@RELATION CALLS -> [TranslationOrchestrator:Class]
@RELATION CALLS -> [DictionaryManager:Class]
@RELATION CALLS -> [TranslationScheduler:Class]
@RELATION BINDS_TO -> [TranslateRoutes:Module]
[/DEF:TranslatePlugin:Module]
[DEF:TranslationOrchestrator:Class]
backend/src/plugins/translate/orchestrator.py
@COMPLEXITY 5
@PURPOSE Central coordinator for translation run lifecycle: validates preconditions, manages preview quality gate, dispatches to executor, generates safe SQL, submits to Superset SQL Lab API, manages retry, records events, enforces retention.
@PRE Job configuration is saved and valid. Superset datasource is accessible. LLM provider is configured and reachable. For manual runs: preview session must be accepted. For scheduled runs: at least one prior successful manual run exists.
@POST TranslationRun created with translation_status and insert_status. INSERT SQL generated (safe PostgreSQL dialect) and submitted to Superset /api/v1/sqllab/execute/. Superset query reference recorded. Structured events recorded for all lifecycle transitions.
@SIDE_EFFECT Creates TranslationRun, TranslationBatch, TranslationRecord, TranslationEvent, MetricSnapshot rows. Calls LLM provider API (token consumption). Calls Superset SQL Lab API.
@DATA_CONTRACT Input: TranslationJob (config snapshot) + datasource rows → Output: TranslationRun (result) + Superset query reference
@INVARIANT A run must transition through states: pending → running → (completed|partial|failed|cancelled|skipped). insert_status must transition through one of: not_started → skipped (if no insert needed); not_started → submitted → (succeeded|failed) (Superset may complete immediately); not_started → submitted → running → (succeeded|failed) (async polling). No other state transitions are allowed. Snapshot isolation: in-progress runs use config snapshot; config edits affect future runs only.
@RELATION CALLS -> [TranslationPreview:Class]
@RELATION CALLS -> [TranslationExecutor:Class]
@RELATION CALLS -> [SQLGenerator:Class]
@RELATION CALLS -> [SupersetSqlLabExecutor:Class]
@RELATION CALLS -> [TranslationEventLog:Class]
@RELATION CALLS -> [LLMProviderService:Module]
@RELATION CALLS -> [SupersetClient:Module]
@RATIONALE Centralized orchestrator is needed because preview gating, execution, SQL generation, Superset API submission, event logging, and retry share state (run_id, config snapshot) and must coordinate within a single transaction boundary for consistency.
@REJECTED Distributed actor model (Celery tasks per batch) was rejected because it introduces eventual-consistency challenges for run status tracking without proportional benefit at the expected scale. Synchronous batch processing provides simpler debugging and deterministic retry. UPDATE statements are never generated — all modifications use INSERT/UPSERT per PostgreSQL dialect.
[/DEF:TranslationOrchestrator:Class]
[DEF:TranslationPreview:Class]
backend/src/plugins/translate/preview.py
@COMPLEXITY 4
@PURPOSE Fetch a sample of source rows, send to LLM with configured context and per-batch filtered dictionary, return side-by-side preview for quality gate acceptance. Creates persistent PreviewSession and PreviewRecord rows.
@PRE Job configuration is saved. Datasource is accessible. Preview row count is configured (default: 10).
@POST PreviewSession created with config_hash and dict_snapshot_hash. PreviewRecord rows returned with source_text, context, key_values, llm_translation, and status. Accepting the preview session gates full execution. No data is persisted to target table.
@SIDE_EFFECT Calls LLM provider API (token consumption). Creates PreviewSession and PreviewRecord rows.
@RELATION CALLS -> [LLMProviderService:Module]
@RELATION CALLS -> [SupersetClient:Module]
@RELATION CALLS -> [DictionaryManager:Class]
[/DEF:TranslationPreview:Class]
[DEF:TranslationExecutor:Class]
backend/src/plugins/translate/executor.py
@COMPLEXITY 4
@PURPOSE Process source rows in batches through LLM, collect translations, handle batch-level retry, and produce TranslationBatch and TranslationRecord rows. Requests structured JSON output from LLM keyed by stable row identifiers; validates row alignment.
@PRE Run exists with translation_status running. Source rows fetched. Batch size configured.
@POST All processable rows have TranslationRecord entries. Each batch has a TranslationBatch record with statistics and timing. Run statistics updated.
@SIDE_EFFECT Calls LLM provider API (token consumption). Creates TranslationBatch and TranslationRecord rows.
@RELATION CALLS -> [LLMProviderService:Module]
@RELATION CALLS -> [DictionaryManager:Class]
[/DEF:TranslationExecutor:Class]
[DEF:SQLGenerator:Class]
backend/src/plugins/translate/sql_generator.py
@COMPLEXITY 3
@PURPOSE Generate safe dialect-appropriate INSERT/UPSERT SQL from TranslationRecord rows, keyed by configured target key columns. Detects dialect from Superset connection (PostgreSQL/Greenplum, ClickHouse supported for MVP). Validates and quotes identifiers per dialect rules; safely encodes values.
@PRE TranslationRecord rows exist with status translated or edited and final_value is non-null. Target table schema validated at configuration time. Target database dialect is PostgreSQL.
@POST Returns a syntactically valid, injection-safe PostgreSQL INSERT (or INSERT ... ON CONFLICT) statement string.
@RELATION CALLS -> [SupersetClient:Module]
@RATIONALE Separate contract because SQL generation reused by both manual and scheduled runs, and independently testable for SQL syntax correctness (SC-003) and injection safety.
@REJECTED UPDATE statements not generated because source is append-only (new-key-only strategy). UPSERT covers the overwrite case without separate UPDATE logic.
[/DEF:SQLGenerator:Class]
[DEF:SupersetSqlLabExecutor:Class]
backend/src/plugins/translate/superset_executor.py
@COMPLEXITY 3
@PURPOSE Submit generated SQL to Superset SQL Lab API /api/v1/sqllab/execute/, poll execution status, record Superset query reference, status, and error details in the TranslationRun.
@PRE TranslationRun exists with translation_status completed/partial. SQL is generated and syntactically valid.
@POST TranslationRun.insert_status updated (submitted→running→succeeded|failed). Superset query reference, error details, rows_affected (if available) stored.
@SIDE_EFFECT Calls Superset API (SQL execution, database write). Updates TranslationRun row.
@RELATION CALLS -> [SupersetClient:Module]
@RELATION CALLS -> [TranslationEventLog:Class]
[/DEF:SupersetSqlLabExecutor:Class]
[DEF:DictionaryManager:Class]
backend/src/plugins/translate/dictionary.py
@COMPLEXITY 4
@PURPOSE CRUD for TerminologyDictionary and DictionaryEntry (unique per dictionary_id + source_term). CSV/TSV import with conflict detection (overwrite/keep existing). Per-batch term filtering (case-insensitive, word-boundary-aware substring matching). Language validation on job attachment.
@PRE Dictionary model exists. User has required permissions. For attachment: dictionary target_language matches job target_language.
@POST CRUD operations reflected in database. Filtered term list returned for batch. Entries with mismatched language rejected on attachment.
@SIDE_EFFECT Creates/updates/deletes DictionaryEntry rows. May read TranslationRun for origin tracking.
@RELATION CALLS -> [DictionaryEntry:Class]
@RELATION CALLS -> [TranslationJobDictionary:Class]
[/DEF:DictionaryManager:Class]
[DEF:TranslationScheduler:Class]
backend/src/plugins/translate/scheduler.py
@COMPLEXITY 4
@PURPOSE Manage translation job schedules with timezone support: create/update/delete, register with APScheduler, handle trigger dispatch with concurrency policy (skip/queue at most one), enforce new-key-only strategy with baseline_expired fallback.
@PRE SchedulerService is running. Job has at least one prior successful manual run. Job configuration exists.
@POST Schedule registered with APScheduler or removed. On trigger: new TranslationRun created and dispatched to TranslationOrchestrator. New-key-only filter applied; if baseline expired (>90 days since last successful run), full translation with baseline_expired event.
@SIDE_EFFECT Creates TranslationRun rows. Calls SchedulerService. Emits schedule_triggered/schedule_skipped/schedule_failed events. Calls NotificationService on failure.
@RELATION CALLS -> [SchedulerService:Class]
@RELATION CALLS -> [TranslationOrchestrator:Class]
@RELATION CALLS -> [TranslationEventLog:Class]
@RELATION CALLS -> [NotificationService:Module]
[/DEF:TranslationScheduler:Class]
[DEF:TranslationEventLog:Class]
backend/src/plugins/translate/events.py
@COMPLEXITY 5
@PURPOSE Structured event logging: write immutable events (run_id nullable for pre-run events), query events for audit/dashboard, enforce 90-day retention pruning with MetricSnapshot persistence before deletion.
@PRE TranslationEvent table exists. Event type recognized. For run-scoped events: run exists. For pre-run events (schedule_*, run_noop): run_id is NULL.
@POST Event row created with type-specific payload. Pruning job: persists MetricSnapshot, then removes events and records older than 90 days.
@SIDE_EFFECT Creates TranslationEvent rows. Creates MetricSnapshot rows at pruning time. Deletes expired rows.
@DATA_CONTRACT Input: (run_id?, job_id, event_type, payload: dict) → Output: TranslationEvent
@INVARIANT Every created run MUST have exactly one run_started event and exactly one terminal event among: run_succeeded, run_partial, run_failed, run_cancelled, run_skipped. Events are immutable after creation. Cumulative metrics survive pruning via MetricSnapshot.
@RELATION CALLS -> [TranslationEvent:Class]
@RELATION CALLS -> [MetricSnapshot:Class]
@RATIONALE C5 warranted: event log is single source of truth for observability, metrics, and audit. Immutability, retention, and metric continuity invariants must be enforced to prevent data loss or tampering.
@REJECTED stdout-only logging lacks structured payload integrity and cannot enforce terminal-event invariant. Event-sourced metrics without snapshots would lose cumulative data after pruning.
[/DEF:TranslationEventLog:Class]
[DEF:TranslationMetrics:Class]
backend/src/plugins/translate/metrics.py
@COMPLEXITY 3
@PURPOSE Aggregate per-job metrics from live TranslationEvent log AND persistent MetricSnapshot table. For recent data (<90 days): compute from events. For cumulative totals: read latest MetricSnapshot + recent events.
@PRE TranslationEvent rows or MetricSnapshot rows exist for target job.
@POST Returns MetricsResponse DTO with accurate cumulative values spanning both live and pruned data.
@RELATION CALLS -> [TranslationEventLog:Class]
@RELATION CALLS -> [MetricSnapshot:Class]
[/DEF:TranslationMetrics:Class]
[DEF:TranslateRoutes:Module]
backend/src/api/routes/translate.py
@COMPLEXITY 3
@PURPOSE FastAPI route handlers: CRUD jobs/dictionaries, preview, run trigger, cancel run, schedule, history, metrics, feedback-loop submission. All endpoints enforce RBAC per access-control matrix.
@RELATION CALLS -> [TranslationOrchestrator:Class]
@RELATION CALLS -> [DictionaryManager:Class]
@RELATION CALLS -> [TranslationScheduler:Class]
@RELATION CALLS -> [TranslationEventLog:Class]
@RELATION CALLS -> [TranslationMetrics:Class]
@RELATION BINDS_TO -> [PermissionChecker:Dependency]
[/DEF:TranslateRoutes:Module]
[DEF:TranslateModels:Module]
backend/src/models/translate.py
@COMPLEXITY 2
@PURPOSE SQLAlchemy ORM models: TranslationJob, TranslationRun, TranslationBatch, TranslationRecord, TranslationEvent, TranslationPreviewSession, TranslationPreviewRecord, TerminologyDictionary, DictionaryEntry, TranslationSchedule, TranslationJobDictionary, MetricSnapshot.
@RELATION INHERITS -> [Base:Class]
[/DEF:TranslateModels:Module]
[DEF:TranslateSchemas:Module]
backend/src/schemas/translate.py
@COMPLEXITY 2
@PURPOSE Pydantic v2 request/response schemas for translation API endpoints.
[/DEF:TranslateSchemas:Module]
Frontend Contracts (Svelte 5 / SvelteKit)
[DEF:TranslateJobList:Component]
frontend/src/routes/translate/+page.svelte
@COMPLEXITY 3
@PURPOSE SvelteKit page listing all translation jobs with status/schedule indicators.
@UX_STATE idle, loading, empty, populated, error
[/DEF:TranslateJobList:Component]
[DEF:TranslationJobConfig:Component]
frontend/src/routes/translate/[id]/+page.svelte
@COMPLEXITY 3
@PURPOSE SvelteKit page for job configuration: datasource selection, column mapping with source→target key mapping, target table/column, LLM settings, dictionary attachment (language-filtered), schedule tab with timezone.
@UX_STATE idle, loading, configured, saving, validation_error, datasource_unavailable
@UX_REACTIVITY Column list $derived from datasource; dictionary list filtered by target_language
[/DEF:TranslationJobConfig:Component]
[DEF:TranslationPreview:Component]
frontend/src/lib/components/translate/TranslationPreview.svelte
@COMPLEXITY 4
@PURPOSE Side-by-side preview of source rows with LLM translations, approve/edit/reject as quality feedback, accept preview as quality gate. Shows config_hash to detect stale previews.
@UX_STATE idle, loading, preview_loaded, preview_error, accepted, stale_config
@UX_FEEDBACK Spinner during LLM call; visual distinction for LLM-generated vs user-edited values
@UX_RECOVERY Retry preview; re-fetch with updated config
[/DEF:TranslationPreview:Component]
[DEF:TranslationRunProgress:Component]
frontend/src/lib/components/translate/TranslationRunProgress.svelte
@COMPLEXITY 4
@PURPOSE Live progress display: progress bar, batch counter, success/failure counts, cancel button. WebSocket-driven. Shows both translation and insert execution phases.
@UX_STATE idle, running, cancelling, cancelled, completed, partial, failed, insert_pending, insert_running, insert_failed
@UX_FEEDBACK Progress percentage $derived; real-time batch/insert status; Superset execution reference on completion
@UX_RECOVERY Retry failed batches; cancel run; view generated SQL (audit/debug)
[/DEF:TranslationRunProgress:Component]
[DEF:TranslationRunResult:Component]
frontend/src/lib/components/translate/TranslationRunResult.svelte
@COMPLEXITY 4
@PURPOSE Completion summary with statistics, Superset execution status/reference, generated SQL (audit/debug), inline feedback-loop correction controls.
@UX_STATE completed, partial, failed, insert_failed
@UX_FEEDBACK Superset execution status badge; SQL block for audit
@UX_RECOVERY Retry failed rows; retry insert; submit corrections
@RELATION CALLS -> [TermCorrectionPopup:Component]
[/DEF:TranslationRunResult:Component]
[DEF:TermCorrectionPopup:Component]
frontend/src/lib/components/translate/TermCorrectionPopup.svelte
@COMPLEXITY 3
@PURPOSE Popup for selecting source term + incorrect target term in run results, providing corrected target term, submitting to dictionary. Conflict: overwrite or keep existing.
@UX_STATE closed, selecting, editing, submitting, conflict_detected, submitted
[/DEF:TermCorrectionPopup:Component]
[DEF:BulkCorrectionSidebar:Component]
frontend/src/lib/components/translate/BulkCorrectionSidebar.svelte
@COMPLEXITY 3
@PURPOSE Sidebar for bulk correction: collect multiple terms, mass edit, submit atomically.
@UX_STATE closed, collecting, reviewing, submitting, submitted
[/DEF:BulkCorrectionSidebar:Component]
[DEF:DictionaryEditor:Component]
frontend/src/lib/components/translate/DictionaryEditor.svelte
@COMPLEXITY 3
@PURPOSE Inline editor: add/edit/delete entries, CSV/TSV import with conflict preview (overwrite/keep existing), export.
@UX_STATE idle, loading, editing, importing, import_preview, import_conflict, saving
[/DEF:DictionaryEditor:Component]
[DEF:DictionaryList:Component]
frontend/src/routes/translate/dictionaries/+page.svelte
@COMPLEXITY 3
@PURPOSE SvelteKit page listing dictionaries with language, term count, attachment info.
@UX_STATE idle, loading, empty, populated, delete_blocked
[/DEF:DictionaryList:Component]
[DEF:ScheduleConfig:Component]
frontend/src/lib/components/translate/ScheduleConfig.svelte
@COMPLEXITY 3
@PURPOSE Schedule panel: type selector, cron/interval with timezone, next-N-executions preview, concurrency policy, enable/disable. Warns if no prior successful manual run.
@UX_STATE idle, editing, validating, enabled, disabled, no_prior_run_warning
@UX_REACTIVITY Next execution times $derived with timezone display
[/DEF:ScheduleConfig:Component]
[DEF:TranslationHistory:Component]
frontend/src/routes/translate/history/+page.svelte
@COMPLEXITY 3
@PURPOSE Filterable run history: datasource, target table, row count, translation_status, insert_status, date. Detail view with config snapshot, Superset reference, SQL. Pruned runs show metadata only.
@UX_STATE idle, loading, empty, populated, detail_open, pruned
[/DEF:TranslationHistory:Component]
[DEF:TranslateApiClient:Module]
frontend/src/lib/api/translate.js
@COMPLEXITY 2
@PURPOSE API client wrapping requestApi/fetchApi for all translate endpoints.
[/DEF:TranslateApiClient:Module]
[DEF:translateStore:Store]
frontend/src/lib/stores/translate.js
@COMPLEXITY 3
@PURPOSE Svelte 5 rune store for translation feature state.
@RELATION BINDS_TO -> [TranslateApiClient:Module]
@RELATION BINDS_TO -> [TaskWebSocket:Module]
[/DEF:translateStore:Store]
Integration Contracts (Existing System)
| Contract ID | What It Provides | How Translation Uses It |
|---|---|---|
[LLMProviderService:Module] |
LLM API call with provider selection, key encryption | Sends batches with constructed prompts; receives structured JSON translations |
[SupersetClient:Module] |
Superset API: datasource schema, SQL Lab execution | Fetch column metadata, submit SQL to /api/v1/sqllab/execute/, poll status |
[SchedulerService:Class] |
APScheduler lifecycle, add_job/remove_job | Translation schedules registered as APScheduler jobs |
[NotificationService:Module] |
Email/in-app notification dispatch | Scheduled run failure notifications |
[PermissionChecker:Dependency] |
FastAPI dependency for RBAC enforcement | Route handlers annotated per access-control matrix |
[TaskWebSocket:Module] |
WebSocket for real-time task progress | Translation run progress events streamed to frontend |
[TaskContext:Class] |
Background task lifecycle context | Orchestrator runs as async background task |