Files
ss-tools/specs/028-llm-datasource-supeset/contracts/modules.md
2026-05-08 18:01:49 +03:00

18 KiB

Semantic Contracts: LLM Table Translation Service

Feature Branch: 028-llm-datasource-supeset
Date: 2026-05-08 (updated post-review)
Protocol: GRACE-Poly v2.4


Backend Contracts (Python / FastAPI)

[DEF:TranslatePlugin:Module]

backend/src/plugins/translate/plugin.py @COMPLEXITY 3 @PURPOSE Plugin entry point implementing PluginBase for the translation service. Registers API routes, ORM models, and scheduler integration. @RELATION INHERITS -> [PluginBase:Class] @RELATION CALLS -> [TranslationOrchestrator:Class] @RELATION CALLS -> [DictionaryManager:Class] @RELATION CALLS -> [TranslationScheduler:Class] @RELATION BINDS_TO -> [TranslateRoutes:Module] [/DEF:TranslatePlugin:Module]

[DEF:TranslationOrchestrator:Class]

backend/src/plugins/translate/orchestrator.py @COMPLEXITY 5 @PURPOSE Central coordinator for translation run lifecycle: validates preconditions, manages preview quality gate, dispatches to executor, generates safe SQL, submits to Superset SQL Lab API, manages retry, records events, enforces retention. @PRE Job configuration is saved and valid. Superset datasource is accessible. LLM provider is configured and reachable. For manual runs: preview session must be accepted. For scheduled runs: at least one prior successful manual run exists. @POST TranslationRun created with translation_status and insert_status. INSERT SQL generated (safe PostgreSQL dialect) and submitted to Superset /api/v1/sqllab/execute/. Superset query reference recorded. Structured events recorded for all lifecycle transitions. @SIDE_EFFECT Creates TranslationRun, TranslationBatch, TranslationRecord, TranslationEvent, MetricSnapshot rows. Calls LLM provider API (token consumption). Calls Superset SQL Lab API. @DATA_CONTRACT Input: TranslationJob (config snapshot) + datasource rows → Output: TranslationRun (result) + Superset query reference @INVARIANT A run must transition through states: pending → running → (completed|partial|failed|cancelled|skipped). insert_status must transition through one of: not_started → skipped (if no insert needed); not_started → submitted → (succeeded|failed) (Superset may complete immediately); not_started → submitted → running → (succeeded|failed) (async polling). No other state transitions are allowed. Snapshot isolation: in-progress runs use config snapshot; config edits affect future runs only. @RELATION CALLS -> [TranslationPreview:Class] @RELATION CALLS -> [TranslationExecutor:Class] @RELATION CALLS -> [SQLGenerator:Class] @RELATION CALLS -> [SupersetSqlLabExecutor:Class] @RELATION CALLS -> [TranslationEventLog:Class] @RELATION CALLS -> [LLMProviderService:Module] @RELATION CALLS -> [SupersetClient:Module] @RATIONALE Centralized orchestrator is needed because preview gating, execution, SQL generation, Superset API submission, event logging, and retry share state (run_id, config snapshot) and must coordinate within a single transaction boundary for consistency. @REJECTED Distributed actor model (Celery tasks per batch) was rejected because it introduces eventual-consistency challenges for run status tracking without proportional benefit at the expected scale. Synchronous batch processing provides simpler debugging and deterministic retry. UPDATE statements are never generated — all modifications use INSERT/UPSERT per PostgreSQL dialect. [/DEF:TranslationOrchestrator:Class]

[DEF:TranslationPreview:Class]

backend/src/plugins/translate/preview.py @COMPLEXITY 4 @PURPOSE Fetch a sample of source rows, send to LLM with configured context and per-batch filtered dictionary, return side-by-side preview for quality gate acceptance. Creates persistent PreviewSession and PreviewRecord rows. @PRE Job configuration is saved. Datasource is accessible. Preview row count is configured (default: 10). @POST PreviewSession created with config_hash and dict_snapshot_hash. PreviewRecord rows returned with source_text, context, key_values, llm_translation, and status. Accepting the preview session gates full execution. No data is persisted to target table. @SIDE_EFFECT Calls LLM provider API (token consumption). Creates PreviewSession and PreviewRecord rows. @RELATION CALLS -> [LLMProviderService:Module] @RELATION CALLS -> [SupersetClient:Module] @RELATION CALLS -> [DictionaryManager:Class] [/DEF:TranslationPreview:Class]

[DEF:TranslationExecutor:Class]

backend/src/plugins/translate/executor.py @COMPLEXITY 4 @PURPOSE Process source rows in batches through LLM, collect translations, handle batch-level retry, and produce TranslationBatch and TranslationRecord rows. Requests structured JSON output from LLM keyed by stable row identifiers; validates row alignment. @PRE Run exists with translation_status running. Source rows fetched. Batch size configured. @POST All processable rows have TranslationRecord entries. Each batch has a TranslationBatch record with statistics and timing. Run statistics updated. @SIDE_EFFECT Calls LLM provider API (token consumption). Creates TranslationBatch and TranslationRecord rows. @RELATION CALLS -> [LLMProviderService:Module] @RELATION CALLS -> [DictionaryManager:Class] [/DEF:TranslationExecutor:Class]

[DEF:SQLGenerator:Class]

backend/src/plugins/translate/sql_generator.py @COMPLEXITY 3 @PURPOSE Generate safe dialect-appropriate INSERT/UPSERT SQL from TranslationRecord rows, keyed by configured target key columns. Detects dialect from Superset connection (PostgreSQL/Greenplum, ClickHouse supported for MVP). Validates and quotes identifiers per dialect rules; safely encodes values. @PRE TranslationRecord rows exist with status translated or edited and final_value is non-null. Target table schema validated at configuration time. Target database dialect is PostgreSQL. @POST Returns a syntactically valid, injection-safe PostgreSQL INSERT (or INSERT ... ON CONFLICT) statement string. @RELATION CALLS -> [SupersetClient:Module] @RATIONALE Separate contract because SQL generation reused by both manual and scheduled runs, and independently testable for SQL syntax correctness (SC-003) and injection safety. @REJECTED UPDATE statements not generated because source is append-only (new-key-only strategy). UPSERT covers the overwrite case without separate UPDATE logic. [/DEF:SQLGenerator:Class]

[DEF:SupersetSqlLabExecutor:Class]

backend/src/plugins/translate/superset_executor.py @COMPLEXITY 3 @PURPOSE Submit generated SQL to Superset SQL Lab API /api/v1/sqllab/execute/, poll execution status, record Superset query reference, status, and error details in the TranslationRun. @PRE TranslationRun exists with translation_status completed/partial. SQL is generated and syntactically valid. @POST TranslationRun.insert_status updated (submitted→running→succeeded|failed). Superset query reference, error details, rows_affected (if available) stored. @SIDE_EFFECT Calls Superset API (SQL execution, database write). Updates TranslationRun row. @RELATION CALLS -> [SupersetClient:Module] @RELATION CALLS -> [TranslationEventLog:Class] [/DEF:SupersetSqlLabExecutor:Class]

[DEF:DictionaryManager:Class]

backend/src/plugins/translate/dictionary.py @COMPLEXITY 4 @PURPOSE CRUD for TerminologyDictionary and DictionaryEntry (unique per dictionary_id + source_term). CSV/TSV import with conflict detection (overwrite/keep existing). Per-batch term filtering (case-insensitive, word-boundary-aware substring matching). Language validation on job attachment. @PRE Dictionary model exists. User has required permissions. For attachment: dictionary target_language matches job target_language. @POST CRUD operations reflected in database. Filtered term list returned for batch. Entries with mismatched language rejected on attachment. @SIDE_EFFECT Creates/updates/deletes DictionaryEntry rows. May read TranslationRun for origin tracking. @RELATION CALLS -> [DictionaryEntry:Class] @RELATION CALLS -> [TranslationJobDictionary:Class] [/DEF:DictionaryManager:Class]

[DEF:TranslationScheduler:Class]

backend/src/plugins/translate/scheduler.py @COMPLEXITY 4 @PURPOSE Manage translation job schedules with timezone support: create/update/delete, register with APScheduler, handle trigger dispatch with concurrency policy (skip/queue at most one), enforce new-key-only strategy with baseline_expired fallback. @PRE SchedulerService is running. Job has at least one prior successful manual run. Job configuration exists. @POST Schedule registered with APScheduler or removed. On trigger: new TranslationRun created and dispatched to TranslationOrchestrator. New-key-only filter applied; if baseline expired (>90 days since last successful run), full translation with baseline_expired event. @SIDE_EFFECT Creates TranslationRun rows. Calls SchedulerService. Emits schedule_triggered/schedule_skipped/schedule_failed events. Calls NotificationService on failure. @RELATION CALLS -> [SchedulerService:Class] @RELATION CALLS -> [TranslationOrchestrator:Class] @RELATION CALLS -> [TranslationEventLog:Class] @RELATION CALLS -> [NotificationService:Module] [/DEF:TranslationScheduler:Class]

[DEF:TranslationEventLog:Class]

backend/src/plugins/translate/events.py @COMPLEXITY 5 @PURPOSE Structured event logging: write immutable events (run_id nullable for pre-run events), query events for audit/dashboard, enforce 90-day retention pruning with MetricSnapshot persistence before deletion. @PRE TranslationEvent table exists. Event type recognized. For run-scoped events: run exists. For pre-run events (schedule_*, run_noop): run_id is NULL. @POST Event row created with type-specific payload. Pruning job: persists MetricSnapshot, then removes events and records older than 90 days. @SIDE_EFFECT Creates TranslationEvent rows. Creates MetricSnapshot rows at pruning time. Deletes expired rows. @DATA_CONTRACT Input: (run_id?, job_id, event_type, payload: dict) → Output: TranslationEvent @INVARIANT Every created run MUST have exactly one run_started event and exactly one terminal event among: run_succeeded, run_partial, run_failed, run_cancelled, run_skipped. Events are immutable after creation. Cumulative metrics survive pruning via MetricSnapshot. @RELATION CALLS -> [TranslationEvent:Class] @RELATION CALLS -> [MetricSnapshot:Class] @RATIONALE C5 warranted: event log is single source of truth for observability, metrics, and audit. Immutability, retention, and metric continuity invariants must be enforced to prevent data loss or tampering. @REJECTED stdout-only logging lacks structured payload integrity and cannot enforce terminal-event invariant. Event-sourced metrics without snapshots would lose cumulative data after pruning. [/DEF:TranslationEventLog:Class]

[DEF:TranslationMetrics:Class]

backend/src/plugins/translate/metrics.py @COMPLEXITY 3 @PURPOSE Aggregate per-job metrics from live TranslationEvent log AND persistent MetricSnapshot table. For recent data (<90 days): compute from events. For cumulative totals: read latest MetricSnapshot + recent events. @PRE TranslationEvent rows or MetricSnapshot rows exist for target job. @POST Returns MetricsResponse DTO with accurate cumulative values spanning both live and pruned data. @RELATION CALLS -> [TranslationEventLog:Class] @RELATION CALLS -> [MetricSnapshot:Class] [/DEF:TranslationMetrics:Class]

[DEF:TranslateRoutes:Module]

backend/src/api/routes/translate.py @COMPLEXITY 3 @PURPOSE FastAPI route handlers: CRUD jobs/dictionaries, preview, run trigger, cancel run, schedule, history, metrics, feedback-loop submission. All endpoints enforce RBAC per access-control matrix. @RELATION CALLS -> [TranslationOrchestrator:Class] @RELATION CALLS -> [DictionaryManager:Class] @RELATION CALLS -> [TranslationScheduler:Class] @RELATION CALLS -> [TranslationEventLog:Class] @RELATION CALLS -> [TranslationMetrics:Class] @RELATION BINDS_TO -> [PermissionChecker:Dependency] [/DEF:TranslateRoutes:Module]

[DEF:TranslateModels:Module]

backend/src/models/translate.py @COMPLEXITY 2 @PURPOSE SQLAlchemy ORM models: TranslationJob, TranslationRun, TranslationBatch, TranslationRecord, TranslationEvent, TranslationPreviewSession, TranslationPreviewRecord, TerminologyDictionary, DictionaryEntry, TranslationSchedule, TranslationJobDictionary, MetricSnapshot. @RELATION INHERITS -> [Base:Class] [/DEF:TranslateModels:Module]

[DEF:TranslateSchemas:Module]

backend/src/schemas/translate.py @COMPLEXITY 2 @PURPOSE Pydantic v2 request/response schemas for translation API endpoints. [/DEF:TranslateSchemas:Module]


Frontend Contracts (Svelte 5 / SvelteKit)

[DEF:TranslateJobList:Component]

frontend/src/routes/translate/+page.svelte @COMPLEXITY 3 @PURPOSE SvelteKit page listing all translation jobs with status/schedule indicators. @UX_STATE idle, loading, empty, populated, error [/DEF:TranslateJobList:Component]

[DEF:TranslationJobConfig:Component]

frontend/src/routes/translate/[id]/+page.svelte @COMPLEXITY 3 @PURPOSE SvelteKit page for job configuration: datasource selection, column mapping with source→target key mapping, target table/column, LLM settings, dictionary attachment (language-filtered), schedule tab with timezone. @UX_STATE idle, loading, configured, saving, validation_error, datasource_unavailable @UX_REACTIVITY Column list $derived from datasource; dictionary list filtered by target_language [/DEF:TranslationJobConfig:Component]

[DEF:TranslationPreview:Component]

frontend/src/lib/components/translate/TranslationPreview.svelte @COMPLEXITY 4 @PURPOSE Side-by-side preview of source rows with LLM translations, approve/edit/reject as quality feedback, accept preview as quality gate. Shows config_hash to detect stale previews. @UX_STATE idle, loading, preview_loaded, preview_error, accepted, stale_config @UX_FEEDBACK Spinner during LLM call; visual distinction for LLM-generated vs user-edited values @UX_RECOVERY Retry preview; re-fetch with updated config [/DEF:TranslationPreview:Component]

[DEF:TranslationRunProgress:Component]

frontend/src/lib/components/translate/TranslationRunProgress.svelte @COMPLEXITY 4 @PURPOSE Live progress display: progress bar, batch counter, success/failure counts, cancel button. WebSocket-driven. Shows both translation and insert execution phases. @UX_STATE idle, running, cancelling, cancelled, completed, partial, failed, insert_pending, insert_running, insert_failed @UX_FEEDBACK Progress percentage $derived; real-time batch/insert status; Superset execution reference on completion @UX_RECOVERY Retry failed batches; cancel run; view generated SQL (audit/debug) [/DEF:TranslationRunProgress:Component]

[DEF:TranslationRunResult:Component]

frontend/src/lib/components/translate/TranslationRunResult.svelte @COMPLEXITY 4 @PURPOSE Completion summary with statistics, Superset execution status/reference, generated SQL (audit/debug), inline feedback-loop correction controls. @UX_STATE completed, partial, failed, insert_failed @UX_FEEDBACK Superset execution status badge; SQL block for audit @UX_RECOVERY Retry failed rows; retry insert; submit corrections @RELATION CALLS -> [TermCorrectionPopup:Component] [/DEF:TranslationRunResult:Component]

[DEF:TermCorrectionPopup:Component]

frontend/src/lib/components/translate/TermCorrectionPopup.svelte @COMPLEXITY 3 @PURPOSE Popup for selecting source term + incorrect target term in run results, providing corrected target term, submitting to dictionary. Conflict: overwrite or keep existing. @UX_STATE closed, selecting, editing, submitting, conflict_detected, submitted [/DEF:TermCorrectionPopup:Component]

[DEF:BulkCorrectionSidebar:Component]

frontend/src/lib/components/translate/BulkCorrectionSidebar.svelte @COMPLEXITY 3 @PURPOSE Sidebar for bulk correction: collect multiple terms, mass edit, submit atomically. @UX_STATE closed, collecting, reviewing, submitting, submitted [/DEF:BulkCorrectionSidebar:Component]

[DEF:DictionaryEditor:Component]

frontend/src/lib/components/translate/DictionaryEditor.svelte @COMPLEXITY 3 @PURPOSE Inline editor: add/edit/delete entries, CSV/TSV import with conflict preview (overwrite/keep existing), export. @UX_STATE idle, loading, editing, importing, import_preview, import_conflict, saving [/DEF:DictionaryEditor:Component]

[DEF:DictionaryList:Component]

frontend/src/routes/translate/dictionaries/+page.svelte @COMPLEXITY 3 @PURPOSE SvelteKit page listing dictionaries with language, term count, attachment info. @UX_STATE idle, loading, empty, populated, delete_blocked [/DEF:DictionaryList:Component]

[DEF:ScheduleConfig:Component]

frontend/src/lib/components/translate/ScheduleConfig.svelte @COMPLEXITY 3 @PURPOSE Schedule panel: type selector, cron/interval with timezone, next-N-executions preview, concurrency policy, enable/disable. Warns if no prior successful manual run. @UX_STATE idle, editing, validating, enabled, disabled, no_prior_run_warning @UX_REACTIVITY Next execution times $derived with timezone display [/DEF:ScheduleConfig:Component]

[DEF:TranslationHistory:Component]

frontend/src/routes/translate/history/+page.svelte @COMPLEXITY 3 @PURPOSE Filterable run history: datasource, target table, row count, translation_status, insert_status, date. Detail view with config snapshot, Superset reference, SQL. Pruned runs show metadata only. @UX_STATE idle, loading, empty, populated, detail_open, pruned [/DEF:TranslationHistory:Component]

[DEF:TranslateApiClient:Module]

frontend/src/lib/api/translate.js @COMPLEXITY 2 @PURPOSE API client wrapping requestApi/fetchApi for all translate endpoints. [/DEF:TranslateApiClient:Module]

[DEF:translateStore:Store]

frontend/src/lib/stores/translate.js @COMPLEXITY 3 @PURPOSE Svelte 5 rune store for translation feature state. @RELATION BINDS_TO -> [TranslateApiClient:Module] @RELATION BINDS_TO -> [TaskWebSocket:Module] [/DEF:translateStore:Store]


Integration Contracts (Existing System)

Contract ID What It Provides How Translation Uses It
[LLMProviderService:Module] LLM API call with provider selection, key encryption Sends batches with constructed prompts; receives structured JSON translations
[SupersetClient:Module] Superset API: datasource schema, SQL Lab execution Fetch column metadata, submit SQL to /api/v1/sqllab/execute/, poll status
[SchedulerService:Class] APScheduler lifecycle, add_job/remove_job Translation schedules registered as APScheduler jobs
[NotificationService:Module] Email/in-app notification dispatch Scheduled run failure notifications
[PermissionChecker:Dependency] FastAPI dependency for RBAC enforcement Route handlers annotated per access-control matrix
[TaskWebSocket:Module] WebSocket for real-time task progress Translation run progress events streamed to frontend
[TaskContext:Class] Background task lifecycle context Orchestrator runs as async background task