# Implementation Plan: LLM Table Translation Service **Branch**: `028-llm-datasource-supeset` | **Date**: 2026-05-08 | **Spec**: [spec.md](./spec.md) **Input**: Feature specification from `/specs/028-llm-datasource-supeset/spec.md` ## Summary Implement an LLM-powered table translation service as a new backend plugin (`TranslationPlugin`) with companion API routes, ORM models, Pydantic schemas, and Svelte frontend components. The service reads rows from a Superset datasource (translation column + context columns), sends batches to a configured LLM provider with optional per-batch filtered terminology dictionary context, generates safe PostgreSQL INSERT/UPSERT SQL, and submits it to Superset via `/api/v1/sqllab/execute/` with status polling and reference recording. Supporting capabilities: terminology dictionary CRUD with language validation, translation preview as persistent quality gate, scheduled execution via APScheduler (new-key-only with baseline_expired fallback), structured event logging with MetricSnapshot persistence, RBAC-gated access control matrix, and 90-day retention with metric continuity. **Total planned modules**: 25 contracts across backend (Python/FastAPI) and frontend (Svelte 5/SvelteKit). **Complexity distribution**: 2× C5 (orchestrator, event-log), 8× C4 (preview, execute, dictionary, scheduler, SQL-gen, routes), 10× C3 (models, schemas, components), 5× C2 (helpers). ## Technical Context **Language/Version**: Python 3.13+ (backend), JavaScript/TypeScript (frontend Svelte 5) **Primary Dependencies**: FastAPI 0.115+, SQLAlchemy 2.0+, APScheduler 3.x, Pydantic v2 (backend); SvelteKit 2.x, Svelte 5.43+, Vite 7.x, Tailwind CSS 3.x (frontend) **Storage**: PostgreSQL 16 (shared with existing ss-tools schema) **Testing**: pytest 8.x + pytest-asyncio (backend); vitest 4.x + @testing-library/svelte 5.x (frontend) **Target Platform**: Linux server (Docker Compose: backend + frontend + PostgreSQL) **Project Type**: web application — FastAPI REST + WebSocket backend, SvelteKit SPA frontend **Performance Goals**: Preview of 10 rows within 30s (LLM-dependent); INSERT generation <5s for 1000 rows; schedule trigger precision ±60s; event recording <10s after occurrence **Constraints**: RBAC enforcement via access-control matrix; dictionary per-batch filtering (case-insensitive, word-boundary-aware); 90-day detailed data retention with MetricSnapshot persistence; plugin lifecycle compatible with `PluginBase`; dialect-aware SQL generation (PostgreSQL/Greenplum, ClickHouse supported for MVP); snapshot isolation for in-progress runs; structured JSON LLM output required **Scale/Scope**: Tens of thousands of rows per run; dictionaries of any size; 50 concurrent users; 5–10 active translation jobs per deployment ## Constitution Check *GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* | Principle | Status | Evidence | |-----------|--------|----------| | **I. Plugin Architecture** | ✅ PASS | Feature implemented as a new plugin inheriting from `PluginBase` (`backend/src/plugins/translate/`), consistent with existing `llm_analysis`, `git`, `storage` plugin pattern. | | **II. API-First Design** | ✅ PASS | All operations exposed via REST endpoints under `/api/translate/` (jobs, dictionaries, runs, schedules, history) and WebSocket for run progress. Follows existing FastAPI route conventions. | | **III. Test-First** | ✅ PASS | Acceptance criteria per user story defined in spec; test strategy in research.md covers unit (pytest), integration (API + DB), and component (vitest + Svelte Testing Library) layers. | | **IV. RBAC & Security** | ✅ PASS | Granular permissions (`translate.job.*`, `translate.dictionary.*`, `translate.schedule.manage`, `translate.history.view`) via existing RBAC model. API keys encrypted via `EncryptionManager`. PII masking for LLM-facing context per existing patterns. | | **V. Observability & Retention** | ✅ PASS | Structured Translation Events (FR-046), per-job metrics (FR-047), notification integration (FR-048), 90-day retention with pruning (FR-049). C4+ flows instrumented with `belief_scope`/`reason`/`reflect`/`explore` markers. | | **Semantic Protocol Compliance** | ✅ PASS | All planned modules assigned `@COMPLEXITY` 2–5 with appropriate tag density. `@RATIONALE`/`@REJECTED` reserved for C5 contracts only. Canonical `@RELATION PREDICATE -> TARGET_ID` syntax. Comment-anchor syntax: `# [DEF:...]` for Python, `` for Svelte markup. | | **Fractal Limit (INV_7)** | ✅ PASS | Planned modules kept under 400 lines each. No planned contract exceeds 150 lines or CC>10. Decomposition strategy documented in contracts/modules.md. | **Gate Result**: ✅ ALL PASS — no blocking constitutional or semantic conflicts. ## Project Structure ### Documentation (this feature) ```text specs/028-llm-datasource-supeset/ ├── plan.md # This file ├── research.md # Phase 0 output ├── data-model.md # Phase 1 output ├── quickstart.md # Phase 1 output ├── contracts/ # Phase 1 output │ └── modules.md # Semantic contract design ├── spec.md # Feature specification ├── ux_reference.md # Interaction reference └── checklists/ └── requirements.md # Quality validation ``` ### Source Code (repository root) ```text backend/ ├── src/ │ ├── api/routes/ │ │ └── translate.py # New: translation REST endpoints │ ├── core/ │ │ ├── scheduler.py # Existing: APScheduler (extend for translation schedules) │ │ └── superset_client/ # Existing: Superset API client (reuse) │ ├── models/ │ │ └── translate.py # New: SQLAlchemy ORM models │ ├── plugins/ │ │ └── translate/ # New: TranslationPlugin │ │ ├── __init__.py │ │ ├── plugin.py # Plugin entry (inherits PluginBase) │ │ ├── orchestrator.py # Run lifecycle orchestration (C5) │ │ ├── preview.py # Preview engine (C4) │ │ ├── executor.py # Batch execution + INSERT gen (C4) │ │ ├── dictionary.py # Dictionary CRUD + filtering (C4) │ │ ├── sql_generator.py # INSERT/UPSERT SQL generation (C3) │ │ ├── scheduler.py # Schedule management (C4) │ │ ├── events.py # Structured event logging (C5) │ │ ├── metrics.py # Per-job metrics aggregation (C3) │ │ └── __tests__/ # Plugin-level tests │ ├── schemas/ │ │ └── translate.py # New: Pydantic request/response schemas │ └── services/ │ ├── llm_provider.py # Existing: LLM provider management (reuse) │ └── llm_prompt_templates.py # Existing: prompt rendering (reuse) └── tests/ # Integration tests frontend/ ├── src/ │ ├── routes/ │ │ └── translate/ # New: SvelteKit route │ │ ├── +page.svelte # Translation job list │ │ ├── [id]/ │ │ │ └── +page.svelte # Job config + preview + run │ │ ├── dictionaries/ │ │ │ ├── +page.svelte # Dictionary list │ │ │ └── [id]/ │ │ │ └── +page.svelte # Dictionary editor │ │ └── history/ │ │ └── +page.svelte # Run history │ └── lib/ │ ├── components/ │ │ └── translate/ # New: reusable translation components │ │ ├── TranslationJobConfig.svelte │ │ ├── TranslationPreview.svelte │ │ ├── TranslationRunProgress.svelte │ │ ├── TranslationRunResult.svelte │ │ ├── DictionaryEditor.svelte │ │ ├── TermCorrectionPopup.svelte │ │ ├── ScheduleConfig.svelte │ │ └── BulkCorrectionSidebar.svelte │ ├── stores/ │ │ └── translate.js # New: Svelte 5 rune stores │ └── api/ │ └── translate.js # New: API client module ``` ## Semantic Contract Guidance - All planned contracts follow GRACE-Poly v2.4 protocol with `[DEF:id:Type]...[/DEF:id:Type]` anchors. - Python backend uses `# [DEF:...]` comment-anchor syntax. - Svelte components use `` in markup and `// [DEF:...]` in script blocks. - Complexity assignments and required tag coverage detailed in `contracts/modules.md`. - C4+ Python modules instrumented with `belief_scope()` + `reason()`/`reflect()`/`explore()` markers. - C5 contracts carry `@RATIONALE` and `@REJECTED` decision memory. - `@RELATION` targets reference existing contracts where applicable (e.g., `[LLMProviderService:Module]`, `[SchedulerService:Class]`, `[SupersetClient:Module]`, `[NotificationService:Module]`). ## Complexity Tracking No constitutional violations detected. All complexity assignments are justified within the semantic protocol's complexity scale: | Contract | Complexity | Justification | |----------|-----------|---------------| | `TranslationOrchestrator` | C5 | Stateful lifecycle with PRE/POST, multi-step coordination, decision memory for retry/concurrency policies | | `TranslationScheduler` | C4 | Stateful schedule CRUD with APScheduler integration, conflict detection; no decision memory needed | | `TranslationEventLog` | C5 | Immutable event log with retention enforcement, audit invariants, decision memory for pruning strategy | | `TranslationPreview` | C4 | Stateful preview with LLM calls, approve/edit/reject lifecycle | | `TranslationExecutor` | C4 | Batch execution with retry, INSERT generation, progress tracking | | `DictionaryManager` | C4 | CRUD with import/export, per-batch filtering, conflict resolution | | Svelte components | C3 | UI state machines with UX_STATE/FEEDBACK/RECOVERY/REACTIVITY bindings |