10 KiB
Implementation Plan: LLM Table Translation Service
Branch: 028-llm-datasource-supeset | Date: 2026-05-08 | Spec: spec.md
Input: Feature specification from /specs/028-llm-datasource-supeset/spec.md
Summary
Implement an LLM-powered table translation service as a new backend plugin (TranslationPlugin) with companion API routes, ORM models, Pydantic schemas, and Svelte frontend components. The service reads rows from a Superset datasource (translation column + context columns), sends batches to a configured LLM provider with optional per-batch filtered terminology dictionary context, generates safe PostgreSQL INSERT/UPSERT SQL, and submits it to Superset via /api/v1/sqllab/execute/ with status polling and reference recording. Supporting capabilities: terminology dictionary CRUD with language validation, translation preview as persistent quality gate, scheduled execution via APScheduler (new-key-only with baseline_expired fallback), structured event logging with MetricSnapshot persistence, RBAC-gated access control matrix, and 90-day retention with metric continuity.
Total planned modules: 25 contracts across backend (Python/FastAPI) and frontend (Svelte 5/SvelteKit).
Complexity distribution: 2× C5 (orchestrator, event-log), 8× C4 (preview, execute, dictionary, scheduler, SQL-gen, routes), 10× C3 (models, schemas, components), 5× C2 (helpers).
Technical Context
Language/Version: Python 3.13+ (backend), JavaScript/TypeScript (frontend Svelte 5)
Primary Dependencies: FastAPI 0.115+, SQLAlchemy 2.0+, APScheduler 3.x, Pydantic v2 (backend); SvelteKit 2.x, Svelte 5.43+, Vite 7.x, Tailwind CSS 3.x (frontend)
Storage: PostgreSQL 16 (shared with existing ss-tools schema)
Testing: pytest 8.x + pytest-asyncio (backend); vitest 4.x + @testing-library/svelte 5.x (frontend)
Target Platform: Linux server (Docker Compose: backend + frontend + PostgreSQL)
Project Type: web application — FastAPI REST + WebSocket backend, SvelteKit SPA frontend
Performance Goals: Preview of 10 rows within 30s (LLM-dependent); INSERT generation <5s for 1000 rows; schedule trigger precision ±60s; event recording <10s after occurrence
Constraints: RBAC enforcement via access-control matrix; dictionary per-batch filtering (case-insensitive, word-boundary-aware); 90-day detailed data retention with MetricSnapshot persistence; plugin lifecycle compatible with PluginBase; dialect-aware SQL generation (PostgreSQL/Greenplum, ClickHouse supported for MVP); snapshot isolation for in-progress runs; structured JSON LLM output required
Scale/Scope: Tens of thousands of rows per run; dictionaries of any size; 50 concurrent users; 5–10 active translation jobs per deployment
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
| Principle | Status | Evidence |
|---|---|---|
| I. Plugin Architecture | ✅ PASS | Feature implemented as a new plugin inheriting from PluginBase (backend/src/plugins/translate/), consistent with existing llm_analysis, git, storage plugin pattern. |
| II. API-First Design | ✅ PASS | All operations exposed via REST endpoints under /api/translate/ (jobs, dictionaries, runs, schedules, history) and WebSocket for run progress. Follows existing FastAPI route conventions. |
| III. Test-First | ✅ PASS | Acceptance criteria per user story defined in spec; test strategy in research.md covers unit (pytest), integration (API + DB), and component (vitest + Svelte Testing Library) layers. |
| IV. RBAC & Security | ✅ PASS | Granular permissions (translate.job.*, translate.dictionary.*, translate.schedule.manage, translate.history.view) via existing RBAC model. API keys encrypted via EncryptionManager. PII masking for LLM-facing context per existing patterns. |
| V. Observability & Retention | ✅ PASS | Structured Translation Events (FR-046), per-job metrics (FR-047), notification integration (FR-048), 90-day retention with pruning (FR-049). C4+ flows instrumented with belief_scope/reason/reflect/explore markers. |
| Semantic Protocol Compliance | ✅ PASS | All planned modules assigned @COMPLEXITY 2–5 with appropriate tag density. @RATIONALE/@REJECTED reserved for C5 contracts only. Canonical @RELATION PREDICATE -> TARGET_ID syntax. Comment-anchor syntax: # [DEF:...] for Python, <!-- [DEF:...] --> for Svelte markup. |
| Fractal Limit (INV_7) | ✅ PASS | Planned modules kept under 400 lines each. No planned contract exceeds 150 lines or CC>10. Decomposition strategy documented in contracts/modules.md. |
Gate Result: ✅ ALL PASS — no blocking constitutional or semantic conflicts.
Project Structure
Documentation (this feature)
specs/028-llm-datasource-supeset/
├── plan.md # This file
├── research.md # Phase 0 output
├── data-model.md # Phase 1 output
├── quickstart.md # Phase 1 output
├── contracts/ # Phase 1 output
│ └── modules.md # Semantic contract design
├── spec.md # Feature specification
├── ux_reference.md # Interaction reference
└── checklists/
└── requirements.md # Quality validation
Source Code (repository root)
backend/
├── src/
│ ├── api/routes/
│ │ └── translate.py # New: translation REST endpoints
│ ├── core/
│ │ ├── scheduler.py # Existing: APScheduler (extend for translation schedules)
│ │ └── superset_client/ # Existing: Superset API client (reuse)
│ ├── models/
│ │ └── translate.py # New: SQLAlchemy ORM models
│ ├── plugins/
│ │ └── translate/ # New: TranslationPlugin
│ │ ├── __init__.py
│ │ ├── plugin.py # Plugin entry (inherits PluginBase)
│ │ ├── orchestrator.py # Run lifecycle orchestration (C5)
│ │ ├── preview.py # Preview engine (C4)
│ │ ├── executor.py # Batch execution + INSERT gen (C4)
│ │ ├── dictionary.py # Dictionary CRUD + filtering (C4)
│ │ ├── sql_generator.py # INSERT/UPSERT SQL generation (C3)
│ │ ├── scheduler.py # Schedule management (C4)
│ │ ├── events.py # Structured event logging (C5)
│ │ ├── metrics.py # Per-job metrics aggregation (C3)
│ │ └── __tests__/ # Plugin-level tests
│ ├── schemas/
│ │ └── translate.py # New: Pydantic request/response schemas
│ └── services/
│ ├── llm_provider.py # Existing: LLM provider management (reuse)
│ └── llm_prompt_templates.py # Existing: prompt rendering (reuse)
└── tests/ # Integration tests
frontend/
├── src/
│ ├── routes/
│ │ └── translate/ # New: SvelteKit route
│ │ ├── +page.svelte # Translation job list
│ │ ├── [id]/
│ │ │ └── +page.svelte # Job config + preview + run
│ │ ├── dictionaries/
│ │ │ ├── +page.svelte # Dictionary list
│ │ │ └── [id]/
│ │ │ └── +page.svelte # Dictionary editor
│ │ └── history/
│ │ └── +page.svelte # Run history
│ └── lib/
│ ├── components/
│ │ └── translate/ # New: reusable translation components
│ │ ├── TranslationJobConfig.svelte
│ │ ├── TranslationPreview.svelte
│ │ ├── TranslationRunProgress.svelte
│ │ ├── TranslationRunResult.svelte
│ │ ├── DictionaryEditor.svelte
│ │ ├── TermCorrectionPopup.svelte
│ │ ├── ScheduleConfig.svelte
│ │ └── BulkCorrectionSidebar.svelte
│ ├── stores/
│ │ └── translate.js # New: Svelte 5 rune stores
│ └── api/
│ └── translate.js # New: API client module
Semantic Contract Guidance
- All planned contracts follow GRACE-Poly v2.4 protocol with
[DEF:id:Type]...[/DEF:id:Type]anchors. - Python backend uses
# [DEF:...]comment-anchor syntax. - Svelte components use
<!-- [DEF:...] -->in markup and// [DEF:...]in script blocks. - Complexity assignments and required tag coverage detailed in
contracts/modules.md. - C4+ Python modules instrumented with
belief_scope()+reason()/reflect()/explore()markers. - C5 contracts carry
@RATIONALEand@REJECTEDdecision memory. @RELATIONtargets reference existing contracts where applicable (e.g.,[LLMProviderService:Module],[SchedulerService:Class],[SupersetClient:Module],[NotificationService:Module]).
Complexity Tracking
No constitutional violations detected. All complexity assignments are justified within the semantic protocol's complexity scale:
| Contract | Complexity | Justification |
|---|---|---|
TranslationOrchestrator |
C5 | Stateful lifecycle with PRE/POST, multi-step coordination, decision memory for retry/concurrency policies |
TranslationScheduler |
C4 | Stateful schedule CRUD with APScheduler integration, conflict detection; no decision memory needed |
TranslationEventLog |
C5 | Immutable event log with retention enforcement, audit invariants, decision memory for pruning strategy |
TranslationPreview |
C4 | Stateful preview with LLM calls, approve/edit/reject lifecycle |
TranslationExecutor |
C4 | Batch execution with retry, INSERT generation, progress tracking |
DictionaryManager |
C4 | CRUD with import/export, per-batch filtering, conflict resolution |
| Svelte components | C3 | UI state machines with UX_STATE/FEEDBACK/RECOVERY/REACTIVITY bindings |