144 lines
10 KiB
Markdown
144 lines
10 KiB
Markdown
# Implementation Plan: LLM Table Translation Service
|
||
|
||
**Branch**: `028-llm-datasource-supeset` | **Date**: 2026-05-08 | **Spec**: [spec.md](./spec.md)
|
||
**Input**: Feature specification from `/specs/028-llm-datasource-supeset/spec.md`
|
||
|
||
## Summary
|
||
|
||
Implement an LLM-powered table translation service as a new backend plugin (`TranslationPlugin`) with companion API routes, ORM models, Pydantic schemas, and Svelte frontend components. The service reads rows from a Superset datasource (translation column + context columns), sends batches to a configured LLM provider with optional per-batch filtered terminology dictionary context, generates safe PostgreSQL INSERT/UPSERT SQL, and submits it to Superset via `/api/v1/sqllab/execute/` with status polling and reference recording. Supporting capabilities: terminology dictionary CRUD with language validation, translation preview as persistent quality gate, scheduled execution via APScheduler (new-key-only with baseline_expired fallback), structured event logging with MetricSnapshot persistence, RBAC-gated access control matrix, and 90-day retention with metric continuity.
|
||
|
||
**Total planned modules**: 25 contracts across backend (Python/FastAPI) and frontend (Svelte 5/SvelteKit).
|
||
**Complexity distribution**: 2× C5 (orchestrator, event-log), 8× C4 (preview, execute, dictionary, scheduler, SQL-gen, routes), 10× C3 (models, schemas, components), 5× C2 (helpers).
|
||
|
||
## Technical Context
|
||
|
||
**Language/Version**: Python 3.13+ (backend), JavaScript/TypeScript (frontend Svelte 5)
|
||
**Primary Dependencies**: FastAPI 0.115+, SQLAlchemy 2.0+, APScheduler 3.x, Pydantic v2 (backend); SvelteKit 2.x, Svelte 5.43+, Vite 7.x, Tailwind CSS 3.x (frontend)
|
||
**Storage**: PostgreSQL 16 (shared with existing ss-tools schema)
|
||
**Testing**: pytest 8.x + pytest-asyncio (backend); vitest 4.x + @testing-library/svelte 5.x (frontend)
|
||
**Target Platform**: Linux server (Docker Compose: backend + frontend + PostgreSQL)
|
||
**Project Type**: web application — FastAPI REST + WebSocket backend, SvelteKit SPA frontend
|
||
**Performance Goals**: Preview of 10 rows within 30s (LLM-dependent); INSERT generation <5s for 1000 rows; schedule trigger precision ±60s; event recording <10s after occurrence
|
||
**Constraints**: RBAC enforcement via access-control matrix; dictionary per-batch filtering (case-insensitive, word-boundary-aware); 90-day detailed data retention with MetricSnapshot persistence; plugin lifecycle compatible with `PluginBase`; dialect-aware SQL generation (PostgreSQL/Greenplum, ClickHouse supported for MVP); snapshot isolation for in-progress runs; structured JSON LLM output required
|
||
**Scale/Scope**: Tens of thousands of rows per run; dictionaries of any size; 50 concurrent users; 5–10 active translation jobs per deployment
|
||
|
||
## Constitution Check
|
||
|
||
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
||
|
||
| Principle | Status | Evidence |
|
||
|-----------|--------|----------|
|
||
| **I. Plugin Architecture** | ✅ PASS | Feature implemented as a new plugin inheriting from `PluginBase` (`backend/src/plugins/translate/`), consistent with existing `llm_analysis`, `git`, `storage` plugin pattern. |
|
||
| **II. API-First Design** | ✅ PASS | All operations exposed via REST endpoints under `/api/translate/` (jobs, dictionaries, runs, schedules, history) and WebSocket for run progress. Follows existing FastAPI route conventions. |
|
||
| **III. Test-First** | ✅ PASS | Acceptance criteria per user story defined in spec; test strategy in research.md covers unit (pytest), integration (API + DB), and component (vitest + Svelte Testing Library) layers. |
|
||
| **IV. RBAC & Security** | ✅ PASS | Granular permissions (`translate.job.*`, `translate.dictionary.*`, `translate.schedule.manage`, `translate.history.view`) via existing RBAC model. API keys encrypted via `EncryptionManager`. PII masking for LLM-facing context per existing patterns. |
|
||
| **V. Observability & Retention** | ✅ PASS | Structured Translation Events (FR-046), per-job metrics (FR-047), notification integration (FR-048), 90-day retention with pruning (FR-049). C4+ flows instrumented with `belief_scope`/`reason`/`reflect`/`explore` markers. |
|
||
| **Semantic Protocol Compliance** | ✅ PASS | All planned modules assigned `@COMPLEXITY` 2–5 with appropriate tag density. `@RATIONALE`/`@REJECTED` reserved for C5 contracts only. Canonical `@RELATION PREDICATE -> TARGET_ID` syntax. Comment-anchor syntax: `# [DEF:...]` for Python, `<!-- [DEF:...] -->` for Svelte markup. |
|
||
| **Fractal Limit (INV_7)** | ✅ PASS | Planned modules kept under 400 lines each. No planned contract exceeds 150 lines or CC>10. Decomposition strategy documented in contracts/modules.md. |
|
||
|
||
**Gate Result**: ✅ ALL PASS — no blocking constitutional or semantic conflicts.
|
||
|
||
## Project Structure
|
||
|
||
### Documentation (this feature)
|
||
|
||
```text
|
||
specs/028-llm-datasource-supeset/
|
||
├── plan.md # This file
|
||
├── research.md # Phase 0 output
|
||
├── data-model.md # Phase 1 output
|
||
├── quickstart.md # Phase 1 output
|
||
├── contracts/ # Phase 1 output
|
||
│ └── modules.md # Semantic contract design
|
||
├── spec.md # Feature specification
|
||
├── ux_reference.md # Interaction reference
|
||
└── checklists/
|
||
└── requirements.md # Quality validation
|
||
```
|
||
|
||
### Source Code (repository root)
|
||
|
||
```text
|
||
backend/
|
||
├── src/
|
||
│ ├── api/routes/
|
||
│ │ └── translate.py # New: translation REST endpoints
|
||
│ ├── core/
|
||
│ │ ├── scheduler.py # Existing: APScheduler (extend for translation schedules)
|
||
│ │ └── superset_client/ # Existing: Superset API client (reuse)
|
||
│ ├── models/
|
||
│ │ └── translate.py # New: SQLAlchemy ORM models
|
||
│ ├── plugins/
|
||
│ │ └── translate/ # New: TranslationPlugin
|
||
│ │ ├── __init__.py
|
||
│ │ ├── plugin.py # Plugin entry (inherits PluginBase)
|
||
│ │ ├── orchestrator.py # Run lifecycle orchestration (C5)
|
||
│ │ ├── preview.py # Preview engine (C4)
|
||
│ │ ├── executor.py # Batch execution + INSERT gen (C4)
|
||
│ │ ├── dictionary.py # Dictionary CRUD + filtering (C4)
|
||
│ │ ├── sql_generator.py # INSERT/UPSERT SQL generation (C3)
|
||
│ │ ├── scheduler.py # Schedule management (C4)
|
||
│ │ ├── events.py # Structured event logging (C5)
|
||
│ │ ├── metrics.py # Per-job metrics aggregation (C3)
|
||
│ │ └── __tests__/ # Plugin-level tests
|
||
│ ├── schemas/
|
||
│ │ └── translate.py # New: Pydantic request/response schemas
|
||
│ └── services/
|
||
│ ├── llm_provider.py # Existing: LLM provider management (reuse)
|
||
│ └── llm_prompt_templates.py # Existing: prompt rendering (reuse)
|
||
└── tests/ # Integration tests
|
||
|
||
frontend/
|
||
├── src/
|
||
│ ├── routes/
|
||
│ │ └── translate/ # New: SvelteKit route
|
||
│ │ ├── +page.svelte # Translation job list
|
||
│ │ ├── [id]/
|
||
│ │ │ └── +page.svelte # Job config + preview + run
|
||
│ │ ├── dictionaries/
|
||
│ │ │ ├── +page.svelte # Dictionary list
|
||
│ │ │ └── [id]/
|
||
│ │ │ └── +page.svelte # Dictionary editor
|
||
│ │ └── history/
|
||
│ │ └── +page.svelte # Run history
|
||
│ └── lib/
|
||
│ ├── components/
|
||
│ │ └── translate/ # New: reusable translation components
|
||
│ │ ├── TranslationJobConfig.svelte
|
||
│ │ ├── TranslationPreview.svelte
|
||
│ │ ├── TranslationRunProgress.svelte
|
||
│ │ ├── TranslationRunResult.svelte
|
||
│ │ ├── DictionaryEditor.svelte
|
||
│ │ ├── TermCorrectionPopup.svelte
|
||
│ │ ├── ScheduleConfig.svelte
|
||
│ │ └── BulkCorrectionSidebar.svelte
|
||
│ ├── stores/
|
||
│ │ └── translate.js # New: Svelte 5 rune stores
|
||
│ └── api/
|
||
│ └── translate.js # New: API client module
|
||
```
|
||
|
||
## Semantic Contract Guidance
|
||
|
||
- All planned contracts follow GRACE-Poly v2.4 protocol with `[DEF:id:Type]...[/DEF:id:Type]` anchors.
|
||
- Python backend uses `# [DEF:...]` comment-anchor syntax.
|
||
- Svelte components use `<!-- [DEF:...] -->` in markup and `// [DEF:...]` in script blocks.
|
||
- Complexity assignments and required tag coverage detailed in `contracts/modules.md`.
|
||
- C4+ Python modules instrumented with `belief_scope()` + `reason()`/`reflect()`/`explore()` markers.
|
||
- C5 contracts carry `@RATIONALE` and `@REJECTED` decision memory.
|
||
- `@RELATION` targets reference existing contracts where applicable (e.g., `[LLMProviderService:Module]`, `[SchedulerService:Class]`, `[SupersetClient:Module]`, `[NotificationService:Module]`).
|
||
|
||
## Complexity Tracking
|
||
|
||
No constitutional violations detected. All complexity assignments are justified within the semantic protocol's complexity scale:
|
||
|
||
| Contract | Complexity | Justification |
|
||
|----------|-----------|---------------|
|
||
| `TranslationOrchestrator` | C5 | Stateful lifecycle with PRE/POST, multi-step coordination, decision memory for retry/concurrency policies |
|
||
| `TranslationScheduler` | C4 | Stateful schedule CRUD with APScheduler integration, conflict detection; no decision memory needed |
|
||
| `TranslationEventLog` | C5 | Immutable event log with retention enforcement, audit invariants, decision memory for pruning strategy |
|
||
| `TranslationPreview` | C4 | Stateful preview with LLM calls, approve/edit/reject lifecycle |
|
||
| `TranslationExecutor` | C4 | Batch execution with retry, INSERT generation, progress tracking |
|
||
| `DictionaryManager` | C4 | CRUD with import/export, per-batch filtering, conflict resolution |
|
||
| Svelte components | C3 | UI state machines with UX_STATE/FEEDBACK/RECOVERY/REACTIVITY bindings |
|