tasks ready

This commit is contained in:
2026-05-08 18:01:49 +03:00
parent d8df1fff59
commit bdd376595c
32 changed files with 3243 additions and 229 deletions

View File

@@ -0,0 +1,143 @@
# Implementation Plan: LLM Table Translation Service
**Branch**: `028-llm-datasource-supeset` | **Date**: 2026-05-08 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/028-llm-datasource-supeset/spec.md`
## Summary
Implement an LLM-powered table translation service as a new backend plugin (`TranslationPlugin`) with companion API routes, ORM models, Pydantic schemas, and Svelte frontend components. The service reads rows from a Superset datasource (translation column + context columns), sends batches to a configured LLM provider with optional per-batch filtered terminology dictionary context, generates safe PostgreSQL INSERT/UPSERT SQL, and submits it to Superset via `/api/v1/sqllab/execute/` with status polling and reference recording. Supporting capabilities: terminology dictionary CRUD with language validation, translation preview as persistent quality gate, scheduled execution via APScheduler (new-key-only with baseline_expired fallback), structured event logging with MetricSnapshot persistence, RBAC-gated access control matrix, and 90-day retention with metric continuity.
**Total planned modules**: 25 contracts across backend (Python/FastAPI) and frontend (Svelte 5/SvelteKit).
**Complexity distribution**: 2× C5 (orchestrator, event-log), 8× C4 (preview, execute, dictionary, scheduler, SQL-gen, routes), 10× C3 (models, schemas, components), 5× C2 (helpers).
## Technical Context
**Language/Version**: Python 3.13+ (backend), JavaScript/TypeScript (frontend Svelte 5)
**Primary Dependencies**: FastAPI 0.115+, SQLAlchemy 2.0+, APScheduler 3.x, Pydantic v2 (backend); SvelteKit 2.x, Svelte 5.43+, Vite 7.x, Tailwind CSS 3.x (frontend)
**Storage**: PostgreSQL 16 (shared with existing ss-tools schema)
**Testing**: pytest 8.x + pytest-asyncio (backend); vitest 4.x + @testing-library/svelte 5.x (frontend)
**Target Platform**: Linux server (Docker Compose: backend + frontend + PostgreSQL)
**Project Type**: web application — FastAPI REST + WebSocket backend, SvelteKit SPA frontend
**Performance Goals**: Preview of 10 rows within 30s (LLM-dependent); INSERT generation <5s for 1000 rows; schedule trigger precision ±60s; event recording <10s after occurrence
**Constraints**: RBAC enforcement via access-control matrix; dictionary per-batch filtering (case-insensitive, word-boundary-aware); 90-day detailed data retention with MetricSnapshot persistence; plugin lifecycle compatible with `PluginBase`; dialect-aware SQL generation (PostgreSQL/Greenplum, ClickHouse supported for MVP); snapshot isolation for in-progress runs; structured JSON LLM output required
**Scale/Scope**: Tens of thousands of rows per run; dictionaries of any size; 50 concurrent users; 510 active translation jobs per deployment
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
| Principle | Status | Evidence |
|-----------|--------|----------|
| **I. Plugin Architecture** | PASS | Feature implemented as a new plugin inheriting from `PluginBase` (`backend/src/plugins/translate/`), consistent with existing `llm_analysis`, `git`, `storage` plugin pattern. |
| **II. API-First Design** | PASS | All operations exposed via REST endpoints under `/api/translate/` (jobs, dictionaries, runs, schedules, history) and WebSocket for run progress. Follows existing FastAPI route conventions. |
| **III. Test-First** | PASS | Acceptance criteria per user story defined in spec; test strategy in research.md covers unit (pytest), integration (API + DB), and component (vitest + Svelte Testing Library) layers. |
| **IV. RBAC & Security** | PASS | Granular permissions (`translate.job.*`, `translate.dictionary.*`, `translate.schedule.manage`, `translate.history.view`) via existing RBAC model. API keys encrypted via `EncryptionManager`. PII masking for LLM-facing context per existing patterns. |
| **V. Observability & Retention** | PASS | Structured Translation Events (FR-046), per-job metrics (FR-047), notification integration (FR-048), 90-day retention with pruning (FR-049). C4+ flows instrumented with `belief_scope`/`reason`/`reflect`/`explore` markers. |
| **Semantic Protocol Compliance** | PASS | All planned modules assigned `@COMPLEXITY` 25 with appropriate tag density. `@RATIONALE`/`@REJECTED` reserved for C5 contracts only. Canonical `@RELATION PREDICATE -> TARGET_ID` syntax. Comment-anchor syntax: `# [DEF:...]` for Python, `<!-- [DEF:...] -->` for Svelte markup. |
| **Fractal Limit (INV_7)** | PASS | Planned modules kept under 400 lines each. No planned contract exceeds 150 lines or CC>10. Decomposition strategy documented in contracts/modules.md. |
**Gate Result**: ✅ ALL PASS — no blocking constitutional or semantic conflicts.
## Project Structure
### Documentation (this feature)
```text
specs/028-llm-datasource-supeset/
├── plan.md # This file
├── research.md # Phase 0 output
├── data-model.md # Phase 1 output
├── quickstart.md # Phase 1 output
├── contracts/ # Phase 1 output
│ └── modules.md # Semantic contract design
├── spec.md # Feature specification
├── ux_reference.md # Interaction reference
└── checklists/
└── requirements.md # Quality validation
```
### Source Code (repository root)
```text
backend/
├── src/
│ ├── api/routes/
│ │ └── translate.py # New: translation REST endpoints
│ ├── core/
│ │ ├── scheduler.py # Existing: APScheduler (extend for translation schedules)
│ │ └── superset_client/ # Existing: Superset API client (reuse)
│ ├── models/
│ │ └── translate.py # New: SQLAlchemy ORM models
│ ├── plugins/
│ │ └── translate/ # New: TranslationPlugin
│ │ ├── __init__.py
│ │ ├── plugin.py # Plugin entry (inherits PluginBase)
│ │ ├── orchestrator.py # Run lifecycle orchestration (C5)
│ │ ├── preview.py # Preview engine (C4)
│ │ ├── executor.py # Batch execution + INSERT gen (C4)
│ │ ├── dictionary.py # Dictionary CRUD + filtering (C4)
│ │ ├── sql_generator.py # INSERT/UPSERT SQL generation (C3)
│ │ ├── scheduler.py # Schedule management (C4)
│ │ ├── events.py # Structured event logging (C5)
│ │ ├── metrics.py # Per-job metrics aggregation (C3)
│ │ └── __tests__/ # Plugin-level tests
│ ├── schemas/
│ │ └── translate.py # New: Pydantic request/response schemas
│ └── services/
│ ├── llm_provider.py # Existing: LLM provider management (reuse)
│ └── llm_prompt_templates.py # Existing: prompt rendering (reuse)
└── tests/ # Integration tests
frontend/
├── src/
│ ├── routes/
│ │ └── translate/ # New: SvelteKit route
│ │ ├── +page.svelte # Translation job list
│ │ ├── [id]/
│ │ │ └── +page.svelte # Job config + preview + run
│ │ ├── dictionaries/
│ │ │ ├── +page.svelte # Dictionary list
│ │ │ └── [id]/
│ │ │ └── +page.svelte # Dictionary editor
│ │ └── history/
│ │ └── +page.svelte # Run history
│ └── lib/
│ ├── components/
│ │ └── translate/ # New: reusable translation components
│ │ ├── TranslationJobConfig.svelte
│ │ ├── TranslationPreview.svelte
│ │ ├── TranslationRunProgress.svelte
│ │ ├── TranslationRunResult.svelte
│ │ ├── DictionaryEditor.svelte
│ │ ├── TermCorrectionPopup.svelte
│ │ ├── ScheduleConfig.svelte
│ │ └── BulkCorrectionSidebar.svelte
│ ├── stores/
│ │ └── translate.js # New: Svelte 5 rune stores
│ └── api/
│ └── translate.js # New: API client module
```
## Semantic Contract Guidance
- All planned contracts follow GRACE-Poly v2.4 protocol with `[DEF:id:Type]...[/DEF:id:Type]` anchors.
- Python backend uses `# [DEF:...]` comment-anchor syntax.
- Svelte components use `<!-- [DEF:...] -->` in markup and `// [DEF:...]` in script blocks.
- Complexity assignments and required tag coverage detailed in `contracts/modules.md`.
- C4+ Python modules instrumented with `belief_scope()` + `reason()`/`reflect()`/`explore()` markers.
- C5 contracts carry `@RATIONALE` and `@REJECTED` decision memory.
- `@RELATION` targets reference existing contracts where applicable (e.g., `[LLMProviderService:Module]`, `[SchedulerService:Class]`, `[SupersetClient:Module]`, `[NotificationService:Module]`).
## Complexity Tracking
No constitutional violations detected. All complexity assignments are justified within the semantic protocol's complexity scale:
| Contract | Complexity | Justification |
|----------|-----------|---------------|
| `TranslationOrchestrator` | C5 | Stateful lifecycle with PRE/POST, multi-step coordination, decision memory for retry/concurrency policies |
| `TranslationScheduler` | C4 | Stateful schedule CRUD with APScheduler integration, conflict detection; no decision memory needed |
| `TranslationEventLog` | C5 | Immutable event log with retention enforcement, audit invariants, decision memory for pruning strategy |
| `TranslationPreview` | C4 | Stateful preview with LLM calls, approve/edit/reject lifecycle |
| `TranslationExecutor` | C4 | Batch execution with retry, INSERT generation, progress tracking |
| `DictionaryManager` | C4 | CRUD with import/export, per-batch filtering, conflict resolution |
| Svelte components | C3 | UI state machines with UX_STATE/FEEDBACK/RECOVERY/REACTIVITY bindings |