Files
ss-tools/specs/028-llm-datasource-supeset/tasks.md
2026-05-08 18:01:49 +03:00

346 lines
41 KiB
Markdown

# Tasks: LLM Table Translation Service
**Feature Branch**: `028-llm-datasource-supeset`
**Input**: Design documents from `/specs/028-llm-datasource-supeset/`
**Prerequisites**: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/modules.md
**Tests**: Test tasks are included for all C4/C5 backend contracts, new API endpoints, and Svelte components with `@UX_STATE` contracts. Test work traces to contract `@PRE`/`@POST` guarantees and spec acceptance scenarios.
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (e.g., US1, US5)
- Include exact file paths in descriptions
---
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Create plugin directory structure and register the new route module in the lazy-import registry.
- [ ] T001 Create translation plugin directory structure: `backend/src/plugins/translate/__init__.py`, `backend/src/plugins/translate/plugin.py` (empty skeleton), plus `backend/src/plugins/translate/__tests__/__init__.py`
- [ ] T002 Register `translate` route module in `backend/src/api/routes/__init__.py` — add `"translate"` to `__all__` list inside `[DEF:Route_Group_Contracts:Block]`
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: ORM models, Pydantic schemas, plugin boilerplate, route skeleton, and database migration. ALL user stories depend on these artifacts.
**⚠️ CRITICAL**: No user story work can begin until this phase is complete.
### ORM Models
- [ ] T003 [P] Create all SQLAlchemy ORM models in `backend/src/models/translate.py`: `TranslationJob`, `TranslationRun`, `TranslationBatch`, `TranslationRecord`, `TranslationEvent`, `TranslationPreviewSession`, `TranslationPreviewRecord`, `TerminologyDictionary`, `DictionaryEntry`, `TranslationSchedule`, `TranslationJobDictionary`, `MetricSnapshot`. Follow patterns from `backend/src/models/llm.py` (UUID PKs, `generate_uuid`, `Base` inheritance, JSON columns, `UniqueConstraint`, indexes, timezone-aware DateTime with callable defaults). Include `source_term_normalized` column on `DictionaryEntry` with unique constraint for case-insensitive matching.
- [ ] T004 [P] Create Pydantic v2 request/response schemas in `backend/src/schemas/translate.py`: `TranslateJobCreate`, `TranslateJobUpdate`, `TranslateJobResponse`, `DictionaryCreate`, `DictionaryImport`, `DictionaryResponse`, `TermCorrectionSubmit`, `ScheduleConfig`, `TranslationRunResponse`, `TranslationPreviewResponse` (with `PreviewRow`), `MetricsResponse`. Follow existing `backend/src/schemas/` patterns (use `BaseModel`, `Field` with defaults/validation)
### Plugin Skeleton
- [ ] T005 Create `TranslatePlugin` class in `backend/src/plugins/translate/plugin.py` inheriting from `PluginBase`. Implement `id`, `name`, `description` properties. Wire `@RELATION INHERITS -> [PluginBase:Class]` in contract header. (RATIONALE: separate plugin avoids bloating `LLMAnalysisPlugin` beyond fractal limit; REJECTED: extending LLMAnalysisPlugin would conflate domains)
### Route Skeleton
- [ ] T006 Create `backend/src/api/routes/translate.py` with FastAPI `APIRouter` (prefix=`/api/translate`, tags=`["translate"]`). Define all endpoint stubs with `pass` bodies for now: CRUD jobs, CRUD dictionaries, preview trigger, run trigger, retry, schedule CRUD, run history, metrics, correction submission, dictionary import. Attach `Depends(require_permission(...))` annotations. Register router in `backend/src/app.py` alongside existing routers.
### Database Migration
- [ ] T007 Generate Alembic migration for all `translate_*` tables: `translation_jobs`, `translation_runs`, `translation_records`, `translation_events`, `terminology_dictionaries`, `dictionary_entries`, `translation_schedules`, `translation_job_dictionaries`. Run `cd backend && alembic revision --autogenerate -m "add translation tables"` and `alembic upgrade head`.
### RBAC Registration
- [ ] T008 Register 13 permission strings in the RBAC seed/permission store: `translate.job.view`, `translate.job.create`, `translate.job.edit`, `translate.job.delete`, `translate.job.execute`, `translate.dictionary.view`, `translate.dictionary.create`, `translate.dictionary.edit`, `translate.dictionary.delete`, `translate.schedule.view`, `translate.schedule.manage`, `translate.history.view`, `translate.metrics.view`. Ensure admin role gets all; analyst role gets `translate.job.view`, `translate.job.execute`, `translate.dictionary.view`, `translate.history.view`. Update role seeding script if needed.
**Checkpoint**: Foundation ready — models, schemas, plugin, routes, migration, and RBAC all in place. User story implementation can now begin.
---
## Phase 3: User Story 1 — Configure Translation Job (Priority: P1) 🎯 MVP
**Goal**: User can create, edit, delete, and list translation jobs with datasource selection, column mapping, key columns, target table configuration, LLM settings, and dictionary attachment.
**Independent Test**: Open Configuration form → select Superset datasource → pick translation/context/key columns → specify target table → save → verify job appears in list with correct settings.
### Backend — Job CRUD
- [ ] T009 [P] [US1] Implement job CRUD service in `backend/src/plugins/translate/plugin.py` as methods on the `TranslatePlugin` class: `create_job()`, `update_job()`, `delete_job()`, `get_job()`, `list_jobs()`, `duplicate_job()`. Validate column existence via `SupersetClient` on create/update (FR-001, FR-002, FR-006). Enforce composite key support (FR-004). Detect virtual columns and warn (US1 acceptance scenario 5).
- [ ] T010 [US1] Implement `/api/translate/jobs` endpoints in `backend/src/api/routes/translate.py`: `POST /` (create), `GET /` (list), `GET /{job_id}` (get), `PUT /{job_id}` (update), `DELETE /{job_id}` (delete), `POST /{job_id}/duplicate` (duplicate — FR-021). Inject `Depends(require_permission("translate.job.*"))` per operation.
- [ ] T011 [US1] Implement `/api/translate/datasources/{datasource_id}/columns` endpoint that queries Superset for column metadata (name, type, is_physical flag) and the database dialect (backend/engine) from the connection configuration. Returns column list AND `database_dialect` field for the frontend. Cache dialect on `TranslationJob.database_dialect` at save time. Reject unsupported dialects at configuration time (FR-002, dialect detection).
### Frontend — Job Config UI
- [ ] T012 [P] [US1] Create `TranslateApiClient` module in `frontend/src/lib/api/translate.js`: `fetchJobs()`, `createJob()`, `updateJob()`, `deleteJob()`, `duplicateJob()`, `fetchDatasourceColumns()`. Use existing `requestApi`/`fetchApi` wrapper pattern.
- [ ] T013 [US1] Create `TranslationJobList` SvelteKit page in `frontend/src/routes/translate/+page.svelte`: list all jobs with name, datasource, status/schedule indicators, create button, duplicate action. `@UX_STATE`: idle, loading, empty, populated, error.
- [ ] T014 [US1] Create `TranslationJobConfig` SvelteKit page in `frontend/src/routes/translate/[id]/+page.svelte`: datasource dropdown → column selectors (translation column, context columns, key columns with [+ Add key] for composite), target table/column inputs, LLM provider selector, target language, batch size, prompt template editor, dictionary attachment (multi-select with priority ordering). `@UX_STATE`: idle, loading, configured, saving, validation_error, datasource_unavailable. `@UX_REACTIVITY`: column list `$derived` from datasource selection.
### Verification — US1
- [ ] T015 [US1] Write pytest integration tests for job CRUD API in `backend/tests/test_translate_jobs.py`: test create with valid config, create with missing translation column (expect 422), create with virtual key column (expect warning), update job, delete job, duplicate job. Mock `SupersetClient` for column metadata.
- [ ] T016 [US1] Verify US1 acceptance scenarios against `specs/028-llm-datasource-supeset/spec.md` User Story 1 (5 scenarios). Run `cd backend && pytest backend/tests/test_translate_jobs.py -v`.
**Checkpoint**: Job CRUD fully functional — user can create, edit, list, and duplicate translation jobs with validated column mappings.
---
## Phase 4: User Story 5 — Terminology Dictionary Management (Priority: P2)
**Goal**: User can create, edit, delete dictionaries; add terms inline; import CSV/TSV with duplicate detection; attach dictionaries to jobs with priority ordering.
**Independent Test**: Create dictionary with 5 terms → import CSV with 50 terms → verify duplicates flagged → attach dictionary to job → verify dictionary appears in job config.
### Backend — Dictionary CRUD + Import
- [ ] T017 [P] [US5] Implement `DictionaryManager` class in `backend/src/plugins/translate/dictionary.py`: `create_dictionary()`, `update_dictionary()`, `delete_dictionary()`, `get_dictionary()`, `list_dictionaries()`, `add_entry()`, `edit_entry()`, `delete_entry()`, `clear_entries()`. Enforce unique `source_term` per dictionary with conflict resolution (FR-026). Prevent deletion if attached to active/scheduled jobs (FR-030). `@COMPLEXITY 4` — instrument with `belief_scope`/`reason`/`reflect` markers at mutation boundaries. (RATIONALE: C4 warranted because dictionary CRUD is stateful and must enforce referential integrity on deletion; REJECTED: pure C3 CRUD without state guards would allow orphaned job-dictionary links)
- [ ] T018 [US5] Implement CSV/TSV import in `DictionaryManager`: parse uploaded content, detect delimiter, create `DictionaryEntry` rows, preview with duplicate detection, return parse errors with line numbers for malformed rows (FR-025). Add `DictionaryImport` schema validation.
- [ ] T019 [US5] Implement `/api/translate/dictionaries` endpoints in `backend/src/api/routes/translate.py`: `POST /` (create), `GET /` (list), `GET /{dict_id}` (get with entries), `PUT /{dict_id}` (update), `DELETE /{dict_id}` (delete — blocked if attached), `POST /{dict_id}/entries` (add entry), `PUT /{dict_id}/entries/{entry_id}` (edit), `DELETE /{dict_id}/entries/{entry_id}` (delete), `POST /{dict_id}/import` (CSV/TSV import with preview).
- [ ] T020 [US5] Implement per-batch dictionary filtering logic in `DictionaryManager.filter_for_batch(source_texts: list[str]) -> list[dict]`: scan batch texts for substrings matching dictionary `source_term` values; return matched entries in priority order across all attached dictionaries (FR-044). This is consumed by US2 (preview) and US3 (executor).
### Frontend — Dictionary UI
- [ ] T021 [P] [US5] Add dictionary API methods to `frontend/src/lib/api/translate.js`: `fetchDictionaries()`, `createDictionary()`, `updateDictionary()`, `deleteDictionary()`, `fetchDictionaryEntries()`, `addEntry()`, `editEntry()`, `deleteEntry()`, `importDictionary()`.
- [ ] T022 [US5] Create `DictionaryList` SvelteKit page in `frontend/src/routes/translate/dictionaries/+page.svelte`: list dictionaries with name, language, term count, attached job count, create/delete actions. `@UX_STATE`: idle, loading, empty, populated, delete_blocked.
- [ ] T023 [US5] Create `DictionaryEditor` SvelteKit page in `frontend/src/routes/translate/dictionaries/[id]/+page.svelte`: inline term editor (source_term → target_translation), add/delete rows, CSV/TSV import with conflict preview, export. `@UX_STATE`: idle, loading, editing, importing, import_preview, import_conflict, saving. `@UX_FEEDBACK`: import preview with duplicate flags; toast on save.
### Verification — US5
- [ ] T024 [US5] Write pytest tests for DictionaryManager in `backend/src/plugins/translate/__tests__/test_dictionary.py`: test create/update/delete, add entry with duplicate detection (expect conflict), import CSV with valid/invalid rows, delete dictionary blocked by active job, per-batch filtering returns matched terms.
- [ ] T025 [US5] Verify US5 acceptance scenarios against spec User Story 5 (6 scenarios). Run `cd backend && pytest backend/src/plugins/translate/__tests__/test_dictionary.py -v`.
**Checkpoint**: Dictionary management fully functional — CRUD, import, filtering, and job attachment all work.
---
## Phase 5: User Story 2 — Preview Translated Output (Priority: P2)
**Goal**: User triggers preview on a saved job → system fetches sample rows → sends to LLM with context + dictionary → displays source/context/translation side-by-side → user approves/edits/rejects → preview state saved for execution gate.
**Independent Test**: Create job + dictionary → click Preview → verify 10 rows shown with LLM translations → approve 8, edit 1, reject 1 → verify state preserved.
### Backend — Preview Engine
- [ ] T026 [US2] Implement `TranslationPreview` class in `backend/src/plugins/translate/preview.py`: `preview_rows(job_id, sample_size)`. Fetch source rows from Superset via `SupersetClient`; construct LLM prompt using `LLMProviderService` + `llm_prompt_templates.render_prompt()` + `DictionaryManager.filter_for_batch()`; call LLM; return `PreviewRow` list. `@COMPLEXITY 4` — instrument with `belief_scope`/`reason`/`reflect` at LLM call boundaries. (RATIONALE: C4 because preview is stateful (approve/edit/reject lifecycle) and calls external LLM API with side effects; REJECTED: making preview purely read-only without approval state would degrade UX by losing user decisions between preview and execution)
- [ ] T027 [US2] Implement token count and cost estimation in preview response: compute estimated tokens from sample → extrapolate to full dataset row count → apply provider pricing → return `estimated_total_rows`, `estimated_tokens`, `estimated_cost` in `TranslationPreviewResponse` (FR-014).
- [ ] T028 [US2] Implement preview quality gate: create persistent `TranslationPreviewSession` and `TranslationPreviewRecord` rows with `config_hash` and `dict_snapshot_hash`. Preview acceptance gates full execution; rejected preview sample rows are excluded from full run. Preview is a quality gate — unseen rows are processed normally in full run.
- [ ] T029 [US2] Implement `/api/translate/jobs/{job_id}/preview` endpoint: `POST` triggers preview, returns preview rows with `status=pending`. Add `PUT /api/translate/jobs/{job_id}/preview/rows/{row_key}` for approve/edit/reject actions. Add `POST /api/translate/jobs/{job_id}/preview/approve-all` for bulk approve.
### Frontend — Preview UI
- [ ] T030 [P] [US2] Add preview API methods to `frontend/src/lib/api/translate.js`: `fetchPreview()`, `approveRow()`, `editRow()`, `rejectRow()`, `approveAll()`.
- [ ] T031 [US2] Create `TranslationPreview` component in `frontend/src/lib/components/translate/TranslationPreview.svelte`: side-by-side table (source, context, LLM translation), approve/edit/reject buttons per row, bulk approve, cost estimate card before full run, row limit input. `@UX_STATE`: idle, loading, preview_loaded, preview_error, retrying. `@UX_FEEDBACK`: spinner during LLM call; visual distinction for LLM-generated vs user-edited values; cost estimate reactivity. `@UX_RECOVERY`: retry preview button; individual row re-translate.
- [ ] T032 [US2] Integrate `TranslationPreview` into `TranslationJobConfig` page (`frontend/src/routes/translate/[id]/+page.svelte`) as a tab or collapsible section that appears after job is saved.
### Verification — US2
- [ ] T033 [US2] Write pytest tests for preview in `backend/src/plugins/translate/__tests__/test_preview.py`: test preview with valid job, preview with dictionary (verify glossary terms in prompt), preview row approve/edit/reject state transitions, cost estimation accuracy. Mock LLM provider responses.
- [ ] T034 [US2] Write vitest component test for `TranslationPreview` in `frontend/src/lib/components/translate/__tests__/TranslationPreview.test.js`: test rendering of preview rows, approve/reject/edit interactions, bulk approve behavior. Mock API client.
- [ ] T035 [US2] Verify US2 acceptance scenarios against spec User Story 2 (5 scenarios). Run `cd backend && pytest backend/src/plugins/translate/__tests__/test_preview.py -v && cd frontend && npm run test -- --run`.
**Checkpoint**: Preview flows complete — LLM translation with context + dictionary, approve/edit/reject lifecycle, cost estimation.
---
## Phase 6: User Story 3 — Execute Translation & Insert Results (Priority: P3)
**Goal**: User triggers full batch execution → system processes rows in batches → generates INSERT SQL → user copies to SQL Lab or auto-executes → failed batches retryable.
**Independent Test**: Create job → preview + approve → execute → verify INSERT SQL generated with correct key columns → execute in SQL Lab → verify rows in target table.
### Backend — Executor + SQL Generator + Orchestrator
- [ ] T036 [US3] Implement `SQLGenerator` class in `backend/src/plugins/translate/sql_generator.py`: `generate_insert(records: list[TranslationRecord], job: TranslationJob) -> str`. Detect dialect from `job.database_dialect` (cached from Superset connection at save time). Produce safe dialect-appropriate SQL: for PostgreSQL/Greenplum — `INSERT INTO "target_table" ("key_cols"..., "target_col") VALUES (...)` with quoted identifiers; support `upsert_strategy`: `insert` (plain INSERT), `skip_existing` (ON CONFLICT DO NOTHING), `overwrite` (ON CONFLICT DO UPDATE). For ClickHouse — `INSERT INTO target_table (key_cols..., target_col) VALUES (...)`; `skip_existing` warns user (not natively supported); `overwrite` documented limitation. `@COMPLEXITY 3`. (RATIONALE: dialect-aware because Superset connections may use ClickHouse or PostgreSQL; REJECTED: PostgreSQL-only would break ClickHouse users; raw identifier interpolation rejected)
- [ ] T037 [US3] Implement `TranslationExecutor` class in `backend/src/plugins/translate/executor.py`: `execute_run(run: TranslationRun, job: TranslationJob)`. Fetch all source rows from Superset; split into batches; for each batch: call `DictionaryManager.filter_for_batch()`, construct prompt via `LLMProviderService`, call LLM, create `TranslationRecord` rows with status `translated`/`failed`/`skipped`; handle batch-level retry on LLM failure (FR-015); skip NULL translation values (FR-016); reject NULL key values (FR-017); update run statistics. `@COMPLEXITY 4` — instrument with `belief_scope`/`reason`/`reflect` at batch boundaries and error paths.
- [ ] T038 [US3] Implement `TranslationOrchestrator` class in `backend/src/plugins/translate/orchestrator.py`: `start_run(job_id, trigger_type)`. Validate preconditions (job config valid, datasource accessible, LLM provider reachable); create `TranslationRun` with status `running` and config/dict snapshots (FR-019, FR-029); dispatch to `TranslationExecutor`; on completion call `SQLGenerator`; record `TranslationEvent` rows via `TranslationEventLog` (FR-046); enforce state transitions: pending → running → (completed | partial | failed) — no skipping. `@COMPLEXITY 5` — full `@PRE`/`@POST`/`@DATA_CONTRACT`/`@INVARIANT` enforcement with `@RATIONALE`/`@REJECTED`. (RATIONALE: central coordinator is C5 because preview, execution, event logging, and retry share run state and must coordinate within a single transaction boundary; REJECTED: distributed actor model would introduce eventual-consistency challenges for status tracking at current scale)
- [ ] T039 [US3] Implement `TranslationEventLog` class in `backend/src/plugins/translate/events.py`: `log_event(run_id, job_id, event_type, payload)`. Create immutable `TranslationEvent` row. `query_events(job_id, filters)` for audit/dashboard. `prune_expired()` for 90-day retention enforcement (FR-049) — scheduled via APScheduler cleanup job. `@COMPLEXITY 5``@INVARIANT`: every run must have exactly one `run_started` and one terminal event. (RATIONALE: C5 warranted because event log is single source of truth for observability, metrics, and audit; REJECTED: stdout-only logging lacks structured payload integrity and cannot enforce terminal-event invariant)
- [ ] T040 [US3] Implement execution endpoints in `backend/src/api/routes/translate.py`: `POST /api/translate/jobs/{job_id}/runs` (trigger manual run — creates run, dispatches orchestrator which translates AND submits to Superset API), `GET /api/translate/runs/{run_id}` (status + statistics + insert_status + superset_query_id), `GET /api/translate/runs/{run_id}/records` (paginated TranslationRecord list), `POST /api/translate/runs/{run_id}/retry` (retry failed batches only — FR-015), `POST /api/translate/runs/{run_id}/retry-insert` (retry Superset insert only without re-translating). Inject `Depends(require_permission("translate.job.execute"))`.
### Frontend — Execution UI
- [ ] T041 [P] [US3] Add execution API methods to `frontend/src/lib/api/translate.js`: `triggerRun()`, `fetchRunStatus()`, `fetchRunRecords()`, `retryFailedBatches()`.
- [ ] T042 [US3] Create `TranslationRunProgress` component in `frontend/src/lib/components/translate/TranslationRunProgress.svelte`: live progress bar (WebSocket-driven from `TaskWebSocket`), batch counter (N/M), success/failure/skip counts, cancel button. `@UX_STATE`: idle, running, pausing, cancelled, completed, partial, failed. `@UX_FEEDBACK`: progress percentage `$derived` from translated/total; real-time counts. `@UX_RECOVERY`: retry failed batches button; cancel run; download skipped rows.
- [ ] T043 [US3] Create `TranslationRunResult` component in `frontend/src/lib/components/translate/TranslationRunResult.svelte`: completion summary (rows translated/failed/skipped, token count, cost, insert_status), Superset execution reference with status badge, generated SQL block for audit/debugging (collapsed by default), retry-insert button. `@UX_STATE`: completed, partial, failed, insert_failed. `@UX_FEEDBACK`: Superset execution status badge; SQL block for audit.
- [ ] T044 [US3] Integrate `TranslationRunProgress` and `TranslationRunResult` into `TranslationJobConfig` page as the "Run" tab/section.
### Verification — US3
- [ ] T045 [US3] Write pytest tests for `SQLGenerator` in `backend/src/plugins/translate/__tests__/test_sql_generator.py`: test INSERT with single key, composite key — for PostgreSQL dialect AND ClickHouse dialect. Test PostgreSQL UPSERT (ON CONFLICT DO NOTHING, ON CONFLICT DO UPDATE). Test ClickHouse plain INSERT and skip_existing warning. Test NULL key rejection, NULL translation value skipping, identifier quoting per dialect, injection safety. Validate SQL syntax correctness against each dialect.
- [ ] T046 [US3] Write pytest tests for executor + orchestrator in `backend/src/plugins/translate/__tests__/test_orchestrator.py`: test full run lifecycle (pending→running→completed), partial failure (one batch fails, rest succeed), batch retry, event log invariants, NULL handling. Mock LLM provider and SupersetClient.
- [ ] T047 [US3] Verify US3 acceptance scenarios against spec User Story 3 (5 scenarios). Run `cd backend && pytest backend/src/plugins/translate/__tests__/test_orchestrator.py backend/src/plugins/translate/__tests__/test_sql_generator.py -v`.
**Checkpoint**: Execution pipeline complete — batch processing, INSERT generation, retry, event logging. User can translate data and insert into target table.
---
## Phase 7: User Story 6 — Feedback Loop (Correct → Dictionary) (Priority: P3)
**Goal**: In run results, user selects incorrect translation → submits correction to dictionary → dictionary updated with origin tracking → next run uses corrected term.
**Independent Test**: Complete a run → find incorrect translation → open correction popup → submit to dictionary → re-run preview → verify corrected term used.
### Backend — Correction Submission
- [ ] T048 [US6] Implement correction submission endpoint in `backend/src/api/routes/translate.py`: `POST /api/translate/corrections` accepting `TermCorrectionSubmit` body. Validate target language match between dictionary and job (FR language validation edge case); detect existing entry conflict → return conflict response (FR-032); create `DictionaryEntry` with origin tracking (`origin_run_id`, `origin_row_key`, `origin_user_id`) per FR-033. Inject `Depends(require_permission("translate.dictionary.edit"))`.
- [ ] T049 [US6] Implement bulk correction endpoint: `POST /api/translate/corrections/bulk` accepting array of `TermCorrectionSubmit` objects (FR-034). Process atomically — if any conflict is detected, return all conflicts for user resolution before partial apply.
### Frontend — Correction UI
- [ ] T050 [P] [US6] Add correction API methods to `frontend/src/lib/api/translate.js`: `submitCorrection()`, `submitBulkCorrections()`.
- [ ] T051 [US6] Create `TermCorrectionPopup` component in `frontend/src/lib/components/translate/TermCorrectionPopup.svelte`: text selection on source term and incorrect target translation → popup with source term (pre-filled from source column), incorrect target translation (pre-filled from selection), corrected target translation input, dictionary selector dropdown (filtered by target language), submit button, conflict dialog (overwrite/keep existing/cancel). `@UX_STATE`: closed, selecting, editing, submitting, conflict_detected, submitted. `@UX_FEEDBACK`: "Added to Dictionary" badge on corrected row.
- [ ] T052 [US6] Create `BulkCorrectionSidebar` component in `frontend/src/lib/components/translate/BulkCorrectionSidebar.svelte`: sidebar collecting selected terms across rows, per-term correction inputs, submit all to dictionary. `@UX_STATE`: closed, collecting, reviewing, submitting, submitted. `@UX_REACTIVITY`: selected terms list `$state`.
- [ ] T053 [US6] Integrate feedback-loop components into `TranslationRunResult` (T043) — add selection highlight behavior and correction triggers.
### Verification — US6
- [ ] T054 [US6] Write pytest tests for correction endpoints in `backend/tests/test_translate_corrections.py`: test single correction, bulk correction, conflict detection (existing term), cross-language rejection, origin tracking fields populated. Verify corrected term appears in next preview's dictionary filter.
- [ ] T055 [US6] Verify US6 acceptance scenarios against spec User Story 6 (5 scenarios). Run `cd backend && pytest backend/tests/test_translate_corrections.py -v`.
**Checkpoint**: Feedback loop complete — corrections flow from results → dictionary → next run.
---
## Phase 8: User Story 7 — Schedule Translation Jobs (Priority: P3)
**Goal**: User configures schedule → system triggers runs → new-key-only translation → optional auto-INSERT → failure notification → pause/resume.
**Independent Test**: Configure schedule (every 5 min for test) → wait for trigger → verify new TranslationRun created → verify only new keys translated → disable schedule → verify no more triggers.
### Backend — Schedule Management + Trigger Dispatch
- [ ] T056 [US7] Implement `TranslationScheduler` class in `backend/src/plugins/translate/scheduler.py`: `create_schedule()`, `update_schedule()`, `delete_schedule()`, `enable_schedule()`, `disable_schedule()`, `get_next_executions(schedule, n=3)` (FR-036). Register schedule with existing `SchedulerService` via `add_job()` with cron/interval/date trigger. `@COMPLEXITY 4` — instrument with `belief_scope`/`reason`/`reflect`. (RATIONALE: C4 because schedule management is stateful with APScheduler integration, concurrency policy enforcement, and trigger dispatch side effects)
- [ ] T057 [US7] Implement schedule trigger handler: `_execute_scheduled_translation(job_id)`. Enforce concurrency policy: check if previous run for same job is still `running``skip` (log + event) or `queue` (start after previous completes) per FR-039. If proceeding: create new `TranslationRun` with `trigger_type=scheduled`; fetch source rows; apply new-key-only filter (FR-045) — compare current key values against previous successful run's key values; dispatch to `TranslationOrchestrator`. On failure, send notification via `NotificationService` (FR-041, FR-048). Schedule remains enabled for next trigger (US7 acceptance scenario 6).
- [ ] T058 [US7] Implement Superset SQL Lab API submission for all runs: create `SupersetSqlLabExecutor` class in `backend/src/plugins/translate/superset_executor.py`. Submit generated SQL to `/api/v1/sqllab/execute/`, poll execution status, update `TranslationRun.insert_status`, `superset_query_id`, `rows_affected`, error fields. For scheduled runs, this happens automatically; for manual runs, this happens on user trigger. Record `insert_submitted`/`insert_succeeded`/`insert_failed` events.
- [ ] T059 [US7] Implement schedule endpoints in `backend/src/api/routes/translate.py`: `PUT /api/translate/jobs/{job_id}/schedule` (create/update), `DELETE /api/translate/jobs/{job_id}/schedule` (remove), `POST /api/translate/jobs/{job_id}/schedule/enable` (FR-040), `POST /api/translate/jobs/{job_id}/schedule/disable` (FR-040). Inject `Depends(require_permission("translate.schedule.manage"))`. Add schedule warning when editing job with active schedule (FR-042).
- [ ] T060 [US7] Extend `SchedulerService.load_schedules()` in `backend/src/core/scheduler.py` to discover and register active `TranslationSchedule` rows alongside existing backup schedules (R4).
### Frontend — Schedule UI
- [ ] T061 [P] [US7] Add schedule API methods to `frontend/src/lib/api/translate.js`: `updateSchedule()`, `deleteSchedule()`, `enableSchedule()`, `disableSchedule()`.
- [ ] T062 [US7] Create `ScheduleConfig` component in `frontend/src/lib/components/translate/ScheduleConfig.svelte`: type selector (cron/interval/once), cron expression input with validation, interval input, timezone selector, run-at datetime picker, next-3-executions preview (with timezone), concurrency policy selector (skip/queue), enable/disable toggle with status indicator. Warns if no prior successful manual run exists. `@UX_STATE`: idle, editing, validating, enabled, disabled, no_prior_run_warning. `@UX_REACTIVITY`: next execution times `$derived` from schedule config with timezone display.
- [ ] T063 [US7] Integrate `ScheduleConfig` into `TranslationJobConfig` page as the "Schedule" tab.
### Verification — US7
- [ ] T064 [US7] Write pytest tests for scheduler in `backend/src/plugins/translate/__tests__/test_scheduler.py`: test schedule CRUD, cron expression validation, next-N-executions calculation, trigger dispatch with skip/queue concurrency, new-key-only filter (verify only unseen keys processed), auto-INSERT execution, failure notification, pause/resume, load on SchedulerService start.
- [ ] T065 [US7] Verify US7 acceptance scenarios against spec User Story 7 (8 scenarios). Run `cd backend && pytest backend/src/plugins/translate/__tests__/test_scheduler.py -v`.
**Checkpoint**: Scheduling complete — jobs can run automatically on schedule with new-key-only incremental translation and failure recovery.
---
## Phase 9: User Story 4 — Translation History & Audit Trail (Priority: P4)
**Goal**: User views past runs with filterable list; inspects run details (config snapshot, prompt, translations, INSERT SQL); sees edit marks; duplicates job. Admin views metrics dashboard.
**Independent Test**: Run several translations → open history → filter by datasource → click run → verify config snapshot, prompt, translations with edit marks, INSERT SQL all shown.
### Backend — History + Metrics Endpoints
- [ ] T066 [US4] Implement history endpoints in `backend/src/api/routes/translate.py`: `GET /api/translate/runs` (list with filters: `job_id`, `datasource_id`, `target_table`, `status`, `date_from`, `date_to`, pagination per FR-020), `GET /api/translate/runs/{run_id}` (detail with `config_snapshot`, `prompt_used`, `records` with `llm_translation` and `user_edit` fields visible — FR showing original vs user-edited).
- [ ] T067 [US4] Implement `TranslationMetrics` class in `backend/src/plugins/translate/metrics.py`: `get_job_metrics(job_id) -> MetricsResponse`. Aggregate from `TranslationEvent` table: total runs, success/failure counts, cumulative tokens, cumulative cost, average batch latency (FR-047). `@COMPLEXITY 3`.
- [ ] T068 [US4] Implement metrics endpoint: `GET /api/translate/jobs/{job_id}/metrics`. Inject `Depends(require_permission("translate.history.view"))`.
### Frontend — History + Metrics UI
- [ ] T069 [P] [US4] Add history API methods to `frontend/src/lib/api/translate.js`: `fetchRunHistory()`, `fetchRunDetail()`, `fetchJobMetrics()`.
- [ ] T070 [US4] Create `TranslationHistory` SvelteKit page in `frontend/src/routes/translate/history/+page.svelte`: filterable table (datasource, target table, row count, status, date, user), click-to-expand detail with config snapshot, prompt, translation rows with edit marks, INSERT SQL. `@UX_STATE`: idle, loading, empty, populated, detail_open. `@UX_REACTIVITY`: filtered list `$derived` from filters.
- [ ] T071 [US4] Create admin metrics dashboard section (integrated into existing admin pages or standalone) displaying per-job metrics: run counts, success/failure ratio, cumulative tokens, cumulative cost, average latency. Use `MetricsResponse` schema.
### Verification — US4
- [ ] T072 [US4] Write pytest tests for history + metrics in `backend/tests/test_translate_history.py`: test run list with filters, run detail with snapshots, metrics aggregation accuracy, `TranslationEvent` queryability.
- [ ] T073 [US4] Verify US4 acceptance scenarios against spec User Story 4 (4 scenarios). Run `cd backend && pytest backend/tests/test_translate_history.py -v`.
**Checkpoint**: History and audit complete — all runs traceable, metrics dashboard populated.
---
## Phase 10: Polish & Cross-Cutting Concerns
**Purpose**: Retention enforcement, notification wiring, semantic audit, quickstart validation, and rejected-path regression protection.
- [ ] T074 [P] Implement 90-day retention pruning in `TranslationEventLog.prune_expired()`: run as APScheduler daily cleanup job. BEFORE pruning events/records: persist cumulative metrics as `MetricSnapshot` row (tokens, cost, run counts). Then prune `TranslationRecord`, `TranslationPreviewRecord`, `TranslationEvent`, and `insert_sql`/`config_snapshot` fields older than 90 days. Preserve `TranslationRun` metadata, `MetricSnapshot` rows, and `superset_query_id`. Verify metrics remain accurate post-prune (SC-014). (RATIONALE: metric snapshots prevent cumulative data loss from event pruning; REJECTED: indefinite retention would violate storage constraints)
- [ ] T075 [P] Wire scheduled-run failure notification: ensure `TranslationScheduler` trigger handler calls `NotificationService.send()` when a scheduled run fails (FR-041, FR-048). Test with mock notification provider.
- [ ] T076 [P] Instrument remaining C4/C5 Python flows with `belief_scope`/`reason`/`reflect`/`explore` markers where missing: `TranslationOrchestrator.start_run()` (entry/exit), `TranslationExecutor.execute_run()` (batch boundaries + error paths), `DictionaryManager` mutation boundaries, `TranslationScheduler` trigger dispatch. Verify via `axiom_semantic_validation` belief-runtime audit.
- [ ] T077 Run full semantic audit via axiom MCP tools:
- `axiom_semantic_validation audit_contracts --file_path backend/src/plugins/translate/` — verify all `[DEF]` anchors are closed, `@RELATION` targets resolve, no orphan contracts, C4+ contracts have required tag density
- `axiom_semantic_validation audit_belief_protocol --file_path backend/src/plugins/translate/` — verify `@RATIONALE`/`@REJECTED` present on all C5 contracts
- `axiom_semantic_validation audit_belief_runtime --file_path backend/src/plugins/translate/` — verify `belief_scope`/`reason`/`reflect`/`explore` markers exist in all C4+ module bodies
- `axiom_semantic_validation impact_analysis --contract_id TranslationOrchestrator:Class` — verify no rejected path is accidentally re-enabled
- [ ] T078 Run quickstart validation: follow `specs/028-llm-datasource-supeset/quickstart.md` end-to-end — create dictionary → create job → preview → execute → verify INSERT SQL → submit correction → schedule → view history → verify metrics. Run `cd backend && pytest -v`, `cd frontend && npm run test -- --run`, `cd backend && ruff check src/plugins/translate/ src/api/routes/translate.py src/models/translate.py src/schemas/translate.py`.
- [ ] T079 Rejected-path regression guard: add a test case in `backend/src/plugins/translate/__tests__/test_orchestrator.py` verifying snapshot isolation — changing job config mid-run does NOT invalidate the running TranslationRun. Add a test case in `backend/src/plugins/translate/__tests__/test_sql_generator.py` verifying that UPDATE statements are never generated (only INSERT/UPSERT per PostgreSQL dialect). Add a test case in `backend/src/plugins/translate/__tests__/test_dictionary.py` verifying that duplicate source_term entries cannot coexist (UniqueConstraint enforced) and conflict resolution only offers overwrite/keep-existing. Add a test case in `backend/src/plugins/translate/__tests__/test_retention.py` verifying metric snapshots are persisted before event pruning and cumulative metrics remain accurate post-prune.
- [ ] T080 [P] Implement cancel run endpoint: `POST /api/translate/runs/{run_id}/cancel` in `backend/src/api/routes/translate.py`. Set `translation_status=cancelled`, mark in-progress batches as failed, do NOT submit INSERT SQL. Emit `run_cancelled` event. Inject `Depends(require_permission("translate.job.execute"))`.
- [ ] T081 [P] Implement download skipped rows endpoint: `GET /api/translate/runs/{run_id}/skipped.csv` returning CSV of rows skipped due to NULL keys or translation failures. Use `key_hash` for efficient lookup.
- [ ] T082 [P] Compute `key_hash` for TranslationRecord and TranslationPreviewRecord: `hash(canonical_json(key_values))` at creation time. Add `config_hash` for TranslationRun and TranslationPreviewSession: hash of effective config (columns, keys, target, prompt, dictionaries). Use for idempotency checks, new-key-only filtering, and stale preview detection.
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies — can start immediately
- **Foundational (Phase 2)**: Depends on Setup — BLOCKS all user stories
- **US1 (Phase 3)**: Depends on Foundational — no dependencies on other stories. **Recommended start after Foundational.**
- **US5 (Phase 4)**: Depends on Foundational — can run in **parallel with US1**. Dictionary filtering (T020) will be consumed later by US2/US3 but is self-contained.
- **US2 (Phase 5)**: Depends on US1 (needs saved job) + US5 (needs dictionary filtering). Can start integration once US1 backend is stable.
- **US3 (Phase 6)**: Depends on US1 (needs job config) + US2 (preview decisions feed executor). Sequential after US2.
- **US6 (Phase 7)**: Depends on US3 (needs run results) + US5 (needs dictionary). Can run in **parallel with US7** after US3.
- **US7 (Phase 8)**: Depends on US1 (needs job) + US3 (needs execution pipeline). Can run in **parallel with US6** after US3.
- **US4 (Phase 9)**: Depends on US3 (needs run records). Can run in **parallel with US6/US7** after US3.
- **Polish (Phase 10)**: Depends on all desired user stories being complete.
### Parallel Opportunities
| Phase | Parallel Tasks | Notes |
|-------|---------------|-------|
| 1 | — | Sequential (only 2 tasks) |
| 2 | T003 ∥ T004 | Models + Schemas in parallel |
| 3 (US1) | T009 ∥ T012 | Backend CRUD ∥ API client |
| 4 (US5) | T017 ∥ T021 | DictionaryManager ∥ API client |
| 5 (US2) | T030 ∥ T031 | API client ∥ Preview component |
| 6 (US3) | T036 ∥ T041 | SQLGenerator ∥ API client |
| 7 (US6) | T050 ∥ T051 ∥ T052 | API client ∥ Popup ∥ Sidebar |
| 8 (US7) | T061 ∥ T062 | API client ∥ ScheduleConfig |
| 9 (US4) | T069 ∥ T070 | API client ∥ History page |
| 10 | T074 ∥ T075 ∥ T076 | Retention, notifications, belief instrumentation |
### Cross-Story Parallelism
After Foundational (Phase 2):
- **US1 and US5** can proceed in parallel by different developers
- After US3 completes: **US6, US7, and US4** can proceed in parallel
---
## Implementation Strategy
### MVP First (US1 Only)
1. Phase 1 + Phase 2 → Foundation
2. Phase 3 (US1) → Job configuration CRUD
3. **STOP and VALIDATE**: User can create, list, edit, delete translation jobs
4. Deploy/demo — partial value (configuration ready, no translation yet)
### Minimum Viable Feature (US1 + US5 + US2 + US3)
1. Foundation → US1 + US5 (parallel) → US2 → US3
2. **STOP and VALIDATE**: End-to-end translation flow works: configure → preview → execute → INSERT
3. This is the core feature — all remaining stories add automation (US7), quality improvement (US6), and visibility (US4)
### Full Feature (All Stories)
1. MVP → US6 + US7 + US4 (parallel after US3) → Polish
2. Scheduled automation, feedback loop, and audit trail all functional
---
## Notes
- All file paths reference the actual repository structure (`backend/src/`, `frontend/src/`).
- `@COMPLEXITY 4/5` backend contracts require `belief_scope`/`reason`/`reflect` markers — verified in T076.
- `@RATIONALE`/`@REJECTED` tags appear only in C5 contracts (`TranslationOrchestrator`, `TranslationEventLog`) per INV_7.
- Rejected paths are explicitly protected by regression tests in T079.
- `[NEED_CONTEXT]` markers: none — all contract targets resolve to existing or planned modules within this feature.
- The existing `LLMProviderService`, `SupersetClient`, `SchedulerService`, `NotificationService`, and `TaskWebSocket` contracts are reused without modification.
- Quickstart.md (T078) serves as the human-verifiable acceptance test for the full feature.