Files
ss-tools/specs/028-llm-datasource-supeset/tasks.md
2026-05-08 18:01:49 +03:00

41 KiB

Tasks: LLM Table Translation Service

Feature Branch: 028-llm-datasource-supeset Input: Design documents from /specs/028-llm-datasource-supeset/ Prerequisites: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/modules.md

Tests: Test tasks are included for all C4/C5 backend contracts, new API endpoints, and Svelte components with @UX_STATE contracts. Test work traces to contract @PRE/@POST guarantees and spec acceptance scenarios.

Organization: Tasks are grouped by user story to enable independent implementation and testing of each story.

Format: [ID] [P?] [Story] Description

  • [P]: Can run in parallel (different files, no dependencies)
  • [Story]: Which user story this task belongs to (e.g., US1, US5)
  • Include exact file paths in descriptions

Phase 1: Setup (Shared Infrastructure)

Purpose: Create plugin directory structure and register the new route module in the lazy-import registry.

  • T001 Create translation plugin directory structure: backend/src/plugins/translate/__init__.py, backend/src/plugins/translate/plugin.py (empty skeleton), plus backend/src/plugins/translate/__tests__/__init__.py
  • T002 Register translate route module in backend/src/api/routes/__init__.py — add "translate" to __all__ list inside [DEF:Route_Group_Contracts:Block]

Phase 2: Foundational (Blocking Prerequisites)

Purpose: ORM models, Pydantic schemas, plugin boilerplate, route skeleton, and database migration. ALL user stories depend on these artifacts.

⚠️ CRITICAL: No user story work can begin until this phase is complete.

ORM Models

  • T003 [P] Create all SQLAlchemy ORM models in backend/src/models/translate.py: TranslationJob, TranslationRun, TranslationBatch, TranslationRecord, TranslationEvent, TranslationPreviewSession, TranslationPreviewRecord, TerminologyDictionary, DictionaryEntry, TranslationSchedule, TranslationJobDictionary, MetricSnapshot. Follow patterns from backend/src/models/llm.py (UUID PKs, generate_uuid, Base inheritance, JSON columns, UniqueConstraint, indexes, timezone-aware DateTime with callable defaults). Include source_term_normalized column on DictionaryEntry with unique constraint for case-insensitive matching.
  • T004 [P] Create Pydantic v2 request/response schemas in backend/src/schemas/translate.py: TranslateJobCreate, TranslateJobUpdate, TranslateJobResponse, DictionaryCreate, DictionaryImport, DictionaryResponse, TermCorrectionSubmit, ScheduleConfig, TranslationRunResponse, TranslationPreviewResponse (with PreviewRow), MetricsResponse. Follow existing backend/src/schemas/ patterns (use BaseModel, Field with defaults/validation)

Plugin Skeleton

  • T005 Create TranslatePlugin class in backend/src/plugins/translate/plugin.py inheriting from PluginBase. Implement id, name, description properties. Wire @RELATION INHERITS -> [PluginBase:Class] in contract header. (RATIONALE: separate plugin avoids bloating LLMAnalysisPlugin beyond fractal limit; REJECTED: extending LLMAnalysisPlugin would conflate domains)

Route Skeleton

  • T006 Create backend/src/api/routes/translate.py with FastAPI APIRouter (prefix=/api/translate, tags=["translate"]). Define all endpoint stubs with pass bodies for now: CRUD jobs, CRUD dictionaries, preview trigger, run trigger, retry, schedule CRUD, run history, metrics, correction submission, dictionary import. Attach Depends(require_permission(...)) annotations. Register router in backend/src/app.py alongside existing routers.

Database Migration

  • T007 Generate Alembic migration for all translate_* tables: translation_jobs, translation_runs, translation_records, translation_events, terminology_dictionaries, dictionary_entries, translation_schedules, translation_job_dictionaries. Run cd backend && alembic revision --autogenerate -m "add translation tables" and alembic upgrade head.

RBAC Registration

  • T008 Register 13 permission strings in the RBAC seed/permission store: translate.job.view, translate.job.create, translate.job.edit, translate.job.delete, translate.job.execute, translate.dictionary.view, translate.dictionary.create, translate.dictionary.edit, translate.dictionary.delete, translate.schedule.view, translate.schedule.manage, translate.history.view, translate.metrics.view. Ensure admin role gets all; analyst role gets translate.job.view, translate.job.execute, translate.dictionary.view, translate.history.view. Update role seeding script if needed.

Checkpoint: Foundation ready — models, schemas, plugin, routes, migration, and RBAC all in place. User story implementation can now begin.


Phase 3: User Story 1 — Configure Translation Job (Priority: P1) 🎯 MVP

Goal: User can create, edit, delete, and list translation jobs with datasource selection, column mapping, key columns, target table configuration, LLM settings, and dictionary attachment.

Independent Test: Open Configuration form → select Superset datasource → pick translation/context/key columns → specify target table → save → verify job appears in list with correct settings.

Backend — Job CRUD

  • T009 [P] [US1] Implement job CRUD service in backend/src/plugins/translate/plugin.py as methods on the TranslatePlugin class: create_job(), update_job(), delete_job(), get_job(), list_jobs(), duplicate_job(). Validate column existence via SupersetClient on create/update (FR-001, FR-002, FR-006). Enforce composite key support (FR-004). Detect virtual columns and warn (US1 acceptance scenario 5).
  • T010 [US1] Implement /api/translate/jobs endpoints in backend/src/api/routes/translate.py: POST / (create), GET / (list), GET /{job_id} (get), PUT /{job_id} (update), DELETE /{job_id} (delete), POST /{job_id}/duplicate (duplicate — FR-021). Inject Depends(require_permission("translate.job.*")) per operation.
  • T011 [US1] Implement /api/translate/datasources/{datasource_id}/columns endpoint that queries Superset for column metadata (name, type, is_physical flag) and the database dialect (backend/engine) from the connection configuration. Returns column list AND database_dialect field for the frontend. Cache dialect on TranslationJob.database_dialect at save time. Reject unsupported dialects at configuration time (FR-002, dialect detection).

Frontend — Job Config UI

  • T012 [P] [US1] Create TranslateApiClient module in frontend/src/lib/api/translate.js: fetchJobs(), createJob(), updateJob(), deleteJob(), duplicateJob(), fetchDatasourceColumns(). Use existing requestApi/fetchApi wrapper pattern.
  • T013 [US1] Create TranslationJobList SvelteKit page in frontend/src/routes/translate/+page.svelte: list all jobs with name, datasource, status/schedule indicators, create button, duplicate action. @UX_STATE: idle, loading, empty, populated, error.
  • T014 [US1] Create TranslationJobConfig SvelteKit page in frontend/src/routes/translate/[id]/+page.svelte: datasource dropdown → column selectors (translation column, context columns, key columns with [+ Add key] for composite), target table/column inputs, LLM provider selector, target language, batch size, prompt template editor, dictionary attachment (multi-select with priority ordering). @UX_STATE: idle, loading, configured, saving, validation_error, datasource_unavailable. @UX_REACTIVITY: column list $derived from datasource selection.

Verification — US1

  • T015 [US1] Write pytest integration tests for job CRUD API in backend/tests/test_translate_jobs.py: test create with valid config, create with missing translation column (expect 422), create with virtual key column (expect warning), update job, delete job, duplicate job. Mock SupersetClient for column metadata.
  • T016 [US1] Verify US1 acceptance scenarios against specs/028-llm-datasource-supeset/spec.md User Story 1 (5 scenarios). Run cd backend && pytest backend/tests/test_translate_jobs.py -v.

Checkpoint: Job CRUD fully functional — user can create, edit, list, and duplicate translation jobs with validated column mappings.


Phase 4: User Story 5 — Terminology Dictionary Management (Priority: P2)

Goal: User can create, edit, delete dictionaries; add terms inline; import CSV/TSV with duplicate detection; attach dictionaries to jobs with priority ordering.

Independent Test: Create dictionary with 5 terms → import CSV with 50 terms → verify duplicates flagged → attach dictionary to job → verify dictionary appears in job config.

Backend — Dictionary CRUD + Import

  • T017 [P] [US5] Implement DictionaryManager class in backend/src/plugins/translate/dictionary.py: create_dictionary(), update_dictionary(), delete_dictionary(), get_dictionary(), list_dictionaries(), add_entry(), edit_entry(), delete_entry(), clear_entries(). Enforce unique source_term per dictionary with conflict resolution (FR-026). Prevent deletion if attached to active/scheduled jobs (FR-030). @COMPLEXITY 4 — instrument with belief_scope/reason/reflect markers at mutation boundaries. (RATIONALE: C4 warranted because dictionary CRUD is stateful and must enforce referential integrity on deletion; REJECTED: pure C3 CRUD without state guards would allow orphaned job-dictionary links)
  • T018 [US5] Implement CSV/TSV import in DictionaryManager: parse uploaded content, detect delimiter, create DictionaryEntry rows, preview with duplicate detection, return parse errors with line numbers for malformed rows (FR-025). Add DictionaryImport schema validation.
  • T019 [US5] Implement /api/translate/dictionaries endpoints in backend/src/api/routes/translate.py: POST / (create), GET / (list), GET /{dict_id} (get with entries), PUT /{dict_id} (update), DELETE /{dict_id} (delete — blocked if attached), POST /{dict_id}/entries (add entry), PUT /{dict_id}/entries/{entry_id} (edit), DELETE /{dict_id}/entries/{entry_id} (delete), POST /{dict_id}/import (CSV/TSV import with preview).
  • T020 [US5] Implement per-batch dictionary filtering logic in DictionaryManager.filter_for_batch(source_texts: list[str]) -> list[dict]: scan batch texts for substrings matching dictionary source_term values; return matched entries in priority order across all attached dictionaries (FR-044). This is consumed by US2 (preview) and US3 (executor).

Frontend — Dictionary UI

  • T021 [P] [US5] Add dictionary API methods to frontend/src/lib/api/translate.js: fetchDictionaries(), createDictionary(), updateDictionary(), deleteDictionary(), fetchDictionaryEntries(), addEntry(), editEntry(), deleteEntry(), importDictionary().
  • T022 [US5] Create DictionaryList SvelteKit page in frontend/src/routes/translate/dictionaries/+page.svelte: list dictionaries with name, language, term count, attached job count, create/delete actions. @UX_STATE: idle, loading, empty, populated, delete_blocked.
  • T023 [US5] Create DictionaryEditor SvelteKit page in frontend/src/routes/translate/dictionaries/[id]/+page.svelte: inline term editor (source_term → target_translation), add/delete rows, CSV/TSV import with conflict preview, export. @UX_STATE: idle, loading, editing, importing, import_preview, import_conflict, saving. @UX_FEEDBACK: import preview with duplicate flags; toast on save.

Verification — US5

  • T024 [US5] Write pytest tests for DictionaryManager in backend/src/plugins/translate/__tests__/test_dictionary.py: test create/update/delete, add entry with duplicate detection (expect conflict), import CSV with valid/invalid rows, delete dictionary blocked by active job, per-batch filtering returns matched terms.
  • T025 [US5] Verify US5 acceptance scenarios against spec User Story 5 (6 scenarios). Run cd backend && pytest backend/src/plugins/translate/__tests__/test_dictionary.py -v.

Checkpoint: Dictionary management fully functional — CRUD, import, filtering, and job attachment all work.


Phase 5: User Story 2 — Preview Translated Output (Priority: P2)

Goal: User triggers preview on a saved job → system fetches sample rows → sends to LLM with context + dictionary → displays source/context/translation side-by-side → user approves/edits/rejects → preview state saved for execution gate.

Independent Test: Create job + dictionary → click Preview → verify 10 rows shown with LLM translations → approve 8, edit 1, reject 1 → verify state preserved.

Backend — Preview Engine

  • T026 [US2] Implement TranslationPreview class in backend/src/plugins/translate/preview.py: preview_rows(job_id, sample_size). Fetch source rows from Superset via SupersetClient; construct LLM prompt using LLMProviderService + llm_prompt_templates.render_prompt() + DictionaryManager.filter_for_batch(); call LLM; return PreviewRow list. @COMPLEXITY 4 — instrument with belief_scope/reason/reflect at LLM call boundaries. (RATIONALE: C4 because preview is stateful (approve/edit/reject lifecycle) and calls external LLM API with side effects; REJECTED: making preview purely read-only without approval state would degrade UX by losing user decisions between preview and execution)
  • T027 [US2] Implement token count and cost estimation in preview response: compute estimated tokens from sample → extrapolate to full dataset row count → apply provider pricing → return estimated_total_rows, estimated_tokens, estimated_cost in TranslationPreviewResponse (FR-014).
  • T028 [US2] Implement preview quality gate: create persistent TranslationPreviewSession and TranslationPreviewRecord rows with config_hash and dict_snapshot_hash. Preview acceptance gates full execution; rejected preview sample rows are excluded from full run. Preview is a quality gate — unseen rows are processed normally in full run.
  • T029 [US2] Implement /api/translate/jobs/{job_id}/preview endpoint: POST triggers preview, returns preview rows with status=pending. Add PUT /api/translate/jobs/{job_id}/preview/rows/{row_key} for approve/edit/reject actions. Add POST /api/translate/jobs/{job_id}/preview/approve-all for bulk approve.

Frontend — Preview UI

  • T030 [P] [US2] Add preview API methods to frontend/src/lib/api/translate.js: fetchPreview(), approveRow(), editRow(), rejectRow(), approveAll().
  • T031 [US2] Create TranslationPreview component in frontend/src/lib/components/translate/TranslationPreview.svelte: side-by-side table (source, context, LLM translation), approve/edit/reject buttons per row, bulk approve, cost estimate card before full run, row limit input. @UX_STATE: idle, loading, preview_loaded, preview_error, retrying. @UX_FEEDBACK: spinner during LLM call; visual distinction for LLM-generated vs user-edited values; cost estimate reactivity. @UX_RECOVERY: retry preview button; individual row re-translate.
  • T032 [US2] Integrate TranslationPreview into TranslationJobConfig page (frontend/src/routes/translate/[id]/+page.svelte) as a tab or collapsible section that appears after job is saved.

Verification — US2

  • T033 [US2] Write pytest tests for preview in backend/src/plugins/translate/__tests__/test_preview.py: test preview with valid job, preview with dictionary (verify glossary terms in prompt), preview row approve/edit/reject state transitions, cost estimation accuracy. Mock LLM provider responses.
  • T034 [US2] Write vitest component test for TranslationPreview in frontend/src/lib/components/translate/__tests__/TranslationPreview.test.js: test rendering of preview rows, approve/reject/edit interactions, bulk approve behavior. Mock API client.
  • T035 [US2] Verify US2 acceptance scenarios against spec User Story 2 (5 scenarios). Run cd backend && pytest backend/src/plugins/translate/__tests__/test_preview.py -v && cd frontend && npm run test -- --run.

Checkpoint: Preview flows complete — LLM translation with context + dictionary, approve/edit/reject lifecycle, cost estimation.


Phase 6: User Story 3 — Execute Translation & Insert Results (Priority: P3)

Goal: User triggers full batch execution → system processes rows in batches → generates INSERT SQL → user copies to SQL Lab or auto-executes → failed batches retryable.

Independent Test: Create job → preview + approve → execute → verify INSERT SQL generated with correct key columns → execute in SQL Lab → verify rows in target table.

Backend — Executor + SQL Generator + Orchestrator

  • T036 [US3] Implement SQLGenerator class in backend/src/plugins/translate/sql_generator.py: generate_insert(records: list[TranslationRecord], job: TranslationJob) -> str. Detect dialect from job.database_dialect (cached from Superset connection at save time). Produce safe dialect-appropriate SQL: for PostgreSQL/Greenplum — INSERT INTO "target_table" ("key_cols"..., "target_col") VALUES (...) with quoted identifiers; support upsert_strategy: insert (plain INSERT), skip_existing (ON CONFLICT DO NOTHING), overwrite (ON CONFLICT DO UPDATE). For ClickHouse — INSERT INTO target_table (key_cols..., target_col) VALUES (...); skip_existing warns user (not natively supported); overwrite documented limitation. @COMPLEXITY 3. (RATIONALE: dialect-aware because Superset connections may use ClickHouse or PostgreSQL; REJECTED: PostgreSQL-only would break ClickHouse users; raw identifier interpolation rejected)
  • T037 [US3] Implement TranslationExecutor class in backend/src/plugins/translate/executor.py: execute_run(run: TranslationRun, job: TranslationJob). Fetch all source rows from Superset; split into batches; for each batch: call DictionaryManager.filter_for_batch(), construct prompt via LLMProviderService, call LLM, create TranslationRecord rows with status translated/failed/skipped; handle batch-level retry on LLM failure (FR-015); skip NULL translation values (FR-016); reject NULL key values (FR-017); update run statistics. @COMPLEXITY 4 — instrument with belief_scope/reason/reflect at batch boundaries and error paths.
  • T038 [US3] Implement TranslationOrchestrator class in backend/src/plugins/translate/orchestrator.py: start_run(job_id, trigger_type). Validate preconditions (job config valid, datasource accessible, LLM provider reachable); create TranslationRun with status running and config/dict snapshots (FR-019, FR-029); dispatch to TranslationExecutor; on completion call SQLGenerator; record TranslationEvent rows via TranslationEventLog (FR-046); enforce state transitions: pending → running → (completed | partial | failed) — no skipping. @COMPLEXITY 5 — full @PRE/@POST/@DATA_CONTRACT/@INVARIANT enforcement with @RATIONALE/@REJECTED. (RATIONALE: central coordinator is C5 because preview, execution, event logging, and retry share run state and must coordinate within a single transaction boundary; REJECTED: distributed actor model would introduce eventual-consistency challenges for status tracking at current scale)
  • T039 [US3] Implement TranslationEventLog class in backend/src/plugins/translate/events.py: log_event(run_id, job_id, event_type, payload). Create immutable TranslationEvent row. query_events(job_id, filters) for audit/dashboard. prune_expired() for 90-day retention enforcement (FR-049) — scheduled via APScheduler cleanup job. @COMPLEXITY 5@INVARIANT: every run must have exactly one run_started and one terminal event. (RATIONALE: C5 warranted because event log is single source of truth for observability, metrics, and audit; REJECTED: stdout-only logging lacks structured payload integrity and cannot enforce terminal-event invariant)
  • T040 [US3] Implement execution endpoints in backend/src/api/routes/translate.py: POST /api/translate/jobs/{job_id}/runs (trigger manual run — creates run, dispatches orchestrator which translates AND submits to Superset API), GET /api/translate/runs/{run_id} (status + statistics + insert_status + superset_query_id), GET /api/translate/runs/{run_id}/records (paginated TranslationRecord list), POST /api/translate/runs/{run_id}/retry (retry failed batches only — FR-015), POST /api/translate/runs/{run_id}/retry-insert (retry Superset insert only without re-translating). Inject Depends(require_permission("translate.job.execute")).

Frontend — Execution UI

  • T041 [P] [US3] Add execution API methods to frontend/src/lib/api/translate.js: triggerRun(), fetchRunStatus(), fetchRunRecords(), retryFailedBatches().
  • T042 [US3] Create TranslationRunProgress component in frontend/src/lib/components/translate/TranslationRunProgress.svelte: live progress bar (WebSocket-driven from TaskWebSocket), batch counter (N/M), success/failure/skip counts, cancel button. @UX_STATE: idle, running, pausing, cancelled, completed, partial, failed. @UX_FEEDBACK: progress percentage $derived from translated/total; real-time counts. @UX_RECOVERY: retry failed batches button; cancel run; download skipped rows.
  • T043 [US3] Create TranslationRunResult component in frontend/src/lib/components/translate/TranslationRunResult.svelte: completion summary (rows translated/failed/skipped, token count, cost, insert_status), Superset execution reference with status badge, generated SQL block for audit/debugging (collapsed by default), retry-insert button. @UX_STATE: completed, partial, failed, insert_failed. @UX_FEEDBACK: Superset execution status badge; SQL block for audit.
  • T044 [US3] Integrate TranslationRunProgress and TranslationRunResult into TranslationJobConfig page as the "Run" tab/section.

Verification — US3

  • T045 [US3] Write pytest tests for SQLGenerator in backend/src/plugins/translate/__tests__/test_sql_generator.py: test INSERT with single key, composite key — for PostgreSQL dialect AND ClickHouse dialect. Test PostgreSQL UPSERT (ON CONFLICT DO NOTHING, ON CONFLICT DO UPDATE). Test ClickHouse plain INSERT and skip_existing warning. Test NULL key rejection, NULL translation value skipping, identifier quoting per dialect, injection safety. Validate SQL syntax correctness against each dialect.
  • T046 [US3] Write pytest tests for executor + orchestrator in backend/src/plugins/translate/__tests__/test_orchestrator.py: test full run lifecycle (pending→running→completed), partial failure (one batch fails, rest succeed), batch retry, event log invariants, NULL handling. Mock LLM provider and SupersetClient.
  • T047 [US3] Verify US3 acceptance scenarios against spec User Story 3 (5 scenarios). Run cd backend && pytest backend/src/plugins/translate/__tests__/test_orchestrator.py backend/src/plugins/translate/__tests__/test_sql_generator.py -v.

Checkpoint: Execution pipeline complete — batch processing, INSERT generation, retry, event logging. User can translate data and insert into target table.


Phase 7: User Story 6 — Feedback Loop (Correct → Dictionary) (Priority: P3)

Goal: In run results, user selects incorrect translation → submits correction to dictionary → dictionary updated with origin tracking → next run uses corrected term.

Independent Test: Complete a run → find incorrect translation → open correction popup → submit to dictionary → re-run preview → verify corrected term used.

Backend — Correction Submission

  • T048 [US6] Implement correction submission endpoint in backend/src/api/routes/translate.py: POST /api/translate/corrections accepting TermCorrectionSubmit body. Validate target language match between dictionary and job (FR language validation edge case); detect existing entry conflict → return conflict response (FR-032); create DictionaryEntry with origin tracking (origin_run_id, origin_row_key, origin_user_id) per FR-033. Inject Depends(require_permission("translate.dictionary.edit")).
  • T049 [US6] Implement bulk correction endpoint: POST /api/translate/corrections/bulk accepting array of TermCorrectionSubmit objects (FR-034). Process atomically — if any conflict is detected, return all conflicts for user resolution before partial apply.

Frontend — Correction UI

  • T050 [P] [US6] Add correction API methods to frontend/src/lib/api/translate.js: submitCorrection(), submitBulkCorrections().
  • T051 [US6] Create TermCorrectionPopup component in frontend/src/lib/components/translate/TermCorrectionPopup.svelte: text selection on source term and incorrect target translation → popup with source term (pre-filled from source column), incorrect target translation (pre-filled from selection), corrected target translation input, dictionary selector dropdown (filtered by target language), submit button, conflict dialog (overwrite/keep existing/cancel). @UX_STATE: closed, selecting, editing, submitting, conflict_detected, submitted. @UX_FEEDBACK: "Added to Dictionary" badge on corrected row.
  • T052 [US6] Create BulkCorrectionSidebar component in frontend/src/lib/components/translate/BulkCorrectionSidebar.svelte: sidebar collecting selected terms across rows, per-term correction inputs, submit all to dictionary. @UX_STATE: closed, collecting, reviewing, submitting, submitted. @UX_REACTIVITY: selected terms list $state.
  • T053 [US6] Integrate feedback-loop components into TranslationRunResult (T043) — add selection highlight behavior and correction triggers.

Verification — US6

  • T054 [US6] Write pytest tests for correction endpoints in backend/tests/test_translate_corrections.py: test single correction, bulk correction, conflict detection (existing term), cross-language rejection, origin tracking fields populated. Verify corrected term appears in next preview's dictionary filter.
  • T055 [US6] Verify US6 acceptance scenarios against spec User Story 6 (5 scenarios). Run cd backend && pytest backend/tests/test_translate_corrections.py -v.

Checkpoint: Feedback loop complete — corrections flow from results → dictionary → next run.


Phase 8: User Story 7 — Schedule Translation Jobs (Priority: P3)

Goal: User configures schedule → system triggers runs → new-key-only translation → optional auto-INSERT → failure notification → pause/resume.

Independent Test: Configure schedule (every 5 min for test) → wait for trigger → verify new TranslationRun created → verify only new keys translated → disable schedule → verify no more triggers.

Backend — Schedule Management + Trigger Dispatch

  • T056 [US7] Implement TranslationScheduler class in backend/src/plugins/translate/scheduler.py: create_schedule(), update_schedule(), delete_schedule(), enable_schedule(), disable_schedule(), get_next_executions(schedule, n=3) (FR-036). Register schedule with existing SchedulerService via add_job() with cron/interval/date trigger. @COMPLEXITY 4 — instrument with belief_scope/reason/reflect. (RATIONALE: C4 because schedule management is stateful with APScheduler integration, concurrency policy enforcement, and trigger dispatch side effects)
  • T057 [US7] Implement schedule trigger handler: _execute_scheduled_translation(job_id). Enforce concurrency policy: check if previous run for same job is still runningskip (log + event) or queue (start after previous completes) per FR-039. If proceeding: create new TranslationRun with trigger_type=scheduled; fetch source rows; apply new-key-only filter (FR-045) — compare current key values against previous successful run's key values; dispatch to TranslationOrchestrator. On failure, send notification via NotificationService (FR-041, FR-048). Schedule remains enabled for next trigger (US7 acceptance scenario 6).
  • T058 [US7] Implement Superset SQL Lab API submission for all runs: create SupersetSqlLabExecutor class in backend/src/plugins/translate/superset_executor.py. Submit generated SQL to /api/v1/sqllab/execute/, poll execution status, update TranslationRun.insert_status, superset_query_id, rows_affected, error fields. For scheduled runs, this happens automatically; for manual runs, this happens on user trigger. Record insert_submitted/insert_succeeded/insert_failed events.
  • T059 [US7] Implement schedule endpoints in backend/src/api/routes/translate.py: PUT /api/translate/jobs/{job_id}/schedule (create/update), DELETE /api/translate/jobs/{job_id}/schedule (remove), POST /api/translate/jobs/{job_id}/schedule/enable (FR-040), POST /api/translate/jobs/{job_id}/schedule/disable (FR-040). Inject Depends(require_permission("translate.schedule.manage")). Add schedule warning when editing job with active schedule (FR-042).
  • T060 [US7] Extend SchedulerService.load_schedules() in backend/src/core/scheduler.py to discover and register active TranslationSchedule rows alongside existing backup schedules (R4).

Frontend — Schedule UI

  • T061 [P] [US7] Add schedule API methods to frontend/src/lib/api/translate.js: updateSchedule(), deleteSchedule(), enableSchedule(), disableSchedule().
  • T062 [US7] Create ScheduleConfig component in frontend/src/lib/components/translate/ScheduleConfig.svelte: type selector (cron/interval/once), cron expression input with validation, interval input, timezone selector, run-at datetime picker, next-3-executions preview (with timezone), concurrency policy selector (skip/queue), enable/disable toggle with status indicator. Warns if no prior successful manual run exists. @UX_STATE: idle, editing, validating, enabled, disabled, no_prior_run_warning. @UX_REACTIVITY: next execution times $derived from schedule config with timezone display.
  • T063 [US7] Integrate ScheduleConfig into TranslationJobConfig page as the "Schedule" tab.

Verification — US7

  • T064 [US7] Write pytest tests for scheduler in backend/src/plugins/translate/__tests__/test_scheduler.py: test schedule CRUD, cron expression validation, next-N-executions calculation, trigger dispatch with skip/queue concurrency, new-key-only filter (verify only unseen keys processed), auto-INSERT execution, failure notification, pause/resume, load on SchedulerService start.
  • T065 [US7] Verify US7 acceptance scenarios against spec User Story 7 (8 scenarios). Run cd backend && pytest backend/src/plugins/translate/__tests__/test_scheduler.py -v.

Checkpoint: Scheduling complete — jobs can run automatically on schedule with new-key-only incremental translation and failure recovery.


Phase 9: User Story 4 — Translation History & Audit Trail (Priority: P4)

Goal: User views past runs with filterable list; inspects run details (config snapshot, prompt, translations, INSERT SQL); sees edit marks; duplicates job. Admin views metrics dashboard.

Independent Test: Run several translations → open history → filter by datasource → click run → verify config snapshot, prompt, translations with edit marks, INSERT SQL all shown.

Backend — History + Metrics Endpoints

  • T066 [US4] Implement history endpoints in backend/src/api/routes/translate.py: GET /api/translate/runs (list with filters: job_id, datasource_id, target_table, status, date_from, date_to, pagination per FR-020), GET /api/translate/runs/{run_id} (detail with config_snapshot, prompt_used, records with llm_translation and user_edit fields visible — FR showing original vs user-edited).
  • T067 [US4] Implement TranslationMetrics class in backend/src/plugins/translate/metrics.py: get_job_metrics(job_id) -> MetricsResponse. Aggregate from TranslationEvent table: total runs, success/failure counts, cumulative tokens, cumulative cost, average batch latency (FR-047). @COMPLEXITY 3.
  • T068 [US4] Implement metrics endpoint: GET /api/translate/jobs/{job_id}/metrics. Inject Depends(require_permission("translate.history.view")).

Frontend — History + Metrics UI

  • T069 [P] [US4] Add history API methods to frontend/src/lib/api/translate.js: fetchRunHistory(), fetchRunDetail(), fetchJobMetrics().
  • T070 [US4] Create TranslationHistory SvelteKit page in frontend/src/routes/translate/history/+page.svelte: filterable table (datasource, target table, row count, status, date, user), click-to-expand detail with config snapshot, prompt, translation rows with edit marks, INSERT SQL. @UX_STATE: idle, loading, empty, populated, detail_open. @UX_REACTIVITY: filtered list $derived from filters.
  • T071 [US4] Create admin metrics dashboard section (integrated into existing admin pages or standalone) displaying per-job metrics: run counts, success/failure ratio, cumulative tokens, cumulative cost, average latency. Use MetricsResponse schema.

Verification — US4

  • T072 [US4] Write pytest tests for history + metrics in backend/tests/test_translate_history.py: test run list with filters, run detail with snapshots, metrics aggregation accuracy, TranslationEvent queryability.
  • T073 [US4] Verify US4 acceptance scenarios against spec User Story 4 (4 scenarios). Run cd backend && pytest backend/tests/test_translate_history.py -v.

Checkpoint: History and audit complete — all runs traceable, metrics dashboard populated.


Phase 10: Polish & Cross-Cutting Concerns

Purpose: Retention enforcement, notification wiring, semantic audit, quickstart validation, and rejected-path regression protection.

  • T074 [P] Implement 90-day retention pruning in TranslationEventLog.prune_expired(): run as APScheduler daily cleanup job. BEFORE pruning events/records: persist cumulative metrics as MetricSnapshot row (tokens, cost, run counts). Then prune TranslationRecord, TranslationPreviewRecord, TranslationEvent, and insert_sql/config_snapshot fields older than 90 days. Preserve TranslationRun metadata, MetricSnapshot rows, and superset_query_id. Verify metrics remain accurate post-prune (SC-014). (RATIONALE: metric snapshots prevent cumulative data loss from event pruning; REJECTED: indefinite retention would violate storage constraints)
  • T075 [P] Wire scheduled-run failure notification: ensure TranslationScheduler trigger handler calls NotificationService.send() when a scheduled run fails (FR-041, FR-048). Test with mock notification provider.
  • T076 [P] Instrument remaining C4/C5 Python flows with belief_scope/reason/reflect/explore markers where missing: TranslationOrchestrator.start_run() (entry/exit), TranslationExecutor.execute_run() (batch boundaries + error paths), DictionaryManager mutation boundaries, TranslationScheduler trigger dispatch. Verify via axiom_semantic_validation belief-runtime audit.
  • T077 Run full semantic audit via axiom MCP tools:
    • axiom_semantic_validation audit_contracts --file_path backend/src/plugins/translate/ — verify all [DEF] anchors are closed, @RELATION targets resolve, no orphan contracts, C4+ contracts have required tag density
    • axiom_semantic_validation audit_belief_protocol --file_path backend/src/plugins/translate/ — verify @RATIONALE/@REJECTED present on all C5 contracts
    • axiom_semantic_validation audit_belief_runtime --file_path backend/src/plugins/translate/ — verify belief_scope/reason/reflect/explore markers exist in all C4+ module bodies
    • axiom_semantic_validation impact_analysis --contract_id TranslationOrchestrator:Class — verify no rejected path is accidentally re-enabled
  • T078 Run quickstart validation: follow specs/028-llm-datasource-supeset/quickstart.md end-to-end — create dictionary → create job → preview → execute → verify INSERT SQL → submit correction → schedule → view history → verify metrics. Run cd backend && pytest -v, cd frontend && npm run test -- --run, cd backend && ruff check src/plugins/translate/ src/api/routes/translate.py src/models/translate.py src/schemas/translate.py.
  • T079 Rejected-path regression guard: add a test case in backend/src/plugins/translate/__tests__/test_orchestrator.py verifying snapshot isolation — changing job config mid-run does NOT invalidate the running TranslationRun. Add a test case in backend/src/plugins/translate/__tests__/test_sql_generator.py verifying that UPDATE statements are never generated (only INSERT/UPSERT per PostgreSQL dialect). Add a test case in backend/src/plugins/translate/__tests__/test_dictionary.py verifying that duplicate source_term entries cannot coexist (UniqueConstraint enforced) and conflict resolution only offers overwrite/keep-existing. Add a test case in backend/src/plugins/translate/__tests__/test_retention.py verifying metric snapshots are persisted before event pruning and cumulative metrics remain accurate post-prune.
  • T080 [P] Implement cancel run endpoint: POST /api/translate/runs/{run_id}/cancel in backend/src/api/routes/translate.py. Set translation_status=cancelled, mark in-progress batches as failed, do NOT submit INSERT SQL. Emit run_cancelled event. Inject Depends(require_permission("translate.job.execute")).
  • T081 [P] Implement download skipped rows endpoint: GET /api/translate/runs/{run_id}/skipped.csv returning CSV of rows skipped due to NULL keys or translation failures. Use key_hash for efficient lookup.
  • T082 [P] Compute key_hash for TranslationRecord and TranslationPreviewRecord: hash(canonical_json(key_values)) at creation time. Add config_hash for TranslationRun and TranslationPreviewSession: hash of effective config (columns, keys, target, prompt, dictionaries). Use for idempotency checks, new-key-only filtering, and stale preview detection.

Dependencies & Execution Order

Phase Dependencies

  • Setup (Phase 1): No dependencies — can start immediately
  • Foundational (Phase 2): Depends on Setup — BLOCKS all user stories
  • US1 (Phase 3): Depends on Foundational — no dependencies on other stories. Recommended start after Foundational.
  • US5 (Phase 4): Depends on Foundational — can run in parallel with US1. Dictionary filtering (T020) will be consumed later by US2/US3 but is self-contained.
  • US2 (Phase 5): Depends on US1 (needs saved job) + US5 (needs dictionary filtering). Can start integration once US1 backend is stable.
  • US3 (Phase 6): Depends on US1 (needs job config) + US2 (preview decisions feed executor). Sequential after US2.
  • US6 (Phase 7): Depends on US3 (needs run results) + US5 (needs dictionary). Can run in parallel with US7 after US3.
  • US7 (Phase 8): Depends on US1 (needs job) + US3 (needs execution pipeline). Can run in parallel with US6 after US3.
  • US4 (Phase 9): Depends on US3 (needs run records). Can run in parallel with US6/US7 after US3.
  • Polish (Phase 10): Depends on all desired user stories being complete.

Parallel Opportunities

Phase Parallel Tasks Notes
1 Sequential (only 2 tasks)
2 T003 ∥ T004 Models + Schemas in parallel
3 (US1) T009 ∥ T012 Backend CRUD ∥ API client
4 (US5) T017 ∥ T021 DictionaryManager ∥ API client
5 (US2) T030 ∥ T031 API client ∥ Preview component
6 (US3) T036 ∥ T041 SQLGenerator ∥ API client
7 (US6) T050 ∥ T051 ∥ T052 API client ∥ Popup ∥ Sidebar
8 (US7) T061 ∥ T062 API client ∥ ScheduleConfig
9 (US4) T069 ∥ T070 API client ∥ History page
10 T074 ∥ T075 ∥ T076 Retention, notifications, belief instrumentation

Cross-Story Parallelism

After Foundational (Phase 2):

  • US1 and US5 can proceed in parallel by different developers
  • After US3 completes: US6, US7, and US4 can proceed in parallel

Implementation Strategy

MVP First (US1 Only)

  1. Phase 1 + Phase 2 → Foundation
  2. Phase 3 (US1) → Job configuration CRUD
  3. STOP and VALIDATE: User can create, list, edit, delete translation jobs
  4. Deploy/demo — partial value (configuration ready, no translation yet)

Minimum Viable Feature (US1 + US5 + US2 + US3)

  1. Foundation → US1 + US5 (parallel) → US2 → US3
  2. STOP and VALIDATE: End-to-end translation flow works: configure → preview → execute → INSERT
  3. This is the core feature — all remaining stories add automation (US7), quality improvement (US6), and visibility (US4)

Full Feature (All Stories)

  1. MVP → US6 + US7 + US4 (parallel after US3) → Polish
  2. Scheduled automation, feedback loop, and audit trail all functional

Notes

  • All file paths reference the actual repository structure (backend/src/, frontend/src/).
  • @COMPLEXITY 4/5 backend contracts require belief_scope/reason/reflect markers — verified in T076.
  • @RATIONALE/@REJECTED tags appear only in C5 contracts (TranslationOrchestrator, TranslationEventLog) per INV_7.
  • Rejected paths are explicitly protected by regression tests in T079.
  • [NEED_CONTEXT] markers: none — all contract targets resolve to existing or planned modules within this feature.
  • The existing LLMProviderService, SupersetClient, SchedulerService, NotificationService, and TaskWebSocket contracts are reused without modification.
  • Quickstart.md (T078) serves as the human-verifiable acceptance test for the full feature.