# Quickstart: LLM Table Translation Service **Feature Branch**: `028-llm-datasource-supeset` **Date**: 2026-05-08 ## Prerequisites - Running ss-tools instance (Docker Compose or local) - Superset connection configured in ss-tools settings - At least one LLM provider configured (Settings → LLM) - Target insertable PostgreSQL physical table exists in Superset with compatible schema - User has appropriate RBAC permissions (admin by default) ## 1. Start the Application ```bash # Docker (recommended) cd /home/busya/dev/ss-tools docker compose up --build # Or local development # Terminal 1 — Backend cd backend source .venv/bin/activate python -m uvicorn src.app:app --reload --port 8001 # Terminal 2 — Frontend cd frontend npm run dev -- --port 5173 ``` - Frontend: http://localhost:5173 - Backend API: http://localhost:8001 - API Docs: http://localhost:8001/docs ## 2. Create a Terminology Dictionary ### Via UI 1. Navigate to http://localhost:5173/translate/dictionaries 2. Click **[+ New Dictionary]** 3. Enter name: `Product Terms`, language: `ru` 4. Add entries inline or click **[Import CSV]** 5. Save ### Via API ```bash curl -X POST http://localhost:8001/api/translate/dictionaries \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "name": "Product Terms", "target_language": "ru", "entries": [ {"source_term": "invoice", "target_term": "накладная"}, {"source_term": "widget", "target_term": "виджет"}, {"source_term": "backorder", "target_term": "предзаказ"} ] }' ``` **Expected**: 201 Created with dictionary ID and entry count = 3. ## 3. Create a Translation Job ### Via UI 1. Navigate to http://localhost:5173/translate 2. Click **[+ New Translation Job]** 3. Select Superset datasource → columns auto-populate 4. Set: - Translation column: `product_name` - Context columns: `category_name`, `product_description` - Key columns: `product_id` - Target table: `products_i18n` - Target column: `translated_name` - Target language: `Russian` - Attach dictionary: `Product Terms` 5. Click **[Save & Preview]** ### Via API ```bash curl -X POST http://localhost:8001/api/translate/jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "name": "Products RU Translation", "datasource_id": "", "source_table": "products", "translation_col": "product_name", "context_cols": ["category_name", "product_description"], "source_key_cols": ["product_id"], "target_key_cols": ["product_id"], "target_table": "products_i18n", "target_col": "translated_name", "target_language": "ru", "batch_size": 50, "dictionary_ids": [""] }' ``` **Expected**: 201 Created with job ID. Validation passes (columns exist, target table accessible). **Error case**: 422 if translation column is empty; 400 if target table not found. ## 4. Preview Translations ### Via UI 1. Open the saved job → click **[Preview]** 2. System shows ~10 rows with source, context, and LLM translation 3. Approve good translations, edit or reject bad ones 4. Click **[Approve All]** or handle individually ### Via API ```bash curl -X POST http://localhost:8001/api/translate/jobs//preview \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{"sample_size": 10}' ``` **Expected**: 200 with array of PreviewRow objects (source_text, context, llm_translation, status=pending). **Error case**: 503 if LLM provider unreachable; error message includes provider name and reason. ## 5. Execute Full Translation Run ### Via UI 1. After preview approval, click **[Start Full Run]** 2. Confirm cost estimate dialog 3. Watch live progress bar (WebSocket-driven) 4. On completion: view run summary with translation status, insert status, Superset query reference, and generated SQL (audit). ### Via API ```bash curl -X POST http://localhost:8001/api/translate/jobs//runs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{"upsert_strategy": "insert"}' ``` **Expected**: 202 Accepted with run ID. WebSocket messages stream progress. Final GET returns run with `status=completed`, `translated_rows=N`, `insert_sql=`. **Partial failure**: `status=partial`, `failed_rows>0`. **[Retry Failed]** available. ## 6. Execute INSERT through Superset SQL Lab API ### Via UI 1. After translation completes, the system automatically submits SQL to Superset 2. Progress indicator shows: «📤 Submitting to Superset...» 3. On success: «✅ Insert succeeded · 1,241 rows affected · Query #a7f3b2c» 4. Click **[View SQL]** to audit the generated statement ### Via API ```bash # Trigger full run (backend handles Superset submission automatically) curl -X POST http://localhost:8001/api/translate/jobs//runs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{"upsert_strategy": "insert"}' # Check run status (includes insert_status and superset_query_id) curl http://localhost:8001/api/translate/runs/ \ -H "Authorization: Bearer " ``` **Expected**: Run response includes `insert_status: "succeeded"`, `superset_query_id`, `rows_affected`. **Insert failure**: `insert_status: "failed"`, `insert_error_message` populated. **[Retry Insert]** re-submits without re-translating. ### Verify in Target Table ```sql -- Run directly in Superset SQL Lab to verify SELECT * FROM products_i18n WHERE translated_name IS NOT NULL; ``` ## 7. Feedback Loop — Correct a Translation ### Via UI 1. Open run results → find a mistranslated word 2. Highlight the word → **[Correct this term]** popup 3. Enter correction → select dictionary → submit 4. Re-run preview to verify correction is used ### Via API ```bash curl -X POST http://localhost:8001/api/translate/corrections \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "record_id": "", "source_term": "Monitor Stand", "source_term": "Monitor Stand", "incorrect_target_term": "Мониторная стойка", "corrected_target_term": "Подставка для монитора", "dictionary_id": "" }' ``` **Expected**: 201. Term pair added to dictionary. Conflict dialog if term already exists. ## 8. Configure Schedule ### Via UI 1. Open job → **Schedule** tab 2. Set type: Cron → `0 6 * * 1` (every Monday 06:00) 3. Toggle auto-INSERT: ON 4. Verify next 3 execution times 5. Enable schedule ### Via API ```bash curl -X PUT http://localhost:8001/api/translate/jobs//schedule \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "schedule_type": "cron", "cron_expression": "0 6 * * 1", "timezone": "Europe/Moscow", "concurrency": "skip" }' ``` **Expected**: 200 with schedule config including `next_run_at`. **Verify**: Check APScheduler jobs (backend log) or wait for next trigger and check run history. ## 9. View History and Metrics ### Via UI 1. Navigate to http://localhost:5173/translate/history 2. Filter by datasource, target table, or date range 3. Click a run for details: config snapshot, prompt, translations, INSERT SQL ### Via API ```bash # List runs curl http://localhost:8001/api/translate/runs?job_id= \ -H "Authorization: Bearer " # Get metrics curl http://localhost:8001/api/translate/jobs//metrics \ -H "Authorization: Bearer " ``` **Expected**: Run list with status and row counts. Metrics with cumulative tokens and cost. ## 10. Verification Checklist ### Backend Tests ```bash cd backend source .venv/bin/activate # Unit tests for translation plugin pytest src/plugins/translate/__tests__/ -v # Integration tests for translate API pytest tests/test_translate_api.py -v # All backend tests pytest -v ``` ### Frontend Tests ```bash cd frontend npm run test -- --run ``` ### Linting ```bash # Python cd backend && ruff check src/plugins/translate/ src/api/routes/translate.py src/models/translate.py src/schemas/translate.py # Svelte cd frontend && npm run build # build includes type checking ``` ### Manual Smoke Test 1. Create dictionary with 3 terms → verify in list 2. Import CSV with 50 terms → verify no duplicates (check conflict dialog) 3. Create job → verify column list populates from datasource 4. Preview with empty dictionary → verify LLM still translates 5. Preview with attached dictionary → verify glossary terms used (check `invoice` → `накладная`) 6. Full run with 50 rows → verify INSERT SQL has 50 VALUES tuples 7. Scheduled run (set to every 5 min for test) → verify run appears in history 8. Feedback loop: correct 1 term → re-preview → verify correction reflected 9. Delete dictionary attached to active job → verify blocked 10. Check metrics dashboard → verify run counts and token totals