8.8 KiB
Quickstart: LLM Table Translation Service
Feature Branch: 028-llm-datasource-supeset
Date: 2026-05-08
Prerequisites
- Running ss-tools instance (Docker Compose or local)
- Superset connection configured in ss-tools settings
- At least one LLM provider configured (Settings → LLM)
- Target insertable PostgreSQL physical table exists in Superset with compatible schema
- User has appropriate RBAC permissions (admin by default)
1. Start the Application
# Docker (recommended)
cd /home/busya/dev/ss-tools
docker compose up --build
# Or local development
# Terminal 1 — Backend
cd backend
source .venv/bin/activate
python -m uvicorn src.app:app --reload --port 8001
# Terminal 2 — Frontend
cd frontend
npm run dev -- --port 5173
- Frontend: http://localhost:5173
- Backend API: http://localhost:8001
- API Docs: http://localhost:8001/docs
2. Create a Terminology Dictionary
Via UI
- Navigate to http://localhost:5173/translate/dictionaries
- Click [+ New Dictionary]
- Enter name:
Product Terms, language:ru - Add entries inline or click [Import CSV]
- Save
Via API
curl -X POST http://localhost:8001/api/translate/dictionaries \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"name": "Product Terms",
"target_language": "ru",
"entries": [
{"source_term": "invoice", "target_term": "накладная"},
{"source_term": "widget", "target_term": "виджет"},
{"source_term": "backorder", "target_term": "предзаказ"}
]
}'
Expected: 201 Created with dictionary ID and entry count = 3.
3. Create a Translation Job
Via UI
- Navigate to http://localhost:5173/translate
- Click [+ New Translation Job]
- Select Superset datasource → columns auto-populate
- Set:
- Translation column:
product_name - Context columns:
category_name,product_description - Key columns:
product_id - Target table:
products_i18n - Target column:
translated_name - Target language:
Russian - Attach dictionary:
Product Terms
- Translation column:
- Click [Save & Preview]
Via API
curl -X POST http://localhost:8001/api/translate/jobs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"name": "Products RU Translation",
"datasource_id": "<datasource-uuid>",
"source_table": "products",
"translation_col": "product_name",
"context_cols": ["category_name", "product_description"],
"source_key_cols": ["product_id"],
"target_key_cols": ["product_id"],
"target_table": "products_i18n",
"target_col": "translated_name",
"target_language": "ru",
"batch_size": 50,
"dictionary_ids": ["<dictionary-uuid>"]
}'
Expected: 201 Created with job ID. Validation passes (columns exist, target table accessible).
Error case: 422 if translation column is empty; 400 if target table not found.
4. Preview Translations
Via UI
- Open the saved job → click [Preview]
- System shows ~10 rows with source, context, and LLM translation
- Approve good translations, edit or reject bad ones
- Click [Approve All] or handle individually
Via API
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/preview \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"sample_size": 10}'
Expected: 200 with array of PreviewRow objects (source_text, context, llm_translation, status=pending).
Error case: 503 if LLM provider unreachable; error message includes provider name and reason.
5. Execute Full Translation Run
Via UI
- After preview approval, click [Start Full Run]
- Confirm cost estimate dialog
- Watch live progress bar (WebSocket-driven)
- On completion: view run summary with translation status, insert status, Superset query reference, and generated SQL (audit).
Via API
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/runs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"upsert_strategy": "insert"}'
Expected: 202 Accepted with run ID. WebSocket messages stream progress. Final GET returns run with status=completed, translated_rows=N, insert_sql=<SQL>.
Partial failure: status=partial, failed_rows>0. [Retry Failed] available.
6. Execute INSERT through Superset SQL Lab API
Via UI
- After translation completes, the system automatically submits SQL to Superset
- Progress indicator shows: «📤 Submitting to Superset...»
- On success: «✅ Insert succeeded · 1,241 rows affected · Query #a7f3b2c»
- Click [View SQL] to audit the generated statement
Via API
# Trigger full run (backend handles Superset submission automatically)
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/runs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"upsert_strategy": "insert"}'
# Check run status (includes insert_status and superset_query_id)
curl http://localhost:8001/api/translate/runs/<run-id> \
-H "Authorization: Bearer <token>"
Expected: Run response includes insert_status: "succeeded", superset_query_id, rows_affected.
Insert failure: insert_status: "failed", insert_error_message populated. [Retry Insert] re-submits without re-translating.
Verify in Target Table
-- Run directly in Superset SQL Lab to verify
SELECT * FROM products_i18n WHERE translated_name IS NOT NULL;
7. Feedback Loop — Correct a Translation
Via UI
- Open run results → find a mistranslated word
- Highlight the word → [Correct this term] popup
- Enter correction → select dictionary → submit
- Re-run preview to verify correction is used
Via API
curl -X POST http://localhost:8001/api/translate/corrections \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"record_id": "<record-uuid>",
"source_term": "Monitor Stand",
"source_term": "Monitor Stand",
"incorrect_target_term": "Мониторная стойка",
"corrected_target_term": "Подставка для монитора",
"dictionary_id": "<dictionary-uuid>"
}'
Expected: 201. Term pair added to dictionary. Conflict dialog if term already exists.
8. Configure Schedule
Via UI
- Open job → Schedule tab
- Set type: Cron →
0 6 * * 1(every Monday 06:00) - Toggle auto-INSERT: ON
- Verify next 3 execution times
- Enable schedule
Via API
curl -X PUT http://localhost:8001/api/translate/jobs/<job-id>/schedule \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"schedule_type": "cron",
"cron_expression": "0 6 * * 1",
"timezone": "Europe/Moscow",
"concurrency": "skip"
}'
Expected: 200 with schedule config including next_run_at.
Verify: Check APScheduler jobs (backend log) or wait for next trigger and check run history.
9. View History and Metrics
Via UI
- Navigate to http://localhost:5173/translate/history
- Filter by datasource, target table, or date range
- Click a run for details: config snapshot, prompt, translations, INSERT SQL
Via API
# List runs
curl http://localhost:8001/api/translate/runs?job_id=<job-id> \
-H "Authorization: Bearer <token>"
# Get metrics
curl http://localhost:8001/api/translate/jobs/<job-id>/metrics \
-H "Authorization: Bearer <token>"
Expected: Run list with status and row counts. Metrics with cumulative tokens and cost.
10. Verification Checklist
Backend Tests
cd backend
source .venv/bin/activate
# Unit tests for translation plugin
pytest src/plugins/translate/__tests__/ -v
# Integration tests for translate API
pytest tests/test_translate_api.py -v
# All backend tests
pytest -v
Frontend Tests
cd frontend
npm run test -- --run
Linting
# Python
cd backend && ruff check src/plugins/translate/ src/api/routes/translate.py src/models/translate.py src/schemas/translate.py
# Svelte
cd frontend && npm run build # build includes type checking
Manual Smoke Test
- Create dictionary with 3 terms → verify in list
- Import CSV with 50 terms → verify no duplicates (check conflict dialog)
- Create job → verify column list populates from datasource
- Preview with empty dictionary → verify LLM still translates
- Preview with attached dictionary → verify glossary terms used (check
invoice→накладная) - Full run with 50 rows → verify INSERT SQL has 50 VALUES tuples
- Scheduled run (set to every 5 min for test) → verify run appears in history
- Feedback loop: correct 1 term → re-preview → verify correction reflected
- Delete dictionary attached to active job → verify blocked
- Check metrics dashboard → verify run counts and token totals