289 lines
8.8 KiB
Markdown
289 lines
8.8 KiB
Markdown
# Quickstart: LLM Table Translation Service
|
|
|
|
**Feature Branch**: `028-llm-datasource-supeset`
|
|
**Date**: 2026-05-08
|
|
|
|
## Prerequisites
|
|
|
|
- Running ss-tools instance (Docker Compose or local)
|
|
- Superset connection configured in ss-tools settings
|
|
- At least one LLM provider configured (Settings → LLM)
|
|
- Target insertable PostgreSQL physical table exists in Superset with compatible schema
|
|
- User has appropriate RBAC permissions (admin by default)
|
|
|
|
## 1. Start the Application
|
|
|
|
```bash
|
|
# Docker (recommended)
|
|
cd /home/busya/dev/ss-tools
|
|
docker compose up --build
|
|
|
|
# Or local development
|
|
# Terminal 1 — Backend
|
|
cd backend
|
|
source .venv/bin/activate
|
|
python -m uvicorn src.app:app --reload --port 8001
|
|
|
|
# Terminal 2 — Frontend
|
|
cd frontend
|
|
npm run dev -- --port 5173
|
|
```
|
|
|
|
- Frontend: http://localhost:5173
|
|
- Backend API: http://localhost:8001
|
|
- API Docs: http://localhost:8001/docs
|
|
|
|
## 2. Create a Terminology Dictionary
|
|
|
|
### Via UI
|
|
1. Navigate to http://localhost:5173/translate/dictionaries
|
|
2. Click **[+ New Dictionary]**
|
|
3. Enter name: `Product Terms`, language: `ru`
|
|
4. Add entries inline or click **[Import CSV]**
|
|
5. Save
|
|
|
|
### Via API
|
|
```bash
|
|
curl -X POST http://localhost:8001/api/translate/dictionaries \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer <token>" \
|
|
-d '{
|
|
"name": "Product Terms",
|
|
"target_language": "ru",
|
|
"entries": [
|
|
{"source_term": "invoice", "target_term": "накладная"},
|
|
{"source_term": "widget", "target_term": "виджет"},
|
|
{"source_term": "backorder", "target_term": "предзаказ"}
|
|
]
|
|
}'
|
|
```
|
|
|
|
**Expected**: 201 Created with dictionary ID and entry count = 3.
|
|
|
|
## 3. Create a Translation Job
|
|
|
|
### Via UI
|
|
1. Navigate to http://localhost:5173/translate
|
|
2. Click **[+ New Translation Job]**
|
|
3. Select Superset datasource → columns auto-populate
|
|
4. Set:
|
|
- Translation column: `product_name`
|
|
- Context columns: `category_name`, `product_description`
|
|
- Key columns: `product_id`
|
|
- Target table: `products_i18n`
|
|
- Target column: `translated_name`
|
|
- Target language: `Russian`
|
|
- Attach dictionary: `Product Terms`
|
|
5. Click **[Save & Preview]**
|
|
|
|
### Via API
|
|
```bash
|
|
curl -X POST http://localhost:8001/api/translate/jobs \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer <token>" \
|
|
-d '{
|
|
"name": "Products RU Translation",
|
|
"datasource_id": "<datasource-uuid>",
|
|
"source_table": "products",
|
|
"translation_col": "product_name",
|
|
"context_cols": ["category_name", "product_description"],
|
|
"source_key_cols": ["product_id"],
|
|
"target_key_cols": ["product_id"],
|
|
"target_table": "products_i18n",
|
|
"target_col": "translated_name",
|
|
"target_language": "ru",
|
|
"batch_size": 50,
|
|
"dictionary_ids": ["<dictionary-uuid>"]
|
|
}'
|
|
```
|
|
|
|
**Expected**: 201 Created with job ID. Validation passes (columns exist, target table accessible).
|
|
|
|
**Error case**: 422 if translation column is empty; 400 if target table not found.
|
|
|
|
## 4. Preview Translations
|
|
|
|
### Via UI
|
|
1. Open the saved job → click **[Preview]**
|
|
2. System shows ~10 rows with source, context, and LLM translation
|
|
3. Approve good translations, edit or reject bad ones
|
|
4. Click **[Approve All]** or handle individually
|
|
|
|
### Via API
|
|
```bash
|
|
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/preview \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer <token>" \
|
|
-d '{"sample_size": 10}'
|
|
```
|
|
|
|
**Expected**: 200 with array of PreviewRow objects (source_text, context, llm_translation, status=pending).
|
|
|
|
**Error case**: 503 if LLM provider unreachable; error message includes provider name and reason.
|
|
|
|
## 5. Execute Full Translation Run
|
|
|
|
### Via UI
|
|
1. After preview approval, click **[Start Full Run]**
|
|
2. Confirm cost estimate dialog
|
|
3. Watch live progress bar (WebSocket-driven)
|
|
4. On completion: view run summary with translation status, insert status, Superset query reference, and generated SQL (audit).
|
|
|
|
### Via API
|
|
```bash
|
|
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/runs \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer <token>" \
|
|
-d '{"upsert_strategy": "insert"}'
|
|
```
|
|
|
|
**Expected**: 202 Accepted with run ID. WebSocket messages stream progress. Final GET returns run with `status=completed`, `translated_rows=N`, `insert_sql=<SQL>`.
|
|
|
|
**Partial failure**: `status=partial`, `failed_rows>0`. **[Retry Failed]** available.
|
|
|
|
## 6. Execute INSERT through Superset SQL Lab API
|
|
|
|
### Via UI
|
|
1. After translation completes, the system automatically submits SQL to Superset
|
|
2. Progress indicator shows: «📤 Submitting to Superset...»
|
|
3. On success: «✅ Insert succeeded · 1,241 rows affected · Query #a7f3b2c»
|
|
4. Click **[View SQL]** to audit the generated statement
|
|
|
|
### Via API
|
|
```bash
|
|
# Trigger full run (backend handles Superset submission automatically)
|
|
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/runs \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer <token>" \
|
|
-d '{"upsert_strategy": "insert"}'
|
|
|
|
# Check run status (includes insert_status and superset_query_id)
|
|
curl http://localhost:8001/api/translate/runs/<run-id> \
|
|
-H "Authorization: Bearer <token>"
|
|
```
|
|
|
|
**Expected**: Run response includes `insert_status: "succeeded"`, `superset_query_id`, `rows_affected`.
|
|
|
|
**Insert failure**: `insert_status: "failed"`, `insert_error_message` populated. **[Retry Insert]** re-submits without re-translating.
|
|
|
|
### Verify in Target Table
|
|
```sql
|
|
-- Run directly in Superset SQL Lab to verify
|
|
SELECT * FROM products_i18n WHERE translated_name IS NOT NULL;
|
|
```
|
|
|
|
## 7. Feedback Loop — Correct a Translation
|
|
|
|
### Via UI
|
|
1. Open run results → find a mistranslated word
|
|
2. Highlight the word → **[Correct this term]** popup
|
|
3. Enter correction → select dictionary → submit
|
|
4. Re-run preview to verify correction is used
|
|
|
|
### Via API
|
|
```bash
|
|
curl -X POST http://localhost:8001/api/translate/corrections \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer <token>" \
|
|
-d '{
|
|
"record_id": "<record-uuid>",
|
|
"source_term": "Monitor Stand",
|
|
"source_term": "Monitor Stand",
|
|
"incorrect_target_term": "Мониторная стойка",
|
|
"corrected_target_term": "Подставка для монитора",
|
|
"dictionary_id": "<dictionary-uuid>"
|
|
}'
|
|
```
|
|
|
|
**Expected**: 201. Term pair added to dictionary. Conflict dialog if term already exists.
|
|
|
|
## 8. Configure Schedule
|
|
|
|
### Via UI
|
|
1. Open job → **Schedule** tab
|
|
2. Set type: Cron → `0 6 * * 1` (every Monday 06:00)
|
|
3. Toggle auto-INSERT: ON
|
|
4. Verify next 3 execution times
|
|
5. Enable schedule
|
|
|
|
### Via API
|
|
```bash
|
|
curl -X PUT http://localhost:8001/api/translate/jobs/<job-id>/schedule \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer <token>" \
|
|
-d '{
|
|
"schedule_type": "cron",
|
|
"cron_expression": "0 6 * * 1",
|
|
"timezone": "Europe/Moscow",
|
|
"concurrency": "skip"
|
|
}'
|
|
```
|
|
|
|
**Expected**: 200 with schedule config including `next_run_at`.
|
|
|
|
**Verify**: Check APScheduler jobs (backend log) or wait for next trigger and check run history.
|
|
|
|
## 9. View History and Metrics
|
|
|
|
### Via UI
|
|
1. Navigate to http://localhost:5173/translate/history
|
|
2. Filter by datasource, target table, or date range
|
|
3. Click a run for details: config snapshot, prompt, translations, INSERT SQL
|
|
|
|
### Via API
|
|
```bash
|
|
# List runs
|
|
curl http://localhost:8001/api/translate/runs?job_id=<job-id> \
|
|
-H "Authorization: Bearer <token>"
|
|
|
|
# Get metrics
|
|
curl http://localhost:8001/api/translate/jobs/<job-id>/metrics \
|
|
-H "Authorization: Bearer <token>"
|
|
```
|
|
|
|
**Expected**: Run list with status and row counts. Metrics with cumulative tokens and cost.
|
|
|
|
## 10. Verification Checklist
|
|
|
|
### Backend Tests
|
|
```bash
|
|
cd backend
|
|
source .venv/bin/activate
|
|
|
|
# Unit tests for translation plugin
|
|
pytest src/plugins/translate/__tests__/ -v
|
|
|
|
# Integration tests for translate API
|
|
pytest tests/test_translate_api.py -v
|
|
|
|
# All backend tests
|
|
pytest -v
|
|
```
|
|
|
|
### Frontend Tests
|
|
```bash
|
|
cd frontend
|
|
npm run test -- --run
|
|
```
|
|
|
|
### Linting
|
|
```bash
|
|
# Python
|
|
cd backend && ruff check src/plugins/translate/ src/api/routes/translate.py src/models/translate.py src/schemas/translate.py
|
|
|
|
# Svelte
|
|
cd frontend && npm run build # build includes type checking
|
|
```
|
|
|
|
### Manual Smoke Test
|
|
1. Create dictionary with 3 terms → verify in list
|
|
2. Import CSV with 50 terms → verify no duplicates (check conflict dialog)
|
|
3. Create job → verify column list populates from datasource
|
|
4. Preview with empty dictionary → verify LLM still translates
|
|
5. Preview with attached dictionary → verify glossary terms used (check `invoice` → `накладная`)
|
|
6. Full run with 50 rows → verify INSERT SQL has 50 VALUES tuples
|
|
7. Scheduled run (set to every 5 min for test) → verify run appears in history
|
|
8. Feedback loop: correct 1 term → re-preview → verify correction reflected
|
|
9. Delete dictionary attached to active job → verify blocked
|
|
10. Check metrics dashboard → verify run counts and token totals
|