busya/ss-tools

Fork 0

Files

busya bdd376595c tasks ready

2026-05-08 18:01:49 +03:00

8.8 KiB

Raw Blame History

Quickstart: LLM Table Translation Service

Feature Branch: 028-llm-datasource-supeset
Date: 2026-05-08

Prerequisites

Running ss-tools instance (Docker Compose or local)
Superset connection configured in ss-tools settings
At least one LLM provider configured (Settings → LLM)
Target insertable PostgreSQL physical table exists in Superset with compatible schema
User has appropriate RBAC permissions (admin by default)

1. Start the Application

# Docker (recommended)
cd /home/busya/dev/ss-tools
docker compose up --build

# Or local development
# Terminal 1 — Backend
cd backend
source .venv/bin/activate
python -m uvicorn src.app:app --reload --port 8001

# Terminal 2 — Frontend
cd frontend
npm run dev -- --port 5173

2. Create a Terminology Dictionary

Via UI

Navigate to http://localhost:5173/translate/dictionaries
Click [+ New Dictionary]
Enter name: Product Terms, language: ru
Add entries inline or click [Import CSV]
Save

Via API

curl -X POST http://localhost:8001/api/translate/dictionaries \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "name": "Product Terms",
    "target_language": "ru",
    "entries": [
      {"source_term": "invoice", "target_term": "накладная"},
      {"source_term": "widget", "target_term": "виджет"},
      {"source_term": "backorder", "target_term": "предзаказ"}
    ]
  }'

Expected: 201 Created with dictionary ID and entry count = 3.

3. Create a Translation Job

Via UI

Navigate to http://localhost:5173/translate
Click [+ New Translation Job]
Select Superset datasource → columns auto-populate
Set:
- Translation column: product_name
- Context columns: category_name, product_description
- Key columns: product_id
- Target table: products_i18n
- Target column: translated_name
- Target language: Russian
- Attach dictionary: Product Terms
Click [Save & Preview]

Via API

curl -X POST http://localhost:8001/api/translate/jobs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "name": "Products RU Translation",
    "datasource_id": "<datasource-uuid>",
    "source_table": "products",
    "translation_col": "product_name",
    "context_cols": ["category_name", "product_description"],
    "source_key_cols": ["product_id"],
    "target_key_cols": ["product_id"],
    "target_table": "products_i18n",
    "target_col": "translated_name",
    "target_language": "ru",
    "batch_size": 50,
    "dictionary_ids": ["<dictionary-uuid>"]
  }'

Expected: 201 Created with job ID. Validation passes (columns exist, target table accessible).

Error case: 422 if translation column is empty; 400 if target table not found.

4. Preview Translations

Via UI

Open the saved job → click [Preview]
System shows ~10 rows with source, context, and LLM translation
Approve good translations, edit or reject bad ones
Click [Approve All] or handle individually

Via API

curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/preview \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"sample_size": 10}'

Expected: 200 with array of PreviewRow objects (source_text, context, llm_translation, status=pending).

Error case: 503 if LLM provider unreachable; error message includes provider name and reason.

5. Execute Full Translation Run

Via UI

After preview approval, click [Start Full Run]
Confirm cost estimate dialog
Watch live progress bar (WebSocket-driven)
On completion: view run summary with translation status, insert status, Superset query reference, and generated SQL (audit).

Via API

curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/runs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"upsert_strategy": "insert"}'

Expected: 202 Accepted with run ID. WebSocket messages stream progress. Final GET returns run with status=completed, translated_rows=N, insert_sql=<SQL>.

Partial failure: status=partial, failed_rows>0. [Retry Failed] available.

6. Execute INSERT through Superset SQL Lab API

Via UI

After translation completes, the system automatically submits SQL to Superset
Progress indicator shows: «📤 Submitting to Superset...»
On success: «✅ Insert succeeded · 1,241 rows affected · Query #a7f3b2c»
Click [View SQL] to audit the generated statement

Via API

# Trigger full run (backend handles Superset submission automatically)
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/runs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"upsert_strategy": "insert"}'

# Check run status (includes insert_status and superset_query_id)
curl http://localhost:8001/api/translate/runs/<run-id> \
  -H "Authorization: Bearer <token>"

Expected: Run response includes insert_status: "succeeded", superset_query_id, rows_affected.

Insert failure: insert_status: "failed", insert_error_message populated. [Retry Insert] re-submits without re-translating.

Verify in Target Table

-- Run directly in Superset SQL Lab to verify
SELECT * FROM products_i18n WHERE translated_name IS NOT NULL;

7. Feedback Loop — Correct a Translation

Via UI

Open run results → find a mistranslated word
Highlight the word → [Correct this term] popup
Enter correction → select dictionary → submit
Re-run preview to verify correction is used

Via API

curl -X POST http://localhost:8001/api/translate/corrections \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "record_id": "<record-uuid>",
    "source_term": "Monitor Stand",
    "source_term": "Monitor Stand",
    "incorrect_target_term": "Мониторная стойка",
    "corrected_target_term": "Подставка для монитора",
    "dictionary_id": "<dictionary-uuid>"
  }'

Expected: 201. Term pair added to dictionary. Conflict dialog if term already exists.

8. Configure Schedule

Via UI

Open job → Schedule tab
Set type: Cron → 0 6 * * 1 (every Monday 06:00)
Toggle auto-INSERT: ON
Verify next 3 execution times
Enable schedule

Via API

curl -X PUT http://localhost:8001/api/translate/jobs/<job-id>/schedule \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "schedule_type": "cron",
    "cron_expression": "0 6 * * 1",
    "timezone": "Europe/Moscow",
    "concurrency": "skip"
  }'

Expected: 200 with schedule config including next_run_at.

Verify: Check APScheduler jobs (backend log) or wait for next trigger and check run history.

9. View History and Metrics

Via UI

Navigate to http://localhost:5173/translate/history
Filter by datasource, target table, or date range
Click a run for details: config snapshot, prompt, translations, INSERT SQL

Via API

# List runs
curl http://localhost:8001/api/translate/runs?job_id=<job-id> \
  -H "Authorization: Bearer <token>"

# Get metrics
curl http://localhost:8001/api/translate/jobs/<job-id>/metrics \
  -H "Authorization: Bearer <token>"

Expected: Run list with status and row counts. Metrics with cumulative tokens and cost.

10. Verification Checklist

Backend Tests

cd backend
source .venv/bin/activate

# Unit tests for translation plugin
pytest src/plugins/translate/__tests__/ -v

# Integration tests for translate API
pytest tests/test_translate_api.py -v

# All backend tests
pytest -v

Frontend Tests

cd frontend
npm run test -- --run

Linting

# Python
cd backend && ruff check src/plugins/translate/ src/api/routes/translate.py src/models/translate.py src/schemas/translate.py

# Svelte
cd frontend && npm run build  # build includes type checking

Manual Smoke Test

Create dictionary with 3 terms → verify in list
Import CSV with 50 terms → verify no duplicates (check conflict dialog)
Create job → verify column list populates from datasource
Preview with empty dictionary → verify LLM still translates
Preview with attached dictionary → verify glossary terms used (check invoice → накладная)
Full run with 50 rows → verify INSERT SQL has 50 VALUES tuples
Scheduled run (set to every 5 min for test) → verify run appears in history
Feedback loop: correct 1 term → re-preview → verify correction reflected
Delete dictionary attached to active job → verify blocked
Check metrics dashboard → verify run counts and token totals

8.8 KiB Raw Blame History

Quickstart: LLM Table Translation Service

Prerequisites

1. Start the Application

2. Create a Terminology Dictionary

Via UI

Via API

3. Create a Translation Job

Via UI

Via API

4. Preview Translations

Via UI

Via API

5. Execute Full Translation Run

Via UI

Via API

6. Execute INSERT through Superset SQL Lab API

Via UI

Via API

Verify in Target Table

7. Feedback Loop — Correct a Translation

Via UI

Via API

8. Configure Schedule

Via UI

Via API

9. View History and Metrics

Via UI

Via API

10. Verification Checklist

Backend Tests

Frontend Tests

Linting

Manual Smoke Test

8.8 KiB

Raw Blame History