ss-tools/specs/028-llm-datasource-supeset/quickstart.md

# Quickstart: LLM Table Translation Service

**Feature Branch**: `028-llm-datasource-supeset`
**Date**: 2026-05-08

## Prerequisites

- Running ss-tools instance (Docker Compose or local)
- Superset connection configured in ss-tools settings
- At least one LLM provider configured (Settings → LLM)
- Target insertable PostgreSQL physical table exists in Superset with compatible schema
- User has appropriate RBAC permissions (admin by default)

## 1. Start the Application

```bash
# Docker (recommended)
cd /home/busya/dev/ss-tools
docker compose up --build

# Or local development
# Terminal 1 — Backend
cd backend
source .venv/bin/activate
python -m uvicorn src.app:app --reload --port 8001

# Terminal 2 — Frontend
cd frontend
npm run dev -- --port 5173
```

- Frontend: http://localhost:5173
- Backend API: http://localhost:8001
- API Docs: http://localhost:8001/docs

## 2. Create a Terminology Dictionary

### Via UI
1. Navigate to http://localhost:5173/translate/dictionaries
2. Click **[+ New Dictionary]**
3. Enter name: `Product Terms`, language: `ru`
4. Add entries inline or click **[Import CSV]**
5. Save

### Via API
```bash
curl -X POST http://localhost:8001/api/translate/dictionaries \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "name": "Product Terms",
    "target_language": "ru",
    "entries": [
      {"source_term": "invoice", "target_term": "накладная"},
      {"source_term": "widget", "target_term": "виджет"},
      {"source_term": "backorder", "target_term": "предзаказ"}
    ]
  }'
```

**Expected**: 201 Created with dictionary ID and entry count = 3.

## 3. Create a Translation Job

### Via UI
1. Navigate to http://localhost:5173/translate
2. Click **[+ New Translation Job]**
3. Select Superset datasource → columns auto-populate
4. Set:
   - Translation column: `product_name`
   - Context columns: `category_name`, `product_description`
   - Key columns: `product_id`
   - Target table: `products_i18n`
   - Target column: `translated_name`
   - Target language: `Russian`
   - Attach dictionary: `Product Terms`
5. Click **[Save & Preview]**

### Via API
```bash
curl -X POST http://localhost:8001/api/translate/jobs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "name": "Products RU Translation",
    "datasource_id": "<datasource-uuid>",
    "source_table": "products",
    "translation_col": "product_name",
    "context_cols": ["category_name", "product_description"],
    "source_key_cols": ["product_id"],
    "target_key_cols": ["product_id"],
    "target_table": "products_i18n",
    "target_col": "translated_name",
    "target_language": "ru",
    "batch_size": 50,
    "dictionary_ids": ["<dictionary-uuid>"]
  }'
```

**Expected**: 201 Created with job ID. Validation passes (columns exist, target table accessible).

**Error case**: 422 if translation column is empty; 400 if target table not found.

## 4. Preview Translations

### Via UI
1. Open the saved job → click **[Preview]**
2. System shows ~10 rows with source, context, and LLM translation
3. Approve good translations, edit or reject bad ones
4. Click **[Approve All]** or handle individually

### Via API
```bash
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/preview \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"sample_size": 10}'
```

**Expected**: 200 with array of PreviewRow objects (source_text, context, llm_translation, status=pending).

**Error case**: 503 if LLM provider unreachable; error message includes provider name and reason.

## 5. Execute Full Translation Run

### Via UI
1. After preview approval, click **[Start Full Run]**
2. Confirm cost estimate dialog
3. Watch live progress bar (WebSocket-driven)
4. On completion: view run summary with translation status, insert status, Superset query reference, and generated SQL (audit).

### Via API
```bash
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/runs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"upsert_strategy": "insert"}'
```

**Expected**: 202 Accepted with run ID. WebSocket messages stream progress. Final GET returns run with `status=completed`, `translated_rows=N`, `insert_sql=<SQL>`.

**Partial failure**: `status=partial`, `failed_rows>0`. **[Retry Failed]** available.

## 6. Execute INSERT through Superset SQL Lab API

### Via UI
1. After translation completes, the system automatically submits SQL to Superset
2. Progress indicator shows: «📤 Submitting to Superset...»
3. On success: «✅ Insert succeeded · 1,241 rows affected · Query #a7f3b2c»
4. Click **[View SQL]** to audit the generated statement

### Via API
```bash
# Trigger full run (backend handles Superset submission automatically)
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/runs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"upsert_strategy": "insert"}'

# Check run status (includes insert_status and superset_query_id)
curl http://localhost:8001/api/translate/runs/<run-id> \
  -H "Authorization: Bearer <token>"
```

**Expected**: Run response includes `insert_status: "succeeded"`, `superset_query_id`, `rows_affected`.

**Insert failure**: `insert_status: "failed"`, `insert_error_message` populated. **[Retry Insert]** re-submits without re-translating.

### Verify in Target Table
```sql
-- Run directly in Superset SQL Lab to verify
SELECT * FROM products_i18n WHERE translated_name IS NOT NULL;
```

## 7. Feedback Loop — Correct a Translation

### Via UI
1. Open run results → find a mistranslated word
2. Highlight the word → **[Correct this term]** popup
3. Enter correction → select dictionary → submit
4. Re-run preview to verify correction is used

### Via API
```bash
curl -X POST http://localhost:8001/api/translate/corrections \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "record_id": "<record-uuid>",
    "source_term": "Monitor Stand",
    "source_term": "Monitor Stand",
    "incorrect_target_term": "Мониторная стойка",
    "corrected_target_term": "Подставка для монитора",
    "dictionary_id": "<dictionary-uuid>"
  }'
```

**Expected**: 201. Term pair added to dictionary. Conflict dialog if term already exists.

## 8. Configure Schedule

### Via UI
1. Open job → **Schedule** tab
2. Set type: Cron → `0 6 * * 1` (every Monday 06:00)
3. Toggle auto-INSERT: ON
4. Verify next 3 execution times
5. Enable schedule

### Via API
```bash
curl -X PUT http://localhost:8001/api/translate/jobs/<job-id>/schedule \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "schedule_type": "cron",
    "cron_expression": "0 6 * * 1",
    "timezone": "Europe/Moscow",
    "concurrency": "skip"
  }'
```

**Expected**: 200 with schedule config including `next_run_at`.

**Verify**: Check APScheduler jobs (backend log) or wait for next trigger and check run history.

## 9. View History and Metrics

### Via UI
1. Navigate to http://localhost:5173/translate/history
2. Filter by datasource, target table, or date range
3. Click a run for details: config snapshot, prompt, translations, INSERT SQL

### Via API
```bash
# List runs
curl http://localhost:8001/api/translate/runs?job_id=<job-id> \
  -H "Authorization: Bearer <token>"

# Get metrics
curl http://localhost:8001/api/translate/jobs/<job-id>/metrics \
  -H "Authorization: Bearer <token>"
```

**Expected**: Run list with status and row counts. Metrics with cumulative tokens and cost.

## 10. Verification Checklist

### Backend Tests
```bash
cd backend
source .venv/bin/activate

# Unit tests for translation plugin
pytest src/plugins/translate/__tests__/ -v

# Integration tests for translate API
pytest tests/test_translate_api.py -v

# All backend tests
pytest -v
```

### Frontend Tests
```bash
cd frontend
npm run test -- --run
```

### Linting
```bash
# Python
cd backend && ruff check src/plugins/translate/ src/api/routes/translate.py src/models/translate.py src/schemas/translate.py

# Svelte
cd frontend && npm run build  # build includes type checking
```

### Manual Smoke Test
1. Create dictionary with 3 terms → verify in list
2. Import CSV with 50 terms → verify no duplicates (check conflict dialog)
3. Create job → verify column list populates from datasource
4. Preview with empty dictionary → verify LLM still translates
5. Preview with attached dictionary → verify glossary terms used (check `invoice` → `накладная`)
6. Full run with 50 rows → verify INSERT SQL has 50 VALUES tuples
7. Scheduled run (set to every 5 min for test) → verify run appears in history
8. Feedback loop: correct 1 term → re-preview → verify correction reflected
9. Delete dictionary attached to active job → verify blocked
10. Check metrics dashboard → verify run counts and token totals