tasks ready
This commit is contained in:
188
specs/028-llm-datasource-supeset/research.md
Normal file
188
specs/028-llm-datasource-supeset/research.md
Normal file
@@ -0,0 +1,188 @@
|
||||
# Research: LLM Table Translation Service
|
||||
|
||||
**Feature Branch**: `028-llm-datasource-supeset`
|
||||
**Date**: 2026-05-08
|
||||
|
||||
## R1: Plugin Placement — New Plugin vs. Extending LLMAnalysisPlugin
|
||||
|
||||
### Decision
|
||||
Create a new standalone plugin `TranslationPlugin` at `backend/src/plugins/translate/` rather than extending the existing `LLMAnalysisPlugin`.
|
||||
|
||||
### Rationale
|
||||
- `LLMAnalysisPlugin` is focused on dashboard validation and documentation generation — a different domain from batch table translation.
|
||||
- Translation requires new ORM models (`TranslationJob`, `TranslationRun`, `TranslationRecord`, `TerminologyDictionary`, `TranslationSchedule`, `TranslationEvent`), new API routes (`/api/translate/*`), new Svelte components, and new scheduler integration — scope that warrants a dedicated plugin.
|
||||
- Existing plugin system (`PluginBase`) already supports multiple independent plugins and lazy discovery via `plugin_loader.py`.
|
||||
- Separation avoids bloating `llm_analysis/plugin.py` (already 481 lines) and maintains the fractal limit (INV_7: <400 lines per module).
|
||||
|
||||
### Alternatives Considered
|
||||
- **Extend LLMAnalysisPlugin**: Rejected because it would conflate two distinct feature domains, increase module size beyond fractal limit, and complicate RBAC permission boundaries.
|
||||
- **Create as a standalone service in `backend/src/services/translate/`**: Rejected because the plugin lifecycle (register, unregister, configuration persistence, API exposure) is already standardized via `PluginBase`. A standalone service would duplicate plugin machinery.
|
||||
|
||||
### Impact
|
||||
- New directory: `backend/src/plugins/translate/` with `plugin.py`, `orchestrator.py`, `preview.py`, `executor.py`, `dictionary.py`, `sql_generator.py`, `scheduler.py`, `events.py`, `metrics.py`, `__tests__/`.
|
||||
- New route module: `backend/src/api/routes/translate.py`.
|
||||
- New model module: `backend/src/models/translate.py`.
|
||||
- New schema module: `backend/src/schemas/translate.py`.
|
||||
- Registered in `backend/src/api/routes/__init__.py` `__all__` list.
|
||||
|
||||
---
|
||||
|
||||
## R2: LLM Prompt Construction Strategy
|
||||
|
||||
### Decision
|
||||
Construct prompts using a layered template approach: base system prompt → dictionary glossary (per-batch filtered) → context columns → translation column values. Leverage existing `llm_prompt_templates.py` for template rendering and `LLMProviderService` for provider selection.
|
||||
|
||||
### Rationale
|
||||
- Existing `llm_prompt_templates.py` already supports `render_prompt()` with Jinja2-like substitution and multimodal detection.
|
||||
- Per-batch dictionary filtering (FR-044): scan batch rows for substring matches against dictionary `source_term` values; only include matched entries in the prompt. This keeps token usage proportional to batch content, not dictionary size.
|
||||
- Context columns are appended as structured fields (e.g., `Category: {category_name}\nDescription: {product_description}`) before the translation column value.
|
||||
- The system prompt explicitly instructs the LLM: "Use the provided glossary for exact matches. For partial matches, prefer glossary translations. For terms not in the glossary, translate naturally."
|
||||
|
||||
### Alternatives Considered
|
||||
- **Full dictionary injection**: Rejected — would exceed LLM context window for dictionaries >5000 terms.
|
||||
- **Semantic embedding search**: Rejected — adds unnecessary complexity (vector DB dependency) when substring matching is sufficient for glossary use cases.
|
||||
- **Separate LLM call for glossary matching**: Rejected — doubles API cost and latency without proportional quality gain.
|
||||
|
||||
### Impact
|
||||
- `DictionaryManager` must implement `filter_for_batch(rows: list[str]) -> list[dict]` returning matched entries.
|
||||
- Prompt template includes `{{ glossary }}` and `{{ context }}` placeholder blocks.
|
||||
|
||||
---
|
||||
|
||||
## R3: SQL Generation — Dialect-Aware INSERT/UPSERT for Superset API
|
||||
|
||||
### Decision
|
||||
Detect the target database dialect from the Superset datasource's connection configuration at job save time. Generate dialect-appropriate safe SQL: `INSERT INTO ... VALUES (...)` for ClickHouse; `INSERT INTO ... VALUES (...)` or `INSERT ... ON CONFLICT ...` for PostgreSQL/Greenplum. Submit generated SQL to Superset via `/api/v1/sqllab/execute/`.
|
||||
|
||||
### Rationale
|
||||
- Different databases use different UPSERT syntax: PostgreSQL has `ON CONFLICT`, ClickHouse has no standard UPSERT (use INSERT with deduplication or ALTER TABLE UPDATE). Greenplum is PostgreSQL-compatible.
|
||||
- Superset knows the database backend via the connection's `backend`/`engine` field — the system queries this at configuration time and caches the dialect on the TranslationJob.
|
||||
- For ClickHouse, the `insert` strategy generates plain INSERT; `skip_existing` is not natively supported (the system warns the user); `overwrite` uses ALTER TABLE UPDATE or INSERT with ReplacingMergeTree semantics (documented limitation).
|
||||
- For PostgreSQL/Greenplum, full UPSERT support: `ON CONFLICT DO NOTHING` (skip_existing) and `ON CONFLICT DO UPDATE` (overwrite).
|
||||
- Identifier quoting: PostgreSQL/Greenplum uses `"identifier"`; ClickHouse uses `` `identifier` `` or `"identifier"` depending on settings.
|
||||
- Values are safely encoded per dialect: strings escaped, NULLs rendered as `NULL`.
|
||||
|
||||
### Alternatives Considered
|
||||
- **PostgreSQL-only**: Rejected — user's Superset instances may use ClickHouse as the primary analytical database. Dialect detection from the connection is the correct source of truth.
|
||||
- **Manual SQL Lab copy/paste**: Rejected — Superset API execution is the canonical path.
|
||||
- **UPDATE statements**: Rejected — source data is append-only (new-key-only strategy). UPSERT covers the overwrite case.
|
||||
|
||||
### Impact
|
||||
- `TranslationJob.database_dialect` field caches the detected dialect at save time.
|
||||
- `SQLGenerator` dispatches to dialect-specific formatters.
|
||||
- Dialect-specific SQL syntax tests required for PostgreSQL and ClickHouse (SC-003).
|
||||
- Unsupported dialects are rejected at configuration time with a clear error message.
|
||||
|
||||
---
|
||||
|
||||
## R4: Schedule Execution — APScheduler Integration
|
||||
|
||||
### Decision
|
||||
Extend the existing `SchedulerService` (`backend/src/core/scheduler.py`) with a new job type `translate_scheduled_run`. Each translation job's schedule configuration is stored in the `TranslationSchedule` model and loaded into APScheduler on service start and on schedule create/update.
|
||||
|
||||
### Rationale
|
||||
- Existing `SchedulerService` already manages `BackgroundScheduler`, cron triggers, start/stop lifecycle, and task manager integration.
|
||||
- Translation schedules are distinct from backup schedules — stored in `translate` models, loaded via a registration callback pattern.
|
||||
- Schedule trigger: APScheduler fires → `run_scheduled_translation(job_id)` → creates `TranslationRun` → orchestrator processes new-key-only rows → generates INSERT statements.
|
||||
- Concurrency policy (skip/queue) enforced in the trigger handler before orchestrator invocation.
|
||||
|
||||
### Alternatives Considered
|
||||
- **Separate scheduler instance**: Rejected — creates resource contention (two APScheduler instances) and complicates Docker deployment.
|
||||
- **Celery/Redis-based scheduling**: Rejected — adds infrastructure dependency; APScheduler is already proven in this codebase.
|
||||
- **Cron-based external scheduling**: Rejected — requires OS-level cron configuration, loses programmatic control over pause/resume and concurrency policies.
|
||||
|
||||
### Impact
|
||||
- `backend/src/plugins/translate/scheduler.py` registers translation job schedules with `SchedulerService`.
|
||||
- New trigger function `_execute_scheduled_translation(job_id: str)` imported by `SchedulerService`.
|
||||
- Existing `SchedulerService.load_schedules()` extended to discover and register translation schedules alongside backup schedules.
|
||||
|
||||
---
|
||||
|
||||
## R5: Observability — Structured Event Log + MetricSnapshot
|
||||
|
||||
### Decision
|
||||
Implement a dedicated `TranslationEvent` ORM model with type-specific payload (JSON) for structured event logging (FR-046). Events are written synchronously within the orchestrator flow. Per-job cumulative metrics (FR-047) are computed from live `TranslationEvent` rows (for recent data <90 days) combined with `MetricSnapshot` rows (for historical data >90 days). At pruning time, a `MetricSnapshot` is persisted capturing cumulative tokens, cost, and run counts before events are deleted.
|
||||
|
||||
### Rationale
|
||||
- Structured events provide queryability for audit, trend analysis, and the admin dashboard without coupling to log parsing infrastructure.
|
||||
- JSON payload allows type-specific data.
|
||||
- MetricSnapshot persistence before pruning ensures cumulative metrics survive the 90-day retention window (SC-014).
|
||||
- The metrics dashboard reads: `latest MetricSnapshot + events WHERE timestamp > snapshot.covers_events_before`.
|
||||
- Synchronous event writes within the run transaction ensure no event loss during crashes.
|
||||
|
||||
### Alternatives Considered
|
||||
- **Application log (stdout) only**: Rejected — not queryable for dashboards or audit.
|
||||
- **Separate metrics table with counters (dual-write)**: Rejected — dual-write consistency risk; event-sourced + snapshot is simpler.
|
||||
- **Events-only (no snapshots)**: Rejected — cumulative metrics would be lost after 90-day pruning (FR-049).
|
||||
|
||||
### Impact
|
||||
- `TranslationEvent` model with nullable `run_id` for pre-run events.
|
||||
- `MetricSnapshot` model with `covers_events_before` timestamp for correct cutoff.
|
||||
- `MetricsService` queries aggregation from events + snapshots.
|
||||
- APScheduler daily job: persist snapshot → prune expired events/records.
|
||||
|
||||
---
|
||||
|
||||
## R6: Frontend Architecture — Svelte 5 Runes Pattern
|
||||
|
||||
### Decision
|
||||
Use Svelte 5 runes (`$state`, `$derived`, `$effect`) for all reactive state management in translation components. Store layer uses a dedicated `translate.js` Svelte store module with `$state` runes for job list, current job config, preview state, and run progress. API calls use the existing `requestApi`/`fetchApi` wrapper pattern.
|
||||
|
||||
### Rationale
|
||||
- Svelte 5 runes are the canonical reactivity model for this codebase (Svelte 5.43+ in package.json).
|
||||
- Dedicated store per feature domain follows existing patterns (`frontend/src/lib/stores/` houses auth, settings, task stores).
|
||||
- WebSocket for run progress reuses the existing `TaskManager` WebSocket infrastructure — translation runs emit progress events on the same channel.
|
||||
- Components follow existing layout patterns: Tailwind CSS, `@UX_STATE`/`@UX_FEEDBACK`/`@UX_RECOVERY`/`@UX_REACTIVITY` contract tags.
|
||||
|
||||
### Alternatives Considered
|
||||
- **Svelte 4 stores (writable/derived)**: Rejected — codebase has already migrated to Svelte 5 runes; mixing patterns creates inconsistency.
|
||||
- **Separate WebSocket channel**: Rejected — existing Task Drawer WebSocket infrastructure handles progress events generically; translation runs fit the same pattern.
|
||||
|
||||
### Impact
|
||||
- New SvelteKit route: `frontend/src/routes/translate/` with sub-routes for job config, dictionaries, history.
|
||||
- New component library: `frontend/src/lib/components/translate/` with 8 components.
|
||||
- New store: `frontend/src/lib/stores/translate.js` with `$state` runes.
|
||||
- New API client: `frontend/src/lib/api/translate.js`.
|
||||
|
||||
---
|
||||
|
||||
## R7: RBAC Permission Model Integration
|
||||
|
||||
### Decision
|
||||
Define 13 permission strings per the Access Control Matrix in spec.md and enforce them via the existing `PermissionChecker` dependency in FastAPI route handlers. No new database tables needed — the existing `permissions` and `role_permissions` tables store string-based permissions.
|
||||
|
||||
### Rationale
|
||||
- Existing RBAC model stores permissions as strings in `role_permissions.permission` column, checked via dependency injection in route handlers.
|
||||
- Granular permissions per resource type align with the existing pattern.
|
||||
- Ownership constraints (owner OR admin) are enforced in route handlers alongside permission checks.
|
||||
- Missing from original design: `translate.job.view`, `translate.dictionary.view`, `translate.schedule.view`, `translate.metrics.view` added for read-only access scenarios.
|
||||
|
||||
### Alternatives Considered
|
||||
- **Resource-level ownership only (no granular permissions)**: Rejected — spec explicitly requires granular permissions (FR-043).
|
||||
- **Separate permission table per resource**: Rejected — over-engineered; string-based permissions are sufficient.
|
||||
|
||||
### Impact
|
||||
- 13 permission strings registered in RBAC seed.
|
||||
- Route handlers annotated with `Depends(require_permission(...))` + ownership checks.
|
||||
- Admin UI displays new permission strings for role assignment.
|
||||
- Default analyst role: `translate.job.view`, `translate.job.execute`, `translate.dictionary.view`, `translate.history.view`.
|
||||
|
||||
---
|
||||
|
||||
## R8: Testing Strategy
|
||||
|
||||
### Decision
|
||||
Multi-layer testing: (1) pytest unit tests for orchestrator, executor, dictionary manager, SQL generator, scheduler, event log; (2) pytest integration tests for API routes with test database; (3) vitest component tests for Svelte components using @testing-library/svelte; (4) manual verification via `quickstart.md` for end-to-end flow.
|
||||
|
||||
### Rationale
|
||||
- Unit tests with mocked LLM responses and Superset client ensure fast feedback for business logic.
|
||||
- Integration tests verify API contract, database schema, RBAC enforcement, and schedule trigger behavior.
|
||||
- Component tests validate Svelte 5 rune reactivity, UX state transitions, and error recovery paths.
|
||||
- Manual quickstart provides a human-verifiable happy path that catches integration issues between backend and frontend.
|
||||
|
||||
### Alternatives Considered
|
||||
- **E2E tests with Playwright**: Deferred to future iteration — adds maintenance overhead; quickstart manual verification is sufficient for initial delivery.
|
||||
|
||||
### Impact
|
||||
- Test files: `backend/src/plugins/translate/__tests__/`, `backend/tests/test_translate_api.py`, `frontend/src/lib/components/translate/__tests__/`.
|
||||
- Fixtures: mock LLM provider responses, mock Superset client, test dictionary data, test translation job configuration.
|
||||
Reference in New Issue
Block a user