tasks ready

This commit is contained in:
2026-05-08 18:01:49 +03:00
parent d8df1fff59
commit bdd376595c
32 changed files with 3243 additions and 229 deletions

View File

@@ -534,7 +534,7 @@ embedding: null
http_api:
http_enabled: true
http_host: 127.0.0.1
http_port: 8420
http_port: 8421
http_api_key: '123'
doc_mode: null
doc_tag_mapping: null

View File

@@ -0,0 +1,30 @@
# ss-tools Development Guidelines
Auto-generated from all feature plans. Last updated: 2026-05-08
## Active Technologies
- Python 3.13+ (backend), JavaScript/TypeScript (frontend Svelte 5) + FastAPI 0.115+, SQLAlchemy 2.0+, APScheduler 3.x, Pydantic v2 (backend); SvelteKit 2.x, Svelte 5.43+, Vite 7.x, Tailwind CSS 3.x (frontend) (028-llm-datasource-supeset)
## Project Structure
```text
backend/
frontend/
tests/
```
## Commands
cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .
## Code Style
Python 3.13+ (backend), JavaScript/TypeScript (frontend Svelte 5): Follow standard conventions
## Recent Changes
- 028-llm-datasource-supeset: Added Python 3.13+ (backend), JavaScript/TypeScript (frontend Svelte 5) + FastAPI 0.115+, SQLAlchemy 2.0+, APScheduler 3.x, Pydantic v2 (backend); SvelteKit 2.x, Svelte 5.43+, Vite 7.x, Tailwind CSS 3.x (frontend)
<!-- MANUAL ADDITIONS START -->
<!-- MANUAL ADDITIONS END -->

View File

@@ -1,4 +1,4 @@
---
description: read semantic protocol
---
MANDATORY USE `skill({name="semantics-core"})`, `skill({name="semantics-contracts"})`, `skill({name="semantics-belief"})`
MANDATORY USE `skill({name="semantics-core"})`, `skill({name="semantics-contracts"})`, `skill({name="semantics-belief"})`, `skill({name="semantics-frontend"})`

View File

@@ -1,5 +1,5 @@
---
description: Perform a read-only consistency analysis across spec.md, plan.md, tasks.md, and ADR sources for the active Rust MCP feature.
description: Perform a read-only consistency analysis across spec.md, plan.md, tasks.md, and ADR sources for the active Python/Svelte feature.
---
## User Input
@@ -66,7 +66,7 @@ Identify inconsistencies, ambiguities, coverage gaps, and decision-memory drift
## Analysis Rules
- Treat stale Python/Svelte assumptions in plan/tasks as real defects for this repository.
- Treat stale Rust/MCP assumptions in plan/tasks as real defects for this Python/Svelte repository.
- Treat missing ADR propagation as a real defect, not a documentation nit.
- Prefer repository-real expectations (`src/**/*.rs`, `tests/*.rs`, task-shaped MCP tools/resources, belief runtime, static semantic verification).
- Prefer repository-real expectations (`backend/src/**/*.py`, `frontend/src/**/*.svelte`, `backend/tests/`, `frontend/tests/`, `pytest`, `vitest`, `ruff check`, static semantic verification).
- Do not treat `.kilo/plans/*` as feature artifacts for consistency analysis.

View File

@@ -149,18 +149,19 @@ You **MUST** consider the user input before proceeding (if not empty).
- "Are error handling requirements defined for all API failure modes? [Gap]"
- "Are accessibility requirements specified for all interactive elements? [Completeness]"
- "Are mobile breakpoint requirements defined for responsive layouts? [Gap]"
- "Are WebSocket reconnection requirements defined for real-time features? [Gap]"
Clarity:
- "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec §NFR-2]"
- "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec §FR-5]"
- "Are 'related dashboards' selection criteria explicitly defined? [Clarity, Spec §FR-5]"
- "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec §FR-4]"
Consistency:
- "Do navigation requirements align across all pages? [Consistency, Spec §FR-10]"
- "Are card component requirements consistent between landing and detail pages? [Consistency]"
- "Are component requirements consistent between list and detail views? [Consistency]"
Coverage:
- "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]"
- "Are requirements defined for zero-state scenarios (no dashboards)? [Coverage, Edge Case]"
- "Are concurrent user interaction scenarios addressed? [Coverage, Gap]"
- "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]"
@@ -171,7 +172,7 @@ You **MUST** consider the user input before proceeding (if not empty).
Decision Memory:
- "Do all repo-shaping technical choices have explicit rationale before tasks are generated? [Decision Memory, Plan]"
- "Are rejected alternatives documented for architectural branches that would materially change implementation scope? [Decision Memory, Gap]"
- "Can a coder determine from the planning artifacts which tempting shortcut is forbidden? [Decision Memory, Clarity]"
- "Can a developer determine from the planning artifacts which tempting shortcut is forbidden? [Decision Memory, Clarity]"
**Scenario Classification & Coverage** (Requirements Quality Focus):
- Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios
@@ -188,8 +189,8 @@ You **MUST** consider the user input before proceeding (if not empty).
Ask questions about the requirements themselves:
- Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec §NFR-1]"
- Conflicts: "Do navigation requirements conflict between §FR-10 and §FR-10a? [Conflict]"
- Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]"
- Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]"
- Assumptions: "Is the assumption of 'always available API' validated? [Assumption]"
- Dependencies: "Are external API requirements documented? [Dependency, Gap]"
- Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]"
- Decision-memory drift: "Do tasks inherit the same rejected-path guardrails defined in planning? [Decision Memory, Conflict]"
@@ -254,6 +255,7 @@ Sample items:
- "Are authentication requirements consistent across all endpoints? [Consistency]"
- "Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]"
- "Is versioning strategy documented in requirements? [Gap]"
- "Are WebSocket message schemas documented for all event types? [Completeness, Gap]"
**Performance Requirements Quality:** `performance.md`
@@ -289,20 +291,20 @@ Sample items:
**❌ WRONG - These test implementation, not requirements:**
```markdown
- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001]
- [ ] CHK001 - Verify landing page displays 3 dashboard cards [Spec §FR-001]
- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003]
- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010]
- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005]
- [ ] CHK004 - Check that related dashboards section shows 3-5 items [Spec §FR-005]
```
**✅ CORRECT - These test requirements quality:**
```markdown
- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001]
- [ ] CHK001 - Are the number and layout of featured dashboards explicitly specified? [Completeness, Spec §FR-001]
- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003]
- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010]
- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005]
- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap]
- [ ] CHK004 - Is the selection criteria for related dashboards documented? [Gap, Spec §FR-005]
- [ ] CHK005 - Are loading state requirements defined for asynchronous dashboard data? [Gap]
- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001]
- [ ] CHK007 - Do planning artifacts state why the accepted architecture was chosen and which alternative is rejected? [Decision Memory, ADR]
```

View File

@@ -46,6 +46,7 @@ Execution steps:
- Critical user journeys / sequences
- Error/empty/loading states
- Accessibility or localization notes
- Svelte 5 runes reactivity expectations ($state, $derived, $effect, $props)
Non-Functional Quality Attributes:
- Performance (latency, throughput targets)
@@ -66,7 +67,7 @@ Execution steps:
- Conflict resolution (e.g., concurrent edits)
Constraints & Tradeoffs:
- Technical constraints (language, storage, hosting)
- Technical constraints (Python 3.13+, Svelte 5, PostgreSQL)
- Explicit tradeoffs or rejected alternatives
Terminology & Consistency:

View File

@@ -18,10 +18,11 @@ You **MUST** consider the user input before proceeding (if not empty).
You are updating the local constitution at `.specify/memory/constitution.md`. This file is the workflow-facing constitutional source for the repository and must align with:
- `.kilo/skills/semantics-core/SKILL.md`
- `.kilo/skills/semantics-contracts/SKILL.md`
- `.kilo/skills/semantics-belief/SKILL.md`
- `.kilo/skills/semantics-testing/SKILL.md`
- `.opencode/skills/semantics-core/SKILL.md`
- `.opencode/skills/semantics-contracts/SKILL.md`
- `.opencode/skills/semantics-belief/SKILL.md`
- `.opencode/skills/semantics-testing/SKILL.md`
- `.opencode/skills/semantics-frontend/SKILL.md`
- `README.md`
- `docs/SEMANTIC_PROTOCOL_COMPLIANCE.md`
- `docs/adr/*`
@@ -29,7 +30,7 @@ You are updating the local constitution at `.specify/memory/constitution.md`. Th
Execution flow:
1. Load the existing constitution at `.specify/memory/constitution.md`.
2. Identify placeholders, stale assumptions, or principles that conflict with the current Rust MCP repository.
2. Identify placeholders, stale assumptions, or principles that conflict with the current Python/Svelte repository.
3. Derive concrete constitutional text from user input and repository reality.
4. Version the constitution using semantic versioning:
- MAJOR: incompatible governance/principle change
@@ -42,11 +43,11 @@ Execution flow:
- `.specify/templates/tasks-template.md`
- `.specify/templates/test-docs-template.md`
- `.specify/templates/ux-reference-template.md`
- `.kilo/workflows/speckit.plan.md`
- `.kilo/workflows/speckit.tasks.md`
- `.kilo/workflows/speckit.implement.md`
- `.kilo/workflows/speckit.test.md`
- `.kilo/workflows/speckit.analyze.md`
- `.opencode/command/speckit.plan.md`
- `.opencode/command/speckit.tasks.md`
- `.opencode/command/speckit.implement.md`
- `.opencode/command/speckit.test.md`
- `.opencode/command/speckit.analyze.md`
7. Prepend a sync impact report as an HTML comment at the top of the constitution.
8. Validate:
- no unexplained placeholders remain

View File

@@ -1,5 +1,5 @@
---
description: Execute the implementation plan by processing the active tasks.md for the Rust MCP repository.
description: Execute the implementation plan by processing the active tasks.md for the Python/Svelte repository.
handoffs:
- label: Audit & Verify (Tester)
agent: qa-tester
@@ -39,21 +39,25 @@ You **MUST** consider the user input before proceeding (if not empty).
## Repository Reality Rules
- Default source paths are `src/**/*.rs` and `tests/*.rs`.
- Default source paths are `backend/src/**/*.py`, `frontend/src/**/*.svelte`, `backend/tests/`, and `frontend/tests/`.
- Active feature docs always live under `specs/<feature>/...` and are discovered via the `.specify/scripts/bash/*` helpers.
- Default verification stack is Rust-native and repository-real:
- `cargo test --all-targets --all-features -- --nocapture`
- `cargo clippy --all-targets --all-features -- -D warnings` when applicable
- `python3 scripts/static_verify.py`
- Do not fall back to `backend/`, `frontend/`, `pytest`, `npm`, or `__tests__/` conventions unless the active feature genuinely introduces such a surface.
- Default verification stack is Python/Svelte-native and repository-real:
- `cd backend && pytest` (backend tests)
- `cd frontend && npm run test` (frontend vitest)
- `ruff check backend/` (Python linting)
- `cd frontend && npm run build` (Svelte build check)
- `python3 scripts/static_verify.py` (semantic static verification when available)
- Do not fall back to `cargo`, `cargo test`, `cargo clippy`, `src/**/*.rs`, or Rust/MCP conventions unless the active feature genuinely introduces such a surface.
## Semantic Execution Rules
- Preserve and extend canonical `[DEF]` anchors and metadata.
- Use correct comment-anchor syntax: `# [DEF:...]` for Python, `<!-- [DEF:...] -->` for Svelte markup, `// [DEF:...]` for Svelte script blocks.
- Match contract density to effective complexity.
- Keep accepted-path and rejected-path memory intact.
- Do not silently restore an ADR- or contract-rejected branch.
- For C4/C5 Rust orchestration flows, account for the belief runtime where required by repository norms and local contracts.
- For C4/C5 Python orchestration flows, account for the belief runtime where required by repository norms and local contracts.
- For C4/C5 Svelte components, ensure `@UX_STATE`, `@UX_FEEDBACK`, `@UX_RECOVERY`, `@UX_REACTIVITY` tags are satisfied.
- Treat pseudo-semantic markup as invalid.
## Progress and Acceptance

View File

@@ -1,13 +1,13 @@
---
description: Execute the Rust MCP implementation planning workflow and generate research, design, contracts, and quickstart artifacts.
description: Execute the Python/Svelte implementation planning workflow and generate research, design, contracts, and quickstart artifacts.
handoffs:
- label: Create Tasks
agent: speckit.tasks
prompt: Break the Rust MCP plan into executable tasks
prompt: Break the Python/Svelte plan into executable tasks
send: true
- label: Create Checklist
agent: speckit.checklist
prompt: Create a requirements-quality checklist for the active Rust MCP feature
prompt: Create a requirements-quality checklist for the active Python/Svelte feature
---
## User Input
@@ -27,20 +27,19 @@ You **MUST** consider the user input before proceeding (if not empty).
2. **Load canonical planning context**:
- `README.md`
- `Cargo.toml`
- `backend/pyproject.toml` and `backend/requirements.txt`
- `frontend/package.json` and `frontend/svelte.config.js`
- `docs/SEMANTIC_PROTOCOL_COMPLIANCE.md`
- `docs/adr/ADR-0001-semantic-rust-module-layout.md`
- `docs/adr/ADR-0002-belief-state-runtime.md`
- `docs/adr/ADR-0003-comment-anchored-semantic-protocol.md`
- `docs/adr/ADR-0004-task-shaped-server-routing.md`
- `docs/adr/*.md` (Architecture Decision Records)
- `.specify/memory/constitution.md`
- `.kilo/skills/semantics-core/SKILL.md`
- `.kilo/skills/semantics-contracts/SKILL.md`
- `.kilo/skills/semantics-testing/SKILL.md`
- `.opencode/skills/semantics-core/SKILL.md`
- `.opencode/skills/semantics-contracts/SKILL.md`
- `.opencode/skills/semantics-testing/SKILL.md`
- `.opencode/skills/semantics-frontend/SKILL.md`
- `.specify/templates/plan-template.md`
3. **Execute the planning workflow** using the template structure:
- Fill `Technical Context` for the current repository reality: Rust crate, task-shaped MCP server, semantic contracts, belief runtime, and repository-local verification.
- Fill `Technical Context` for the current repository reality: Python 3.13+ backend (FastAPI, SQLAlchemy), Svelte 5 frontend (SvelteKit, Vite, Tailwind CSS), PostgreSQL storage, Docker deployment.
- Fill `Constitution Check` using the local constitution, semantic protocol compliance doc, and ADR set.
- ERROR if a blocking constitutional or semantic conflict is discovered and cannot be justified.
- Phase 0: generate `research.md` in `FEATURE_DIR`, resolving all material unknowns.
@@ -52,14 +51,16 @@ You **MUST** consider the user input before proceeding (if not empty).
## Phase 0: Research
Research must resolve only implementation-shaping unknowns that matter for this Rust MCP repository, such as:
Research must resolve only implementation-shaping unknowns that matter for this Python/Svelte repository, such as:
- crate/module placement under `src/`
- `tests/*.rs` strategy and required fixture coverage
- MCP tool/resource schema design
- runtime evidence and belief-state coverage
- backend module placement under `backend/src/` (api/, core/, models/, services/, schemas/)
- frontend component placement under `frontend/src/` (routes/, lib/components/, lib/stores/, lib/api/)
- `backend/tests/` and `frontend/tests/` strategy and required fixture coverage
- FastAPI endpoint / WebSocket schema design
- Svelte 5 runes reactivity model ($state, $derived, $effect, $props)
- PostgreSQL schema and migration strategy
- belief-state runtime coverage for C4/C5 Python flows
- semantic validation boundaries and static verification workflow
- task-shaped routing, workspace safety, and error-envelope design
Write `research.md` with concise sections:
@@ -74,29 +75,34 @@ Use `[NEED_CONTEXT: target]` instead of inventing relation targets, DTO names, o
### UX / Interaction Validation
Validate the proposed design against `ux_reference.md` as an **interaction reference** for MCP callers, CLI/operator flows, result envelopes, warnings, and recovery guidance.
Validate the proposed design against `ux_reference.md` as an **interaction reference** for:
- API callers (REST endpoints, WebSocket messages)
- CLI/operator flows
- Svelte UI flows (when the feature introduces frontend components)
- Result envelopes, warnings, and recovery guidance
If the planned architecture degrades the promised interaction model, deterministic recovery path, or context-budget behavior, stop and warn the user.
### Data Model Output
Generate `data-model.md` for Rust/MCP domain entities such as:
Generate `data-model.md` for Python/Svelte domain entities such as:
- tool request/response structs
- semantic query payloads
- runtime evidence envelopes
- workspace/checkpoint/index/security entities
- contract and relation traceability data
- FastAPI request/response schemas (Pydantic models)
- SQLAlchemy ORM entities
- Svelte store shapes and component props
- WebSocket message envelopes
- Task/report/artifact entities
### Global ADR Continuity
Before task decomposition, planning must identify any repo-shaping decisions this feature depends on or extends:
- Rust module layout and decomposition
- task-shaped tool/resource routing
- belief-state runtime behavior
- semantic comment-anchor rules
- payload/schema stability decisions
- Python module layout and decomposition (`backend/src/api/`, `backend/src/core/`, etc.)
- Frontend component architecture (Svelte 5 runes, SvelteKit routing)
- Belief-state runtime behavior for C4/C5 flows
- Semantic comment-anchor rules for Python and Svelte
- RBAC/security constraints (local auth, ADFS SSO)
- Plugin system lifecycle
For each durable choice, ensure the plan references the relevant ADR and explicitly records accepted and rejected paths.
@@ -108,37 +114,35 @@ Generate `contracts/modules.md` as the primary design contract for implementatio
- classify each planned module/component with `@COMPLEXITY` 1-5
- use canonical relation syntax `@RELATION PREDICATE -> TARGET_ID`
- preserve accepted-path and rejected-path memory via `@RATIONALE` and `@REJECTED` where needed
- describe MCP tools/resources, runtime evidence, validation envelopes, and semantic boundaries instead of inventing backend/frontend layers
- describe Python backend modules (api routes, core services, models, plugins) and Svelte frontend components instead of inventing Rust/MCP layers
- use appropriate comment-anchor syntax: `# [DEF:...]` for Python, `<!-- [DEF:...] -->` for Svelte markup, `// [DEF:...]` for Svelte script blocks
Complexity guidance for this repository:
- **Complexity 1**: anchors only
- **Complexity 2**: `@PURPOSE`
- **Complexity 3**: `@PURPOSE`, `@RELATION`
- **Complexity 4**: `@PURPOSE`, `@RELATION`, `@PRE`, `@POST`, `@SIDE_EFFECT`; Rust orchestration paths should account for belief runtime markers before mutation or return
- **Complexity 1**: anchors only (DTOs, simple constants)
- **Complexity 2**: `@PURPOSE` (utility functions, pure helpers)
- **Complexity 3**: `@PURPOSE`, `@RELATION` (multi-step flows with dependencies); Svelte components also `@UX_STATE`
- **Complexity 4**: `@PURPOSE`, `@RELATION`, `@PRE`, `@POST`, `@SIDE_EFFECT`; Python orchestration paths should account for belief runtime markers (`belief_scope`, `reason`, `reflect`, `explore`); Svelte also `@UX_FEEDBACK`, `@UX_RECOVERY`, `@UX_REACTIVITY`
- **Complexity 5**: level 4 plus `@DATA_CONTRACT`, `@INVARIANT`, and explicit decision-memory continuity
If a planned contract depends on unknown schema, relation target, or ADR identity, emit `[NEED_CONTEXT: target]` instead of fabricating placeholders.
### Optional Machine-Readable Contracts
You MAY generate machine-readable artifacts in `contracts/` only when they mirror the actual MCP tool/resource payloads of this Rust server. Do **not** default to REST/OpenAPI or frontend-sync artifacts unless the feature truly introduces them.
### Quickstart Output
Generate `quickstart.md` using real repository verification paths, typically:
Generate `quickstart.md` using real repository verification paths:
- start or exercise the MCP server entrypoint
- invoke relevant MCP tools/resources
- validate expected envelopes and recovery flows
- run `cargo test --all-targets --all-features -- --nocapture`
- run `cargo clippy --all-targets --all-features -- -D warnings` when applicable
- run `python3 scripts/static_verify.py`
- start or exercise the FastAPI backend entrypoint: `cd backend && python -m uvicorn src.app:app --reload`
- start or exercise the SvelteKit frontend: `cd frontend && npm run dev`
- invoke relevant API endpoints or CLI commands
- validate expected response envelopes, WebSocket messages, and recovery flows
- run `cd backend && pytest` for backend tests
- run `cd frontend && npm run test` for frontend tests
- run `ruff check backend/` for Python linting
## Key Rules
- Use absolute paths in workflow execution.
- Planning must reflect the current repository structure (`src/**/*.rs`, `tests/*.rs`, `docs/adr/*`) rather than legacy Python/Svelte examples.
- Do not reference `.ai/*` or `.kilocode/*` paths.
- Planning must reflect the current repository structure (`backend/src/`, `frontend/src/`, `docs/adr/`) rather than legacy Rust/MCP examples.
- Do not reference `.ai/*` or `.kilocode/*` paths as feature artifacts.
- Do not write any feature planning artifact outside `specs/<feature>/...`.
- Do not hand off to `speckit.tasks` until blocking ADR continuity and rejected-path guardrails are explicit.

View File

@@ -1,5 +1,5 @@
---
description: Maintain semantic integrity by reindexing, auditing, and reviewing the Rust MCP repository through AXIOM MCP tools.
description: Maintain semantic integrity by reindexing, auditing, and reviewing the Python/Svelte repository through AXIOM MCP tools.
---
## User Input
@@ -19,9 +19,10 @@ Ensure the repository adheres to the active GRACE semantic protocol using AXIOM
1. **ROLE: Orchestrator** — coordinate semantic maintenance at the workflow level.
2. **MCP-FIRST** — use AXIOM task-shaped tools for discovery, context, audit, impact analysis, and safe mutation planning.
3. **STRICT ADHERENCE** — follow the local semantic authorities:
- `.kilo/skills/semantics-core/SKILL.md`
- `.kilo/skills/semantics-contracts/SKILL.md`
- `.kilo/skills/semantics-testing/SKILL.md`
- `.opencode/skills/semantics-core/SKILL.md`
- `.opencode/skills/semantics-contracts/SKILL.md`
- `.opencode/skills/semantics-testing/SKILL.md`
- `.opencode/skills/semantics-frontend/SKILL.md`
- `docs/SEMANTIC_PROTOCOL_COMPLIANCE.md`
- `docs/adr/*`
4. **NON-DESTRUCTIVE** — do not remove business logic; only add or correct semantic markup unless the user requested implementation changes.

View File

@@ -1,9 +1,9 @@
---
description: Create or update the feature specification from a natural-language feature description for the Rust MCP repository.
description: Create or update the feature specification from a natural-language feature description for the Python/Svelte repository.
handoffs:
- label: Build Technical Plan
agent: speckit.plan
prompt: Create a Rust MCP implementation plan for the active feature
prompt: Create a Python/Svelte implementation plan for the active feature
- label: Clarify Spec Requirements
agent: speckit.clarify
prompt: Clarify specification requirements
@@ -39,38 +39,41 @@ The feature description is the text passed to `/speckit.specify`.
- `spec.md`
- `ux_reference.md`
- `checklists/requirements.md`
5. Generate `ux_reference.md` as an **interaction reference** for MCP callers, CLI/operator flows, result envelopes, warnings, and recovery behavior.
6. Write `spec.md` focused on **what** the user/operator needs and **why**, not how the Rust crate will implement it.
5. Generate `ux_reference.md` as an **interaction reference** for API callers, CLI/operator flows, Svelte UX states, result envelopes, warnings, and recovery behavior.
6. Write `spec.md` focused on **what** the user/operator needs and **why**, not how the Python/FastAPI backend or Svelte frontend will implement it.
7. Validate the spec against a requirements-quality checklist and iterate until major issues are resolved.
## Specification Rules
- Use domain language appropriate for this repository: MCP callers, tools, resources, runtime evidence, workspace flows, operator recovery, semantic contracts.
- Avoid leaking implementation details such as module names, crates, file-level refactors, or exact Rust APIs.
- Use domain language appropriate for this repository: API callers (REST/WebSocket), CLI operators, Svelte UI users, task runners, data migration operators, Git integration users.
- Avoid leaking implementation details such as FastAPI route names, SQLAlchemy models, Svelte component names, or exact file-level refactors.
- Use `[NEEDS CLARIFICATION: ...]` only for truly blocking product ambiguities. Maximum 3 markers.
- Prefer informed defaults grounded in repository context over unnecessary clarification.
- Do not assume web-app, backend/frontend, or Svelte UI flows unless the feature actually introduces them.
- The default project structure is a web application with `backend/src/` (Python) and `frontend/src/` (Svelte). Assume this unless the feature explicitly changes it.
- Do not assume Rust, MCP server, cargo, or `src/*.rs` conventions unless the feature actually introduces them.
- Do not write feature outputs to `.kilo/plans/`, `.kilo/reports/`, or any path outside `specs/<feature>/...`.
## UX / Interaction Reference Rules
- `ux_reference.md` is mandatory, but for this repository it is usually an interaction-reference artifact rather than a screen-design artifact.
- `ux_reference.md` is mandatory. For this repository it covers both:
- **API/interaction reference** for backend callers (REST endpoints, WebSocket messages, CLI commands)
- **Svelte UX reference** for frontend flows (when the feature has a UI component)
- Capture:
- caller/operator persona
- happy-path invocation flow
- result envelope expectations
- caller/operator/end-user persona
- happy-path invocation flow (API requests, CLI commands, or UI interactions)
- result envelope expectations (JSON response shapes, CLI output, UI feedback)
- warning/degraded states
- failure recovery guidance
- canonical terminology
- Only include UI-specific `@UX_*` guidance when the feature truly has a user interface component.
- Include `@UX_STATE`, `@UX_FEEDBACK`, `@UX_RECOVERY`, `@UX_REACTIVITY` guidance when the feature introduces Svelte components.
## Quality Validation
Generate `FEATURE_DIR/checklists/requirements.md` and ensure it validates:
- no implementation leakage into `spec.md`
- no stale Python/Svelte assumptions unless the feature explicitly needs them
- compatibility with the Rust MCP/task-shaped tool surface
- no stale Rust/MCP assumptions unless the feature explicitly needs them
- compatibility with the Python/FastAPI backend and Svelte frontend surface
- measurable success criteria
- explicit edge cases and recovery paths
- decision-memory readiness for downstream planning

View File

@@ -1,13 +1,13 @@
---
description: Generate an actionable, dependency-ordered tasks.md for the active Rust MCP feature.
description: Generate an actionable, dependency-ordered tasks.md for the active Python/Svelte feature.
handoffs:
- label: Analyze For Consistency
agent: speckit.analyze
prompt: Run a cross-artifact consistency analysis for the Rust MCP feature
prompt: Run a cross-artifact consistency analysis for the Python/Svelte feature
send: true
- label: Implement Project
agent: speckit.implement
prompt: Start implementation in phases for the Rust MCP feature
prompt: Start implementation in phases for the Python/Svelte feature
send: true
---
@@ -31,9 +31,9 @@ You **MUST** consider the user input before proceeding (if not empty).
3. **Build the task model**:
- Extract user stories and priorities from `spec.md`
- Extract repository structure, tool/resource scope, verification stack, and semantic constraints from `plan.md`
- Extract repository structure, backend/frontend scope, verification stack, and semantic constraints from `plan.md`
- Extract accepted-path and rejected-path memory from ADRs and `contracts/modules.md`
- Map entities, tool payloads, runtime evidence, and verification scenarios to stories
- Map entities, API payloads, UI components, and verification scenarios to stories
- Generate tasks grouped by story and ordered by dependency
- Validate that no task schedules an ADR-rejected path
@@ -41,9 +41,9 @@ You **MUST** consider the user input before proceeding (if not empty).
- Phase 1: Setup
- Phase 2: Foundational work
- Phase 3+: one phase per user story in priority order
- Final phase: polish and cross-cutting verification
- Every task must use the strict checklist format and include exact file paths
- Write the final document to `FEATURE_DIR/tasks.md`, never to `.kilo/plans/` or other side folders
- Final phase: polish and cross-cutting verification
- Every task must use the strict checklist format and include exact file paths
- Write the final document to `FEATURE_DIR/tasks.md`, never to `.kilo/plans/` or other side folders
5. **Report** the generated path and summarize:
- total task count
@@ -74,38 +74,52 @@ Rules:
4. `[USx]` required only for user-story phases
5. exact file paths required in the description
### Rust / MCP Pathing
### Python / Svelte Pathing
Prefer real repository paths such as:
- `src/server/*.rs`
- `src/services/**/*.rs`
- `src/models/*.rs`
- `src/semantics/*.rs`
- `tests/*.rs`
**Backend (Python):**
- `backend/src/api/*.py` (FastAPI routes)
- `backend/src/core/**/*.py` (core services, task manager, auth, migration, plugins)
- `backend/src/models/*.py` (SQLAlchemy models)
- `backend/src/services/*.py` (business logic)
- `backend/src/schemas/*.py` (Pydantic schemas)
- `backend/tests/*.py` (pytest tests)
**Frontend (Svelte):**
- `frontend/src/routes/**/*.svelte` (SvelteKit pages)
- `frontend/src/lib/components/**/*.svelte` (reusable Svelte 5 components)
- `frontend/src/lib/stores/*.svelte.js` (Svelte rune stores)
- `frontend/src/lib/api/*.js` (API client modules)
- `frontend/tests/**/*.test.js` (vitest tests)
**Shared/Infrastructure:**
- `docs/adr/*.md`
- `docker/*`
- `specs/<feature>/contracts/*.md`
Do **not** generate default tasks for:
- `backend/` or `frontend/`
- `*.py`
- `.svelte`
- `__tests__/`
- `src/**/*.rs` or `tests/*.rs`
- `Cargo.toml`
- `cargo` commands
- MCP server/tool/resource syntax unless the feature actually introduces them
### Verification Discipline
Each story phase must end with:
- a verification task against `ux_reference.md` interpreted as the caller/operator interaction contract
- a verification task against `ux_reference.md` interpreted as the API caller, CLI operator, or Svelte UX interaction contract
- a semantic audit / verification task tied to repository validators and touched contracts
Typical verification tasks may include:
- focused `cargo test` commands
- `cargo test --all-targets --all-features -- --nocapture`
- `cargo clippy --all-targets --all-features -- -D warnings`
- `python3 scripts/static_verify.py`
- focused `pytest backend/tests/test_<module>.py` commands
- `cd backend && pytest` (full backend suite)
- `cd frontend && npm run test` (vitest suite)
- `ruff check backend/` (Python lint)
- `cd frontend && npm run build` (Svelte build validation)
- `python3 scripts/static_verify.py` (semantic static verification)
Only include the commands that are truly required by the feature scope.
@@ -115,8 +129,9 @@ If a task implements or depends on a guarded contract, append a concise guardrai
Examples:
- `- [ ] T021 [US1] Implement deterministic tool envelope mapping in src/server/tools.rs (RATIONALE: preserve task-shaped MCP parity; REJECTED: ad-hoc per-tool response shapes)`
- `- [ ] T033 [US2] Add runtime evidence verification in tests/server_protocol.rs (RATIONALE: C4/C5 flows must expose belief markers; REJECTED: relying on manual log inspection only)`
- `- [ ] T021 [US1] Implement report generation endpoint in backend/src/api/reports.py (RATIONALE: unified report envelope preserves task-shaped parity; REJECTED: ad-hoc per-endpoint response shapes)`
- `- [ ] T033 [US2] Add WebSocket logging handler in backend/src/core/task_manager/ws_handler.py (RATIONALE: C4/C5 flows must expose real-time log streaming; REJECTED: polling-based log retrieval)`
- `- [ ] T040 [US3] Create DashboardCard component in frontend/src/lib/components/DashboardCard.svelte (RATIONALE: @UX_STATE must cover Idle/Loading/Error/Empty; REJECTED: single-state inline rendering without recovery)`
If no safe executable task wording exists because the accepted path is still unclear, stop and emit `[NEED_CONTEXT: target]`.
@@ -124,8 +139,9 @@ If no safe executable task wording exists because the accepted path is still unc
Tests are optional only when the feature truly has no new verification surface. In this repository, test tasks are usually expected for:
- new MCP tools/resources
- new query/mutation flows
- new FastAPI endpoints / WebSocket handlers
- new plugin or service modules
- new Svelte components with `@UX_STATE` contracts
- C4/C5 semantic contracts
- runtime evidence / belief-state behavior
- rejected-path regression coverage
@@ -136,5 +152,5 @@ Before finalizing `tasks.md`, verify that:
- blocking ADRs are inherited into setup/foundational or downstream story tasks
- no task text schedules a rejected path
- story tasks remain executable within the actual Rust crate structure
- story tasks remain executable within the actual Python/Svelte repository structure
- at least one explicit verification task protects against rejected-path regression

View File

@@ -1,5 +1,5 @@
---
description: Execute semantic audit and Rust-native testing for the active feature batch.
description: Execute semantic audit and Python/Svelte-native testing for the active feature batch.
---
## User Input
@@ -12,14 +12,16 @@ You **MUST** consider the user input before proceeding (if not empty).
## Goal
Run the verification loop for the touched Rust MCP scope: semantic audit, decision-memory audit, executable tests, logic review, and documentation of coverage/results.
Run the verification loop for the touched Python/Svelte scope: semantic audit, decision-memory audit, executable tests (pytest + vitest), logic review, and documentation of coverage/results.
## Operating Constraints
1. **NEVER delete existing tests** unless the user explicitly requests removal.
2. **NEVER duplicate tests** when existing `tests/*.rs` coverage already validates the same contract.
2. **NEVER duplicate tests** when existing `backend/tests/` or `frontend/tests/` coverage already validates the same contract.
3. **Decision-memory regression guard**: tests and audits must not silently normalize any path documented as rejected.
4. **Rust-native structure**: prefer existing integration/protocol test organization under `tests/`.
4. **Python/Svelte-native structure**: prefer existing test organization:
- Backend: `backend/tests/` with pytest
- Frontend: `frontend/tests/` or co-located `__tests__/` with vitest + @testing-library/svelte
## Execution Steps
@@ -29,7 +31,7 @@ Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --inclu
- `FEATURE_DIR`
- touched implementation tasks from `tasks.md`
- affected `.rs` files
- affected `.py` and `.svelte` files
- relevant ADRs, `@RATIONALE`, and `@REJECTED` guardrails
All test documentation emitted by this workflow belongs under `FEATURE_DIR/tests/` or other files inside `specs/<feature>/...`, never under `.kilo/plans/`.
@@ -61,9 +63,10 @@ Before writing or executing tests, perform a semantic audit of the touched scope
1. Use the AXIOM semantic validation path where available.
2. Reject malformed or pseudo-semantic markup.
3. Verify contract density matches effective complexity.
4. Verify C4/C5 Rust flows account for belief runtime markers (`belief_scope`, `reason`, `reflect`, `explore`) when required by the contract and repository norms.
5. Verify no touched code silently restores an ADR- or contract-rejected path.
6. Emulate the algorithm mentally to ensure `@PRE`, `@POST`, `@INVARIANT`, and declared side effects remain coherent.
4. For C4/C5 Python flows: verify belief runtime markers (`belief_scope`, `reason`, `reflect`, `explore`).
5. For C4/C5 Svelte components: verify `@UX_STATE`, `@UX_FEEDBACK`, `@UX_RECOVERY`, `@UX_REACTIVITY` coverage.
6. Verify no touched code silently restores an ADR- or contract-rejected path.
7. Emulate the algorithm mentally to ensure `@PRE`, `@POST`, `@INVARIANT`, and declared side effects remain coherent.
If audit fails, emit `[AUDIT_FAIL: semantic_noncompliance | contract_mismatch | logic_mismatch | rejected_path_regression]` with concrete file-based reasons.
@@ -71,24 +74,35 @@ If audit fails, emit `[AUDIT_FAIL: semantic_noncompliance | contract_mismatch |
When test additions are needed:
- prefer `tests/*.rs` integration/protocol coverage
- use deterministic fixtures rather than logic mirrors
- trace tests back to semantic contracts and ADR guardrails
- add explicit rejected-path regression coverage when the touched scope has a forbidden alternative
For non-UI Rust MCP flows, UX verification means validating interaction envelopes, warnings, recovery messaging, and tool/resource discoverability promised by `ux_reference.md`.
- Backend: prefer `backend/tests/` with pytest fixtures, use `unittest.mock` / `pytest-mock` for external dependencies
- Frontend: prefer vitest with `@testing-library/svelte` for component testing, `jsdom` environment
- Use deterministic fixtures rather than logic mirrors
- Trace tests back to semantic contracts and ADR guardrails
- Add explicit rejected-path regression coverage when the touched scope has a forbidden alternative
- For Svelte UX contracts, validate `@UX_STATE` transitions, `@UX_FEEDBACK` messages, and `@UX_RECOVERY` paths
### 6. Execute Verifiers
Run the smallest truthful verifier set for the touched scope, typically chosen from:
```bash
cargo test --all-targets --all-features -- --nocapture
cargo clippy --all-targets --all-features -- -D warnings
# Backend tests
cd backend && pytest
# Backend lint
ruff check backend/
# Frontend tests
cd frontend && npm run test
# Frontend build check (catches Svelte compilation errors)
cd frontend && npm run build
# Semantic static verification (when available)
python3 scripts/static_verify.py
```
Use narrower `cargo test <target>` runs when they are sufficient and then widen verification when finalizing the feature batch.
Use narrower test runs when they are sufficient (e.g., `pytest backend/tests/test_auth.py`) and then widen verification when finalizing the feature batch.
### 7. Test Documentation

View File

@@ -33,8 +33,9 @@ Decision memory prevents architectural drift. It records the *Decision Space* (W
2. **Task Guardrails:** Preventative `@REJECTED` tags injected by the Orchestrator to keep you away from known LLM pitfalls.
3. **Reactive Micro-ADR (Your Responsibility):** If you encounter a runtime failure, use `explore()`, and invent a valid workaround, you MUST ascend to the `[DEF]` header and document it via `@RATIONALE [Why]` and `@REJECTED [The failing path]` BEFORE closing the task.
**⚠️ `@RATIONALE`/`@REJECTED` ARE C5-ONLY.**
Decision Memory tags belong exclusively to C5 contracts per Std:Semantics:Core complexity scale. C4 adds `@PRE`/`@POST`/`@SIDE_EFFECT` — not decision memory. Adding them below C5 violates INV_7 (verbosity/erosion). If a C1-C4 contract genuinely needs decision memory, it should be C5.
**`@RATIONALE` / `@REJECTED` are ORTHOGONAL tags.** Per `axiom_config.yaml`, these are `protected: true` and `orthogonal: true` — they may appear at ANY complexity level (C1-C5) when a node records a deliberate architectural choice. They are REQUIRED for `ADR` type contracts. Removal of an existing `@RATIONALE`/`@REJECTED` requires `<ESCALATION>` to the Architect.
If a C1-C4 contract records a workaround after a runtime failure, add `@RATIONALE`/`@REJECTED` at that node's header BEFORE closing the task. This is a Reactive Micro-ADR — it does NOT require bumping the complexity to C5.
**Resurrection Ban:** Silently reintroducing a coding pattern, library, or logic flow previously marked as `@REJECTED` is classified as a fatal regression. If the rejected path is now required, emit `<ESCALATION>` to the Architect.

View File

@@ -42,9 +42,15 @@ Format depends on the execution environment:
- JS/TS: `// [DEF:Id:Type] ... // [/DEF:Id:Type]`
*Allowed Types: Root, Standard, Module, Class, Function, Component, Store, Block, ADR, Tombstone.*
**Module Header Tags (required for `Module` type at ALL complexity levels):**
- `@LAYER` architectural layer: `Domain` (business logic), `UI` (interface), `Infra` (infrastructure), `Test` (tests).
- `@SEMANTICS` orthogonal semantic markers (comma-separated keywords, e.g. `indexing, validation, metadata`).
**ADR Type Override:** `ADR` type has its own contract rules `@COMPLEXITY` is FORBIDDEN. ADR requires only: `@PURPOSE`, `@RELATION`, `@RATIONALE`, `@REJECTED`. Optional orthogonal tags: `@STATUS` (ACTIVE, DEPRECATED, EXPERIMENTAL).
**Graph Dependencies (GraphRAG):**
`@RELATION PREDICATE -> TARGET_ID`
*Allowed Predicates:* DEPENDS_ON, CALLS, INHERITS, IMPLEMENTS, DISPATCHES, BINDS_TO.
*Allowed Predicates:* DEPENDS_ON, CALLS, INHERITS, IMPLEMENTS, DISPATCHES, BINDS_TO, VERIFIES.
## III. COMPLEXITY SCALE (1-5)
The level of control is defined in the Header via `@COMPLEXITY`. Default is 1 if omitted.
@@ -54,6 +60,10 @@ The level of control is defined in the Header via `@COMPLEXITY`. Default is 1 if
- **C4 (Orchestration):** Adds `@PRE`, `@POST`, `@SIDE_EFFECT`. Requires Belief State runtime logging.
- **C5 (Critical):** Adds `@DATA_CONTRACT`, `@INVARIANT`, and mandatory Decision Memory tracking.
**Module type** additionally requires `@LAYER` and `@SEMANTICS` at EVERY complexity level (C1-C5). These are Moduleonly not required for Function, Class, Block, or Component types.
**`@RATIONALE` / `@REJECTED` are orthogonal tags** they may appear at ANY complexity level (C1-C5) when decision memory is needed. They are `protected: true` (cannot be removed without escalation) and are REQUIRED for `ADR` type. Adding them to lowercomplexity nodes does NOT violate INV_7 the tag belongs to the decision space, not the complexity hierarchy.
## IV. DOMAIN SUB-PROTOCOLS (ROUTING)
Depending on your active task, you MUST request and apply the following domain-specific rules:
- For Backend Logic & Architecture: Use `skill({name="semantics-contracts"})` and `skill({name="semantics-belief"})`.
@@ -81,7 +91,7 @@ The complexity scale is NOT a checklist — each level has a STRICT MAXIMUM of a
Do NOT add tags from higher levels. The examples below show the boundary of what is acceptable at each tier.
### C1 (Atomic) — DTOs, simple constants, trivial wrappers
Requires ONLY `[DEF]...[/DEF]`. No `@PURPOSE`, no `@RELATION`, no `@RATIONALE`, no `@PRE`/`@POST`.
Requires ONLY `[DEF]...[/DEF]`. No `@PURPOSE`, no `@RELATION`, no `@PRE`/`@POST`.
```python
# [DEF:UserDTO:Class]
@dataclass
@@ -91,10 +101,11 @@ class UserDTO:
email: str
# [/DEF:UserDTO:Class]
```
Do NOT add: `@PURPOSE`, `@RATIONALE`, `@REJECTED`, `@PRE`, `@POST`, `@SIDE_EFFECT`, `@RELATION`, `@DATA_CONTRACT`, `@INVARIANT`.
Do NOT add: `@PURPOSE`, `@PRE`, `@POST`, `@SIDE_EFFECT`, `@RELATION`, `@DATA_CONTRACT`, `@INVARIANT`.
Note: `@RATIONALE`/`@REJECTED` are orthogonal they MAY appear at C1 if the node records a deliberate architectural choice (e.g., "why this DTO has field X instead of Y").
### C2 (Simple) — Utility functions, pure computations
Adds `@PURPOSE`. Still NO `@RELATION`, NO `@RATIONALE`, NO `@PRE`/`@POST`.
Adds `@PURPOSE`. Still NO `@RELATION`, NO `@PRE`/`@POST`.
```python
# [DEF:format_timestamp:Function]
# @COMPLEXITY 2
@@ -105,7 +116,7 @@ def format_timestamp(ts: datetime) -> str:
```
### C3 (Flow) — Multi-step logic with dependencies
Adds `@RELATION` for dependencies. Still NO `@RATIONALE`, NO `@PRE`/`@POST`.
Adds `@RELATION` for dependencies. Still NO `@PRE`/`@POST`.
```python
# [DEF:load_and_validate:Function]
# @COMPLEXITY 3
@@ -121,7 +132,7 @@ def load_and_validate(path: str) -> dict:
### C4 (Orchestration) — Stateful operations with side effects
Adds `@PRE`, `@POST`, `@SIDE_EFFECT`. Add `belief_scope()` + `reason()`/`reflect()` in body.
Still NO `@RATIONALE`, NO `@REJECTED`, NO `@DATA_CONTRACT`, NO `@INVARIANT`.
Still NO `@DATA_CONTRACT`, NO `@INVARIANT`.
```python
# [DEF:migrate_database:Function]
# @COMPLEXITY 4
@@ -149,8 +160,8 @@ def migrate_database(conn: Connection) -> None:
# [/DEF:migrate_database:Function]
```
### C5 (Critical) — Core infrastructure with invariants and decision memory
Adds `@RATIONALE`, `@REJECTED`, `@DATA_CONTRACT`, `@INVARIANT`. Use all belief markers.
### C5 (Critical) — Core infrastructure with invariants and data contracts
Adds `@DATA_CONTRACT`, `@INVARIANT`. Use all belief markers. `@RATIONALE`/`@REJECTED` are expected here for architectural decisions.
```python
# [DEF:rebuild_index:Function]
# @COMPLEXITY 5
@@ -190,12 +201,14 @@ def rebuild_index(root: Path) -> IndexSnapshot:
### Quick reference
| Level | Allowed tags | Forbidden tags |
|-------|-------------|----------------|
| C1 | only `[DEF]` | PURPOSE, RELATION, PRE, POST, SIDE_EFFECT, DATA_CONTRACT, INVARIANT, RATIONALE, REJECTED |
| C2 | +PURPOSE | RELATION, PRE, POST, SIDE_EFFECT, DATA_CONTRACT, INVARIANT, RATIONALE, REJECTED |
| C3 | +RELATION | PRE, POST, SIDE_EFFECT, DATA_CONTRACT, INVARIANT, RATIONALE, REJECTED |
| C4 | +PRE, POST, SIDE_EFFECT | DATA_CONTRACT, INVARIANT, RATIONALE, REJECTED |
| C5 | +DATA_CONTRACT, INVARIANT, RATIONALE, REJECTED | |
| C1 | only `[DEF]` | PURPOSE, RELATION, PRE, POST, SIDE_EFFECT, DATA_CONTRACT, INVARIANT |
| C2 | +PURPOSE | RELATION, PRE, POST, SIDE_EFFECT, DATA_CONTRACT, INVARIANT |
| C3 | +RELATION | PRE, POST, SIDE_EFFECT, DATA_CONTRACT, INVARIANT |
| C4 | +PRE, POST, SIDE_EFFECT | DATA_CONTRACT, INVARIANT |
| C5 | +DATA_CONTRACT, INVARIANT | |
**Key rule:** `@RATIONALE`/`@REJECTED` are C5-only. Adding them to C1-C4 violates INV_7 (fractal limit) and dilutes real decision memory.
**Key rule:** `@RATIONALE` / `@REJECTED` are ORTHOGONAL tags. They are NOT gated by complexity they may appear at ANY level (C1-C5) when a node records a deliberate architectural choice. They are `protected: true` (removal requires `<ESCALATION>`). They are REQUIRED for `ADR` type contracts.
**Module type:** `@LAYER` and `@SEMANTICS` are REQUIRED at EVERY complexity level (C1-C5) in addition to the tags above.
# [/DEF:Std:Semantics:Core]

View File

@@ -17,21 +17,21 @@
the iteration process.
-->
**Language/Version**: [e.g., Python 3.11, Swift 5.9, Rust 1.75 or NEEDS CLARIFICATION]
**Primary Dependencies**: [e.g., FastAPI, UIKit, LLVM or NEEDS CLARIFICATION]
**Storage**: [if applicable, e.g., PostgreSQL, CoreData, files or N/A]
**Testing**: [e.g., pytest, XCTest, cargo test or NEEDS CLARIFICATION]
**Target Platform**: [e.g., Linux server, iOS 15+, WASM or NEEDS CLARIFICATION]
**Project Type**: [e.g., library/cli/web-service/mobile-app/compiler/desktop-app or NEEDS CLARIFICATION]
**Performance Goals**: [domain-specific, e.g., 1000 req/s, 10k lines/sec, 60 fps or NEEDS CLARIFICATION]
**Constraints**: [domain-specific, e.g., <200ms p95, <100MB memory, offline-capable or NEEDS CLARIFICATION]
**Scale/Scope**: [domain-specific, e.g., 10k users, 1M LOC, 50 screens or NEEDS CLARIFICATION]
**Language/Version**: Python 3.13+ (backend), JavaScript/TypeScript (frontend Svelte 5)
**Primary Dependencies**: FastAPI, SQLAlchemy, APScheduler (backend); SvelteKit, Vite, Tailwind CSS (frontend)
**Storage**: PostgreSQL 16
**Testing**: pytest (backend), vitest + @testing-library/svelte (frontend)
**Target Platform**: Linux server (Docker), modern browsers
**Project Type**: web application (FastAPI REST + WebSocket backend, SvelteKit SPA frontend)
**Performance Goals**: [domain-specific, e.g., <200ms p95 API latency, 60fps UI]
**Constraints**: [domain-specific, e.g., <100MB memory per container, RBAC enforcement, offline-capable Docker bundle]
**Scale/Scope**: [domain-specific, e.g., 50 concurrent users, 1000 dashboards, 10 plugins]
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
[Evaluate against constitution.md and semantics.md. Explicitly confirm semantic protocol compliance, complexity-driven contract coverage, UX-state compatibility, async boundaries, API-wrapper rules, RBAC/security constraints, and any required belief-state/logging constraints for Complexity 4/5 Python modules.]
[Evaluate against constitution.md and semantics.md. Explicitly confirm semantic protocol compliance, complexity-driven contract coverage, UX-state compatibility (Svelte 5 runes), async boundaries (FastAPI), API-wrapper rules, RBAC/security constraints (local auth + ADFS SSO), plugin lifecycle rules, and any required belief-state/logging constraints for Complexity 4/5 Python modules.]
## Project Structure
@@ -48,67 +48,50 @@ specs/[###-feature]/
```
### Source Code (repository root)
<!--
ACTION REQUIRED: Replace the placeholder tree below with the concrete layout
for this feature. Delete unused options and expand the chosen structure with
real paths (e.g., apps/admin, packages/something). The delivered plan must
not include Option labels.
-->
```text
# [REMOVE IF UNUSED] Option 1: Single project (DEFAULT)
src/
├── models/
├── services/
├── cli/
└── lib/
tests/
├── contract/
├── integration/
└── unit/
# [REMOVE IF UNUSED] Option 2: Web application (when "frontend" + "backend" detected)
# Web application (default for this repository)
backend/
├── src/
│ ├── models/
│ ├── services/
── api/
└── tests/
│ ├── api/ # FastAPI routes
│ ├── core/ # Core services (task_manager, auth, migration, plugins)
── models/ # SQLAlchemy ORM models
│ ├── services/ # Business logic
│ └── schemas/ # Pydantic schemas
└── tests/ # pytest tests
frontend/
├── src/
│ ├── components/
│ ├── pages/
└── services/
└── tests/
│ ├── routes/ # SvelteKit pages
│ ├── lib/
│ ├── components/ # Reusable Svelte 5 components
│ │ ├── stores/ # Svelte rune stores ($state, $derived)
│ │ └── api/ # API client modules
│ └── i18n/ # Internationalization
└── tests/ # vitest tests
# [REMOVE IF UNUSED] Option 3: Mobile + API (when "iOS/Android" detected)
api/
└── [same as backend above]
ios/ or android/
└── [platform-specific structure: feature modules, UI flows, platform tests]
docker/ # Docker configurations
```
**Structure Decision**: [Document the selected structure and reference the real
directories captured above]
**Structure Decision**: This is a web application with separate `backend/` (Python/FastAPI) and `frontend/` (SvelteKit) directories. Docker Compose orchestrates both services plus PostgreSQL.
## Semantic Contract Guidance
> Use this section to drive Phase 1 artifacts, especially `contracts/modules.md`.
- Classify each planned module/component with `@COMPLEXITY: 1..5` or `@C:`.
- Use `@TIER` only if backward compatibility is needed; never use it as the primary contract rule.
- Classify each planned module/component with `@COMPLEXITY: 1..5`.
- Use comment-anchor syntax appropriate for each context:
- Python: `# [DEF:id:Type]`
- Svelte markup: `<!-- [DEF:id:Type] -->`
- Svelte script: `// [DEF:id:Type]`
- Match contract density to complexity:
- Complexity 1: anchors only, `@PURPOSE` optional
- Complexity 2: `@PURPOSE`
- Complexity 3: `@PURPOSE`, `@RELATION`; UI also `@UX_STATE`
- Complexity 4: `@PURPOSE`, `@RELATION`, `@PRE`, `@POST`, `@SIDE_EFFECT`; Python also meaningful `logger.reason()` / `logger.reflect()` path
- Complexity 5: level 4 + `@DATA_CONTRACT`, `@INVARIANT`; Python also `belief_scope`; UI also `@UX_FEEDBACK`, `@UX_RECOVERY`, `@UX_REACTIVITY`
- Write relations only in canonical form: `@RELATION: [PREDICATE] ->[TARGET_ID]`
- Complexity 1: anchors only (DTOs, simple constants)
- Complexity 2: `@PURPOSE` (utility functions, pure helpers)
- Complexity 3: `@PURPOSE`, `@RELATION`; Svelte components also `@UX_STATE`
- Complexity 4: `@PURPOSE`, `@RELATION`, `@PRE`, `@POST`, `@SIDE_EFFECT`; Python also `belief_scope`/`reason`/`reflect` markers; Svelte also `@UX_FEEDBACK`, `@UX_RECOVERY`, `@UX_REACTIVITY`
- Complexity 5: level 4 + `@DATA_CONTRACT`, `@INVARIANT` + `@RATIONALE`/`@REJECTED` decision memory
- Write relations only in canonical form: `@RELATION PREDICATE -> TARGET_ID`
- If any relation target, DTO, or contract dependency is unknown, emit `[NEED_CONTEXT: target]` instead of inventing placeholders.
- Preserve medium-appropriate anchor/comment syntax for Python, Svelte markup, and Svelte script contexts.
## Complexity Tracking
@@ -116,5 +99,5 @@ directories captured above]
| Violation | Why Needed | Simpler Alternative Rejected Because |
|-----------|------------|-------------------------------------|
| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |
| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |
| [e.g., 4th plugin] | [current need] | [why 3 plugins insufficient] |
| [e.g., Service layer pattern] | [specific problem] | [why direct ORM access insufficient] |

View File

@@ -0,0 +1,94 @@
# [DEF:ADR-0001:ADR]
# @STATUS ACTIVE
# @PURPOSE Define the canonical project directory layout, module boundaries, and naming conventions for the ss-tools repository. This ADR is the root structural authority — all other ADRs and feature plans derive their file placement from it.
# @RELATION CALLS -> [ADR-0002:ADR]
# @RELATION CALLS -> [ADR-0003:ADR]
# @RELATION CALLS -> [ADR-0004:ADR]
# @RELATION CALLS -> [ADR-0005:ADR]
# @RELATION CALLS -> [ADR-0006:ADR]
# @RATIONALE A single authoritative layout prevents module sprawl, cyclic dependencies, and agent confusion during longhorizon speckit workflows. Without this, every feature spec must redebate where to place files, leading to inconsistent structures and broken tooling assumptions.
# @REJECTED Monorepo with perfeature packages (e.g. `packages/llmplugin/`) — rejected because the repository has two toplevel runtime targets (Python backend, Svelte frontend) with shared Docker/configuration infrastructure. Packagestyle nesting would double the directory depth without adding isolation value.
# @REJECTED Flat `src/` root — rejected because Python and JavaScript toolchains have incompatible project roots (`pyproject.toml` vs `package.json`), and mixing them in one toplevel `src/` would force crosstoolchain contamination.
## Decision
The repository uses a **twoplatform, toplevel separation**:
```
ss-tools/ # Repository root
├── backend/ # Python 3.13+ / FastAPI application
│ ├── src/
│ │ ├── api/ # FastAPI route modules (one file per route group)
│ │ ├── core/ # Core services: task_manager, auth, migration, plugin_loader
│ │ ├── models/ # SQLAlchemy ORM models (one file per entity group)
│ │ ├── services/ # Business logic (orchestration, validators, transformers)
│ │ ├── schemas/ # Pydantic request/response schemas
│ │ └── app.py # FastAPI application factory
│ ├── tests/ # pytest test suite (mirrors src/ structure)
│ ├── pyproject.toml # Build & tool configuration
│ └── requirements.txt # Production dependencies
├── frontend/ # SvelteKit application
│ ├── src/
│ │ ├── routes/ # SvelteKit page routes (filebased routing)
│ │ ├── lib/
│ │ │ ├── components/ # Reusable Svelte 5 components
│ │ │ ├── stores/ # Svelte 5 rune stores ($state)
│ │ │ └── api/ # API client modules (typed fetch wrappers)
│ │ └── i18n/ # Internationalization dictionaries
│ ├── tests/ # vitest test suite
│ ├── package.json # Runtime + dev dependencies
│ ├── svelte.config.js # SvelteKit configuration
│ └── vite.config.ts # Vite build configuration
├── docker/ # Dockerfiles for backend, frontend, nginx
├── docker-compose.yml # Development Docker Compose
├── docs/
│ ├── adr/ # Architecture Decision Records (this file)
│ ├── design/ # Detailed design documents
│ └── ... # Operational docs (installation, settings, etc.)
├── specs/ # Feature specifications (speckit artifacts)
│ └── NNNfeaturename/
│ ├── spec.md
│ ├── plan.md
│ ├── research.md
│ ├── data-model.md
│ ├── quickstart.md
│ ├── contracts/
│ │ └── modules.md
│ └── tasks.md
├── scripts/ # Shell utility scripts
├── .specify/ # Speckit workflow framework
│ ├── memory/constitution.md # Repository constitution
│ ├── templates/ # Speckit artifact templates
│ └── scripts/bash/ # Speckit helper scripts
├── .opencode/ # OpenCode AI agent configuration
│ ├── command/ # Speckit command definitions
│ ├── skills/ # GRACE semantic protocol skills
│ └── agents/ # Specialized agent definitions
└── .github/ # GitHub Actions workflows
```
## Module Boundary Rules
1. **`backend/src/api/`** — route handlers only. Must not contain business logic, ORM queries, or schema definitions. Each file = one FastAPI `APIRouter` group.
2. **`backend/src/core/`** — singleton services with applicationscoped lifetime (auth manager, task scheduler, plugin loader, migration engine). Must not import from `api/`.
3. **`backend/src/services/`** — stateless or requestscoped business logic. May depend on `models/`, `schemas/`, and `core/`. Must not depend on `api/`.
4. **`backend/src/models/`** — SQLAlchemy ORM declarations only. No business logic, no API dependencies.
5. **`backend/src/schemas/`** — Pydantic models for validation/serialization. No ORM dependencies, no business logic.
6. **`frontend/src/routes/`** — SvelteKit pages. Import components from `lib/components/`, state from `lib/stores/`, API clients from `lib/api/`.
7. **`frontend/src/lib/components/`** — reusable components. Must not import from `routes/` (prevents circular dependencies).
8. **`frontend/src/lib/stores/`** — Svelte 5 `$state` rune stores. Must not import from `routes/` or `components/`.
## File Naming Conventions
- **Python modules**: `snake_case.py` (e.g., `task_manager.py`, `auth_provider.py`)
- **Svelte components**: `PascalCase.svelte` (e.g., `DashboardGrid.svelte`, `MappingTable.svelte`)
- **Test files**: `test_<module_name>.py` or `<ComponentName>.test.ts`
- **ADR files**: `ADR-NNNN-short-description.md` (zeropadded, 4 digits, kebabcase)
## Enforcement
- Speckit `/speckit.plan` command reads this ADR to validate proposed module placement.
- Speckit `/speckit.implement` rejects files written outside canonical boundaries unless justified in featurelocal `research.md`.
- CI static verification (`python3 scripts/static_verify.py`) may optionally enforce module import direction rules.
# [/DEF:ADR-0001:ADR]

View File

@@ -0,0 +1,39 @@
# [DEF:ADR-0002:ADR]
# @STATUS ACTIVE
# @PURPOSE Mandate the GRACE skill set (`.opencode/skills/`) as the single semantic governance authority for all source files and AIagent workflows in this repository. This ADR is the adoption decision — the protocol mechanics live exclusively in the skills; this ADR records *why* the skills are nonoptional for this project.
# @RELATION DEPENDS_ON -> [ADR-0001:ADR]
# @RATIONALE A multiplatform repository (Python/FastAPI + SvelteKit) served by multiple AI agents (backendcoder, frontendcoder, qatester, semanticcurator, …) requires a single, machineparseable contract language that works identically across both platforms. Without it, agents produce inconsistent annotations, the semantic index (`semantic_map.json`) degrades, and longhorizon agent sessions (50+ commits) accumulate invisible architectural drift.
# @RATIONALE GRACE was chosen over lightweight alternatives because this project has specific stressors that only a full protocol addresses: (a) two platforms with different comment syntax — GRACE provides platformspecific anchor forms under one unified graph, (b) longhorizon agent sessions — GRACE Decision Memory (`@RATIONALE`/`@REJECTED`) prevents agents from reexploring alreadyrejected paths, (c) complex orchestration flows (plugin execution, backup pipelines) — GRACE beliefstate markers make sideeffectheavy code auditable and debuggable, (d) enterprise deployment requirements — fractal limits (module <400 lines, `[DEF]` <150 lines) enforce structural hygiene critical for auditready code.
# @RATIONALE Skills are the canonical source rather than this ADR because protocol details (complexity scale, tag inventory, syntax variants) evolve. Duplicating them here creates a fork that will inevitably desynchronise — agents would face two competing versions and inevitably read the stale one. Skills are versioncontrolled in the same repo and loaded dynamically via `skill({name="..."})`, ensuring agents always receive the current protocol.
# @REJECTED Plain docstring conventions (Google/NumPy style) — rejected because freetext docstrings are invisible to the semantic index and cannot express relations, pre/post conditions, or rejected paths in a machinequeryable form.
# @REJECTED Decoratorbased contracts (`@contract`, `@pre`, `@post`) — rejected because they are Pythononly, cannot annotate Svelte components or TypeScript, and break the unified semantic graph spanning both platforms.
# @REJECTED JSDoc/TSDoc for the frontend — rejected because it would create a second annotation language, fragmenting the semantic graph into two incompatible halves and forcing agents to master two different contract systems.
# @REJECTED Embedding protocol rules directly in ADRs (the previous version of this document) — rejected because it duplicates the skill content and inevitably diverges. Agents receiving both the skill and the ADR would face conflicting versions; the skill is the single source of truth.
## Decision
This repository adopts the **GRACE-Poly v2.4 protocol** as implemented by five skills:
| Skill | File | Role in this project |
|-------|------|---------------------|
| `semantics-core` | `.opencode/skills/semantics-core/SKILL.md` | Anchor syntax, complexity scale (C1C5), global invariants, tag inventory |
| `semantics-contracts` | `.opencode/skills/semantics-contracts/SKILL.md` | Design by Contract, Decision Memory (ADR chain, `@RATIONALE`/`@REJECTED`), antierosion rules |
| `semantics-belief` | `.opencode/skills/semantics-belief/SKILL.md` | Beliefstate runtime markers (`belief_scope`, `reason`, `reflect`, `explore`) for C4/C5 Python flows |
| `semantics-testing` | `.opencode/skills/semantics-testing/SKILL.md` | Test constraints, invariant traceability, antitautology rules |
| `semantics-frontend` | `.opencode/skills/semantics-frontend/SKILL.md` | Svelte 5 UX contracts (`@UX_STATE`, `@UX_FEEDBACK`, `@UX_RECOVERY`, `@UX_REACTIVITY`) |
**Key principle:** Skills are the protocol. This ADR is the adoption record. When an agent needs to know *what tags are required at C4*, it reads `semantics-core`. When it needs to know *why this project chose C4 annotations at all*, it reads this ADR.
## Enforcement
All speckit commands (`.opencode/command/speckit.*.md`) load the skill set as part of their execution flow:
- **`/speckit.plan`** — reads skills to validate contract designs against the complexity scale.
- **`/speckit.implement`** — requires `[DEF]` anchors and complexityappropriate metadata; rejects naked code.
- **`/speckit.test`** — audits semantic compliance and decisionmemory continuity per skill rules.
- **`/speckit.semantics`** — reindexes and audits the entire workspace against the skilldefined protocol.
- **`/speckit.analyze`** — checks for decisionmemory drift and rejectedpath scheduling.
Static verification (`python3 scripts/static_verify.py`) performs offline checks for broken anchors, missing complexity metadata, and orphan relations.
# [/DEF:ADR-0002:ADR]

View File

@@ -0,0 +1,57 @@
# [DEF:ADR-0003:ADR]
# @STATUS ACTIVE
# @PURPOSE Definitively establish ss-tools as a standalone orchestrator service operating *above* Apache Superset environments rather than integrating into them. This is a reposhaping architectural decision that governs deployment topology, security boundary, release cycle independence, and technology stack freedom.
# @RELATION DEPENDS_ON -> [ADR-0001:ADR]
# @RELATION CALLS -> [ADR-0004:ADR]
# @RELATION CALLS -> [ADR-0005:ADR]
# @RATIONALE The core value proposition of ss-tools is multienvironment orchestration: moving dashboards, datasets, and configurations between independent Superset instances (Dev → Stage → Prod). Embedding this logic inside any single Superset instance would immediately create a "master controller" dependency — if that instance fails, all environment management is lost. An external orchestrator treats all Superset instances as equal endpoints.
# @RATIONALE Technology stack mismatch is irreversible: Superset uses Flask + React, ss-tools uses FastAPI + SvelteKit. Integration would require rewriting the entire Svelte frontend into React components and refactoring FastAPI dependency injection into Flask-AppBuilder patterns — a multimonth effort with zero business value and high regression risk.
# @RATIONALE Release cycle independence is nonnegotiable: Superset has a slow, complex release cadence. ss-tools must ship LLM plugins, Git integrations, and backup features on its own schedule without waiting for upstream Superset releases or maintaining a fork with perpetual merge conflicts.
# @RATIONALE Security isolation: ss-tools manages DevOps privileges (deployment, backup, migration) while Superset manages BI privileges (data viewing, chart creation). Mixing these in one system expands the blast radius of any security incident.
# @REJECTED Integration as a Superset plugin/extension — rejected for the four reasons above (orchestrator topology break, 100% frontend rewrite, Superset release cycle coupling, privilege boundary collapse).
# @REJECTED Superset fork with ss-tools baked in — rejected because maintaining a fork requires rebase on every Superset release, creating perpetual merge conflicts and making security patch adoption dangerously slow.
# @REJECTED Shared database with Superset — rejected because it couples ss-tools migrations to Superset's Alembic migration chain, preventing independent schema evolution and creating a single point of failure for both systems.
## Decision
ss-tools operates as a **standalone orchestrator microservice** with these defined boundaries:
### Deployment Topology
```
┌─────────────┐
│ ss-tools │ ← FastAPI + SvelteKit
│ PostgreSQL │ ← Own database
└──────┬──────┘
│ REST API calls
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│Superset │ │Superset │ │Superset │
│ (Dev) │ │ (Stage) │ │ (Prod) │
└──────────┘ └──────────┘ └──────────┘
```
### Key Architectural Properties
| Property | Decision |
|----------|----------|
| **Topology** | External orchestrator above all Superset instances |
| **Communication** | Superset REST API (bearer token auth per environment) |
| **Database** | Dedicated PostgreSQL 16 instance (not Superset metadata DB) |
| **Frontend** | SvelteKit SPA (not integrated into Superset UI) |
| **Release cadence** | Independent Docker image releases |
| **Auth** | Own RBAC system with optional ADFS SSO (see ADR-0005) |
| **Plugins** | Own plugin lifecycle (see ADR-0004) |
### Consequences
- **Positive:** Fast independent development, modern stack, no Superset coupling, clear security boundary.
- **Negative:** Users have two separate UIs (Superset for BI, ss-tools for DevOps). Mitigated by consistent UX design and crosslinking.
- **Risk:** API compatibility if Superset changes its REST API. Mitigated by versioned API client with automated compatibility tests.
## Migration from Existing Document
This ADR supersedes and formalizes `docs/architecture_decision_superset_migration.md`. The original document contained the same recommendation and rationale but lacked the `[DEF:id:ADR]` contract structure, making it invisible to the semantic index and decisionmemory audit chain.
# [/DEF:ADR-0003:ADR]

View File

@@ -0,0 +1,78 @@
# [DEF:ADR-0004:ADR]
# @STATUS ACTIVE
# @PURPOSE Define the plugin architecture for ss-tools — the loading mechanism, lifecycle contract, isolation guarantees, and the boundary between core services and pluggable extensions.
# @RELATION DEPENDS_ON -> [ADR-0001:ADR]
# @RELATION DEPENDS_ON -> [ADR-0003:ADR]
# @RELATION CALLS -> [ADR-0005:ADR]
# @RATIONALE Extensibility is a core architectural value: the system must support LLMdriven analysis, custom data transformations, and environmentspecific logic without modifying core code. A plugin system prevents the monolith from accumulating every domainspecific feature and enables thirdparty (or futureself) contributions without forking.
# @RATIONALE Process isolation was chosen over inprocess imports because: (a) plugins may use incompatible library versions, (b) a crashing plugin must not take down the orchestrator, (c) security boundary — plugins should not access the orchestrator's database connection directly.
# @REJECTED Inprocess Python `importlib` plugin loading — rejected because a misbehaving plugin can corrupt global state, exhaust memory, or crash the server. Process isolation provides a hard boundary.
# @REJECTED Dockercontainer per plugin — rejected because it adds excessive orchestration complexity and startup latency for plugins that are mostly lightweight LLM prompt chains. Subprocess isolation is sufficient.
# @REJECTED WebAssembly (WASI) sandbox — rejected because the Python AI/LLM ecosystem (langchain, transformers) does not yet reliably compile to WASM. Premature optimization.
## Decision
### Plugin Architecture
```
ss-tools Core
├── core/plugin_loader.py # Plugin discovery, loading, lifecycle
├── core/plugin_executor.py # Subprocess execution, timeout, error boundary
├── core/plugin_registry.py # Registered plugins, metadata, health
└── plugins/ # Plugin packages (each = one directory)
├── llm_analysis/ # LLMdriven Superset data analysis
├── dataset_orchestration/ # LLM dataset operations
└── git_integration/ # Gitbased version control for dashboards
```
### Plugin Contract
Every plugin MUST provide:
1. **`plugin.toml`** — metadata manifest at the plugin root
```toml
[plugin]
id = "llm_analysis"
name = "LLM Data Analysis"
version = "1.0.0"
entrypoint = "plugin.py"
timeout_sec = 300
max_memory_mb = 512
requires = ["superset-api>=1.0", "openai>=1.0"]
```
2. **`plugin.py`** — entrypoint with two required functions:
- `def register(registry: PluginRegistry) -> PluginInfo` — declare capabilities
- `def execute(task: TaskContext) -> TaskResult` — run the plugin
3. **Task context contract** (Pydantic `TaskContext`):
- `task_id: str`, `plugin_id: str`, `action: str`, `params: dict`, `superset_env: SupersetConnection`, `auth_token: str` (scoped, shortlived)
4. **Result envelope** (Pydantic `TaskResult`):
- `status: Literal["success", "warning", "error"]`, `data: dict | None`, `error_message: str | None`, `execution_time_ms: int`, `artifacts: list[str]` (file paths to saved artifacts)
### Plugin Lifecycle
```
Discover → Validate → Register → [Execute] → Report
│ │ │ │
│ Check TOML Store in Spawn subprocess
│ schema, registry with timeout +
│ dependencies in DB memory limit
```
### Isolation Guarantees
- **Subprocess**: `subprocess.run(..., timeout=timeout_sec)`, killed on timeout.
- **Memory**: `resource.setrlimit(RLIMIT_AS, max_memory_mb * 1024 * 1024)` before exec.
- **No DB access**: Plugins receive only a scoped REST API token, never a database connection.
- **No filesystem writes outside allowed dirs**: Configurable artifact directory per plugin.
### RBAC Integration
Plugin access is governed by ADR-0005 (RBAC):
- Each plugin declares `required_roles: ["admin", "analyst"]` in `plugin.toml`.
- Plugin executor checks the user's role set before allowing execution.
- Forbidden access returns `403` with audit log entry.
# [/DEF:ADR-0004:ADR]

View File

@@ -0,0 +1,84 @@
# [DEF:ADR-0005:ADR]
# @STATUS ACTIVE
# @PURPOSE Define the authentication and authorization architecture for ss-tools: local auth, optional ADFS SSO federation, RBAC model, session management, and the security boundary between ss-tools (DevOps privileges) and Superset (BI privileges).
# @RELATION DEPENDS_ON -> [ADR-0001:ADR]
# @RELATION DEPENDS_ON -> [ADR-0003:ADR]
# @RELATION CALLS -> [ADR-0004:ADR]
# @RATIONALE ss-tools manages operations that can modify production Superset instances (dashboard migration, backup, deployment). Unauthenticated or underauthorized access to these operations is a critical security risk. A dedicated RBAC system ensures that DevOps privileges (manage deployments) are cleanly separated from BI privileges (view dashboards) and that actions are auditable.
# @RATIONALE Local auth (bcrypt + JWT) is the primary path because ss-tools must work in airgapped enterprise deployments where external identity providers are unavailable. ADFS SSO is an optional federation layer for organizations that already have Active Directory.
# @RATIONALE RoleBased Access Control (RBAC) was chosen over AttributeBased Access Control (ABAC) because: (a) the system has a small, welldefined set of operations (deploy, backup, migrate, view), (b) RBAC maps naturally to organizational roles (admin, analyst, operator), (c) ABAC adds complexity without proportional benefit for this scope.
# @REJECTED Delegating all auth to Superset — rejected because ss-tools must operate when Superset is down, and Superset's auth model is designed for BI users, not DevOps operators with crossenvironment privileges.
# @REJECTED OAuth2 social login (Google, GitHub) as primary path — rejected because enterprise deployments require airgapped operation. External OAuth providers are unavailable in offline mode.
# @REJECTED Simple API key (no RBAC) — rejected because it cannot express granular permissions (admin vs analyst vs viewer), making auditability and leastprivilege impossible.
## Decision
### Authentication Flow
```
┌──────────┐ POST /api/auth/login ┌──────────────┐
│ User │ ──────────────────────────────│ FastAPI │
│ (Browser)│ {username, password} │ Backend │
│ │ ◄──────────────────────────── │ │
│ │ {access_token, refresh} │ bcrypt + │
│ │ │ JWT (HS256) │
└──────────┘ └──────┬───────┘
┌───────▼───────┐
│ PostgreSQL │
│ users table │
│ roles table │
└───────────────┘
```
### ADFS Federation (Optional)
- Enabled via `config.json: auth.adfs_enabled = true`
- SAML 2.0 flow through `python3-saml`
- ADFS users mapped to local roles via `adfs_group → ss_tools_role` mapping table
- Federated sessions still receive JWTs (stateless after initial SAML handshake)
### RBAC Model
| Role | Permissions |
|------|-------------|
| `admin` | All: manage users, manage roles, manage plugins, deploy, backup, migrate, view dashboards, view logs |
| `analyst` | View dashboards, run LLM analysis plugins, view reports, view logs (own) |
| `operator` | Deploy dashboards, run migrations, manage backups, view logs |
| `viewer` | View dashboards, view reports |
### Token Design
- **Access token**: JWT (HS256), 15minute expiry, contains `user_id`, `roles: list[str]`
- **Refresh token**: opaque random string (SHA256), 7day expiry, stored hashed in DB
- **Superset API token per environment**: AES256GCM encrypted, stored in `connection_configs` table
- **Plugin execution token**: JWT, scoped to a single `task_id`, 15minute expiry
### Security Constraints
1. Passwords: bcrypt with cost factor 12 (minimum).
2. Rate limiting: 5 failed login attempts per IP per 15 minutes → temporary IP block.
3. Token revocation: admin can revoke all sessions for a user (delete refresh tokens).
4. Audit log: all auth events (login success/failure, role change, token revoke) written to `audit_log` table.
5. Enterprise clean mode: local auth only, ADFS disabled, no external network calls for identity.
### RBAC Enforcement Pattern
```python
# backend/src/api/dependencies.py
from fastapi import Depends, HTTPException
def require_role(required_role: str):
def dependency(current_user: User = Depends(get_current_user)):
if required_role not in current_user.roles:
raise HTTPException(status_code=403, detail="Insufficient permissions")
return current_user
return dependency
# Usage in route:
@router.post("/api/dashboards/deploy")
async def deploy(..., user: User = Depends(require_role("operator"))):
...
```
# [/DEF:ADR-0005:ADR]

View File

@@ -0,0 +1,79 @@
# [DEF:ADR-0006:ADR]
# @STATUS ACTIVE
# @PURPOSE Define the frontend architecture for ss-tools: Svelte 5 runes reactivity model, SvelteKit routing, component composition rules, state management patterns, API client conventions, and UX contract expectations for C4/C5 components.
# @RELATION DEPENDS_ON -> [ADR-0001:ADR]
# @RELATION DEPENDS_ON -> [ADR-0002:ADR]
# @RATIONALE Svelte 5 introduces a fundamentally different reactivity model (runes: `$state`, `$derived`, `$effect`, `$props`) compared to Svelte 4's `$:` reactive statements. This ADR locks in the Svelte 5 runes approach and prevents backsliding into legacy patterns when new developers or AI agents contribute code.
# @RATIONALE SvelteKit with static adapter (`@sveltejs/adapter-static`) was chosen over SvelteKit SSR because: (a) ss-tools is a Dockerdeployed SPA behind nginx — serverside rendering provides no latency benefit, (b) static SPA simplifies Docker deployment (no Node.js server needed, just nginx serving static files), (c) all data is fetched via REST API, not serverside load functions.
# @RATIONALE Svelte 5 runes (`$state`, `$derived`, `$effect`) are mandated over Svelte 4 `$:` reactivity and `writable` stores because: (a) runes are the forward path — Svelte 4 patterns are deprecated, (b) runes provide finegrained reactivity without subscription boilerplate, (c) `$state` can be used outside `.svelte` files in `.svelte.js` modules, enabling reusable reactive logic.
# @REJECTED React/Next.js — rejected by ADR-0003 (technology stack independence from Superset). Svelte 5 was chosen for its compiletime approach, smaller bundle size, and superior DX for this project's scale.
# @REJECTED Svelte 4 legacy patterns (`$:`, `writable`/`readable` stores) — rejected because Svelte 5 runes are the actively maintained reactivity model. Mixing two models creates confusion and potential reactivity bugs.
# @REJECTED ServerSide Rendering (SSR) with SvelteKit — rejected because the data is entirely APIdriven (no serverside page data loading), and SSR adds deployment complexity (Node.js process) without latency benefit for authenticated SPA users.
## Decision
### Technology Stack
| Layer | Choice | Version |
|-------|--------|---------|
| Framework | SvelteKit | 2.x |
| UI Library | Svelte | 5.x (runes mode) |
| Build | Vite | 7.x |
| Styling | Tailwind CSS | 3.x |
| Adapter | @sveltejs/adapter-static | 3.x (SPA mode) |
| Testing | vitest + @testing-library/svelte | 4.x / 5.x |
### Reactivity Model (Svelte 5 Runes)
```
Legacy (FORBIDDEN) Svelte 5 Runes (REQUIRED)
──────────────────────── ──────────────────────────
let count = 0; let count = $state(0);
$: doubled = count * 2; let doubled = $derived(count * 2);
$: { /* side effect */ } $effect(() => { /* side effect */ });
export let name; let { name } = $props();
import { writable } from ... ❌ stores are .svelte.js modules with $state
```
### State Management Pattern
Three tiers of state, strictly separated:
1. **Componentlocal state**: `$state()` inside `.svelte` files. Never exported.
2. **Shared reactive state**: `.svelte.js` modules exporting `$state` objects. Imported by components.
```js
// frontend/src/lib/stores/dashboard.svelte.js
export const dashboardState = $state({
dashboards: [],
selectedId: null,
isLoading: false
});
```
3. **Server state**: Fetched via API client, held in `$state` variables locally. No global cache — each route fetches what it needs.
### API Client Convention
API client modules live under `frontend/src/lib/api/`. Each module wraps `fetch` calls to the FastAPI backend, attaches the JWT access token from the auth store, and normalises errors into a typed `ApiError`. Contract annotations follow the rules defined in the `semantics-core` and `semantics-contracts` skills.
### Component Complexity & UX Contracts
Svelte components with side effects (API calls, WebSocket subscriptions, file operations) MUST carry UX contract tags as defined by the `semantics-frontend` skill (`.opencode/skills/semantics-frontend/SKILL.md`). The skill is the canonical source for tag inventory, syntax, and percomplexity requirements.
### Component File Structure
```
frontend/src/lib/components/
├── DashboardGrid.svelte # C4: complex data grid with WebSocket updates
├── MappingTable.svelte # C3: migration mapping table
├── PluginExecutionCard.svelte # C4: plugin runner with progress feedback
├── RoleBadge.svelte # C1: simple role display
└── ...
```
### Routing
- SvelteKit filebased routing under `frontend/src/routes/`
- Protected routes check auth in `+layout.svelte` (redirect to `/login` if no token)
- Route structure: `/` (dashboard list), `/envs/[id]` (environment detail), `/migration` (migration wizard), `/plugins` (plugin management), `/admin` (user/role management)
# [/DEF:ADR-0006:ADR]

View File

@@ -1,17 +1,73 @@
# [DEF:MergeSpec:Module]
# @PURPOSE: Utility to merge specs
# @TIER: TRIVIAL
# @COMPLEXITY: 1
# @RELATION: [DEPENDS_ON] builtin
# @LAYER: Infra
# [DEF:merge_spec:Module]
# @PURPOSE: Utility script for merge_spec
# @TIER: TRIVIAL
# @COMPLEXITY: 1
import os
# @LAYER: Infra
import sys
from datetime import datetime
from pathlib import Path
REVIEW_PROMPT = (
"Другая LLM создала этот feature-пакет. Твоя задача - провести независимое "
"ортогональное spec review, оценить готовность спецификации, найти противоречия, "
"пробелы, риски реализации и подготовить структурированный отчет с корректировками. "
"Сфокусируйся именно на review пакета спецификации, а не на переписывании реализации."
)
CANONICAL_MD_STAGES = (
("exact", "spec.md"),
("exact", "ux_reference.md"),
("prefix", "checklists/"),
("exact", "plan.md"),
("exact", "research.md"),
("exact", "data-model.md"),
("prefix", "contracts/"),
("exact", "quickstart.md"),
("exact", "tasks.md"),
)
def relative_key(path: Path, root: Path) -> str:
return path.relative_to(root).as_posix()
def ordered_markdown_files(target_dir: Path) -> list[Path]:
markdown_files = [path for path in target_dir.rglob("*.md") if path.is_file()]
remaining = {relative_key(path, target_dir): path for path in markdown_files}
ordered: list[Path] = []
for stage_type, stage_value in CANONICAL_MD_STAGES:
if stage_type == "exact":
path = remaining.pop(stage_value, None)
if path is not None:
ordered.append(path)
continue
stage_matches = sorted(
[
path
for relative_path, path in remaining.items()
if relative_path.startswith(stage_value)
],
key=lambda path: relative_key(path, target_dir),
)
ordered.extend(stage_matches)
for path in stage_matches:
remaining.pop(relative_key(path, target_dir), None)
ordered.extend(
sorted(remaining.values(), key=lambda path: relative_key(path, target_dir))
)
return ordered
def merge_specs(feature_number):
specs_dir = Path("specs")
if not specs_dir.exists():
@@ -26,24 +82,30 @@ def merge_specs(feature_number):
break
if not target_dir:
print(f"Error: No directory found for feature number '{feature_number}' in 'specs/'.")
print(
f"Error: No directory found for feature number '{feature_number}' in 'specs/'."
)
return
feature_name = target_dir.name
now = datetime.now().strftime("%Y%m%d-%H%M%S")
output_filename = f"{feature_name}-{now}.md"
content_blocks = ["Мой коллега предложил такую фичу, оцени ее\n"]
# Recursively find all files
files_to_merge = sorted([f for f in target_dir.rglob("*") if f.is_file()])
content_blocks = [
REVIEW_PROMPT,
"",
"Порядок артефактов: spec -> ux_reference -> checklist -> plan -> research -> data-model -> contracts -> quickstart -> tasks -> remaining markdown.",
"",
]
files_to_merge = ordered_markdown_files(target_dir)
for file_path in files_to_merge:
relative_path = file_path.relative_to(target_dir)
try:
with open(file_path, "r", encoding="utf-8") as f:
file_content = f.read()
content_blocks.append(f"--- FILE: {relative_path} ---\n")
content_blocks.append(file_content)
content_blocks.append("\n")
@@ -55,11 +117,12 @@ def merge_specs(feature_number):
print(f"Successfully created: {output_filename}")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python merge_spec.py <feature_number>")
sys.exit(1)
merge_specs(sys.argv[1])
# [/DEF:merge_spec:Module]

View File

@@ -0,0 +1,55 @@
# Specification Quality Checklist: LLM Table Translation Service
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-05-08
**Updated**: 2026-05-08 (post-review)
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs) leak into spec.md
- [x] Focused on user value and business needs — translation workflow, dictionary management, scheduling for operational stakeholders
- [x] Written for non-technical stakeholders — uses business terminology throughout
- [x] All mandatory sections completed (User Scenarios, Clarifications, Requirements, Key Entities, Success Criteria, Assumptions, Access Control Matrix)
## UX Consistency
- [x] Functional requirements fully support all flows in ux_reference.md (Flows AG, updated for Superset API execution)
- [x] Error handling requirements match all 'Error Experience' scenarios (AI) in ux_reference.md
- [x] No requirements contradict the defined User Persona or Context
- [x] UX principles (preview gates manual runs, traceability, graceful degradation, cost awareness) are reflected in functional requirements
- [x] INSERT execution via Superset API is consistent across spec, UX, quickstart, and contracts
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain — all post-review ambiguities resolved in Clarifications sessions
- [x] Requirements are testable and unambiguous — each FR describes a specific, verifiable behavior
- [x] Success criteria are measurable — 15 SC items include specific percentages, time bounds, or coverage targets
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined — 7 user stories with 41 total acceptance scenarios
- [x] Edge cases are identified — 19 edge cases covering NULL values, missing tables, large datasets, concurrency, LLM failures, composite keys, dictionary duplicates, schedule overlaps, retention gap, Superset API errors, SQL injection
- [x] Scope is clearly bounded — covers configuration, preview quality gate, execution via Superset API, history, dictionary management, feedback loop, and scheduling; dialect-aware SQL generation (PostgreSQL/Greenplum, ClickHouse)
- [x] Dependencies and assumptions identified — 22 assumptions
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria (mapped to user story acceptance scenarios)
- [x] User scenarios cover primary flows: Configuration (P1) → Preview Quality Gate + Dictionary (P2) → Superset API Execution + Feedback + Scheduling (P3) → Audit (P4)
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
- [x] Feature aligns with existing ss-tools architecture patterns (plugin system, Superset integration, LLM provider, scheduler, notification, RBAC)
- [x] Contradictions resolved: FR-023 (snapshot isolation), dictionary conflict model (overwrite/keep existing), preview model (quality gate), auto-insert for scheduled runs (after first successful manual run)
- [x] New entities added: TranslationBatch, TranslationPreviewSession, TranslationPreviewRecord, MetricSnapshot
- [x] Event model expanded: nullable run_id, terminal states (succeeded/partial/failed/cancelled/skipped), pre-run events
- [x] Access control matrix defined with ownership constraints
- [x] Dialect-aware SQL generation declared (PostgreSQL/Greenplum + ClickHouse, detected from Superset connection)
- [x] Retention gap resolved: MetricSnapshot persistence before event pruning
## Notes
- Post-review session resolved 3 critical contradictions and 10+ medium gaps.
- FR count: 49 | SC count: 15 | Key entities: 10 | Edge cases: 19
- Preview model clarified as quality gate (not row-level approval for all rows).
- INSERT execution standardized on Superset `/api/v1/sqllab/execute/` API.
- All manual copy/paste SQL Lab references removed from spec, UX, quickstart, and contracts.
- Specification is ready for the next phase (`/speckit.plan` re-validation or `/speckit.implement`).

View File

@@ -0,0 +1,277 @@
# Semantic Contracts: LLM Table Translation Service
**Feature Branch**: `028-llm-datasource-supeset`
**Date**: 2026-05-08 (updated post-review)
**Protocol**: GRACE-Poly v2.4
---
## Backend Contracts (Python / FastAPI)
### [DEF:TranslatePlugin:Module]
`backend/src/plugins/translate/plugin.py`
@COMPLEXITY 3
@PURPOSE Plugin entry point implementing PluginBase for the translation service. Registers API routes, ORM models, and scheduler integration.
@RELATION INHERITS -> [PluginBase:Class]
@RELATION CALLS -> [TranslationOrchestrator:Class]
@RELATION CALLS -> [DictionaryManager:Class]
@RELATION CALLS -> [TranslationScheduler:Class]
@RELATION BINDS_TO -> [TranslateRoutes:Module]
[/DEF:TranslatePlugin:Module]
### [DEF:TranslationOrchestrator:Class]
`backend/src/plugins/translate/orchestrator.py`
@COMPLEXITY 5
@PURPOSE Central coordinator for translation run lifecycle: validates preconditions, manages preview quality gate, dispatches to executor, generates safe SQL, submits to Superset SQL Lab API, manages retry, records events, enforces retention.
@PRE Job configuration is saved and valid. Superset datasource is accessible. LLM provider is configured and reachable. For manual runs: preview session must be accepted. For scheduled runs: at least one prior successful manual run exists.
@POST TranslationRun created with translation_status and insert_status. INSERT SQL generated (safe PostgreSQL dialect) and submitted to Superset `/api/v1/sqllab/execute/`. Superset query reference recorded. Structured events recorded for all lifecycle transitions.
@SIDE_EFFECT Creates TranslationRun, TranslationBatch, TranslationRecord, TranslationEvent, MetricSnapshot rows. Calls LLM provider API (token consumption). Calls Superset SQL Lab API.
@DATA_CONTRACT Input: TranslationJob (config snapshot) + datasource rows → Output: TranslationRun (result) + Superset query reference
@INVARIANT A run must transition through states: pending → running → (completed|partial|failed|cancelled|skipped). insert_status must transition through one of: not_started → skipped (if no insert needed); not_started → submitted → (succeeded|failed) (Superset may complete immediately); not_started → submitted → running → (succeeded|failed) (async polling). No other state transitions are allowed. Snapshot isolation: in-progress runs use config snapshot; config edits affect future runs only.
@RELATION CALLS -> [TranslationPreview:Class]
@RELATION CALLS -> [TranslationExecutor:Class]
@RELATION CALLS -> [SQLGenerator:Class]
@RELATION CALLS -> [SupersetSqlLabExecutor:Class]
@RELATION CALLS -> [TranslationEventLog:Class]
@RELATION CALLS -> [LLMProviderService:Module]
@RELATION CALLS -> [SupersetClient:Module]
@RATIONALE Centralized orchestrator is needed because preview gating, execution, SQL generation, Superset API submission, event logging, and retry share state (run_id, config snapshot) and must coordinate within a single transaction boundary for consistency.
@REJECTED Distributed actor model (Celery tasks per batch) was rejected because it introduces eventual-consistency challenges for run status tracking without proportional benefit at the expected scale. Synchronous batch processing provides simpler debugging and deterministic retry. UPDATE statements are never generated — all modifications use INSERT/UPSERT per PostgreSQL dialect.
[/DEF:TranslationOrchestrator:Class]
### [DEF:TranslationPreview:Class]
`backend/src/plugins/translate/preview.py`
@COMPLEXITY 4
@PURPOSE Fetch a sample of source rows, send to LLM with configured context and per-batch filtered dictionary, return side-by-side preview for quality gate acceptance. Creates persistent PreviewSession and PreviewRecord rows.
@PRE Job configuration is saved. Datasource is accessible. Preview row count is configured (default: 10).
@POST PreviewSession created with config_hash and dict_snapshot_hash. PreviewRecord rows returned with source_text, context, key_values, llm_translation, and status. Accepting the preview session gates full execution. No data is persisted to target table.
@SIDE_EFFECT Calls LLM provider API (token consumption). Creates PreviewSession and PreviewRecord rows.
@RELATION CALLS -> [LLMProviderService:Module]
@RELATION CALLS -> [SupersetClient:Module]
@RELATION CALLS -> [DictionaryManager:Class]
[/DEF:TranslationPreview:Class]
### [DEF:TranslationExecutor:Class]
`backend/src/plugins/translate/executor.py`
@COMPLEXITY 4
@PURPOSE Process source rows in batches through LLM, collect translations, handle batch-level retry, and produce TranslationBatch and TranslationRecord rows. Requests structured JSON output from LLM keyed by stable row identifiers; validates row alignment.
@PRE Run exists with translation_status `running`. Source rows fetched. Batch size configured.
@POST All processable rows have TranslationRecord entries. Each batch has a TranslationBatch record with statistics and timing. Run statistics updated.
@SIDE_EFFECT Calls LLM provider API (token consumption). Creates TranslationBatch and TranslationRecord rows.
@RELATION CALLS -> [LLMProviderService:Module]
@RELATION CALLS -> [DictionaryManager:Class]
[/DEF:TranslationExecutor:Class]
### [DEF:SQLGenerator:Class]
`backend/src/plugins/translate/sql_generator.py`
@COMPLEXITY 3
@PURPOSE Generate safe dialect-appropriate INSERT/UPSERT SQL from TranslationRecord rows, keyed by configured target key columns. Detects dialect from Superset connection (PostgreSQL/Greenplum, ClickHouse supported for MVP). Validates and quotes identifiers per dialect rules; safely encodes values.
@PRE TranslationRecord rows exist with status translated or edited and final_value is non-null. Target table schema validated at configuration time. Target database dialect is PostgreSQL.
@POST Returns a syntactically valid, injection-safe PostgreSQL INSERT (or INSERT ... ON CONFLICT) statement string.
@RELATION CALLS -> [SupersetClient:Module]
@RATIONALE Separate contract because SQL generation reused by both manual and scheduled runs, and independently testable for SQL syntax correctness (SC-003) and injection safety.
@REJECTED UPDATE statements not generated because source is append-only (new-key-only strategy). UPSERT covers the overwrite case without separate UPDATE logic.
[/DEF:SQLGenerator:Class]
### [DEF:SupersetSqlLabExecutor:Class]
`backend/src/plugins/translate/superset_executor.py`
@COMPLEXITY 3
@PURPOSE Submit generated SQL to Superset SQL Lab API `/api/v1/sqllab/execute/`, poll execution status, record Superset query reference, status, and error details in the TranslationRun.
@PRE TranslationRun exists with translation_status completed/partial. SQL is generated and syntactically valid.
@POST TranslationRun.insert_status updated (submitted→running→succeeded|failed). Superset query reference, error details, rows_affected (if available) stored.
@SIDE_EFFECT Calls Superset API (SQL execution, database write). Updates TranslationRun row.
@RELATION CALLS -> [SupersetClient:Module]
@RELATION CALLS -> [TranslationEventLog:Class]
[/DEF:SupersetSqlLabExecutor:Class]
### [DEF:DictionaryManager:Class]
`backend/src/plugins/translate/dictionary.py`
@COMPLEXITY 4
@PURPOSE CRUD for TerminologyDictionary and DictionaryEntry (unique per dictionary_id + source_term). CSV/TSV import with conflict detection (overwrite/keep existing). Per-batch term filtering (case-insensitive, word-boundary-aware substring matching). Language validation on job attachment.
@PRE Dictionary model exists. User has required permissions. For attachment: dictionary target_language matches job target_language.
@POST CRUD operations reflected in database. Filtered term list returned for batch. Entries with mismatched language rejected on attachment.
@SIDE_EFFECT Creates/updates/deletes DictionaryEntry rows. May read TranslationRun for origin tracking.
@RELATION CALLS -> [DictionaryEntry:Class]
@RELATION CALLS -> [TranslationJobDictionary:Class]
[/DEF:DictionaryManager:Class]
### [DEF:TranslationScheduler:Class]
`backend/src/plugins/translate/scheduler.py`
@COMPLEXITY 4
@PURPOSE Manage translation job schedules with timezone support: create/update/delete, register with APScheduler, handle trigger dispatch with concurrency policy (skip/queue at most one), enforce new-key-only strategy with baseline_expired fallback.
@PRE SchedulerService is running. Job has at least one prior successful manual run. Job configuration exists.
@POST Schedule registered with APScheduler or removed. On trigger: new TranslationRun created and dispatched to TranslationOrchestrator. New-key-only filter applied; if baseline expired (>90 days since last successful run), full translation with baseline_expired event.
@SIDE_EFFECT Creates TranslationRun rows. Calls SchedulerService. Emits schedule_triggered/schedule_skipped/schedule_failed events. Calls NotificationService on failure.
@RELATION CALLS -> [SchedulerService:Class]
@RELATION CALLS -> [TranslationOrchestrator:Class]
@RELATION CALLS -> [TranslationEventLog:Class]
@RELATION CALLS -> [NotificationService:Module]
[/DEF:TranslationScheduler:Class]
### [DEF:TranslationEventLog:Class]
`backend/src/plugins/translate/events.py`
@COMPLEXITY 5
@PURPOSE Structured event logging: write immutable events (run_id nullable for pre-run events), query events for audit/dashboard, enforce 90-day retention pruning with MetricSnapshot persistence before deletion.
@PRE TranslationEvent table exists. Event type recognized. For run-scoped events: run exists. For pre-run events (schedule_*, run_noop): run_id is NULL.
@POST Event row created with type-specific payload. Pruning job: persists MetricSnapshot, then removes events and records older than 90 days.
@SIDE_EFFECT Creates TranslationEvent rows. Creates MetricSnapshot rows at pruning time. Deletes expired rows.
@DATA_CONTRACT Input: (run_id?, job_id, event_type, payload: dict) → Output: TranslationEvent
@INVARIANT Every created run MUST have exactly one run_started event and exactly one terminal event among: run_succeeded, run_partial, run_failed, run_cancelled, run_skipped. Events are immutable after creation. Cumulative metrics survive pruning via MetricSnapshot.
@RELATION CALLS -> [TranslationEvent:Class]
@RELATION CALLS -> [MetricSnapshot:Class]
@RATIONALE C5 warranted: event log is single source of truth for observability, metrics, and audit. Immutability, retention, and metric continuity invariants must be enforced to prevent data loss or tampering.
@REJECTED stdout-only logging lacks structured payload integrity and cannot enforce terminal-event invariant. Event-sourced metrics without snapshots would lose cumulative data after pruning.
[/DEF:TranslationEventLog:Class]
### [DEF:TranslationMetrics:Class]
`backend/src/plugins/translate/metrics.py`
@COMPLEXITY 3
@PURPOSE Aggregate per-job metrics from live TranslationEvent log AND persistent MetricSnapshot table. For recent data (<90 days): compute from events. For cumulative totals: read latest MetricSnapshot + recent events.
@PRE TranslationEvent rows or MetricSnapshot rows exist for target job.
@POST Returns MetricsResponse DTO with accurate cumulative values spanning both live and pruned data.
@RELATION CALLS -> [TranslationEventLog:Class]
@RELATION CALLS -> [MetricSnapshot:Class]
[/DEF:TranslationMetrics:Class]
### [DEF:TranslateRoutes:Module]
`backend/src/api/routes/translate.py`
@COMPLEXITY 3
@PURPOSE FastAPI route handlers: CRUD jobs/dictionaries, preview, run trigger, cancel run, schedule, history, metrics, feedback-loop submission. All endpoints enforce RBAC per access-control matrix.
@RELATION CALLS -> [TranslationOrchestrator:Class]
@RELATION CALLS -> [DictionaryManager:Class]
@RELATION CALLS -> [TranslationScheduler:Class]
@RELATION CALLS -> [TranslationEventLog:Class]
@RELATION CALLS -> [TranslationMetrics:Class]
@RELATION BINDS_TO -> [PermissionChecker:Dependency]
[/DEF:TranslateRoutes:Module]
### [DEF:TranslateModels:Module]
`backend/src/models/translate.py`
@COMPLEXITY 2
@PURPOSE SQLAlchemy ORM models: TranslationJob, TranslationRun, TranslationBatch, TranslationRecord, TranslationEvent, TranslationPreviewSession, TranslationPreviewRecord, TerminologyDictionary, DictionaryEntry, TranslationSchedule, TranslationJobDictionary, MetricSnapshot.
@RELATION INHERITS -> [Base:Class]
[/DEF:TranslateModels:Module]
### [DEF:TranslateSchemas:Module]
`backend/src/schemas/translate.py`
@COMPLEXITY 2
@PURPOSE Pydantic v2 request/response schemas for translation API endpoints.
[/DEF:TranslateSchemas:Module]
---
## Frontend Contracts (Svelte 5 / SvelteKit)
### [DEF:TranslateJobList:Component]
`frontend/src/routes/translate/+page.svelte`
@COMPLEXITY 3
@PURPOSE SvelteKit page listing all translation jobs with status/schedule indicators.
@UX_STATE idle, loading, empty, populated, error
[/DEF:TranslateJobList:Component]
### [DEF:TranslationJobConfig:Component]
`frontend/src/routes/translate/[id]/+page.svelte`
@COMPLEXITY 3
@PURPOSE SvelteKit page for job configuration: datasource selection, column mapping with source→target key mapping, target table/column, LLM settings, dictionary attachment (language-filtered), schedule tab with timezone.
@UX_STATE idle, loading, configured, saving, validation_error, datasource_unavailable
@UX_REACTIVITY Column list $derived from datasource; dictionary list filtered by target_language
[/DEF:TranslationJobConfig:Component]
### [DEF:TranslationPreview:Component]
`frontend/src/lib/components/translate/TranslationPreview.svelte`
@COMPLEXITY 4
@PURPOSE Side-by-side preview of source rows with LLM translations, approve/edit/reject as quality feedback, accept preview as quality gate. Shows config_hash to detect stale previews.
@UX_STATE idle, loading, preview_loaded, preview_error, accepted, stale_config
@UX_FEEDBACK Spinner during LLM call; visual distinction for LLM-generated vs user-edited values
@UX_RECOVERY Retry preview; re-fetch with updated config
[/DEF:TranslationPreview:Component]
### [DEF:TranslationRunProgress:Component]
`frontend/src/lib/components/translate/TranslationRunProgress.svelte`
@COMPLEXITY 4
@PURPOSE Live progress display: progress bar, batch counter, success/failure counts, cancel button. WebSocket-driven. Shows both translation and insert execution phases.
@UX_STATE idle, running, cancelling, cancelled, completed, partial, failed, insert_pending, insert_running, insert_failed
@UX_FEEDBACK Progress percentage $derived; real-time batch/insert status; Superset execution reference on completion
@UX_RECOVERY Retry failed batches; cancel run; view generated SQL (audit/debug)
[/DEF:TranslationRunProgress:Component]
### [DEF:TranslationRunResult:Component]
`frontend/src/lib/components/translate/TranslationRunResult.svelte`
@COMPLEXITY 4
@PURPOSE Completion summary with statistics, Superset execution status/reference, generated SQL (audit/debug), inline feedback-loop correction controls.
@UX_STATE completed, partial, failed, insert_failed
@UX_FEEDBACK Superset execution status badge; SQL block for audit
@UX_RECOVERY Retry failed rows; retry insert; submit corrections
@RELATION CALLS -> [TermCorrectionPopup:Component]
[/DEF:TranslationRunResult:Component]
### [DEF:TermCorrectionPopup:Component]
`frontend/src/lib/components/translate/TermCorrectionPopup.svelte`
@COMPLEXITY 3
@PURPOSE Popup for selecting source term + incorrect target term in run results, providing corrected target term, submitting to dictionary. Conflict: overwrite or keep existing.
@UX_STATE closed, selecting, editing, submitting, conflict_detected, submitted
[/DEF:TermCorrectionPopup:Component]
### [DEF:BulkCorrectionSidebar:Component]
`frontend/src/lib/components/translate/BulkCorrectionSidebar.svelte`
@COMPLEXITY 3
@PURPOSE Sidebar for bulk correction: collect multiple terms, mass edit, submit atomically.
@UX_STATE closed, collecting, reviewing, submitting, submitted
[/DEF:BulkCorrectionSidebar:Component]
### [DEF:DictionaryEditor:Component]
`frontend/src/lib/components/translate/DictionaryEditor.svelte`
@COMPLEXITY 3
@PURPOSE Inline editor: add/edit/delete entries, CSV/TSV import with conflict preview (overwrite/keep existing), export.
@UX_STATE idle, loading, editing, importing, import_preview, import_conflict, saving
[/DEF:DictionaryEditor:Component]
### [DEF:DictionaryList:Component]
`frontend/src/routes/translate/dictionaries/+page.svelte`
@COMPLEXITY 3
@PURPOSE SvelteKit page listing dictionaries with language, term count, attachment info.
@UX_STATE idle, loading, empty, populated, delete_blocked
[/DEF:DictionaryList:Component]
### [DEF:ScheduleConfig:Component]
`frontend/src/lib/components/translate/ScheduleConfig.svelte`
@COMPLEXITY 3
@PURPOSE Schedule panel: type selector, cron/interval with timezone, next-N-executions preview, concurrency policy, enable/disable. Warns if no prior successful manual run.
@UX_STATE idle, editing, validating, enabled, disabled, no_prior_run_warning
@UX_REACTIVITY Next execution times $derived with timezone display
[/DEF:ScheduleConfig:Component]
### [DEF:TranslationHistory:Component]
`frontend/src/routes/translate/history/+page.svelte`
@COMPLEXITY 3
@PURPOSE Filterable run history: datasource, target table, row count, translation_status, insert_status, date. Detail view with config snapshot, Superset reference, SQL. Pruned runs show metadata only.
@UX_STATE idle, loading, empty, populated, detail_open, pruned
[/DEF:TranslationHistory:Component]
### [DEF:TranslateApiClient:Module]
`frontend/src/lib/api/translate.js`
@COMPLEXITY 2
@PURPOSE API client wrapping requestApi/fetchApi for all translate endpoints.
[/DEF:TranslateApiClient:Module]
### [DEF:translateStore:Store]
`frontend/src/lib/stores/translate.js`
@COMPLEXITY 3
@PURPOSE Svelte 5 rune store for translation feature state.
@RELATION BINDS_TO -> [TranslateApiClient:Module]
@RELATION BINDS_TO -> [TaskWebSocket:Module]
[/DEF:translateStore:Store]
---
## Integration Contracts (Existing System)
| Contract ID | What It Provides | How Translation Uses It |
|-------------|-----------------|------------------------|
| `[LLMProviderService:Module]` | LLM API call with provider selection, key encryption | Sends batches with constructed prompts; receives structured JSON translations |
| `[SupersetClient:Module]` | Superset API: datasource schema, SQL Lab execution | Fetch column metadata, submit SQL to `/api/v1/sqllab/execute/`, poll status |
| `[SchedulerService:Class]` | APScheduler lifecycle, add_job/remove_job | Translation schedules registered as APScheduler jobs |
| `[NotificationService:Module]` | Email/in-app notification dispatch | Scheduled run failure notifications |
| `[PermissionChecker:Dependency]` | FastAPI dependency for RBAC enforcement | Route handlers annotated per access-control matrix |
| `[TaskWebSocket:Module]` | WebSocket for real-time task progress | Translation run progress events streamed to frontend |
| `[TaskContext:Class]` | Background task lifecycle context | Orchestrator runs as async background task |

View File

@@ -0,0 +1,334 @@
# Data Model: LLM Table Translation Service
**Feature Branch**: `028-llm-datasource-supeset`
**Date**: 2026-05-08 (updated post-review)
## 1. SQLAlchemy ORM Entities (dialect-aware — PostgreSQL/Greenplum, ClickHouse supported)
### TranslationJob
```python
# [DEF:TranslationJob:Class]
# @COMPLEXITY 2
# @PURPOSE Persisted configuration for a translation job.
class TranslationJob(Base):
__tablename__ = "translation_jobs"
id = Column(String, primary_key=True, default=generate_uuid)
name = Column(String, nullable=False)
owner_id = Column(String, ForeignKey("users.id"), nullable=False)
# Source configuration
datasource_id = Column(String, nullable=False)
database_dialect = Column(String, nullable=False) # Detected from Superset: postgresql, clickhouse, greenplum
source_table = Column(String, nullable=True) # Optional metadata; datasource may be virtual
translation_col = Column(String, nullable=False)
context_cols = Column(JSON, default=list)
source_key_cols = Column(JSON, nullable=False) # [col_name, ...]
target_key_cols = Column(JSON, nullable=False) # [col_name, ...] — mapped to source_key_cols
# Target configuration
target_table = Column(String, nullable=False)
target_col = Column(String, nullable=False)
upsert_strategy = Column(String, default="insert") # insert | skip_existing | overwrite
# LLM configuration
provider_id = Column(String, ForeignKey("llm_providers.id"), nullable=True)
target_language = Column(String, nullable=False) # BCP-47 tag, e.g., "ru", "en"
source_language = Column(String, nullable=True) # Optional BCP-47 tag
prompt_template = Column(Text, nullable=True)
batch_size = Column(Integer, default=50)
# Timestamps
created_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
updated_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc), onupdate=lambda: datetime.now(timezone.utc))
owner = relationship("User")
dictionaries = relationship("TranslationJobDictionary", back_populates="job", cascade="all, delete-orphan")
schedule = relationship("TranslationSchedule", back_populates="job", uselist=False, cascade="all, delete-orphan")
runs = relationship("TranslationRun", back_populates="job")
preview_sessions = relationship("TranslationPreviewSession", back_populates="job")
# [/DEF:TranslationJob:Class]
```
### TranslationJobDictionary (M2M join with priority)
```python
class TranslationJobDictionary(Base):
__tablename__ = "translation_job_dictionaries"
id = Column(String, primary_key=True, default=generate_uuid)
job_id = Column(String, ForeignKey("translation_jobs.id"), nullable=False)
dictionary_id = Column(String, ForeignKey("terminology_dictionaries.id"), nullable=False)
priority = Column(Integer, default=0) # Lower = higher priority
job = relationship("TranslationJob", back_populates="dictionaries")
dictionary = relationship("TerminologyDictionary")
```
### TerminologyDictionary
```python
class TerminologyDictionary(Base):
__tablename__ = "terminology_dictionaries"
id = Column(String, primary_key=True, default=generate_uuid)
name = Column(String, nullable=False)
target_language = Column(String, nullable=False) # BCP-47 tag
source_language = Column(String, nullable=True)
owner_id = Column(String, ForeignKey("users.id"), nullable=False)
created_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
updated_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc), onupdate=lambda: datetime.now(timezone.utc))
owner = relationship("User")
entries = relationship("DictionaryEntry", back_populates="dictionary", cascade="all, delete-orphan")
```
### DictionaryEntry
```python
class DictionaryEntry(Base):
__tablename__ = "dictionary_entries"
id = Column(String, primary_key=True, default=generate_uuid)
dictionary_id = Column(String, ForeignKey("terminology_dictionaries.id"), nullable=False)
source_term = Column(String, nullable=False)
source_term_normalized = Column(String, nullable=False) # Lowercase, NFC normalized
target_term = Column(String, nullable=False)
origin_run_id = Column(String, ForeignKey("translation_runs.id"), nullable=True)
origin_row_key = Column(JSON, nullable=True)
origin_user_id = Column(String, ForeignKey("users.id"), nullable=True)
created_at = Column(DateTime(timezone=True), default=datetime.now(timezone.utc))
dictionary = relationship("TerminologyDictionary", back_populates="entries")
__table_args__ = (
UniqueConstraint("dictionary_id", "source_term_normalized", name="uq_dict_source_term_norm"),
)
```
### TranslationSchedule
```python
class TranslationSchedule(Base):
__tablename__ = "translation_schedules"
id = Column(String, primary_key=True, default=generate_uuid)
job_id = Column(String, ForeignKey("translation_jobs.id"), unique=True, nullable=False)
schedule_type = Column(String, nullable=False) # cron | interval | once
cron_expression = Column(String, nullable=True)
interval_seconds = Column(Integer, nullable=True)
run_at = Column(DateTime(timezone=True), nullable=True)
timezone = Column(String, default="UTC") # e.g., "Europe/Moscow"
is_enabled = Column(Boolean, default=True)
concurrency = Column(String, default="skip") # skip | queue
next_run_at = Column(DateTime(timezone=True), nullable=True)
created_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
updated_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc), onupdate=lambda: datetime.now(timezone.utc))
job = relationship("TranslationJob", back_populates="schedule")
```
### TranslationRun
```python
class TranslationRun(Base):
__tablename__ = "translation_runs"
id = Column(String, primary_key=True, default=generate_uuid)
job_id = Column(String, ForeignKey("translation_jobs.id"), nullable=False)
trigger_type = Column(String, nullable=False) # manual | scheduled
translation_status = Column(String, default="pending") # pending|running|completed|partial|failed|cancelled|skipped
insert_status = Column(String, default="not_started") # not_started|submitted|running|succeeded|failed|skipped
# Statistics
total_rows = Column(Integer, default=0)
translated_rows = Column(Integer, default=0)
failed_rows = Column(Integer, default=0)
skipped_rows = Column(Integer, default=0)
token_count = Column(Integer, default=0)
estimated_cost = Column(Float, default=0.0)
# Hashes for idempotency and audit
config_hash = Column(String, nullable=True)
dict_snapshot_hash = Column(String, nullable=True)
# Snapshots
config_snapshot = Column(JSON, nullable=False)
dict_snapshot = Column(JSON, nullable=True)
prompt_used = Column(Text, nullable=True)
# SQL output
insert_sql = Column(Text, nullable=True)
sql_hash = Column(String, nullable=True)
# Superset execution reference
superset_query_id = Column(String, nullable=True)
superset_database_id = Column(String, nullable=True)
insert_error_type = Column(String, nullable=True)
insert_error_message = Column(Text, nullable=True)
rows_affected = Column(Integer, nullable=True)
# Timestamps
started_at = Column(DateTime(timezone=True), nullable=True)
completed_at = Column(DateTime(timezone=True), nullable=True)
insert_started_at = Column(DateTime(timezone=True), nullable=True)
insert_completed_at = Column(DateTime(timezone=True), nullable=True)
created_at = Column(DateTime(timezone=True), default=datetime.now(timezone.utc))
job = relationship("TranslationJob", back_populates="runs")
records = relationship("TranslationRecord", back_populates="run", cascade="all, delete-orphan")
batches = relationship("TranslationBatch", back_populates="run", cascade="all, delete-orphan")
events = relationship("TranslationEvent", back_populates="run")
```
### TranslationBatch
```python
class TranslationBatch(Base):
__tablename__ = "translation_batches"
id = Column(String, primary_key=True, default=generate_uuid)
run_id = Column(String, ForeignKey("translation_runs.id"), nullable=False)
batch_index = Column(Integer, nullable=False)
status = Column(String, default="pending") # pending|running|completed|failed
row_count = Column(Integer, default=0)
translated_count = Column(Integer, default=0)
failed_count = Column(Integer, default=0)
skipped_count = Column(Integer, default=0)
token_count = Column(Integer, default=0)
estimated_cost = Column(Float, default=0.0)
latency_ms = Column(Integer, nullable=True)
error_type = Column(String, nullable=True)
error_message = Column(Text, nullable=True)
started_at = Column(DateTime(timezone=True), nullable=True)
completed_at = Column(DateTime(timezone=True), nullable=True)
run = relationship("TranslationRun", back_populates="batches")
records = relationship("TranslationRecord", back_populates="batch")
__table_args__ = (
Index("idx_batch_run_idx", "run_id", "batch_index"),
)
```
### TranslationRecord
```python
class TranslationRecord(Base):
__tablename__ = "translation_records"
id = Column(String, primary_key=True, default=generate_uuid)
run_id = Column(String, ForeignKey("translation_runs.id"), nullable=False)
batch_id = Column(String, ForeignKey("translation_batches.id"), nullable=True)
# Source data
source_text = Column(Text, nullable=True)
context_data = Column(JSON, nullable=True)
key_values = Column(JSON, nullable=False)
key_hash = Column(String, nullable=False) # hash(canonical_json(key_values))
# Translation result
llm_translation = Column(Text, nullable=True)
user_edit = Column(Text, nullable=True)
final_value = Column(Text, nullable=True)
status = Column(String, default="pending") # pending|translated|approved|edited|rejected|failed|skipped
error_message = Column(Text, nullable=True)
run = relationship("TranslationRun", back_populates="records")
batch = relationship("TranslationBatch", back_populates="records")
__table_args__ = (
Index("idx_record_key_hash", "key_hash"),
Index("idx_record_run_key", "run_id", "key_hash"),
)
```
### TranslationPreviewSession
```python
class TranslationPreviewSession(Base):
__tablename__ = "translation_preview_sessions"
id = Column(String, primary_key=True, default=generate_uuid)
job_id = Column(String, ForeignKey("translation_jobs.id"), nullable=False)
user_id = Column(String, ForeignKey("users.id"), nullable=False)
config_hash = Column(String, nullable=False)
dict_snapshot_hash = Column(String, nullable=True)
sample_size = Column(Integer, default=10)
status = Column(String, default="pending") # pending|accepted|rejected
created_at = Column(DateTime(timezone=True), default=datetime.now(timezone.utc))
accepted_at = Column(DateTime(timezone=True), nullable=True)
expires_at = Column(DateTime(timezone=True), nullable=True)
job = relationship("TranslationJob", back_populates="preview_sessions")
rows = relationship("TranslationPreviewRecord", back_populates="session", cascade="all, delete-orphan")
```
### TranslationPreviewRecord
```python
class TranslationPreviewRecord(Base):
__tablename__ = "translation_preview_records"
id = Column(String, primary_key=True, default=generate_uuid)
session_id = Column(String, ForeignKey("translation_preview_sessions.id"), nullable=False)
source_text = Column(Text, nullable=True)
context_data = Column(JSON, nullable=True)
key_values = Column(JSON, nullable=False)
key_hash = Column(String, nullable=False)
llm_translation = Column(Text, nullable=True)
user_edit = Column(Text, nullable=True)
final_value = Column(Text, nullable=True)
status = Column(String, default="pending") # pending|approved|edited|rejected
session = relationship("TranslationPreviewSession", back_populates="rows")
```
### TranslationEvent
```python
class TranslationEvent(Base):
__tablename__ = "translation_events"
id = Column(String, primary_key=True, default=generate_uuid)
run_id = Column(String, ForeignKey("translation_runs.id"), nullable=True) # NULL for pre-run events
job_id = Column(String, ForeignKey("translation_jobs.id"), nullable=False)
event_type = Column(String, nullable=False)
# schedule_triggered|schedule_skipped|schedule_failed|
# run_started|batch_started|batch_completed|batch_failed|
# run_succeeded|run_partial|run_failed|run_cancelled|run_skipped|run_noop|
# insert_submitted|insert_succeeded|insert_failed
timestamp = Column(DateTime(timezone=True), default=datetime.now(timezone.utc), nullable=False)
payload = Column(JSON, default=dict)
run = relationship("TranslationRun", back_populates="events")
__table_args__ = (
Index("idx_event_job_ts", "job_id", "timestamp"),
Index("idx_event_type", "event_type"),
)
```
### MetricSnapshot
```python
class MetricSnapshot(Base):
__tablename__ = "translation_metric_snapshots"
id = Column(String, primary_key=True, default=generate_uuid)
job_id = Column(String, ForeignKey("translation_jobs.id"), nullable=False)
snapshot_date = Column(Date, nullable=False)
cumulative_tokens = Column(Integer, default=0)
cumulative_cost = Column(Float, default=0.0)
covers_events_before = Column(DateTime(timezone=True), nullable=False) # Cutoff for event coverage
total_runs = Column(Integer, default=0)
success_runs = Column(Integer, default=0)
failed_runs = Column(Integer, default=0)
partial_runs = Column(Integer, default=0)
avg_batch_latency_ms = Column(Integer, nullable=True)
created_at = Column(DateTime(timezone=True), default=datetime.now(timezone.utc))
__table_args__ = (
UniqueConstraint("job_id", "snapshot_date", name="uq_metric_snapshot_date"),
)
```
## 2. Pydantic Schemas (API DTOs)
Key changes from original:
- `TranslateJobCreate`/`Update`: added `source_key_cols`, `target_key_cols` (explicit mapping), `source_language`
- `TermCorrectionSubmit`: added `source_term`, `incorrect_target_term`, `corrected_target_term`
- `ScheduleConfig`: added `timezone`
- `TranslationRunResponse`: split into `translation_status` + `insert_status`, added Superset execution fields
- Added: `TranslationBatchResponse`, `PreviewSessionResponse`, `MetricsSnapshotResponse`
## 3. Entity Relationship Summary
```
TranslationJob (1) ──< (N) TranslationJobDictionary >── (1) TerminologyDictionary
TranslationJob (1) ──< (N) TranslationRun
TranslationJob (1) ──< (N) TranslationPreviewSession
TranslationJob (1) ──── (0..1) TranslationSchedule
TranslationRun (1) ──< (N) TranslationBatch
TranslationRun (1) ──< (N) TranslationRecord
TranslationRun (1) ──< (N) TranslationEvent (nullable run_id for pre-run events)
TranslationBatch (1) ──< (N) TranslationRecord
TranslationPreviewSession (1) ──< (N) TranslationPreviewRecord
TerminologyDictionary (1) ──< (N) DictionaryEntry
```
All UUID PKs, timezone-aware UTC timestamps, JSON columns for dynamic data, hash columns for efficient comparison.

View File

@@ -0,0 +1,143 @@
# Implementation Plan: LLM Table Translation Service
**Branch**: `028-llm-datasource-supeset` | **Date**: 2026-05-08 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/028-llm-datasource-supeset/spec.md`
## Summary
Implement an LLM-powered table translation service as a new backend plugin (`TranslationPlugin`) with companion API routes, ORM models, Pydantic schemas, and Svelte frontend components. The service reads rows from a Superset datasource (translation column + context columns), sends batches to a configured LLM provider with optional per-batch filtered terminology dictionary context, generates safe PostgreSQL INSERT/UPSERT SQL, and submits it to Superset via `/api/v1/sqllab/execute/` with status polling and reference recording. Supporting capabilities: terminology dictionary CRUD with language validation, translation preview as persistent quality gate, scheduled execution via APScheduler (new-key-only with baseline_expired fallback), structured event logging with MetricSnapshot persistence, RBAC-gated access control matrix, and 90-day retention with metric continuity.
**Total planned modules**: 25 contracts across backend (Python/FastAPI) and frontend (Svelte 5/SvelteKit).
**Complexity distribution**: 2× C5 (orchestrator, event-log), 8× C4 (preview, execute, dictionary, scheduler, SQL-gen, routes), 10× C3 (models, schemas, components), 5× C2 (helpers).
## Technical Context
**Language/Version**: Python 3.13+ (backend), JavaScript/TypeScript (frontend Svelte 5)
**Primary Dependencies**: FastAPI 0.115+, SQLAlchemy 2.0+, APScheduler 3.x, Pydantic v2 (backend); SvelteKit 2.x, Svelte 5.43+, Vite 7.x, Tailwind CSS 3.x (frontend)
**Storage**: PostgreSQL 16 (shared with existing ss-tools schema)
**Testing**: pytest 8.x + pytest-asyncio (backend); vitest 4.x + @testing-library/svelte 5.x (frontend)
**Target Platform**: Linux server (Docker Compose: backend + frontend + PostgreSQL)
**Project Type**: web application — FastAPI REST + WebSocket backend, SvelteKit SPA frontend
**Performance Goals**: Preview of 10 rows within 30s (LLM-dependent); INSERT generation <5s for 1000 rows; schedule trigger precision ±60s; event recording <10s after occurrence
**Constraints**: RBAC enforcement via access-control matrix; dictionary per-batch filtering (case-insensitive, word-boundary-aware); 90-day detailed data retention with MetricSnapshot persistence; plugin lifecycle compatible with `PluginBase`; dialect-aware SQL generation (PostgreSQL/Greenplum, ClickHouse supported for MVP); snapshot isolation for in-progress runs; structured JSON LLM output required
**Scale/Scope**: Tens of thousands of rows per run; dictionaries of any size; 50 concurrent users; 510 active translation jobs per deployment
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
| Principle | Status | Evidence |
|-----------|--------|----------|
| **I. Plugin Architecture** | PASS | Feature implemented as a new plugin inheriting from `PluginBase` (`backend/src/plugins/translate/`), consistent with existing `llm_analysis`, `git`, `storage` plugin pattern. |
| **II. API-First Design** | PASS | All operations exposed via REST endpoints under `/api/translate/` (jobs, dictionaries, runs, schedules, history) and WebSocket for run progress. Follows existing FastAPI route conventions. |
| **III. Test-First** | PASS | Acceptance criteria per user story defined in spec; test strategy in research.md covers unit (pytest), integration (API + DB), and component (vitest + Svelte Testing Library) layers. |
| **IV. RBAC & Security** | PASS | Granular permissions (`translate.job.*`, `translate.dictionary.*`, `translate.schedule.manage`, `translate.history.view`) via existing RBAC model. API keys encrypted via `EncryptionManager`. PII masking for LLM-facing context per existing patterns. |
| **V. Observability & Retention** | PASS | Structured Translation Events (FR-046), per-job metrics (FR-047), notification integration (FR-048), 90-day retention with pruning (FR-049). C4+ flows instrumented with `belief_scope`/`reason`/`reflect`/`explore` markers. |
| **Semantic Protocol Compliance** | PASS | All planned modules assigned `@COMPLEXITY` 25 with appropriate tag density. `@RATIONALE`/`@REJECTED` reserved for C5 contracts only. Canonical `@RELATION PREDICATE -> TARGET_ID` syntax. Comment-anchor syntax: `# [DEF:...]` for Python, `<!-- [DEF:...] -->` for Svelte markup. |
| **Fractal Limit (INV_7)** | PASS | Planned modules kept under 400 lines each. No planned contract exceeds 150 lines or CC>10. Decomposition strategy documented in contracts/modules.md. |
**Gate Result**: ✅ ALL PASS — no blocking constitutional or semantic conflicts.
## Project Structure
### Documentation (this feature)
```text
specs/028-llm-datasource-supeset/
├── plan.md # This file
├── research.md # Phase 0 output
├── data-model.md # Phase 1 output
├── quickstart.md # Phase 1 output
├── contracts/ # Phase 1 output
│ └── modules.md # Semantic contract design
├── spec.md # Feature specification
├── ux_reference.md # Interaction reference
└── checklists/
└── requirements.md # Quality validation
```
### Source Code (repository root)
```text
backend/
├── src/
│ ├── api/routes/
│ │ └── translate.py # New: translation REST endpoints
│ ├── core/
│ │ ├── scheduler.py # Existing: APScheduler (extend for translation schedules)
│ │ └── superset_client/ # Existing: Superset API client (reuse)
│ ├── models/
│ │ └── translate.py # New: SQLAlchemy ORM models
│ ├── plugins/
│ │ └── translate/ # New: TranslationPlugin
│ │ ├── __init__.py
│ │ ├── plugin.py # Plugin entry (inherits PluginBase)
│ │ ├── orchestrator.py # Run lifecycle orchestration (C5)
│ │ ├── preview.py # Preview engine (C4)
│ │ ├── executor.py # Batch execution + INSERT gen (C4)
│ │ ├── dictionary.py # Dictionary CRUD + filtering (C4)
│ │ ├── sql_generator.py # INSERT/UPSERT SQL generation (C3)
│ │ ├── scheduler.py # Schedule management (C4)
│ │ ├── events.py # Structured event logging (C5)
│ │ ├── metrics.py # Per-job metrics aggregation (C3)
│ │ └── __tests__/ # Plugin-level tests
│ ├── schemas/
│ │ └── translate.py # New: Pydantic request/response schemas
│ └── services/
│ ├── llm_provider.py # Existing: LLM provider management (reuse)
│ └── llm_prompt_templates.py # Existing: prompt rendering (reuse)
└── tests/ # Integration tests
frontend/
├── src/
│ ├── routes/
│ │ └── translate/ # New: SvelteKit route
│ │ ├── +page.svelte # Translation job list
│ │ ├── [id]/
│ │ │ └── +page.svelte # Job config + preview + run
│ │ ├── dictionaries/
│ │ │ ├── +page.svelte # Dictionary list
│ │ │ └── [id]/
│ │ │ └── +page.svelte # Dictionary editor
│ │ └── history/
│ │ └── +page.svelte # Run history
│ └── lib/
│ ├── components/
│ │ └── translate/ # New: reusable translation components
│ │ ├── TranslationJobConfig.svelte
│ │ ├── TranslationPreview.svelte
│ │ ├── TranslationRunProgress.svelte
│ │ ├── TranslationRunResult.svelte
│ │ ├── DictionaryEditor.svelte
│ │ ├── TermCorrectionPopup.svelte
│ │ ├── ScheduleConfig.svelte
│ │ └── BulkCorrectionSidebar.svelte
│ ├── stores/
│ │ └── translate.js # New: Svelte 5 rune stores
│ └── api/
│ └── translate.js # New: API client module
```
## Semantic Contract Guidance
- All planned contracts follow GRACE-Poly v2.4 protocol with `[DEF:id:Type]...[/DEF:id:Type]` anchors.
- Python backend uses `# [DEF:...]` comment-anchor syntax.
- Svelte components use `<!-- [DEF:...] -->` in markup and `// [DEF:...]` in script blocks.
- Complexity assignments and required tag coverage detailed in `contracts/modules.md`.
- C4+ Python modules instrumented with `belief_scope()` + `reason()`/`reflect()`/`explore()` markers.
- C5 contracts carry `@RATIONALE` and `@REJECTED` decision memory.
- `@RELATION` targets reference existing contracts where applicable (e.g., `[LLMProviderService:Module]`, `[SchedulerService:Class]`, `[SupersetClient:Module]`, `[NotificationService:Module]`).
## Complexity Tracking
No constitutional violations detected. All complexity assignments are justified within the semantic protocol's complexity scale:
| Contract | Complexity | Justification |
|----------|-----------|---------------|
| `TranslationOrchestrator` | C5 | Stateful lifecycle with PRE/POST, multi-step coordination, decision memory for retry/concurrency policies |
| `TranslationScheduler` | C4 | Stateful schedule CRUD with APScheduler integration, conflict detection; no decision memory needed |
| `TranslationEventLog` | C5 | Immutable event log with retention enforcement, audit invariants, decision memory for pruning strategy |
| `TranslationPreview` | C4 | Stateful preview with LLM calls, approve/edit/reject lifecycle |
| `TranslationExecutor` | C4 | Batch execution with retry, INSERT generation, progress tracking |
| `DictionaryManager` | C4 | CRUD with import/export, per-batch filtering, conflict resolution |
| Svelte components | C3 | UI state machines with UX_STATE/FEEDBACK/RECOVERY/REACTIVITY bindings |

View File

@@ -0,0 +1,288 @@
# Quickstart: LLM Table Translation Service
**Feature Branch**: `028-llm-datasource-supeset`
**Date**: 2026-05-08
## Prerequisites
- Running ss-tools instance (Docker Compose or local)
- Superset connection configured in ss-tools settings
- At least one LLM provider configured (Settings → LLM)
- Target insertable PostgreSQL physical table exists in Superset with compatible schema
- User has appropriate RBAC permissions (admin by default)
## 1. Start the Application
```bash
# Docker (recommended)
cd /home/busya/dev/ss-tools
docker compose up --build
# Or local development
# Terminal 1 — Backend
cd backend
source .venv/bin/activate
python -m uvicorn src.app:app --reload --port 8001
# Terminal 2 — Frontend
cd frontend
npm run dev -- --port 5173
```
- Frontend: http://localhost:5173
- Backend API: http://localhost:8001
- API Docs: http://localhost:8001/docs
## 2. Create a Terminology Dictionary
### Via UI
1. Navigate to http://localhost:5173/translate/dictionaries
2. Click **[+ New Dictionary]**
3. Enter name: `Product Terms`, language: `ru`
4. Add entries inline or click **[Import CSV]**
5. Save
### Via API
```bash
curl -X POST http://localhost:8001/api/translate/dictionaries \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"name": "Product Terms",
"target_language": "ru",
"entries": [
{"source_term": "invoice", "target_term": "накладная"},
{"source_term": "widget", "target_term": "виджет"},
{"source_term": "backorder", "target_term": "предзаказ"}
]
}'
```
**Expected**: 201 Created with dictionary ID and entry count = 3.
## 3. Create a Translation Job
### Via UI
1. Navigate to http://localhost:5173/translate
2. Click **[+ New Translation Job]**
3. Select Superset datasource → columns auto-populate
4. Set:
- Translation column: `product_name`
- Context columns: `category_name`, `product_description`
- Key columns: `product_id`
- Target table: `products_i18n`
- Target column: `translated_name`
- Target language: `Russian`
- Attach dictionary: `Product Terms`
5. Click **[Save & Preview]**
### Via API
```bash
curl -X POST http://localhost:8001/api/translate/jobs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"name": "Products RU Translation",
"datasource_id": "<datasource-uuid>",
"source_table": "products",
"translation_col": "product_name",
"context_cols": ["category_name", "product_description"],
"source_key_cols": ["product_id"],
"target_key_cols": ["product_id"],
"target_table": "products_i18n",
"target_col": "translated_name",
"target_language": "ru",
"batch_size": 50,
"dictionary_ids": ["<dictionary-uuid>"]
}'
```
**Expected**: 201 Created with job ID. Validation passes (columns exist, target table accessible).
**Error case**: 422 if translation column is empty; 400 if target table not found.
## 4. Preview Translations
### Via UI
1. Open the saved job → click **[Preview]**
2. System shows ~10 rows with source, context, and LLM translation
3. Approve good translations, edit or reject bad ones
4. Click **[Approve All]** or handle individually
### Via API
```bash
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/preview \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"sample_size": 10}'
```
**Expected**: 200 with array of PreviewRow objects (source_text, context, llm_translation, status=pending).
**Error case**: 503 if LLM provider unreachable; error message includes provider name and reason.
## 5. Execute Full Translation Run
### Via UI
1. After preview approval, click **[Start Full Run]**
2. Confirm cost estimate dialog
3. Watch live progress bar (WebSocket-driven)
4. On completion: view run summary with translation status, insert status, Superset query reference, and generated SQL (audit).
### Via API
```bash
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/runs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"upsert_strategy": "insert"}'
```
**Expected**: 202 Accepted with run ID. WebSocket messages stream progress. Final GET returns run with `status=completed`, `translated_rows=N`, `insert_sql=<SQL>`.
**Partial failure**: `status=partial`, `failed_rows>0`. **[Retry Failed]** available.
## 6. Execute INSERT through Superset SQL Lab API
### Via UI
1. After translation completes, the system automatically submits SQL to Superset
2. Progress indicator shows: «📤 Submitting to Superset...»
3. On success: «✅ Insert succeeded · 1,241 rows affected · Query #a7f3b2c»
4. Click **[View SQL]** to audit the generated statement
### Via API
```bash
# Trigger full run (backend handles Superset submission automatically)
curl -X POST http://localhost:8001/api/translate/jobs/<job-id>/runs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"upsert_strategy": "insert"}'
# Check run status (includes insert_status and superset_query_id)
curl http://localhost:8001/api/translate/runs/<run-id> \
-H "Authorization: Bearer <token>"
```
**Expected**: Run response includes `insert_status: "succeeded"`, `superset_query_id`, `rows_affected`.
**Insert failure**: `insert_status: "failed"`, `insert_error_message` populated. **[Retry Insert]** re-submits without re-translating.
### Verify in Target Table
```sql
-- Run directly in Superset SQL Lab to verify
SELECT * FROM products_i18n WHERE translated_name IS NOT NULL;
```
## 7. Feedback Loop — Correct a Translation
### Via UI
1. Open run results → find a mistranslated word
2. Highlight the word → **[Correct this term]** popup
3. Enter correction → select dictionary → submit
4. Re-run preview to verify correction is used
### Via API
```bash
curl -X POST http://localhost:8001/api/translate/corrections \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"record_id": "<record-uuid>",
"source_term": "Monitor Stand",
"source_term": "Monitor Stand",
"incorrect_target_term": "Мониторная стойка",
"corrected_target_term": "Подставка для монитора",
"dictionary_id": "<dictionary-uuid>"
}'
```
**Expected**: 201. Term pair added to dictionary. Conflict dialog if term already exists.
## 8. Configure Schedule
### Via UI
1. Open job → **Schedule** tab
2. Set type: Cron → `0 6 * * 1` (every Monday 06:00)
3. Toggle auto-INSERT: ON
4. Verify next 3 execution times
5. Enable schedule
### Via API
```bash
curl -X PUT http://localhost:8001/api/translate/jobs/<job-id>/schedule \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"schedule_type": "cron",
"cron_expression": "0 6 * * 1",
"timezone": "Europe/Moscow",
"concurrency": "skip"
}'
```
**Expected**: 200 with schedule config including `next_run_at`.
**Verify**: Check APScheduler jobs (backend log) or wait for next trigger and check run history.
## 9. View History and Metrics
### Via UI
1. Navigate to http://localhost:5173/translate/history
2. Filter by datasource, target table, or date range
3. Click a run for details: config snapshot, prompt, translations, INSERT SQL
### Via API
```bash
# List runs
curl http://localhost:8001/api/translate/runs?job_id=<job-id> \
-H "Authorization: Bearer <token>"
# Get metrics
curl http://localhost:8001/api/translate/jobs/<job-id>/metrics \
-H "Authorization: Bearer <token>"
```
**Expected**: Run list with status and row counts. Metrics with cumulative tokens and cost.
## 10. Verification Checklist
### Backend Tests
```bash
cd backend
source .venv/bin/activate
# Unit tests for translation plugin
pytest src/plugins/translate/__tests__/ -v
# Integration tests for translate API
pytest tests/test_translate_api.py -v
# All backend tests
pytest -v
```
### Frontend Tests
```bash
cd frontend
npm run test -- --run
```
### Linting
```bash
# Python
cd backend && ruff check src/plugins/translate/ src/api/routes/translate.py src/models/translate.py src/schemas/translate.py
# Svelte
cd frontend && npm run build # build includes type checking
```
### Manual Smoke Test
1. Create dictionary with 3 terms → verify in list
2. Import CSV with 50 terms → verify no duplicates (check conflict dialog)
3. Create job → verify column list populates from datasource
4. Preview with empty dictionary → verify LLM still translates
5. Preview with attached dictionary → verify glossary terms used (check `invoice``накладная`)
6. Full run with 50 rows → verify INSERT SQL has 50 VALUES tuples
7. Scheduled run (set to every 5 min for test) → verify run appears in history
8. Feedback loop: correct 1 term → re-preview → verify correction reflected
9. Delete dictionary attached to active job → verify blocked
10. Check metrics dashboard → verify run counts and token totals

View File

@@ -0,0 +1,188 @@
# Research: LLM Table Translation Service
**Feature Branch**: `028-llm-datasource-supeset`
**Date**: 2026-05-08
## R1: Plugin Placement — New Plugin vs. Extending LLMAnalysisPlugin
### Decision
Create a new standalone plugin `TranslationPlugin` at `backend/src/plugins/translate/` rather than extending the existing `LLMAnalysisPlugin`.
### Rationale
- `LLMAnalysisPlugin` is focused on dashboard validation and documentation generation — a different domain from batch table translation.
- Translation requires new ORM models (`TranslationJob`, `TranslationRun`, `TranslationRecord`, `TerminologyDictionary`, `TranslationSchedule`, `TranslationEvent`), new API routes (`/api/translate/*`), new Svelte components, and new scheduler integration — scope that warrants a dedicated plugin.
- Existing plugin system (`PluginBase`) already supports multiple independent plugins and lazy discovery via `plugin_loader.py`.
- Separation avoids bloating `llm_analysis/plugin.py` (already 481 lines) and maintains the fractal limit (INV_7: <400 lines per module).
### Alternatives Considered
- **Extend LLMAnalysisPlugin**: Rejected because it would conflate two distinct feature domains, increase module size beyond fractal limit, and complicate RBAC permission boundaries.
- **Create as a standalone service in `backend/src/services/translate/`**: Rejected because the plugin lifecycle (register, unregister, configuration persistence, API exposure) is already standardized via `PluginBase`. A standalone service would duplicate plugin machinery.
### Impact
- New directory: `backend/src/plugins/translate/` with `plugin.py`, `orchestrator.py`, `preview.py`, `executor.py`, `dictionary.py`, `sql_generator.py`, `scheduler.py`, `events.py`, `metrics.py`, `__tests__/`.
- New route module: `backend/src/api/routes/translate.py`.
- New model module: `backend/src/models/translate.py`.
- New schema module: `backend/src/schemas/translate.py`.
- Registered in `backend/src/api/routes/__init__.py` `__all__` list.
---
## R2: LLM Prompt Construction Strategy
### Decision
Construct prompts using a layered template approach: base system prompt dictionary glossary (per-batch filtered) context columns translation column values. Leverage existing `llm_prompt_templates.py` for template rendering and `LLMProviderService` for provider selection.
### Rationale
- Existing `llm_prompt_templates.py` already supports `render_prompt()` with Jinja2-like substitution and multimodal detection.
- Per-batch dictionary filtering (FR-044): scan batch rows for substring matches against dictionary `source_term` values; only include matched entries in the prompt. This keeps token usage proportional to batch content, not dictionary size.
- Context columns are appended as structured fields (e.g., `Category: {category_name}\nDescription: {product_description}`) before the translation column value.
- The system prompt explicitly instructs the LLM: "Use the provided glossary for exact matches. For partial matches, prefer glossary translations. For terms not in the glossary, translate naturally."
### Alternatives Considered
- **Full dictionary injection**: Rejected would exceed LLM context window for dictionaries >5000 terms.
- **Semantic embedding search**: Rejected — adds unnecessary complexity (vector DB dependency) when substring matching is sufficient for glossary use cases.
- **Separate LLM call for glossary matching**: Rejected — doubles API cost and latency without proportional quality gain.
### Impact
- `DictionaryManager` must implement `filter_for_batch(rows: list[str]) -> list[dict]` returning matched entries.
- Prompt template includes `{{ glossary }}` and `{{ context }}` placeholder blocks.
---
## R3: SQL Generation — Dialect-Aware INSERT/UPSERT for Superset API
### Decision
Detect the target database dialect from the Superset datasource's connection configuration at job save time. Generate dialect-appropriate safe SQL: `INSERT INTO ... VALUES (...)` for ClickHouse; `INSERT INTO ... VALUES (...)` or `INSERT ... ON CONFLICT ...` for PostgreSQL/Greenplum. Submit generated SQL to Superset via `/api/v1/sqllab/execute/`.
### Rationale
- Different databases use different UPSERT syntax: PostgreSQL has `ON CONFLICT`, ClickHouse has no standard UPSERT (use INSERT with deduplication or ALTER TABLE UPDATE). Greenplum is PostgreSQL-compatible.
- Superset knows the database backend via the connection's `backend`/`engine` field — the system queries this at configuration time and caches the dialect on the TranslationJob.
- For ClickHouse, the `insert` strategy generates plain INSERT; `skip_existing` is not natively supported (the system warns the user); `overwrite` uses ALTER TABLE UPDATE or INSERT with ReplacingMergeTree semantics (documented limitation).
- For PostgreSQL/Greenplum, full UPSERT support: `ON CONFLICT DO NOTHING` (skip_existing) and `ON CONFLICT DO UPDATE` (overwrite).
- Identifier quoting: PostgreSQL/Greenplum uses `"identifier"`; ClickHouse uses `` `identifier` `` or `"identifier"` depending on settings.
- Values are safely encoded per dialect: strings escaped, NULLs rendered as `NULL`.
### Alternatives Considered
- **PostgreSQL-only**: Rejected — user's Superset instances may use ClickHouse as the primary analytical database. Dialect detection from the connection is the correct source of truth.
- **Manual SQL Lab copy/paste**: Rejected — Superset API execution is the canonical path.
- **UPDATE statements**: Rejected — source data is append-only (new-key-only strategy). UPSERT covers the overwrite case.
### Impact
- `TranslationJob.database_dialect` field caches the detected dialect at save time.
- `SQLGenerator` dispatches to dialect-specific formatters.
- Dialect-specific SQL syntax tests required for PostgreSQL and ClickHouse (SC-003).
- Unsupported dialects are rejected at configuration time with a clear error message.
---
## R4: Schedule Execution — APScheduler Integration
### Decision
Extend the existing `SchedulerService` (`backend/src/core/scheduler.py`) with a new job type `translate_scheduled_run`. Each translation job's schedule configuration is stored in the `TranslationSchedule` model and loaded into APScheduler on service start and on schedule create/update.
### Rationale
- Existing `SchedulerService` already manages `BackgroundScheduler`, cron triggers, start/stop lifecycle, and task manager integration.
- Translation schedules are distinct from backup schedules — stored in `translate` models, loaded via a registration callback pattern.
- Schedule trigger: APScheduler fires → `run_scheduled_translation(job_id)` → creates `TranslationRun` → orchestrator processes new-key-only rows → generates INSERT statements.
- Concurrency policy (skip/queue) enforced in the trigger handler before orchestrator invocation.
### Alternatives Considered
- **Separate scheduler instance**: Rejected — creates resource contention (two APScheduler instances) and complicates Docker deployment.
- **Celery/Redis-based scheduling**: Rejected — adds infrastructure dependency; APScheduler is already proven in this codebase.
- **Cron-based external scheduling**: Rejected — requires OS-level cron configuration, loses programmatic control over pause/resume and concurrency policies.
### Impact
- `backend/src/plugins/translate/scheduler.py` registers translation job schedules with `SchedulerService`.
- New trigger function `_execute_scheduled_translation(job_id: str)` imported by `SchedulerService`.
- Existing `SchedulerService.load_schedules()` extended to discover and register translation schedules alongside backup schedules.
---
## R5: Observability — Structured Event Log + MetricSnapshot
### Decision
Implement a dedicated `TranslationEvent` ORM model with type-specific payload (JSON) for structured event logging (FR-046). Events are written synchronously within the orchestrator flow. Per-job cumulative metrics (FR-047) are computed from live `TranslationEvent` rows (for recent data <90 days) combined with `MetricSnapshot` rows (for historical data >90 days). At pruning time, a `MetricSnapshot` is persisted capturing cumulative tokens, cost, and run counts before events are deleted.
### Rationale
- Structured events provide queryability for audit, trend analysis, and the admin dashboard without coupling to log parsing infrastructure.
- JSON payload allows type-specific data.
- MetricSnapshot persistence before pruning ensures cumulative metrics survive the 90-day retention window (SC-014).
- The metrics dashboard reads: `latest MetricSnapshot + events WHERE timestamp > snapshot.covers_events_before`.
- Synchronous event writes within the run transaction ensure no event loss during crashes.
### Alternatives Considered
- **Application log (stdout) only**: Rejected — not queryable for dashboards or audit.
- **Separate metrics table with counters (dual-write)**: Rejected — dual-write consistency risk; event-sourced + snapshot is simpler.
- **Events-only (no snapshots)**: Rejected — cumulative metrics would be lost after 90-day pruning (FR-049).
### Impact
- `TranslationEvent` model with nullable `run_id` for pre-run events.
- `MetricSnapshot` model with `covers_events_before` timestamp for correct cutoff.
- `MetricsService` queries aggregation from events + snapshots.
- APScheduler daily job: persist snapshot → prune expired events/records.
---
## R6: Frontend Architecture — Svelte 5 Runes Pattern
### Decision
Use Svelte 5 runes (`$state`, `$derived`, `$effect`) for all reactive state management in translation components. Store layer uses a dedicated `translate.js` Svelte store module with `$state` runes for job list, current job config, preview state, and run progress. API calls use the existing `requestApi`/`fetchApi` wrapper pattern.
### Rationale
- Svelte 5 runes are the canonical reactivity model for this codebase (Svelte 5.43+ in package.json).
- Dedicated store per feature domain follows existing patterns (`frontend/src/lib/stores/` houses auth, settings, task stores).
- WebSocket for run progress reuses the existing `TaskManager` WebSocket infrastructure — translation runs emit progress events on the same channel.
- Components follow existing layout patterns: Tailwind CSS, `@UX_STATE`/`@UX_FEEDBACK`/`@UX_RECOVERY`/`@UX_REACTIVITY` contract tags.
### Alternatives Considered
- **Svelte 4 stores (writable/derived)**: Rejected — codebase has already migrated to Svelte 5 runes; mixing patterns creates inconsistency.
- **Separate WebSocket channel**: Rejected — existing Task Drawer WebSocket infrastructure handles progress events generically; translation runs fit the same pattern.
### Impact
- New SvelteKit route: `frontend/src/routes/translate/` with sub-routes for job config, dictionaries, history.
- New component library: `frontend/src/lib/components/translate/` with 8 components.
- New store: `frontend/src/lib/stores/translate.js` with `$state` runes.
- New API client: `frontend/src/lib/api/translate.js`.
---
## R7: RBAC Permission Model Integration
### Decision
Define 13 permission strings per the Access Control Matrix in spec.md and enforce them via the existing `PermissionChecker` dependency in FastAPI route handlers. No new database tables needed — the existing `permissions` and `role_permissions` tables store string-based permissions.
### Rationale
- Existing RBAC model stores permissions as strings in `role_permissions.permission` column, checked via dependency injection in route handlers.
- Granular permissions per resource type align with the existing pattern.
- Ownership constraints (owner OR admin) are enforced in route handlers alongside permission checks.
- Missing from original design: `translate.job.view`, `translate.dictionary.view`, `translate.schedule.view`, `translate.metrics.view` added for read-only access scenarios.
### Alternatives Considered
- **Resource-level ownership only (no granular permissions)**: Rejected — spec explicitly requires granular permissions (FR-043).
- **Separate permission table per resource**: Rejected — over-engineered; string-based permissions are sufficient.
### Impact
- 13 permission strings registered in RBAC seed.
- Route handlers annotated with `Depends(require_permission(...))` + ownership checks.
- Admin UI displays new permission strings for role assignment.
- Default analyst role: `translate.job.view`, `translate.job.execute`, `translate.dictionary.view`, `translate.history.view`.
---
## R8: Testing Strategy
### Decision
Multi-layer testing: (1) pytest unit tests for orchestrator, executor, dictionary manager, SQL generator, scheduler, event log; (2) pytest integration tests for API routes with test database; (3) vitest component tests for Svelte components using @testing-library/svelte; (4) manual verification via `quickstart.md` for end-to-end flow.
### Rationale
- Unit tests with mocked LLM responses and Superset client ensure fast feedback for business logic.
- Integration tests verify API contract, database schema, RBAC enforcement, and schedule trigger behavior.
- Component tests validate Svelte 5 rune reactivity, UX state transitions, and error recovery paths.
- Manual quickstart provides a human-verifiable happy path that catches integration issues between backend and frontend.
### Alternatives Considered
- **E2E tests with Playwright**: Deferred to future iteration — adds maintenance overhead; quickstart manual verification is sufficient for initial delivery.
### Impact
- Test files: `backend/src/plugins/translate/__tests__/`, `backend/tests/test_translate_api.py`, `frontend/src/lib/components/translate/__tests__/`.
- Fixtures: mock LLM provider responses, mock Superset client, test dictionary data, test translation job configuration.

View File

@@ -0,0 +1,332 @@
# Feature Specification: LLM Table Translation Service
**Feature Branch**: `028-llm-datasource-supeset`
**Created**: 2026-05-08
**Status**: Draft
**Input**: User description: "Я хочу добавить сервис llm перевода данных в таблицах. Пусть механизм использует datasource supeset для получения данных (строки для перевода + контекст), и insert values в материализованные таблы (можно выполнять в sqllab) для готовых строк. Обязательно нужно выбирать столбец для переводаб столбцы контекста, ключи (может быть несколько) по которым данные будут инсертится в таблу целевую."
## Clarifications
### Session 2026-05-08
- Q: Should translation jobs support scheduled/periodic execution? → A: Yes, translation jobs can be placed on a schedule (cron-like or interval-based), with each scheduled trigger creating a new Translation Run and optionally auto-executing the INSERT via Superset SQL Lab API.
- Q: Should the system support a terminology dictionary passed as LLM context? → A: Yes, a user-maintained terminology dictionary (source_term → target_translation pairs) is passed as additional context to the LLM during translation to ensure consistent, domain-accurate translations. Dictionaries can be created, edited, populated manually, and attached to translation jobs.
- Q: Can users feed corrections from translation results back into the dictionary? → A: Yes, in the run results view users can select a specific incorrectly translated word/phrase, provide the corrected translation, and submit it to a chosen terminology dictionary for future runs.
- Q: What access control model should govern translation jobs, dictionaries, and run execution? → A: Fine-grained configurable permissions through the existing role-based access control (RBAC) model.
- Q: How should the system handle large dictionaries (10K+ terms) that would exceed LLM context window limits if injected in full? → A: Per-batch filtering: before each LLM call, the system scans the rows in the current batch and includes only those dictionary entries whose source_term appears as a substring (case-insensitive, word-boundary-aware) in at least one row of the batch. The dictionary itself has no hard size limit; the prompt grows proportionally to batch content, not total dictionary size.
- Q: How should scheduled runs detect which source rows need translation (change detection strategy)? → A: New-key-only: the system translates only rows whose key-column values are absent from the most recent run with `insert_status = succeeded` of the same job. Source data is append-only (INSERT-only, no UPDATEs), so existing rows are never retranslated. If the last successful run is older than 90 days and its key data has been pruned, the system falls back to full translation, treating all keys as new. A `run_started` event with reason `baseline_expired` is emitted; the run proceeds as a normal full translation. Manual runs remain available for full retranslation.
- Q: What level of observability (logging, metrics, alerting) is required for production operation? → A: Full observability: structured event log for every run, batch, and schedule trigger; latency metrics per batch; success/failure counters per job; token usage and cost trends; and an admin dashboard aggregating these signals. Failure notifications via existing notification infrastructure.
- Q: How long should detailed translation run data (source snapshots, translations, INSERT statements) be retained? → A: 90 days of full detail, then aggregation: detailed run snapshots are retained for 90 days after run completion. Beyond 90 days, only the run metadata record and aggregated metrics (row count, status, token usage, cost) are preserved; source row snapshots and generated INSERT statements are pruned. Cumulative metrics are persisted in a metric snapshot table before event pruning.
- Q: How is INSERT SQL executed against the target table? → A: All INSERT/UPSERT execution goes through Superset SQL Lab API `/api/v1/sqllab/execute/`. The system submits generated SQL, polls execution status, and records the Superset query reference and outcome. Manual copy/paste into SQL Lab UI is not a supported workflow; generated SQL may be exposed for audit/debugging only.
- Q: What is the preview model — quality gate or row-level approval? → A: Preview is a quality gate for prompt, settings, and dictionary. After preview is accepted, the full run processes all eligible source rows. Approve/edit/reject actions in preview apply only to the preview sample and serve as quality feedback; they do not gate individual unseen rows. Rejected preview rows are excluded from the full run only if they were part of the sample.
### Session 2026-05-08 (post-review)
- Q: Should the system support "keep both" as a dictionary conflict resolution option? → A: No. Dictionary entries are unique per (dictionary, source_term). Conflict options are: overwrite, keep existing, or cancel. Variant support is deferred.
- Q: What database dialect does SQL generation target? → A: The dialect is determined dynamically from the Superset datasource's database connection type. Supported dialects for MVP: PostgreSQL (including Greenplum) and ClickHouse. The system queries Superset for the database backend and generates dialect-appropriate SQL (identifier quoting, UPSERT syntax, value encoding).
- Q: How does the system handle the case where the last successful run's key data has been pruned (90-day retention) and a scheduled new-key-only run triggers? → A: The system falls back to full translation, treating all keys as new. A `run_started` event with reason `baseline_expired` is emitted; the run proceeds as a normal full translation run, recording the usual terminal event upon completion.
- Q: How are cumulative metrics preserved beyond the 90-day event/record retention window? → A: A metric snapshot is persisted at pruning time, capturing cumulative token count, cost, and run counts. The metrics dashboard reads from both live events and snapshots.
- Q: What happens when an in-progress run exists and the job configuration is edited? → A: In-progress runs are NOT invalidated. They continue using their config snapshot taken at run start. Configuration changes apply to future runs only (snapshot isolation).
## User Scenarios & Testing *(mandatory)*
### User Story 1 - Configure a translation job from a Superset datasource (Priority: P1)
An analytics engineer or localization specialist selects a Superset datasource, picks the column whose values need translation, optionally selects context columns that help the LLM understand the meaning, specifies one or more key columns that uniquely identify rows for target-table insertion (with explicit source-to-target key column mapping), and designates the target insertable physical table and column where translated values will be written.
**Why this priority**: Without a correctly configured translation job, no data can flow from source to target. Configuration is the critical prerequisite that gates all downstream value.
**Independent Test**: Can be fully tested by opening the translation job configuration interface, selecting a Superset datasource, specifying translation, context, and key columns with source→target mapping, defining the target table and target column, saving the configuration, and verifying that the system detects the database dialect from the Superset connection, validates column existence and key mapping, and warns if the dialect is unsupported.
**Acceptance Scenarios**:
1. **Given** the user opens the translation configuration interface, **When** they select a Superset datasource, **Then** the system displays available columns with their types and allows the user to designate one translation column, zero or more context columns, and at least one key column with explicit mapping to target key columns.
2. **Given** the user selects columns from the datasource, **When** they specify a target table and target column name, **Then** the system validates that the mapped key columns exist in both the source datasource schema and the target table schema.
3. **Given** the user configures multiple key columns (composite key), **When** the configuration is saved, **Then** the system stores the composite key definition with source→target column mapping and uses it for matching rows during INSERT generation.
4. **Given** the user attempts to save a configuration with no translation column selected, **When** save is triggered, **Then** the system blocks the action and highlights the missing required field.
5. **Given** the user selects a translation column and context columns, **When** the datasource has computed or virtual columns, **Then** the system distinguishes physical columns from virtual columns and warns if a virtual column is selected as a key column (virtual columns as translation/context columns are allowed if Superset can query them).
---
### User Story 2 - Preview translated output as quality gate (Priority: P2)
Before committing translated data into a target table, the user previews a sample of source rows alongside their LLM-generated translations, reviews translation quality against the provided context and attached dictionaries, adjusts the LLM prompt or target language if needed, and confirms the preview as a quality gate. Preview is a quality check for prompt/settings/dictionary — not a row-level approval for all dataset rows. After preview is accepted, the full run processes all eligible source rows.
**Why this priority**: Translation quality assurance is essential—blindly inserting machine-translated content without preview creates data quality risk that undermines the entire feature.
**Independent Test**: Can be fully tested by running a preview on a translation job with a small sample (e.g., 510 rows), verifying that the system shows source values, context values, LLM translations side-by-side, allowing language/prompt adjustment, and confirming that the preview acceptance gates the full run.
**Acceptance Scenarios**:
1. **Given** a saved translation job configuration, **When** the user requests a preview, **Then** the system fetches a configurable number of source rows, sends them to the LLM with the configured context columns and per-batch filtered dictionary, and displays source text, context, and translation in a side-by-side view.
2. **Given** the preview results are displayed, **When** the user finds an unsatisfactory translation, **Then** they can mark it for retranslation, edit it manually, or reject it as quality feedback. These actions apply only to the preview sample.
3. **Given** the user adjusts the translation prompt or target language, **When** they re-run the preview, **Then** the system re-fetches the same sample rows and applies the updated prompt/language settings.
4. **Given** the user is satisfied with preview quality, **When** they confirm preview acceptance, **Then** the system records the preview session as accepted and enables full execution. The full run will process all eligible source rows, not only the preview sample.
5. **Given** the source table contains a large number of rows, **When** the user requests a full batch execution, **Then** the system warns about the estimated row count, token usage, and cost before proceeding, and allows the user to set a row limit.
---
### User Story 3 - Execute translation and insert results via Superset SQL Lab API (Priority: P3)
The user initiates the full translation batch, the system processes rows through the LLM in configurable batches, generates safe INSERT/UPSERT SQL for the target table keyed by the configured key columns, submits the SQL to Superset via `/api/v1/sqllab/execute/`, polls execution status, and records the Superset query reference with full traceability.
**Why this priority**: Execution is the final value-delivery step; once configuration and quality preview are sound, the user needs reliable, auditable insertion of translated data.
**Independent Test**: Can be fully tested by executing a full batch on a configured job with preview accepted, verifying that the system generates correct INSERT SQL, submits it to Superset SQL Lab API, and records the execution outcome with Superset query reference.
**Acceptance Scenarios**:
1. **Given** a translation job with preview accepted, **When** the user triggers execution, **Then** the system processes source rows in configurable batches, calls the LLM for each batch, generates safe INSERT/UPSERT SQL, and submits it to Superset SQL Lab API `/api/v1/sqllab/execute/`.
2. **Given** the SQL is submitted to Superset, **When** the system polls execution status, **Then** the run result shows the Superset execution status, query reference, rows affected (if available), and any errors.
3. **Given** a row already has a translation in the target table (matched by key columns via UNIQUE/PRIMARY KEY constraint), **When** the user triggers execution, **Then** the system applies the configured UPSERT strategy: skip existing (ON CONFLICT DO NOTHING), overwrite (ON CONFLICT DO UPDATE), or plain INSERT (relies on user ensuring key uniqueness).
4. **Given** the LLM fails to translate a batch (timeout, rate limit, API error), **When** the batch fails, **Then** the system records the failure in the TranslationBatch record with error details, and allows the user to retry only the failed batch without reprocessing successful batches.
5. **Given** execution completes, **When** the user reviews the run result, **Then** the system shows the number of rows translated, rows skipped, batches processed, Superset execution reference, target table name, and the generated SQL (for audit/debugging).
---
### User Story 4 - Review translation history and audit trail (Priority: P4)
A data steward or auditor reviews past translation runs, inspects which rows were translated with which prompts, traces INSERT executions back to their source rows via Superset query references, and verifies that translation decisions (approvals, edits, rejections) are preserved for compliance.
**Why this priority**: Auditability is important for enterprise use but does not block the core translation workflow. It can be delivered after the primary flow is functional.
**Independent Test**: Can be fully tested by opening the translation history view, selecting a past run, verifying that source rows, translations, prompts, key values, Superset execution references, and INSERT SQL are displayed, and confirming that filtered views by datasource, target table, and date range work.
**Acceptance Scenarios**:
1. **Given** multiple translation runs exist, **When** the user opens translation history, **Then** the system lists runs with datasource name, target table, row count, execution date, status (translation + insert), and the user who triggered them.
2. **Given** a specific translation run is selected, **When** the user inspects its details, **Then** the system shows the configuration snapshot, the prompt template, the sample of source rows with their translations, the generated SQL, and the Superset execution reference with status.
3. **Given** a translation run contains rows that were manually edited during preview, **When** the user inspects those rows, **Then** the system clearly marks the original LLM translation and the user-edited final value separately.
4. **Given** the user wants to reuse a previous configuration, **When** they duplicate a past translation job, **Then** the system creates a new job pre-filled with the previous datasource, columns, keys, and target table configuration.
5. **Given** a run's detailed data has been pruned (older than 90 days), **When** the user views it, **Then** the system shows run metadata and aggregated metrics; source row snapshots and SQL are marked as unavailable.
---
### User Story 5 - Build and manage a terminology dictionary for consistent translations (Priority: P2)
A localization specialist or domain expert creates a terminology dictionary containing source-term → target-translation pairs, populates it manually or via bulk import, and attaches it to a translation job so the LLM respects these fixed translations rather than guessing domain-specific terms. The dictionary content is injected into the LLM prompt as authoritative context alongside the regular context columns. Dictionary terms are matched against batch rows using case-insensitive, word-boundary-aware substring comparison.
**Why this priority**: Without a terminology dictionary, domain-specific terms will be translated inconsistently or incorrectly by the LLM, undermining trust in the entire translation pipeline. The dictionary must be available before preview and execution to deliver acceptable quality.
**Independent Test**: Can be fully tested by creating a dictionary with 510 term pairs, attaching it to a translation job, running preview, and verifying that the LLM output consistently uses the dictionary translations for matched terms rather than generating alternative translations.
**Acceptance Scenarios**:
1. **Given** the user navigates to the dictionary management section, **When** they create a new dictionary, **Then** the system provides an empty table with «Source Term» and «Target Translation» columns and allows adding rows one by one.
2. **Given** a dictionary has been created, **When** the user opens it, **Then** they can add new term pairs inline, edit existing pairs, delete individual entries, or clear the entire dictionary with confirmation.
3. **Given** the user has an external list of terms (CSV, TSV, or pasted text), **When** they import it into the dictionary, **Then** the system parses the file, shows a preview of detected term pairs, flags duplicates or conflicts, and allows the user to confirm or adjust before saving. Duplicate source_term entries offer: overwrite, keep existing, or skip the new entry.
4. **Given** a translation job configuration is open, **When** the user selects a dictionary from the list of available dictionaries (filtered to those matching the job's target language), **Then** the system attaches it to the job and the dictionary content will be injected into every LLM translation request for that job. Dictionaries with a mismatched target language are not offered.
5. **Given** a dictionary is attached to a job, **When** the LLM processes a batch, **Then** the system includes the per-batch filtered dictionary content in the prompt as an authoritative glossary, instructing the LLM to use the provided translations for exact matches and to consider them for partial or contextual matches.
6. **Given** multiple dictionaries exist, **When** the user attaches them to a job, **Then** the system merges them into the prompt context in priority order (lower priority number = higher precedence). When the same source_term appears in multiple dictionaries, the highest-priority entry is used; lower-priority duplicates are omitted and surfaced as non-blocking validation notes.
---
### User Story 6 - Correct translations and feed back into the dictionary (Priority: P3)
After a translation run completes, the user reviews the results and notices that a specific word or phrase was translated incorrectly. They select the problematic source term (from the source column value) and the incorrect target translation, provide the correct target translation, and submit it to a chosen terminology dictionary so that future runs use the corrected term. If the same source term already exists in the dictionary, the system asks whether to overwrite or keep the existing entry.
**Why this priority**: The feedback loop turns one-time corrections into permanent improvements. Without it, the same translation mistakes would recur across runs, forcing the user to manually edit the same terms repeatedly. It is valuable but depends on the dictionary (Story 5) already existing.
**Independent Test**: Can be fully tested by completing a translation run, identifying an incorrect translation, selecting the source term and incorrect target term, providing a corrected target term, submitting it to a dictionary, re-running the same job, and verifying that the new translation output uses the corrected term.
**Acceptance Scenarios**:
1. **Given** a completed translation run with results displayed, **When** the user selects a source term and its incorrect target translation within a translated value, **Then** the system shows a pop-up: «Correct this term» with the source term and incorrect translation pre-filled, and an empty input for the corrected target translation.
2. **Given** the user provides a corrected target translation in the pop-up, **When** they choose a target dictionary (matching the job's target language) from a dropdown and submit, **Then** the system adds the term pair to the selected dictionary and records the origin (which run, which row, which user, timestamp) for audit.
3. **Given** the corrected source term already exists in the selected dictionary, **When** the user submits, **Then** the system shows a conflict dialog: «Term already exists with translation 'X'. Overwrite with 'Y'?» with options to overwrite, keep existing, or cancel.
4. **Given** the user selects multiple incorrect translations across different rows in the result view, **When** they use bulk correction mode, **Then** the system collects all selected terms, allows mass editing of the corrected values, and submits them to the dictionary in one atomic operation (all succeed or all fail with conflicts listed).
5. **Given** a dictionary was updated via the feedback loop, **When** the user re-runs the same translation job, **Then** the system includes the newly added terms in the LLM prompt context and the translation output reflects the corrections.
---
### User Story 7 - Schedule translation jobs for periodic execution (Priority: P3)
A localization manager configures a translation job to run automatically on a schedule — for example, every Monday at 06:00 Europe/Moscow to translate new product names that appeared during the week. Each scheduled execution creates a new Translation Run with the job's configuration snapshot, generates INSERT SQL, submits it to Superset SQL Lab API, and records the outcome. Manual runs require a preview quality gate; scheduled runs may bypass preview only after the job has passed at least one successful manual run with the same effective configuration.
**Why this priority**: Scheduling eliminates the manual overhead of re-running translation jobs when source data changes. It is valuable for operational efficiency but depends on the core execution flow (Stories 13) already being stable.
**Independent Test**: Can be fully tested by configuring a schedule for a translation job (e.g., «every 5 minutes» for testing), waiting for the scheduled trigger, and verifying that a new Translation Run was created with the correct configuration, source rows translated (new-key-only), SQL submitted to Superset API, and execution outcome recorded.
**Acceptance Scenarios**:
1. **Given** a saved translation job, **When** the user opens the schedule configuration, **Then** the system offers schedule types: one-time future run, interval-based (every N minutes/hours/days), and cron-based (e.g., «0 6 * * 1»). All schedules include a timezone selector.
2. **Given** the user configures a schedule, **When** they enable it, **Then** the system validates the cron expression or interval, shows the next 3 planned execution times with timezone, and verifies that the job has at least one prior successful manual run before allowing scheduled execution.
3. **Given** a scheduled job reaches its trigger time, **When** the scheduler fires, **Then** the system creates a new Translation Run from the job's configuration snapshot, fetches the current source data, and executes the full translation pipeline. Preview is bypassed; the INSERT SQL is submitted to Superset SQL Lab API.
4. **Given** a scheduled run completes successfully, **When** the user reviews the run, **Then** the generated INSERT SQL is available for audit, and the Superset execution reference is recorded.
5. **Given** a scheduled run fails (LLM unavailable, datasource inaccessible, Superset API error), **When** the failure occurs, **Then** the system records the failed run with error details, leaves the schedule enabled for the next trigger, and notifies the user via the existing notification infrastructure.
6. **Given** a translation job has an active schedule, **When** the user edits the job configuration, **Then** the system warns that the schedule will use the updated configuration from the next trigger onward. In-progress runs are NOT invalidated — they continue using their config snapshot.
7. **Given** a scheduled job should be paused, **When** the user disables the schedule, **Then** the system stops triggering new runs but preserves the schedule configuration for later re-enabling.
8. **Given** a scheduled run triggers but no new source rows exist (all keys already translated), **When** the system detects this, **Then** a `run_noop` event is recorded with reason `no_new_rows` and no INSERT SQL is generated.
---
### Edge Cases
- What happens when the source datasource contains NULL values in the translation column?
→ System MUST skip NULL translation values and log them, continuing with the next row.
- What happens when a context column value is NULL or empty?
→ System MUST send the available context to the LLM, marking NULL context fields as empty with a clear placeholder.
- How does the system handle a key column value that does not exist in the target table?
→ System MUST generate INSERT statements (not UPDATE), treating all rows as new insertions. The key columns serve as identifiers but the target table may not have the row yet. If a UNIQUE/PRIMARY KEY constraint exists and a duplicate is inserted, the UPSERT strategy controls behavior.
- What happens when the target table does not exist or is inaccessible in Superset?
→ System MUST warn the user at configuration time and block execution with a clear explanation.
- How does the system handle very large source tables (100k+ rows)?
→ System MUST enforce configurable batch sizes, show progress, estimate token count and cost before execution, and allow cancellation mid-run.
- What happens when the LLM provider returns a response in an unexpected format or language?
→ System MUST request structured JSON output from the LLM keyed by stable row identifiers. The system MUST validate that each requested row has exactly one translation. Missing, duplicate, malformed, or extra outputs are marked as failed.
- How does the system handle concurrent translation runs on the same target table?
→ System MUST warn if another run targets the same table and key range, and provide guidance to avoid data conflicts.
- What happens when the user changes the translation column or key columns after a run has started?
→ In-progress runs are NOT invalidated. They continue using their config snapshot taken at run start (snapshot isolation). Configuration changes apply to future runs only.
- What happens when all rows in a batch fail to translate (LLM unavailable, quota exhausted)?
→ System MUST preserve the batch state in the TranslationBatch record and allow retry with the same or different LLM provider settings.
- How does the system handle composite keys where one key component is NULL?
→ System MUST reject rows with NULL key values during INSERT generation and report them as unprocessable.
- What happens when a terminology dictionary contains duplicate source terms?
→ System MUST detect duplicates at entry time and require explicit resolution (overwrite or keep existing) before saving.
- How does the system handle dictionary updates while a translation run is in progress?
→ System MUST snapshot the dictionary content at the start of each run so the run uses a consistent dictionary state throughout. Mid-run dictionary edits do not affect the in-progress run.
- What happens when an attached dictionary is deleted while a job references it?
→ System MUST warn the user and prevent deletion of dictionaries that are attached to active or scheduled jobs. Dictionaries attached only to historical runs can be deleted.
- How does the system handle a scheduled run overlapping with a still-running previous scheduled run?
→ System MUST detect overlap (same job, previous run still in progress) and either skip the new trigger (with a log event) or queue it for execution after the previous run completes, depending on the job's configured concurrency policy. Queue holds at most one pending run; additional triggers are skipped.
- What happens when a scheduled job's datasource becomes unavailable between triggers?
→ System MUST record the failure for that trigger, leave the schedule enabled, and attempt the next trigger as planned. After N consecutive failures (configurable, default 3), the system optionally disables the schedule and notifies the user.
- How does the system handle feedback-loop corrections that reference a different base language than the dictionary's target language?
→ System MUST validate that the target language of the dictionary matches the translation job's target language before allowing submission, and reject cross-language corrections with a clear message.
- What happens when a scheduled new-key-only run triggers but the last successful run is older than 90 days (key data pruned)?
→ System MUST fall back to full translation, treating all keys as new. A `run_started` event with reason `baseline_expired` is emitted; the run proceeds as a normal full translation run and records the usual terminal event.
- What happens when the Superset SQL Lab API execution returns an error?
→ System MUST record the error in `TranslationRun.insert_error_message` and mark `insert_status = failed`. The translation data remains available for retry or manual inspection.
- How does the system handle SQL identifier injection through user-provided table/column names?
→ System MUST validate table and column identifiers against Superset datasource metadata and quote them using the detected database dialect rules. Raw user-provided identifiers are never interpolated directly into SQL.
## Requirements *(mandatory)*
### Functional Requirements
- **FR-001**: The system MUST allow users to create a translation job by selecting a Superset datasource as the source of data.
- **FR-002**: The system MUST display available columns from the selected datasource and allow the user to designate exactly one column as the translation source column.
- **FR-003**: The system MUST allow the user to select zero or more context columns whose values are sent to the LLM alongside the translation text to improve translation quality.
- **FR-004**: The system MUST require the user to select at least one key column with explicit source→target column mapping (supports composite keys) that uniquely identifies each row for INSERT into the target table.
- **FR-005**: The system MUST allow the user to specify a target insertable physical table name and target column name where translated values will be inserted. Views and materialized views are not supported as targets.
- **FR-006**: The system MUST validate that the mapped key columns exist in both the source datasource schema and the target table schema, and are type-compatible.
- **FR-007**: The system MUST support configurable batch sizes for LLM processing to control throughput, token usage, and cost.
- **FR-008**: The system MUST provide a preview mode that fetches a limited sample of source rows, sends them to the LLM with filtered dictionary context, and displays source values, context, and translations side-by-side as a quality gate before full execution.
- **FR-009**: The system MUST allow the user to adjust the LLM translation prompt, target language, and provider settings within the translation job configuration.
- **FR-010**: The system MUST allow the user to mark preview rows as approved, manually edited, or rejected as quality feedback for the preview sample.
- **FR-011**: The system MUST require preview acceptance before allowing full execution. Rejected preview sample rows are excluded from the full run; approved/edited preview sample rows are included. Unseen rows (not in preview sample) are processed normally.
- **FR-012**: The system MUST generate safe INSERT/UPSERT SQL for the configured target table and target column, using the dialect detected from the Superset datasource's database connection (supported: PostgreSQL/Greenplum, ClickHouse). Identifier quoting, UPSERT syntax, and value encoding MUST follow dialect-specific rules. Raw user-provided identifiers MUST NOT be interpolated directly.
- **FR-013**: The system MUST submit generated SQL to Superset via `/api/v1/sqllab/execute/`, poll execution status, and record the Superset query reference, execution status, and error details. Generated SQL MAY be exposed for audit/debugging but is not the primary execution mechanism.
- **FR-014**: The system MUST estimate and display token count and approximate cost before executing a full translation batch.
- **FR-015**: The system MUST handle LLM failures (timeout, rate limit, API error) gracefully by recording the failed batch in TranslationBatch and allowing retry of only the failed rows.
- **FR-016**: The system MUST skip source rows where the translation column value is NULL and log them.
- **FR-017**: The system MUST reject rows where any key column value is NULL during INSERT generation.
- **FR-018**: The system MUST support an UPSERT strategy: `skip_existing` (ON CONFLICT DO NOTHING), `overwrite` (ON CONFLICT DO UPDATE), or `insert` (plain INSERT — user guarantees key uniqueness). The system MUST document that `insert` strategy does not handle duplicates.
- **FR-019**: The system MUST record each translation run with its configuration snapshot (including config_hash), dictionary snapshot, source rows, translations, prompt used, key values, generated SQL, and Superset execution outcome.
- **FR-020**: The system MUST provide a translation history view listing past runs with datasource, target table, row count, translation status, insert status, date, and triggering user.
- **FR-021**: The system MUST allow the user to duplicate an existing translation job configuration as a starting point for a new job.
- **FR-022**: The system MUST warn the user if a concurrent run targets the same target table and overlapping key range.
- **FR-023**: The system MUST use snapshot isolation: in-progress runs continue using their config snapshot taken at run start. Configuration changes apply to future runs only and do not invalidate in-progress runs.
- **FR-024**: The system MUST allow users to create, edit, and delete terminology dictionaries, each containing source-term → target-translation pairs.
- **FR-025**: The system MUST allow users to populate a dictionary by manual inline entry, bulk text paste, or file import (CSV, TSV).
- **FR-026**: The system MUST detect duplicate source terms within a dictionary at entry time and require the user to resolve conflicts (overwrite or keep existing) before saving.
- **FR-027**: The system MUST allow users to attach one or more terminology dictionaries to a translation job, with configurable priority ordering (lower priority number = higher precedence). Only dictionaries matching the job's target language are offered for attachment.
- **FR-028**: The system MUST inject the per-batch filtered content of all attached dictionaries into the LLM translation prompt as an authoritative glossary, instructing the LLM to use provided translations for exact matches and to consider them for partial matches.
- **FR-029**: The system MUST snapshot the dictionary content at the start of each translation run so the run uses a consistent dictionary state throughout.
- **FR-030**: The system MUST prevent deletion of dictionaries that are attached to active or scheduled translation jobs.
- **FR-031**: The system MUST allow users to identify a mistranslated term by selecting the source term and its incorrect target translation within a run result, and submit a corrected target translation to a chosen terminology dictionary.
- **FR-032**: The system MUST detect when a submitted correction conflicts with an existing dictionary entry and prompt the user to overwrite or keep the existing entry.
- **FR-033**: The system MUST record the origin of each dictionary entry added via the feedback loop, including source run identifier, source row, submitting user, and timestamp.
- **FR-034**: The system MUST support bulk correction mode where users select multiple incorrectly translated terms and submit them to a dictionary in one atomic operation (all succeed or all fail with conflicts listed).
- **FR-035**: The system MUST allow users to configure a schedule for a translation job, supporting one-time future execution, interval-based recurrence, and cron-based recurrence with timezone.
- **FR-036**: The system MUST display the next N planned execution times (with timezone) when a schedule is configured, so the user can verify the schedule before enabling it.
- **FR-037**: The system MUST, on each scheduled trigger, create a new Translation Run from the job's configuration snapshot and the current source data state.
- **FR-038**: The system MUST submit generated INSERT SQL to Superset SQL Lab API for every run (both manual and scheduled). Scheduled runs execute automatically; manual runs execute on user trigger.
- **FR-039**: The system MUST detect overlapping scheduled runs for the same job and handle them according to a configurable concurrency policy (skip new trigger or queue at most one run).
- **FR-040**: The system MUST allow users to pause (disable) and resume (re-enable) a schedule without losing the schedule configuration.
- **FR-041**: The system MUST optionally notify users of scheduled run failures via the existing notification infrastructure.
- **FR-042**: The system MUST warn the user when editing a job configuration that has an active schedule, confirming that the updated configuration will apply to future triggers without affecting in-progress runs.
- **FR-043**: The system MUST enforce granular access control on translation resources through the existing RBAC model (see Access Control Matrix below).
- **FR-044**: The system MUST filter dictionary entries per batch before sending to the LLM: only entries whose source_term appears as a case-insensitive, word-boundary-aware substring in at least one translation-column value within the current batch are included. Dictionaries have no hard size limit.
- **FR-045**: The system MUST, for scheduled runs, translate only source rows whose key-column values are absent from the most recent run with `insert_status = succeeded` (new-key-only strategy). If that run's key data has been pruned (>90 days), the system falls back to full translation with a `baseline_expired` event.
- **FR-046**: The system MUST emit structured events for every significant lifecycle transition: run started, batch started/completed/failed, run succeeded/partial/failed/cancelled/skipped, schedule triggered/skipped/failed, insert submitted/succeeded/failed. Events MUST be queryable for audit and trend analysis.
- **FR-047**: The system MUST track per-job cumulative metrics: total runs, success/failure ratio, cumulative token usage, cumulative estimated cost, average batch latency. Metrics MUST be exposed in an admin-accessible dashboard. Cumulative metrics MUST be persisted in a metric snapshot table before event pruning to survive the 90-day retention window.
- **FR-048**: The system MUST send a notification via the existing notification infrastructure when a scheduled run fails, including the job name, failure reason, and a link to the failed run details.
- **FR-049**: The system MUST retain detailed translation run data for 90 days. Beyond 90 days, the system MUST persist a metric snapshot (cumulative token count, cost, run counts) and prune detailed data (source row snapshots, TranslationRecord rows, TranslationEvent rows, generated SQL). Run metadata (row count, status, Superset reference) is preserved.
### Access Control Matrix
| Action | Required Permission | Ownership Constraint |
|--------|-------------------|---------------------|
| List jobs | `translate.job.view` | Scoped to owned jobs unless admin |
| View job | `translate.job.view` | Owner OR admin |
| Create job | `translate.job.create` | — |
| Edit job | `translate.job.edit` | Owner OR admin |
| Delete job | `translate.job.delete` | Owner OR admin |
| Execute job (manual run) | `translate.job.execute` | Owner OR admin; also requires Superset datasource read access |
| List dictionaries | `translate.dictionary.view` | Scoped to owned unless admin |
| Create dictionary | `translate.dictionary.create` | — |
| Edit dictionary | `translate.dictionary.edit` | Owner OR admin |
| Delete dictionary | `translate.dictionary.delete` | Owner OR admin |
| Use dictionary in job | Implicit: dictionary must be visible to user | — |
| View schedule | `translate.schedule.view` | Owner OR admin |
| Manage schedule | `translate.schedule.manage` | Owner OR admin |
| Auto-INSERT on schedule | `translate.schedule.manage` | Owner OR admin; also requires Superset target write access |
| View history | `translate.history.view` | Scoped to owned runs unless admin |
| View metrics | `translate.metrics.view` | Admin only by default |
### Key Entities *(include if feature involves data)*
- **Translation Job**: A persistent configuration binding a Superset datasource, source→target column mappings (translation, context, key columns), target insertable physical table/column, LLM settings, prompt template, attached dictionaries with priority ordering, and optional schedule.
- **Translation Run**: A single execution (manual or scheduled) with translation_status (pending|running|completed|partial|failed|cancelled|skipped) and insert_status (not_started|submitted|running|succeeded|failed|skipped). Contains config snapshot, dictionary snapshot, config_hash, Superset query reference, statistics.
- **Translation Batch**: A group of TranslationRecord rows processed in one LLM API call. Tracks batch_index, status, row counts, token_count, estimated_cost, latency_ms, error details.
- **Translation Record**: An individual row containing source text, context, key values, key_hash, LLM translation, optional user edit, final INSERT value, and status.
- **Preview Session**: A persistent record of a preview quality gate: job_id, user_id, sample_size, config_hash, dictionary_snapshot_hash, status (pending|accepted|rejected), timestamps.
- **Terminology Dictionary**: A named, language-specific (source_language optional, target_language required) collection of source_term → target_translation pairs with audit origin metadata.
- **Dictionary Entry**: A single term pair within a dictionary, unique per (dictionary_id, source_term). Stores origin metadata (run_id, row_key, user_id, timestamp) for feedback-loop entries.
- **Translation Schedule**: Scheduling configuration: type (cron|interval|once), expression, timezone, enabled state, concurrency policy, next_run_at.
- **Translation Event**: Immutable lifecycle event: run_id (nullable for pre-run events), job_id, event_type, timestamp, type-specific JSON payload.
- **Metric Snapshot**: Persistent cumulative metrics per job, saved at pruning time: job_id, snapshot_date, cumulative_tokens, cumulative_cost, total_runs, success_runs, failed_runs.
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: Users can configure a complete translation job (datasource → columns → keys → target) in under 3 minutes without external documentation.
- **SC-002**: Preview mode returns translation results for a sample of 10 rows within 30 seconds for standard LLM providers.
- **SC-003**: 100% of generated SQL for supported dialects (PostgreSQL/Greenplum, ClickHouse) is syntactically valid when tested against validated schemas for each dialect.
- **SC-004**: Users can recover from a failed batch (LLM timeout, rate limit) and retry only the failed rows in under 2 minutes.
- **SC-005**: Translation run audit records contain all required traceability information (configuration snapshot, prompt, source rows, translations, INSERT SQL, Superset execution reference) for 100% of completed runs within the 90-day retention window.
- **SC-006**: At least 80% of pilot users successfully complete the end-to-end flow (configure → preview → execute → verify) on their first attempt during moderated usability review.
- **SC-007**: NULL translation values are correctly skipped and logged without blocking the remaining rows in 100% of test cases.
- **SC-008**: Domain-specific terms covered by an attached dictionary are translated consistently (exactly matching the dictionary entry) in at least 95% of cases where the source term appears verbatim in the translation column.
- **SC-009**: Users can populate a 50-term dictionary via file import in under 1 minute, with duplicate detection completing in under 5 seconds.
- **SC-010**: Feedback-loop corrections submitted to a dictionary are reflected in the next run of the same job in 100% of cases where the corrected source term reappears.
- **SC-011**: Scheduled translation runs trigger within ±60 seconds of the planned execution time for at least 98% of triggers under normal operating conditions.
- **SC-012**: A scheduled run that overlaps with a still-running previous run is correctly skipped or queued (per the configured policy) in 100% of overlap scenarios.
- **SC-013**: Structured events for 100% of run lifecycle transitions are recorded and queryable within 10 seconds of occurrence.
- **SC-014**: Per-job cumulative metrics remain accurate (±5%) after pruning events older than 90 days, as verified by comparing pre-prune metric snapshots with post-prune dashboard values.
- **SC-015**: Detailed run data is pruned within 24 hours after exceeding the 90-day retention window, with metric snapshots and run metadata preserved intact in 100% of cases.
## Assumptions
- Users already have access to Superset datasources and permission to read data from them.
- The Superset instance supports `/api/v1/sqllab/execute/` and the user's Superset credentials have permission to execute SQL against the target database.
- The LLM provider is already configured in ss-tools (provider selection, API key, model selection are handled by the existing LLM infrastructure).
- The target table is a physical insertable table in a database backing the Superset datasource. The database dialect (PostgreSQL/Greenplum or ClickHouse for MVP) is detected from the Superset connection configuration. Views and materialized views are not supported as targets. Unsupported dialects are rejected at configuration time with a clear message.
- Translation quality is ultimately the user's responsibility; the system provides tools for preview, editing, and approval but does not guarantee translation accuracy.
- The primary use case is batch translation of static or slowly-changing reference data (not real-time streaming data).
- Multiple key columns (composite keys) are supported with explicit source→target column mapping.
- Preview is a mandatory quality gate before manual execution. Scheduled runs may bypass preview only after at least one successful manual run with the same effective configuration.
- Source data is append-only: new rows are INSERTed over time but existing rows are never UPDATEd in place. Scheduled runs use a new-key-only strategy — only previously unseen key values trigger translation.
- The feature is intended for internal operational use where data volume is measured in thousands to tens of thousands of rows per run.
- Terminology dictionaries are language-specific; a dictionary's target_language must match the job's target_language for attachment.
- The scheduling infrastructure builds on existing scheduler foundations already present in the ss-tools backend.
- Dictionary content is treated as authoritative by the LLM for exact matches. The LLM may deviate for terms not present in the dictionary or for partial matches.
- Dictionaries have no hard size limit; per-batch case-insensitive, word-boundary-aware filtering ensures only relevant terms are injected into each LLM prompt.
- The feedback-loop correction flow requires the user to identify both the source term and the incorrect target translation.
- Concurrency policies for scheduled runs default to «skip»; queuing holds at most one pending run per job.
- Access control for translation resources uses the existing RBAC infrastructure with the permission matrix defined above.
- Snapshot isolation: in-progress runs use their config snapshot; configuration edits affect only future runs.
- Cumulative metrics survive the 90-day retention window via metric snapshots persisted at pruning time.

View File

@@ -0,0 +1,345 @@
# Tasks: LLM Table Translation Service
**Feature Branch**: `028-llm-datasource-supeset`
**Input**: Design documents from `/specs/028-llm-datasource-supeset/`
**Prerequisites**: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/modules.md
**Tests**: Test tasks are included for all C4/C5 backend contracts, new API endpoints, and Svelte components with `@UX_STATE` contracts. Test work traces to contract `@PRE`/`@POST` guarantees and spec acceptance scenarios.
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (e.g., US1, US5)
- Include exact file paths in descriptions
---
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Create plugin directory structure and register the new route module in the lazy-import registry.
- [ ] T001 Create translation plugin directory structure: `backend/src/plugins/translate/__init__.py`, `backend/src/plugins/translate/plugin.py` (empty skeleton), plus `backend/src/plugins/translate/__tests__/__init__.py`
- [ ] T002 Register `translate` route module in `backend/src/api/routes/__init__.py` — add `"translate"` to `__all__` list inside `[DEF:Route_Group_Contracts:Block]`
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: ORM models, Pydantic schemas, plugin boilerplate, route skeleton, and database migration. ALL user stories depend on these artifacts.
**⚠️ CRITICAL**: No user story work can begin until this phase is complete.
### ORM Models
- [ ] T003 [P] Create all SQLAlchemy ORM models in `backend/src/models/translate.py`: `TranslationJob`, `TranslationRun`, `TranslationBatch`, `TranslationRecord`, `TranslationEvent`, `TranslationPreviewSession`, `TranslationPreviewRecord`, `TerminologyDictionary`, `DictionaryEntry`, `TranslationSchedule`, `TranslationJobDictionary`, `MetricSnapshot`. Follow patterns from `backend/src/models/llm.py` (UUID PKs, `generate_uuid`, `Base` inheritance, JSON columns, `UniqueConstraint`, indexes, timezone-aware DateTime with callable defaults). Include `source_term_normalized` column on `DictionaryEntry` with unique constraint for case-insensitive matching.
- [ ] T004 [P] Create Pydantic v2 request/response schemas in `backend/src/schemas/translate.py`: `TranslateJobCreate`, `TranslateJobUpdate`, `TranslateJobResponse`, `DictionaryCreate`, `DictionaryImport`, `DictionaryResponse`, `TermCorrectionSubmit`, `ScheduleConfig`, `TranslationRunResponse`, `TranslationPreviewResponse` (with `PreviewRow`), `MetricsResponse`. Follow existing `backend/src/schemas/` patterns (use `BaseModel`, `Field` with defaults/validation)
### Plugin Skeleton
- [ ] T005 Create `TranslatePlugin` class in `backend/src/plugins/translate/plugin.py` inheriting from `PluginBase`. Implement `id`, `name`, `description` properties. Wire `@RELATION INHERITS -> [PluginBase:Class]` in contract header. (RATIONALE: separate plugin avoids bloating `LLMAnalysisPlugin` beyond fractal limit; REJECTED: extending LLMAnalysisPlugin would conflate domains)
### Route Skeleton
- [ ] T006 Create `backend/src/api/routes/translate.py` with FastAPI `APIRouter` (prefix=`/api/translate`, tags=`["translate"]`). Define all endpoint stubs with `pass` bodies for now: CRUD jobs, CRUD dictionaries, preview trigger, run trigger, retry, schedule CRUD, run history, metrics, correction submission, dictionary import. Attach `Depends(require_permission(...))` annotations. Register router in `backend/src/app.py` alongside existing routers.
### Database Migration
- [ ] T007 Generate Alembic migration for all `translate_*` tables: `translation_jobs`, `translation_runs`, `translation_records`, `translation_events`, `terminology_dictionaries`, `dictionary_entries`, `translation_schedules`, `translation_job_dictionaries`. Run `cd backend && alembic revision --autogenerate -m "add translation tables"` and `alembic upgrade head`.
### RBAC Registration
- [ ] T008 Register 13 permission strings in the RBAC seed/permission store: `translate.job.view`, `translate.job.create`, `translate.job.edit`, `translate.job.delete`, `translate.job.execute`, `translate.dictionary.view`, `translate.dictionary.create`, `translate.dictionary.edit`, `translate.dictionary.delete`, `translate.schedule.view`, `translate.schedule.manage`, `translate.history.view`, `translate.metrics.view`. Ensure admin role gets all; analyst role gets `translate.job.view`, `translate.job.execute`, `translate.dictionary.view`, `translate.history.view`. Update role seeding script if needed.
**Checkpoint**: Foundation ready — models, schemas, plugin, routes, migration, and RBAC all in place. User story implementation can now begin.
---
## Phase 3: User Story 1 — Configure Translation Job (Priority: P1) 🎯 MVP
**Goal**: User can create, edit, delete, and list translation jobs with datasource selection, column mapping, key columns, target table configuration, LLM settings, and dictionary attachment.
**Independent Test**: Open Configuration form → select Superset datasource → pick translation/context/key columns → specify target table → save → verify job appears in list with correct settings.
### Backend — Job CRUD
- [ ] T009 [P] [US1] Implement job CRUD service in `backend/src/plugins/translate/plugin.py` as methods on the `TranslatePlugin` class: `create_job()`, `update_job()`, `delete_job()`, `get_job()`, `list_jobs()`, `duplicate_job()`. Validate column existence via `SupersetClient` on create/update (FR-001, FR-002, FR-006). Enforce composite key support (FR-004). Detect virtual columns and warn (US1 acceptance scenario 5).
- [ ] T010 [US1] Implement `/api/translate/jobs` endpoints in `backend/src/api/routes/translate.py`: `POST /` (create), `GET /` (list), `GET /{job_id}` (get), `PUT /{job_id}` (update), `DELETE /{job_id}` (delete), `POST /{job_id}/duplicate` (duplicate — FR-021). Inject `Depends(require_permission("translate.job.*"))` per operation.
- [ ] T011 [US1] Implement `/api/translate/datasources/{datasource_id}/columns` endpoint that queries Superset for column metadata (name, type, is_physical flag) and the database dialect (backend/engine) from the connection configuration. Returns column list AND `database_dialect` field for the frontend. Cache dialect on `TranslationJob.database_dialect` at save time. Reject unsupported dialects at configuration time (FR-002, dialect detection).
### Frontend — Job Config UI
- [ ] T012 [P] [US1] Create `TranslateApiClient` module in `frontend/src/lib/api/translate.js`: `fetchJobs()`, `createJob()`, `updateJob()`, `deleteJob()`, `duplicateJob()`, `fetchDatasourceColumns()`. Use existing `requestApi`/`fetchApi` wrapper pattern.
- [ ] T013 [US1] Create `TranslationJobList` SvelteKit page in `frontend/src/routes/translate/+page.svelte`: list all jobs with name, datasource, status/schedule indicators, create button, duplicate action. `@UX_STATE`: idle, loading, empty, populated, error.
- [ ] T014 [US1] Create `TranslationJobConfig` SvelteKit page in `frontend/src/routes/translate/[id]/+page.svelte`: datasource dropdown → column selectors (translation column, context columns, key columns with [+ Add key] for composite), target table/column inputs, LLM provider selector, target language, batch size, prompt template editor, dictionary attachment (multi-select with priority ordering). `@UX_STATE`: idle, loading, configured, saving, validation_error, datasource_unavailable. `@UX_REACTIVITY`: column list `$derived` from datasource selection.
### Verification — US1
- [ ] T015 [US1] Write pytest integration tests for job CRUD API in `backend/tests/test_translate_jobs.py`: test create with valid config, create with missing translation column (expect 422), create with virtual key column (expect warning), update job, delete job, duplicate job. Mock `SupersetClient` for column metadata.
- [ ] T016 [US1] Verify US1 acceptance scenarios against `specs/028-llm-datasource-supeset/spec.md` User Story 1 (5 scenarios). Run `cd backend && pytest backend/tests/test_translate_jobs.py -v`.
**Checkpoint**: Job CRUD fully functional — user can create, edit, list, and duplicate translation jobs with validated column mappings.
---
## Phase 4: User Story 5 — Terminology Dictionary Management (Priority: P2)
**Goal**: User can create, edit, delete dictionaries; add terms inline; import CSV/TSV with duplicate detection; attach dictionaries to jobs with priority ordering.
**Independent Test**: Create dictionary with 5 terms → import CSV with 50 terms → verify duplicates flagged → attach dictionary to job → verify dictionary appears in job config.
### Backend — Dictionary CRUD + Import
- [ ] T017 [P] [US5] Implement `DictionaryManager` class in `backend/src/plugins/translate/dictionary.py`: `create_dictionary()`, `update_dictionary()`, `delete_dictionary()`, `get_dictionary()`, `list_dictionaries()`, `add_entry()`, `edit_entry()`, `delete_entry()`, `clear_entries()`. Enforce unique `source_term` per dictionary with conflict resolution (FR-026). Prevent deletion if attached to active/scheduled jobs (FR-030). `@COMPLEXITY 4` — instrument with `belief_scope`/`reason`/`reflect` markers at mutation boundaries. (RATIONALE: C4 warranted because dictionary CRUD is stateful and must enforce referential integrity on deletion; REJECTED: pure C3 CRUD without state guards would allow orphaned job-dictionary links)
- [ ] T018 [US5] Implement CSV/TSV import in `DictionaryManager`: parse uploaded content, detect delimiter, create `DictionaryEntry` rows, preview with duplicate detection, return parse errors with line numbers for malformed rows (FR-025). Add `DictionaryImport` schema validation.
- [ ] T019 [US5] Implement `/api/translate/dictionaries` endpoints in `backend/src/api/routes/translate.py`: `POST /` (create), `GET /` (list), `GET /{dict_id}` (get with entries), `PUT /{dict_id}` (update), `DELETE /{dict_id}` (delete — blocked if attached), `POST /{dict_id}/entries` (add entry), `PUT /{dict_id}/entries/{entry_id}` (edit), `DELETE /{dict_id}/entries/{entry_id}` (delete), `POST /{dict_id}/import` (CSV/TSV import with preview).
- [ ] T020 [US5] Implement per-batch dictionary filtering logic in `DictionaryManager.filter_for_batch(source_texts: list[str]) -> list[dict]`: scan batch texts for substrings matching dictionary `source_term` values; return matched entries in priority order across all attached dictionaries (FR-044). This is consumed by US2 (preview) and US3 (executor).
### Frontend — Dictionary UI
- [ ] T021 [P] [US5] Add dictionary API methods to `frontend/src/lib/api/translate.js`: `fetchDictionaries()`, `createDictionary()`, `updateDictionary()`, `deleteDictionary()`, `fetchDictionaryEntries()`, `addEntry()`, `editEntry()`, `deleteEntry()`, `importDictionary()`.
- [ ] T022 [US5] Create `DictionaryList` SvelteKit page in `frontend/src/routes/translate/dictionaries/+page.svelte`: list dictionaries with name, language, term count, attached job count, create/delete actions. `@UX_STATE`: idle, loading, empty, populated, delete_blocked.
- [ ] T023 [US5] Create `DictionaryEditor` SvelteKit page in `frontend/src/routes/translate/dictionaries/[id]/+page.svelte`: inline term editor (source_term → target_translation), add/delete rows, CSV/TSV import with conflict preview, export. `@UX_STATE`: idle, loading, editing, importing, import_preview, import_conflict, saving. `@UX_FEEDBACK`: import preview with duplicate flags; toast on save.
### Verification — US5
- [ ] T024 [US5] Write pytest tests for DictionaryManager in `backend/src/plugins/translate/__tests__/test_dictionary.py`: test create/update/delete, add entry with duplicate detection (expect conflict), import CSV with valid/invalid rows, delete dictionary blocked by active job, per-batch filtering returns matched terms.
- [ ] T025 [US5] Verify US5 acceptance scenarios against spec User Story 5 (6 scenarios). Run `cd backend && pytest backend/src/plugins/translate/__tests__/test_dictionary.py -v`.
**Checkpoint**: Dictionary management fully functional — CRUD, import, filtering, and job attachment all work.
---
## Phase 5: User Story 2 — Preview Translated Output (Priority: P2)
**Goal**: User triggers preview on a saved job → system fetches sample rows → sends to LLM with context + dictionary → displays source/context/translation side-by-side → user approves/edits/rejects → preview state saved for execution gate.
**Independent Test**: Create job + dictionary → click Preview → verify 10 rows shown with LLM translations → approve 8, edit 1, reject 1 → verify state preserved.
### Backend — Preview Engine
- [ ] T026 [US2] Implement `TranslationPreview` class in `backend/src/plugins/translate/preview.py`: `preview_rows(job_id, sample_size)`. Fetch source rows from Superset via `SupersetClient`; construct LLM prompt using `LLMProviderService` + `llm_prompt_templates.render_prompt()` + `DictionaryManager.filter_for_batch()`; call LLM; return `PreviewRow` list. `@COMPLEXITY 4` — instrument with `belief_scope`/`reason`/`reflect` at LLM call boundaries. (RATIONALE: C4 because preview is stateful (approve/edit/reject lifecycle) and calls external LLM API with side effects; REJECTED: making preview purely read-only without approval state would degrade UX by losing user decisions between preview and execution)
- [ ] T027 [US2] Implement token count and cost estimation in preview response: compute estimated tokens from sample → extrapolate to full dataset row count → apply provider pricing → return `estimated_total_rows`, `estimated_tokens`, `estimated_cost` in `TranslationPreviewResponse` (FR-014).
- [ ] T028 [US2] Implement preview quality gate: create persistent `TranslationPreviewSession` and `TranslationPreviewRecord` rows with `config_hash` and `dict_snapshot_hash`. Preview acceptance gates full execution; rejected preview sample rows are excluded from full run. Preview is a quality gate — unseen rows are processed normally in full run.
- [ ] T029 [US2] Implement `/api/translate/jobs/{job_id}/preview` endpoint: `POST` triggers preview, returns preview rows with `status=pending`. Add `PUT /api/translate/jobs/{job_id}/preview/rows/{row_key}` for approve/edit/reject actions. Add `POST /api/translate/jobs/{job_id}/preview/approve-all` for bulk approve.
### Frontend — Preview UI
- [ ] T030 [P] [US2] Add preview API methods to `frontend/src/lib/api/translate.js`: `fetchPreview()`, `approveRow()`, `editRow()`, `rejectRow()`, `approveAll()`.
- [ ] T031 [US2] Create `TranslationPreview` component in `frontend/src/lib/components/translate/TranslationPreview.svelte`: side-by-side table (source, context, LLM translation), approve/edit/reject buttons per row, bulk approve, cost estimate card before full run, row limit input. `@UX_STATE`: idle, loading, preview_loaded, preview_error, retrying. `@UX_FEEDBACK`: spinner during LLM call; visual distinction for LLM-generated vs user-edited values; cost estimate reactivity. `@UX_RECOVERY`: retry preview button; individual row re-translate.
- [ ] T032 [US2] Integrate `TranslationPreview` into `TranslationJobConfig` page (`frontend/src/routes/translate/[id]/+page.svelte`) as a tab or collapsible section that appears after job is saved.
### Verification — US2
- [ ] T033 [US2] Write pytest tests for preview in `backend/src/plugins/translate/__tests__/test_preview.py`: test preview with valid job, preview with dictionary (verify glossary terms in prompt), preview row approve/edit/reject state transitions, cost estimation accuracy. Mock LLM provider responses.
- [ ] T034 [US2] Write vitest component test for `TranslationPreview` in `frontend/src/lib/components/translate/__tests__/TranslationPreview.test.js`: test rendering of preview rows, approve/reject/edit interactions, bulk approve behavior. Mock API client.
- [ ] T035 [US2] Verify US2 acceptance scenarios against spec User Story 2 (5 scenarios). Run `cd backend && pytest backend/src/plugins/translate/__tests__/test_preview.py -v && cd frontend && npm run test -- --run`.
**Checkpoint**: Preview flows complete — LLM translation with context + dictionary, approve/edit/reject lifecycle, cost estimation.
---
## Phase 6: User Story 3 — Execute Translation & Insert Results (Priority: P3)
**Goal**: User triggers full batch execution → system processes rows in batches → generates INSERT SQL → user copies to SQL Lab or auto-executes → failed batches retryable.
**Independent Test**: Create job → preview + approve → execute → verify INSERT SQL generated with correct key columns → execute in SQL Lab → verify rows in target table.
### Backend — Executor + SQL Generator + Orchestrator
- [ ] T036 [US3] Implement `SQLGenerator` class in `backend/src/plugins/translate/sql_generator.py`: `generate_insert(records: list[TranslationRecord], job: TranslationJob) -> str`. Detect dialect from `job.database_dialect` (cached from Superset connection at save time). Produce safe dialect-appropriate SQL: for PostgreSQL/Greenplum — `INSERT INTO "target_table" ("key_cols"..., "target_col") VALUES (...)` with quoted identifiers; support `upsert_strategy`: `insert` (plain INSERT), `skip_existing` (ON CONFLICT DO NOTHING), `overwrite` (ON CONFLICT DO UPDATE). For ClickHouse — `INSERT INTO target_table (key_cols..., target_col) VALUES (...)`; `skip_existing` warns user (not natively supported); `overwrite` documented limitation. `@COMPLEXITY 3`. (RATIONALE: dialect-aware because Superset connections may use ClickHouse or PostgreSQL; REJECTED: PostgreSQL-only would break ClickHouse users; raw identifier interpolation rejected)
- [ ] T037 [US3] Implement `TranslationExecutor` class in `backend/src/plugins/translate/executor.py`: `execute_run(run: TranslationRun, job: TranslationJob)`. Fetch all source rows from Superset; split into batches; for each batch: call `DictionaryManager.filter_for_batch()`, construct prompt via `LLMProviderService`, call LLM, create `TranslationRecord` rows with status `translated`/`failed`/`skipped`; handle batch-level retry on LLM failure (FR-015); skip NULL translation values (FR-016); reject NULL key values (FR-017); update run statistics. `@COMPLEXITY 4` — instrument with `belief_scope`/`reason`/`reflect` at batch boundaries and error paths.
- [ ] T038 [US3] Implement `TranslationOrchestrator` class in `backend/src/plugins/translate/orchestrator.py`: `start_run(job_id, trigger_type)`. Validate preconditions (job config valid, datasource accessible, LLM provider reachable); create `TranslationRun` with status `running` and config/dict snapshots (FR-019, FR-029); dispatch to `TranslationExecutor`; on completion call `SQLGenerator`; record `TranslationEvent` rows via `TranslationEventLog` (FR-046); enforce state transitions: pending → running → (completed | partial | failed) — no skipping. `@COMPLEXITY 5` — full `@PRE`/`@POST`/`@DATA_CONTRACT`/`@INVARIANT` enforcement with `@RATIONALE`/`@REJECTED`. (RATIONALE: central coordinator is C5 because preview, execution, event logging, and retry share run state and must coordinate within a single transaction boundary; REJECTED: distributed actor model would introduce eventual-consistency challenges for status tracking at current scale)
- [ ] T039 [US3] Implement `TranslationEventLog` class in `backend/src/plugins/translate/events.py`: `log_event(run_id, job_id, event_type, payload)`. Create immutable `TranslationEvent` row. `query_events(job_id, filters)` for audit/dashboard. `prune_expired()` for 90-day retention enforcement (FR-049) — scheduled via APScheduler cleanup job. `@COMPLEXITY 5``@INVARIANT`: every run must have exactly one `run_started` and one terminal event. (RATIONALE: C5 warranted because event log is single source of truth for observability, metrics, and audit; REJECTED: stdout-only logging lacks structured payload integrity and cannot enforce terminal-event invariant)
- [ ] T040 [US3] Implement execution endpoints in `backend/src/api/routes/translate.py`: `POST /api/translate/jobs/{job_id}/runs` (trigger manual run — creates run, dispatches orchestrator which translates AND submits to Superset API), `GET /api/translate/runs/{run_id}` (status + statistics + insert_status + superset_query_id), `GET /api/translate/runs/{run_id}/records` (paginated TranslationRecord list), `POST /api/translate/runs/{run_id}/retry` (retry failed batches only — FR-015), `POST /api/translate/runs/{run_id}/retry-insert` (retry Superset insert only without re-translating). Inject `Depends(require_permission("translate.job.execute"))`.
### Frontend — Execution UI
- [ ] T041 [P] [US3] Add execution API methods to `frontend/src/lib/api/translate.js`: `triggerRun()`, `fetchRunStatus()`, `fetchRunRecords()`, `retryFailedBatches()`.
- [ ] T042 [US3] Create `TranslationRunProgress` component in `frontend/src/lib/components/translate/TranslationRunProgress.svelte`: live progress bar (WebSocket-driven from `TaskWebSocket`), batch counter (N/M), success/failure/skip counts, cancel button. `@UX_STATE`: idle, running, pausing, cancelled, completed, partial, failed. `@UX_FEEDBACK`: progress percentage `$derived` from translated/total; real-time counts. `@UX_RECOVERY`: retry failed batches button; cancel run; download skipped rows.
- [ ] T043 [US3] Create `TranslationRunResult` component in `frontend/src/lib/components/translate/TranslationRunResult.svelte`: completion summary (rows translated/failed/skipped, token count, cost, insert_status), Superset execution reference with status badge, generated SQL block for audit/debugging (collapsed by default), retry-insert button. `@UX_STATE`: completed, partial, failed, insert_failed. `@UX_FEEDBACK`: Superset execution status badge; SQL block for audit.
- [ ] T044 [US3] Integrate `TranslationRunProgress` and `TranslationRunResult` into `TranslationJobConfig` page as the "Run" tab/section.
### Verification — US3
- [ ] T045 [US3] Write pytest tests for `SQLGenerator` in `backend/src/plugins/translate/__tests__/test_sql_generator.py`: test INSERT with single key, composite key — for PostgreSQL dialect AND ClickHouse dialect. Test PostgreSQL UPSERT (ON CONFLICT DO NOTHING, ON CONFLICT DO UPDATE). Test ClickHouse plain INSERT and skip_existing warning. Test NULL key rejection, NULL translation value skipping, identifier quoting per dialect, injection safety. Validate SQL syntax correctness against each dialect.
- [ ] T046 [US3] Write pytest tests for executor + orchestrator in `backend/src/plugins/translate/__tests__/test_orchestrator.py`: test full run lifecycle (pending→running→completed), partial failure (one batch fails, rest succeed), batch retry, event log invariants, NULL handling. Mock LLM provider and SupersetClient.
- [ ] T047 [US3] Verify US3 acceptance scenarios against spec User Story 3 (5 scenarios). Run `cd backend && pytest backend/src/plugins/translate/__tests__/test_orchestrator.py backend/src/plugins/translate/__tests__/test_sql_generator.py -v`.
**Checkpoint**: Execution pipeline complete — batch processing, INSERT generation, retry, event logging. User can translate data and insert into target table.
---
## Phase 7: User Story 6 — Feedback Loop (Correct → Dictionary) (Priority: P3)
**Goal**: In run results, user selects incorrect translation → submits correction to dictionary → dictionary updated with origin tracking → next run uses corrected term.
**Independent Test**: Complete a run → find incorrect translation → open correction popup → submit to dictionary → re-run preview → verify corrected term used.
### Backend — Correction Submission
- [ ] T048 [US6] Implement correction submission endpoint in `backend/src/api/routes/translate.py`: `POST /api/translate/corrections` accepting `TermCorrectionSubmit` body. Validate target language match between dictionary and job (FR language validation edge case); detect existing entry conflict → return conflict response (FR-032); create `DictionaryEntry` with origin tracking (`origin_run_id`, `origin_row_key`, `origin_user_id`) per FR-033. Inject `Depends(require_permission("translate.dictionary.edit"))`.
- [ ] T049 [US6] Implement bulk correction endpoint: `POST /api/translate/corrections/bulk` accepting array of `TermCorrectionSubmit` objects (FR-034). Process atomically — if any conflict is detected, return all conflicts for user resolution before partial apply.
### Frontend — Correction UI
- [ ] T050 [P] [US6] Add correction API methods to `frontend/src/lib/api/translate.js`: `submitCorrection()`, `submitBulkCorrections()`.
- [ ] T051 [US6] Create `TermCorrectionPopup` component in `frontend/src/lib/components/translate/TermCorrectionPopup.svelte`: text selection on source term and incorrect target translation → popup with source term (pre-filled from source column), incorrect target translation (pre-filled from selection), corrected target translation input, dictionary selector dropdown (filtered by target language), submit button, conflict dialog (overwrite/keep existing/cancel). `@UX_STATE`: closed, selecting, editing, submitting, conflict_detected, submitted. `@UX_FEEDBACK`: "Added to Dictionary" badge on corrected row.
- [ ] T052 [US6] Create `BulkCorrectionSidebar` component in `frontend/src/lib/components/translate/BulkCorrectionSidebar.svelte`: sidebar collecting selected terms across rows, per-term correction inputs, submit all to dictionary. `@UX_STATE`: closed, collecting, reviewing, submitting, submitted. `@UX_REACTIVITY`: selected terms list `$state`.
- [ ] T053 [US6] Integrate feedback-loop components into `TranslationRunResult` (T043) — add selection highlight behavior and correction triggers.
### Verification — US6
- [ ] T054 [US6] Write pytest tests for correction endpoints in `backend/tests/test_translate_corrections.py`: test single correction, bulk correction, conflict detection (existing term), cross-language rejection, origin tracking fields populated. Verify corrected term appears in next preview's dictionary filter.
- [ ] T055 [US6] Verify US6 acceptance scenarios against spec User Story 6 (5 scenarios). Run `cd backend && pytest backend/tests/test_translate_corrections.py -v`.
**Checkpoint**: Feedback loop complete — corrections flow from results → dictionary → next run.
---
## Phase 8: User Story 7 — Schedule Translation Jobs (Priority: P3)
**Goal**: User configures schedule → system triggers runs → new-key-only translation → optional auto-INSERT → failure notification → pause/resume.
**Independent Test**: Configure schedule (every 5 min for test) → wait for trigger → verify new TranslationRun created → verify only new keys translated → disable schedule → verify no more triggers.
### Backend — Schedule Management + Trigger Dispatch
- [ ] T056 [US7] Implement `TranslationScheduler` class in `backend/src/plugins/translate/scheduler.py`: `create_schedule()`, `update_schedule()`, `delete_schedule()`, `enable_schedule()`, `disable_schedule()`, `get_next_executions(schedule, n=3)` (FR-036). Register schedule with existing `SchedulerService` via `add_job()` with cron/interval/date trigger. `@COMPLEXITY 4` — instrument with `belief_scope`/`reason`/`reflect`. (RATIONALE: C4 because schedule management is stateful with APScheduler integration, concurrency policy enforcement, and trigger dispatch side effects)
- [ ] T057 [US7] Implement schedule trigger handler: `_execute_scheduled_translation(job_id)`. Enforce concurrency policy: check if previous run for same job is still `running``skip` (log + event) or `queue` (start after previous completes) per FR-039. If proceeding: create new `TranslationRun` with `trigger_type=scheduled`; fetch source rows; apply new-key-only filter (FR-045) — compare current key values against previous successful run's key values; dispatch to `TranslationOrchestrator`. On failure, send notification via `NotificationService` (FR-041, FR-048). Schedule remains enabled for next trigger (US7 acceptance scenario 6).
- [ ] T058 [US7] Implement Superset SQL Lab API submission for all runs: create `SupersetSqlLabExecutor` class in `backend/src/plugins/translate/superset_executor.py`. Submit generated SQL to `/api/v1/sqllab/execute/`, poll execution status, update `TranslationRun.insert_status`, `superset_query_id`, `rows_affected`, error fields. For scheduled runs, this happens automatically; for manual runs, this happens on user trigger. Record `insert_submitted`/`insert_succeeded`/`insert_failed` events.
- [ ] T059 [US7] Implement schedule endpoints in `backend/src/api/routes/translate.py`: `PUT /api/translate/jobs/{job_id}/schedule` (create/update), `DELETE /api/translate/jobs/{job_id}/schedule` (remove), `POST /api/translate/jobs/{job_id}/schedule/enable` (FR-040), `POST /api/translate/jobs/{job_id}/schedule/disable` (FR-040). Inject `Depends(require_permission("translate.schedule.manage"))`. Add schedule warning when editing job with active schedule (FR-042).
- [ ] T060 [US7] Extend `SchedulerService.load_schedules()` in `backend/src/core/scheduler.py` to discover and register active `TranslationSchedule` rows alongside existing backup schedules (R4).
### Frontend — Schedule UI
- [ ] T061 [P] [US7] Add schedule API methods to `frontend/src/lib/api/translate.js`: `updateSchedule()`, `deleteSchedule()`, `enableSchedule()`, `disableSchedule()`.
- [ ] T062 [US7] Create `ScheduleConfig` component in `frontend/src/lib/components/translate/ScheduleConfig.svelte`: type selector (cron/interval/once), cron expression input with validation, interval input, timezone selector, run-at datetime picker, next-3-executions preview (with timezone), concurrency policy selector (skip/queue), enable/disable toggle with status indicator. Warns if no prior successful manual run exists. `@UX_STATE`: idle, editing, validating, enabled, disabled, no_prior_run_warning. `@UX_REACTIVITY`: next execution times `$derived` from schedule config with timezone display.
- [ ] T063 [US7] Integrate `ScheduleConfig` into `TranslationJobConfig` page as the "Schedule" tab.
### Verification — US7
- [ ] T064 [US7] Write pytest tests for scheduler in `backend/src/plugins/translate/__tests__/test_scheduler.py`: test schedule CRUD, cron expression validation, next-N-executions calculation, trigger dispatch with skip/queue concurrency, new-key-only filter (verify only unseen keys processed), auto-INSERT execution, failure notification, pause/resume, load on SchedulerService start.
- [ ] T065 [US7] Verify US7 acceptance scenarios against spec User Story 7 (8 scenarios). Run `cd backend && pytest backend/src/plugins/translate/__tests__/test_scheduler.py -v`.
**Checkpoint**: Scheduling complete — jobs can run automatically on schedule with new-key-only incremental translation and failure recovery.
---
## Phase 9: User Story 4 — Translation History & Audit Trail (Priority: P4)
**Goal**: User views past runs with filterable list; inspects run details (config snapshot, prompt, translations, INSERT SQL); sees edit marks; duplicates job. Admin views metrics dashboard.
**Independent Test**: Run several translations → open history → filter by datasource → click run → verify config snapshot, prompt, translations with edit marks, INSERT SQL all shown.
### Backend — History + Metrics Endpoints
- [ ] T066 [US4] Implement history endpoints in `backend/src/api/routes/translate.py`: `GET /api/translate/runs` (list with filters: `job_id`, `datasource_id`, `target_table`, `status`, `date_from`, `date_to`, pagination per FR-020), `GET /api/translate/runs/{run_id}` (detail with `config_snapshot`, `prompt_used`, `records` with `llm_translation` and `user_edit` fields visible — FR showing original vs user-edited).
- [ ] T067 [US4] Implement `TranslationMetrics` class in `backend/src/plugins/translate/metrics.py`: `get_job_metrics(job_id) -> MetricsResponse`. Aggregate from `TranslationEvent` table: total runs, success/failure counts, cumulative tokens, cumulative cost, average batch latency (FR-047). `@COMPLEXITY 3`.
- [ ] T068 [US4] Implement metrics endpoint: `GET /api/translate/jobs/{job_id}/metrics`. Inject `Depends(require_permission("translate.history.view"))`.
### Frontend — History + Metrics UI
- [ ] T069 [P] [US4] Add history API methods to `frontend/src/lib/api/translate.js`: `fetchRunHistory()`, `fetchRunDetail()`, `fetchJobMetrics()`.
- [ ] T070 [US4] Create `TranslationHistory` SvelteKit page in `frontend/src/routes/translate/history/+page.svelte`: filterable table (datasource, target table, row count, status, date, user), click-to-expand detail with config snapshot, prompt, translation rows with edit marks, INSERT SQL. `@UX_STATE`: idle, loading, empty, populated, detail_open. `@UX_REACTIVITY`: filtered list `$derived` from filters.
- [ ] T071 [US4] Create admin metrics dashboard section (integrated into existing admin pages or standalone) displaying per-job metrics: run counts, success/failure ratio, cumulative tokens, cumulative cost, average latency. Use `MetricsResponse` schema.
### Verification — US4
- [ ] T072 [US4] Write pytest tests for history + metrics in `backend/tests/test_translate_history.py`: test run list with filters, run detail with snapshots, metrics aggregation accuracy, `TranslationEvent` queryability.
- [ ] T073 [US4] Verify US4 acceptance scenarios against spec User Story 4 (4 scenarios). Run `cd backend && pytest backend/tests/test_translate_history.py -v`.
**Checkpoint**: History and audit complete — all runs traceable, metrics dashboard populated.
---
## Phase 10: Polish & Cross-Cutting Concerns
**Purpose**: Retention enforcement, notification wiring, semantic audit, quickstart validation, and rejected-path regression protection.
- [ ] T074 [P] Implement 90-day retention pruning in `TranslationEventLog.prune_expired()`: run as APScheduler daily cleanup job. BEFORE pruning events/records: persist cumulative metrics as `MetricSnapshot` row (tokens, cost, run counts). Then prune `TranslationRecord`, `TranslationPreviewRecord`, `TranslationEvent`, and `insert_sql`/`config_snapshot` fields older than 90 days. Preserve `TranslationRun` metadata, `MetricSnapshot` rows, and `superset_query_id`. Verify metrics remain accurate post-prune (SC-014). (RATIONALE: metric snapshots prevent cumulative data loss from event pruning; REJECTED: indefinite retention would violate storage constraints)
- [ ] T075 [P] Wire scheduled-run failure notification: ensure `TranslationScheduler` trigger handler calls `NotificationService.send()` when a scheduled run fails (FR-041, FR-048). Test with mock notification provider.
- [ ] T076 [P] Instrument remaining C4/C5 Python flows with `belief_scope`/`reason`/`reflect`/`explore` markers where missing: `TranslationOrchestrator.start_run()` (entry/exit), `TranslationExecutor.execute_run()` (batch boundaries + error paths), `DictionaryManager` mutation boundaries, `TranslationScheduler` trigger dispatch. Verify via `axiom_semantic_validation` belief-runtime audit.
- [ ] T077 Run full semantic audit via axiom MCP tools:
- `axiom_semantic_validation audit_contracts --file_path backend/src/plugins/translate/` — verify all `[DEF]` anchors are closed, `@RELATION` targets resolve, no orphan contracts, C4+ contracts have required tag density
- `axiom_semantic_validation audit_belief_protocol --file_path backend/src/plugins/translate/` — verify `@RATIONALE`/`@REJECTED` present on all C5 contracts
- `axiom_semantic_validation audit_belief_runtime --file_path backend/src/plugins/translate/` — verify `belief_scope`/`reason`/`reflect`/`explore` markers exist in all C4+ module bodies
- `axiom_semantic_validation impact_analysis --contract_id TranslationOrchestrator:Class` — verify no rejected path is accidentally re-enabled
- [ ] T078 Run quickstart validation: follow `specs/028-llm-datasource-supeset/quickstart.md` end-to-end — create dictionary → create job → preview → execute → verify INSERT SQL → submit correction → schedule → view history → verify metrics. Run `cd backend && pytest -v`, `cd frontend && npm run test -- --run`, `cd backend && ruff check src/plugins/translate/ src/api/routes/translate.py src/models/translate.py src/schemas/translate.py`.
- [ ] T079 Rejected-path regression guard: add a test case in `backend/src/plugins/translate/__tests__/test_orchestrator.py` verifying snapshot isolation — changing job config mid-run does NOT invalidate the running TranslationRun. Add a test case in `backend/src/plugins/translate/__tests__/test_sql_generator.py` verifying that UPDATE statements are never generated (only INSERT/UPSERT per PostgreSQL dialect). Add a test case in `backend/src/plugins/translate/__tests__/test_dictionary.py` verifying that duplicate source_term entries cannot coexist (UniqueConstraint enforced) and conflict resolution only offers overwrite/keep-existing. Add a test case in `backend/src/plugins/translate/__tests__/test_retention.py` verifying metric snapshots are persisted before event pruning and cumulative metrics remain accurate post-prune.
- [ ] T080 [P] Implement cancel run endpoint: `POST /api/translate/runs/{run_id}/cancel` in `backend/src/api/routes/translate.py`. Set `translation_status=cancelled`, mark in-progress batches as failed, do NOT submit INSERT SQL. Emit `run_cancelled` event. Inject `Depends(require_permission("translate.job.execute"))`.
- [ ] T081 [P] Implement download skipped rows endpoint: `GET /api/translate/runs/{run_id}/skipped.csv` returning CSV of rows skipped due to NULL keys or translation failures. Use `key_hash` for efficient lookup.
- [ ] T082 [P] Compute `key_hash` for TranslationRecord and TranslationPreviewRecord: `hash(canonical_json(key_values))` at creation time. Add `config_hash` for TranslationRun and TranslationPreviewSession: hash of effective config (columns, keys, target, prompt, dictionaries). Use for idempotency checks, new-key-only filtering, and stale preview detection.
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies — can start immediately
- **Foundational (Phase 2)**: Depends on Setup — BLOCKS all user stories
- **US1 (Phase 3)**: Depends on Foundational — no dependencies on other stories. **Recommended start after Foundational.**
- **US5 (Phase 4)**: Depends on Foundational — can run in **parallel with US1**. Dictionary filtering (T020) will be consumed later by US2/US3 but is self-contained.
- **US2 (Phase 5)**: Depends on US1 (needs saved job) + US5 (needs dictionary filtering). Can start integration once US1 backend is stable.
- **US3 (Phase 6)**: Depends on US1 (needs job config) + US2 (preview decisions feed executor). Sequential after US2.
- **US6 (Phase 7)**: Depends on US3 (needs run results) + US5 (needs dictionary). Can run in **parallel with US7** after US3.
- **US7 (Phase 8)**: Depends on US1 (needs job) + US3 (needs execution pipeline). Can run in **parallel with US6** after US3.
- **US4 (Phase 9)**: Depends on US3 (needs run records). Can run in **parallel with US6/US7** after US3.
- **Polish (Phase 10)**: Depends on all desired user stories being complete.
### Parallel Opportunities
| Phase | Parallel Tasks | Notes |
|-------|---------------|-------|
| 1 | — | Sequential (only 2 tasks) |
| 2 | T003 ∥ T004 | Models + Schemas in parallel |
| 3 (US1) | T009 ∥ T012 | Backend CRUD ∥ API client |
| 4 (US5) | T017 ∥ T021 | DictionaryManager ∥ API client |
| 5 (US2) | T030 ∥ T031 | API client ∥ Preview component |
| 6 (US3) | T036 ∥ T041 | SQLGenerator ∥ API client |
| 7 (US6) | T050 ∥ T051 ∥ T052 | API client ∥ Popup ∥ Sidebar |
| 8 (US7) | T061 ∥ T062 | API client ∥ ScheduleConfig |
| 9 (US4) | T069 ∥ T070 | API client ∥ History page |
| 10 | T074 ∥ T075 ∥ T076 | Retention, notifications, belief instrumentation |
### Cross-Story Parallelism
After Foundational (Phase 2):
- **US1 and US5** can proceed in parallel by different developers
- After US3 completes: **US6, US7, and US4** can proceed in parallel
---
## Implementation Strategy
### MVP First (US1 Only)
1. Phase 1 + Phase 2 → Foundation
2. Phase 3 (US1) → Job configuration CRUD
3. **STOP and VALIDATE**: User can create, list, edit, delete translation jobs
4. Deploy/demo — partial value (configuration ready, no translation yet)
### Minimum Viable Feature (US1 + US5 + US2 + US3)
1. Foundation → US1 + US5 (parallel) → US2 → US3
2. **STOP and VALIDATE**: End-to-end translation flow works: configure → preview → execute → INSERT
3. This is the core feature — all remaining stories add automation (US7), quality improvement (US6), and visibility (US4)
### Full Feature (All Stories)
1. MVP → US6 + US7 + US4 (parallel after US3) → Polish
2. Scheduled automation, feedback loop, and audit trail all functional
---
## Notes
- All file paths reference the actual repository structure (`backend/src/`, `frontend/src/`).
- `@COMPLEXITY 4/5` backend contracts require `belief_scope`/`reason`/`reflect` markers — verified in T076.
- `@RATIONALE`/`@REJECTED` tags appear only in C5 contracts (`TranslationOrchestrator`, `TranslationEventLog`) per INV_7.
- Rejected paths are explicitly protected by regression tests in T079.
- `[NEED_CONTEXT]` markers: none — all contract targets resolve to existing or planned modules within this feature.
- The existing `LLMProviderService`, `SupersetClient`, `SchedulerService`, `NotificationService`, and `TaskWebSocket` contracts are reused without modification.
- Quickstart.md (T078) serves as the human-verifiable acceptance test for the full feature.

View File

@@ -0,0 +1,485 @@
# UX Reference: LLM Table Translation Service
**Feature Branch**: `028-llm-datasource-supeset`
**Created**: 2026-05-08
**Status**: Draft
## 1. User Persona & Context
* **Primary user**: Analytics engineer or localization specialist who needs to translate column values from a Superset datasource into another language and write the results into a materialized target table.
* **Secondary user**: Data steward who audits past translation runs for quality and compliance.
* **What is the user trying to achieve?**: Convert reference data (product names, category labels, UI strings) stored in Superset-connected tables into a target language using an LLM, with full control over which columns provide translation context, how rows are matched to the target table, and what gets inserted.
* **Mindset**: The user knows their data and knows which language they need. They want the LLM to handle the mechanical translation work but they need to review quality before committing. They do not trust translations enough to let them flow unattended into production tables.
* **Context of use**:
* Preparing localized reference data for a BI dashboard rollout in a new language region.
* Translating product catalog descriptions stored in a Superset-visible database.
* Generating INSERT statements for a materialized translation table that feeds downstream reports.
* Reviewing and approving batches of translations before they become visible to end users.
## 2. UX Principles
* **Preview gates manual runs, schedules bypass after first success**: Manual runs require preview acceptance. Scheduled runs may bypass preview only after at least one successful manual run with the same effective configuration.
* **Context is king**: The system must make it easy to add relevant context columns that help the LLM produce better translations, and must show that context in the preview.
* **Traceability**: Every inserted translation must be traceable back to its source row, the prompt used, the key values, and the approval decision.
* **Graceful degradation**: When the LLM fails, the system preserves progress and enables targeted retry—never force the user to restart from scratch.
* **Cost awareness**: Before any full batch execution, the user must see an estimated token count and cost so they can make informed decisions about batch size and scope.
## 3. Core Interaction Flows
### Flow A: Job Configuration (Happy Path)
```
1. User navigates to "Translation Jobs" section in the frontend.
2. User clicks "New Translation Job".
3. System presents a configuration form:
┌─────────────────────────────────────────────────────┐
│ New Translation Job │
│ │
│ Source Datasource: [▼ Select Superset datasource] │
│ │
│ ── Column Mapping ── │
│ Translation column: [▼ product_name ] │
│ Context columns: [✓ category_name ] │
│ [✓ product_description ] │
│ [ ] supplier_name ] │
│ │
│ ── Key Columns ── │
│ Key column 1: [▼ product_id ] │
│ [+ Add key column] │
│ │
│ ── Target ── │
│ Target table: [ products_i18n________ ] │
│ Target column: [ translated_name______ ] │
│ │
│ ── LLM Settings ── │
│ Target language: [▼ Russian ] │
│ Prompt template: [ Edit default template ] │
│ Batch size: [ 50 ] │
│ │
│ [Save Configuration] [Save & Preview] │
└─────────────────────────────────────────────────────┘
4. User selects datasource → columns auto-populate from schema.
5. User picks translation column, context columns, key columns.
6. User specifies target table and target column.
7. System validates column existence and key compatibility on save.
8. Configuration saved → job appears in job list with status "Configured".
```
### Flow B: Preview & Approve
```
1. User opens a saved translation job and clicks "Preview".
2. System shows progress:
┌─────────────────────────────────────────────────────┐
│ Translation Preview — products_i18n │
│ Sample: 10 rows │ Estimated full: 1,243 rows │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ # │ Source (product_name) │ Context │ │
│ │ │ │ (category_name) │ │
│ ├───┼───────────────────────┼───────────────────┤ │
│ │ 1 │ "Wireless Mouse" │ "Accessories" │ │
│ │ │ → "Беспроводная мышь" │ │ │
│ │ │ [Approve] [Edit] [Reject] │ │
│ ├───┼───────────────────────┼───────────────────┤ │
│ │ 2 │ "Gaming Keyboard" │ "Accessories" │ │
│ │ │ → "Игровая клавиатура"│ │ │
│ │ │ [Approve] [Edit] [Reject] │ │
│ └───┴───────────────────────┴───────────────────┘ │
│ │
│ [Approve All] [Re-run Preview] [Start Full Run] │
└─────────────────────────────────────────────────────┘
3. User reviews each translation, clicks [Approve] for good ones,
[Edit] to correct, or [Reject] to exclude.
4. Edited values are highlighted differently from LLM-generated values.
5. When satisfied, user clicks [Start Full Run].
6. System shows cost/duration estimate:
┌─────────────────────────────────────────────────────┐
│ ⚠ Confirm Full Translation Run │
│ │
│ Rows to translate: 1,243 │
│ Estimated tokens: ~45,000 │
│ Estimated cost: ~$0.09 (GPT-4o-mini) │
│ Batches: 25 × 50 rows │
│ │
│ [Cancel] [Confirm & Run] │
└─────────────────────────────────────────────────────┘
```
### Flow C: Execution & Superset API Insert
```
1. User confirms full run.
2. System shows live progress with two phases:
┌─────────────────────────────────────────────────────┐
│ 🔄 Translating... Batch 12/25 (600/1,243 rows) │
│ ████████████░░░░░░░░░░░░ 48% │
│ │
│ Successful: 598 Failed: 2 Remaining: 643 │
│ │
│ [Cancel Run] │
└─────────────────────────────────────────────────────┘
3. Translation phase completes, insert phase begins:
┌─────────────────────────────────────────────────────┐
│ 📤 Submitting to Superset... │
│ Status: executing · Query #a7f3b2c │
│ │
│ [View in Superset] │
└─────────────────────────────────────────────────────┘
4. On completion, system shows result:
┌─────────────────────────────────────────────────────┐
│ ✅ Run Complete │
│ │
│ Translation: 1,241 rows (2 failed, batch 14) │
│ Insert: ✅ Succeeded · 1,241 rows affected │
│ Superset: Query #a7f3b2c · 2.3s │
│ │
│ ── Generated SQL (audit) ── │
│ INSERT INTO products_i18n (product_id, │
│ translated_name) VALUES │
│ ('PRD-001', 'Беспроводная мышь'), ... │
│ │
│ [Retry Failed] [View SQL] [Retry Insert] │
└─────────────────────────────────────────────────────┘
5. User clicks [Retry Failed] → system re-processes only
the 2 failed rows and re-submits to Superset.
6. If insert fails: [Retry Insert] re-submits the same SQL
to Superset API without re-translating.
```
### Flow D: Translation History
```
1. User navigates to "Translation History".
2. System shows a filterable list:
┌──────────────────────────────────────────────────────────┐
│ Translation History │
│ │
│ Filter: [Datasource ▼] [Target Table ▼] [Date range] │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Date │ Datasource │ Target Table │ Rows │ │
│ ├────────────┼───────────────┼───────────────┼───────┤ │
│ │ 2026-05-08 │ products_main │ products_i18n │ 1,241 │ │
│ │ 14:32 │ │ │ ✅ │ │
│ ├────────────┼───────────────┼───────────────┼───────┤ │
│ │ 2026-05-07 │ categories │ cat_names_ru │ 89 │ │
│ │ 09:15 │ │ │ ⚠️ │ │
│ └────────────┴───────────────┴───────────────┴───────┘ │
│ │
│ [Duplicate selected job] │
└──────────────────────────────────────────────────────────┘
3. Clicking a run opens detail view with:
- Configuration snapshot at time of run
- Prompt template used
- Source rows → translations (with edit marks)
- Generated INSERT statements
- Execution outcome and SQL Lab session reference (if applicable)
```
### Flow E: Terminology Dictionary Management
```
1. User navigates to "Dictionaries" section.
2. System shows list of existing dictionaries:
┌─────────────────────────────────────────────────────┐
│ Terminology Dictionaries │
│ │
│ [+ New Dictionary] │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Name │ Language │ Terms │ Attached to │ │
│ ├───────────────┼──────────┼───────┼─────────────┤ │
│ │ Product Terms │ ru │ 142 │ 2 jobs │ │
│ │ Legal Glossary │ ru │ 38 │ 1 job │ │
│ │ Finance Dict │ ru │ 67 │ — │ │
│ └───────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
3. User clicks "Product Terms" → opens editor:
┌─────────────────────────────────────────────────────┐
│ Dictionary: Product Terms Language: Russian │
│ │
│ [+ Add Term] [Import CSV] [Export] │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ # │ Source Term │ Target Translation │ │
│ ├───┼─────────────────┼───────────────────────┤ │
│ │ 1 │ invoice │ накладная │ │
│ │ 2 │ widget │ виджет │ │
│ │ 3 │ backorder │ предзаказ │ │
│ │ 4 │ SKU │ артикул │ │
│ │ ... │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ Used by: «Products RU Translation», «Catalog RU» │
└─────────────────────────────────────────────────────┘
4. User clicks [+ Add Term] → new empty row appears inline.
Types "shipment" → "отгрузка", presses Enter → saved.
5. User clicks [Import CSV] → file picker opens.
System previews detected pairs, flags line 7 as duplicate:
"⚠️ 'invoice' already exists → 'накладная'. Import would
add 'счёт'. [Keep existing] [Overwrite] [Skip]"
6. User resolves conflicts, confirms import → 45 new terms added.
```
### Flow F: Feedback Loop — Correct → Dictionary
```
1. User is viewing completed run results (Flow D detail view).
2. In the translations table, user notices:
┌────────────────────────────────────────────────────┐
│ # │ Source │ Translation │
├───┼────────────────────┼───────────────────────────┤
│ 3 │ "Monitor Stand" │ "Мониторная стойка" │
│ │ │ [✎ Edit] [📖 Add to Dict] │
└───┴────────────────────┴───────────────────────────┘
3. User selects the source term "Monitor Stand" and the incorrect
translation "Мониторная" in the result → popup appears:
┌──────────────────────────────────────┐
│ Correct term │
│ │
│ Source term: [Monitor Stand______] │ ← from source column
│ Incorrect: [Мониторная_________] │ ← selected in translation
│ Correct: [Подставка для______] │ ← user types correction
│ │
│ Save to dictionary: [Product Terms ▼]│
│ │
│ [Cancel] [Submit to Dictionary] │
└──────────────────────────────────────┘
4. User fills the correction, selects "Product Terms" dictionary.
5. If "Monitor Stand" already exists in that dictionary:
┌──────────────────────────────────────┐
│ ⚠ Term already exists │
│ │
│ "Monitor Stand" already has │
│ translation "Стойка для монитора" │
│ in "Product Terms". │
│ │
│ [Overwrite] [Keep Existing] [Cancel] │
└──────────────────────────────────────┘
4. User fills the correction and selects "Product Terms" dictionary.
5. If "Monitor Stand" already exists in that dictionary:
┌──────────────────────────────────────┐
│ ⚠ Term already exists │
│ │
│ "Monitor Stand" already has │
│ translation "Стойка для монитора" │
│ in "Product Terms". │
│ │
│ [Overwrite] [Keep Existing] [Cancel]│
└──────────────────────────────────────┘
6. User chooses [Overwrite] → term updated. A small badge appears:
"📖 Added to Product Terms" next to the corrected row.
7. For bulk corrections:
User Ctrl+clicks multiple problematic words in different rows
→ sidebar opens: "3 terms selected for correction"
→ each line: [source term] → [correction input]
→ [Submit all to: Product Terms ▼]
```
### Flow G: Schedule Configuration
```
1. User opens a saved translation job and clicks "Schedule" tab.
2. System shows schedule configuration:
┌─────────────────────────────────────────────────────┐
│ Schedule — Products RU Translation │
│ │
│ Schedule type: (●) Cron ( ) Interval ( ) Once │
│ │
│ Cron expression: [0 6 * * 1_______________] │
│ ↳ Every Monday at 06:00 │
│ │
│ ── Upcoming executions ── │
│ Mon, 11 May 2026 06:00 │
│ Mon, 18 May 2026 06:00 │
│ Mon, 25 May 2026 06:00 │
│ │
│ Insert submission: via Superset API (always) │
│ Concurrency: (●) Skip if previous running │
│ ( ) Queue after previous │
│ │
│ Status: ● Active [Disable] │
│ │
│ Last run: 2026-05-04 06:00 — ✅ 1,231 rows │
│ Next run: 2026-05-11 06:00 │
└─────────────────────────────────────────────────────┘
3. User toggles to "Interval" type:
┌─────────────────────────────────────────────────────┐
│ Schedule type: ( ) Cron (●) Interval ( ) Once │
│ │
│ Every: [24] [▼ hours ] │
│ Starting at: [2026-05-09 02:00_______________] │
│ │
│ ── Upcoming executions ── │
│ Sat, 09 May 2026 02:00 │
│ Sun, 10 May 2026 02:00 │
│ Mon, 11 May 2026 02:00 │
└─────────────────────────────────────────────────────┘
4. User clicks [Disable] → schedule pauses:
"Schedule paused. No new runs will be triggered.
[Re-enable] to resume."
Next planned execution line is grayed out.
5. When user edits job config with active schedule:
┌──────────────────────────────────────────────────────┐
│ ⚠ This job has an active schedule (Mon 06:00). │
│ Configuration changes will apply to the next │
│ scheduled run. Continue? │
│ │
│ [Cancel] [Save & Update Schedule] │
└──────────────────────────────────────────────────────┘
```
## 4. The "Error" Experience
**Philosophy**: Don't just report the error; preserve progress and guide the user to targeted recovery.
### Scenario A: Datasource Unavailable
* **User Action**: Opens a saved translation job whose datasource was deleted or renamed.
* **System Response**:
* Banner at top: "⚠️ Datasource 'products_old' is no longer available. Select a replacement datasource to continue."
* Column mappings are cleared; key and target settings are preserved.
* [Select Replacement Datasource] button is prominent.
* **Recovery**: User selects a new datasource, remaps columns, and the job is usable again.
### Scenario B: LLM Batch Failure
* **System Response**:
* Progress bar pauses at the failed batch.
* Warning: "⚠️ Batch 14 failed: rate limit exceeded. 2 rows affected."
* Remaining batches continue processing.
* At completion: "1,241 rows translated, 2 failed. [Retry Failed Rows]"
* **Recovery**: User clicks [Retry Failed Rows]. System re-processes only the 2 affected rows. If rate limits persist, system suggests increasing delay between batches or reducing batch size.
### Scenario C: Key Column NULL in Source Data
* **System Response**:
* During run, rows with NULL key values are collected.
* At completion: "⚠️ 15 rows were skipped because key column 'region_code' contained NULL values."
* [Download skipped rows] link to download a CSV of unprocessable rows.
* **Recovery**: User fixes NULL key values in the source and re-runs the job (existing successful rows are skipped by default).
### Scenario D: Target Table Not Found
* **User Action**: Specifies a target table that does not exist in Superset.
* **System Response**:
* At save time: "❌ Target table 'products_i18n' was not found in Superset. Please verify the table name or create it first."
* Save is blocked.
* **Recovery**: User either corrects the table name or creates the table in Superset, then saves.
### Scenario E: Configuration Modified During Run
* **User Action**: Edits job configuration while a run is in progress.
* **System Response**:
* Banner: " A translation run is currently in progress. It will continue using its existing configuration snapshot. Your changes will apply to future runs only."
* Save is allowed; no run cancellation.
* **Recovery**: User saves changes. In-progress run completes unaffected. Next manual or scheduled run uses updated configuration.
### Scenario F: Dictionary Deletion Blocked by Active Job
* **User Action**: Attempts to delete a dictionary that is attached to an active or scheduled translation job.
* **System Response**:
* Dialog: "❌ Cannot delete 'Product Terms'. It is attached to 2 active jobs: 'Products RU Translation' (scheduled), 'Catalog RU'. Detach the dictionary from these jobs first."
* [Show attached jobs] link opens a filtered view.
* **Recovery**: User opens each job, removes the dictionary attachment, then returns to delete. Or detaches via a bulk action from the dialog.
### Scenario G: Dictionary Import with Malformed File
* **User Action**: Imports a CSV that has wrong column count, encoding issues, or mixed delimiters.
* **System Response**:
* Preview shows: "⚠️ 12 rows could not be parsed. Row 5: only 1 column detected (expected 2). Row 8: encoding error."
* Parsable rows are shown; unparsable rows are listed with line numbers and reasons.
* **Recovery**: User can download the unparsable rows as a separate file for manual correction, or adjust the import settings (delimiter, encoding) and retry.
### Scenario H: Scheduled Run — No New Source Rows
* **System Response**:
* Before processing, system detects that all key-column values in the source already exist in the previous successful run (new-key-only strategy).
* Banner: " No new rows detected since last run (2026-05-04). All 1,231 keys already translated. Run skipped."
* A «no-new-rows» run record is created for audit but no INSERT statements are generated.
* **Recovery**: User can trigger a manual run from the job page if full retranslation is needed (e.g., prompt or dictionary changed).
### Scenario I: Scheduled Run Failure Notification
* **User Action**: Scheduled run fails (LLM quota exhausted at 06:00 on Monday).
* **System Response**:
* Run record marked as ❌ failed with error details.
* Notification sent (if configured): "Scheduled translation 'Products RU' failed: LLM quota exceeded. Next attempt: Mon, 18 May 06:00. [View Details]"
* Schedule remains active for the next trigger.
* **Recovery**: User recharges LLM quota, opens the failed run, and clicks [Retry Failed Run] to reprocess immediately without waiting for the next schedule.
## 5. Tone & Voice
* **Style**: Concise, technical but approachable. Use short sentences with clear action verbs.
* **Terminology**:
* "Datasource" (not "data source" or "dataset") — aligned with Superset terminology.
* "Translation column" — the column whose values are translated.
* "Context columns" — columns that provide semantic context to the LLM.
* "Key columns" — columns that uniquely identify rows for target-table INSERT matching.
* "Materialized table" — the target table that physically stores translated data.
* "INSERT statements" — the generated SQL output, not "queries" or "scripts".
* "Batch" — a group of rows processed in one LLM API call.
* "Run" — a single execution of a translation job.
* "Terminology dictionary" / "Dictionary" — a named, language-specific collection of source_term → target_translation pairs.
* "Dictionary entry" — a single term pair inside a dictionary.
* "Feedback loop" — the workflow of correcting a translation in run results and submitting the correction to a dictionary.
* "Schedule" — a recurring trigger configuration (cron, interval, or one-time) attached to a translation job.
## 6. Frontend Integration Notes
* **Routes**:
* `/translate` — Translation Jobs list and management.
* `/translate/dictionaries` — Terminology dictionary management.
* `/translate/history` — Translation run history (may be a sub-view of jobs).
* **Components**:
* `TranslationJobList` — list of saved translation jobs with status and schedule indicators.
* `TranslationJobConfig` — the configuration form (Flow A), with dictionary attachment and schedule tab.
* `TranslationPreview` — the side-by-side preview view (Flow B).
* `TranslationRunProgress` — live progress during execution (Flow C).
* `TranslationRunResult` — completion summary with generated SQL and inline feedback-loop controls (Flow C + Flow F).
* `TranslationHistory` — filterable history of past runs (Flow D).
* `DictionaryList` — list of dictionaries with term counts and attachment info (Flow E).
* `DictionaryEditor` — inline term editor with import/export (Flow E).
* `TermCorrectionPopup` — the feedback-loop popup for submitting corrected terms (Flow F).
* `ScheduleConfig` — schedule type selector, cron/interval inputs, upcoming preview, auto-INSERT and concurrency settings (Flow G).
* `BulkCorrectionSidebar` — sidebar for selecting and correcting multiple terms at once (Flow F).
* **State Management**: Job configuration, preview state, run progress, dictionary data, and schedule state are managed through Svelte stores with API-backed persistence.
* **API Surface**: Standard REST endpoints under `/api/translate/` for CRUD on jobs, dictionaries, preview triggering, run execution, schedule management, feedback-loop submission, and history retrieval.
* **WebSocket**: Real-time progress updates during batch translation via existing WebSocket infrastructure (consistent with Task Drawer patterns). Schedule trigger events are logged server-side without WebSocket push (user sees results on next page load or via notification).
* **Contract Mapping**:
* `@UX_STATE`: Enumerate states per component — idle, loading, configured, previewing, running, completed, failed, retrying, scheduled, schedule_paused, dictionary_attached, dictionary_empty, import_preview, import_conflict, correction_selected, correction_submitted.
* `@UX_FEEDBACK`: Loading skeletons during datasource fetch; spinners during LLM calls; toast notifications for save/delete; progress bar during run; syntax-highlighted SQL output; badge for dictionary-attached count; «Added to Dictionary» confirmation badge on corrected rows; schedule status indicator (active/paused); upcoming execution timeline.
* `@UX_RECOVERY`: Retry button for failed batches; datasource replacement flow; NULL-key download; configuration duplication from history; dictionary conflict resolution dialog; import error row download; schedule force-run override; failed-schedule-run retry.
* `@UX_REACTIVITY`: Column list derived from selected datasource (`$derived`); batch progress computed from completed/total counts; cost estimate reactivity on batch size change; next N execution times derived from schedule config; term count reactively updated on add/delete; dictionary attachment list reactivity.