21 KiB
Implementation Plan: LLM Dataset Orchestration
Branch: 027-dataset-llm-orchestration | Date: 2026-03-16 | Spec: /home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/spec.md
Input: Feature specification from /home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/spec.md
Note: This template is filled in by the /speckit.plan command. See /home/busya/dev/ss-tools/.specify/templates/plan-template.md for the execution workflow.
Summary
Deliver a dataset-centered orchestration flow that lets users start from a Superset link or dataset selection, recover analytical context, enrich semantics from trusted sources before AI generation, resolve ambiguity through guided clarification, generate a Superset-side compiled SQL preview, and launch an audited SQL Lab execution only when readiness gates pass.
The implementation will extend the existing FastAPI + SvelteKit architecture rather than creating a parallel subsystem. Backend work will add a persisted review-session domain, orchestration services for semantic recovery and clarification, Superset adapters for context extraction and SQL Lab execution, and explicit APIs for mapping approvals and field-level semantic overrides. Frontend work will add a dedicated dataset review workspace with progressive recovery, semantic-source review, one-question-at-a-time clarification, mapping approval controls, compiled SQL preview, and resumable session state.
Implementation Status
Accepted delivery to date covers the US1 automatic review slice introduced in commit feat(us1): add dataset review orchestration automatic review slice. The implemented scope includes the review-session startup flow, Superset link/context intake, trusted-source semantic enrichment, export endpoints, and the initial dataset review workspace/panels needed to render findings and readable review output.
Feature delivery also required repository-wide stabilization and compatibility collateral outside the dedicated dataset-review modules. Those follow-up fixes keep the accepted US1 slice working against the current repository baseline, including task/log API compatibility, dashboard/profile filtering behavior, Git route/repository-path hardening, report-list event handling, LLM provider encryption-key validation, and clean-release compatibility repairs exercised by shared acceptance gates. US2 guided clarification and US3 controlled execution remain planned work and are not accepted by this status note.
Technical Context
Language/Version: Python 3.9+ backend, Node.js 18+ frontend, Svelte 5 / SvelteKit frontend runtime
Primary Dependencies: FastAPI, SQLAlchemy, Pydantic, existing TaskManager, existing SupersetClient, existing LLM provider stack, SvelteKit, Tailwind CSS, frontend requestApi/fetchApi wrappers
Storage: Existing application databases for persistent session/domain entities; existing tasks database for async execution metadata; filesystem for optional uploaded semantic sources/artifacts
Testing: pytest for backend unit/integration/API tests; Vitest for frontend component/store/API-wrapper tests
Target Platform: Linux-hosted FastAPI + Svelte web application integrated with Superset
Project Type: Web application with backend API and frontend SPA
Performance Goals:
- Initial summary generation: < 30s (Progressive recovery visible within < 5s)
- Preview compilation: < 10s
- Session load / Resume: < 2s
- SC-002 target: first readable summary under 5 minutes for complex datasets. Constraints: Launch must remain blocked without successful Superset-side compiled preview; long-running recovery/enrichment/preview work must be asynchronous and observable; frontend must use existing API wrappers instead of native fetch; manual semantic overrides must never be silently overwritten; auditability and provenance are prioritized over raw throughput Scale/Scope: One end-to-end feature spanning dataset intake, session persistence, semantic enrichment, clarification, mapping approval, preview, and launch; multiple new backend services/APIs plus a new multi-state frontend workspace
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
Pre-Research Gate Assessment
-
Semantic protocol compliance — PASS WITH REQUIRED PHASE 1 EXPANSION
- New backend orchestration and persistence modules must follow
/home/busya/dev/ss-tools/.ai/standards/semantics.md. - Existing draft contracts are incomplete for the feature scope; Phase 1 must add explicit contracts for semantic-source resolution, clarification lifecycle, Superset context extraction, session persistence, and missing UI states.
- Complexity 4/5 Python modules must explicitly define
logger.reason()/logger.reflect()paths; Complexity 5 boundaries must usebelief_scope.
- New backend orchestration and persistence modules must follow
-
Complexity-driven contract coverage — PASS WITH GAPS TO CLOSE
- The core orchestration boundary is Complexity 5 because it gates launch, audit, state transitions, and cross-service consistency.
- Semantic source resolution, clarification workflow, mapping approval state, and session persistence each require explicit contracts instead of being hidden inside one orchestrator.
- UI contracts must map to the UX state machine, especially
Empty,Importing,Review Ready,Semantic Source Review Needed,Clarification Active,Mapping Review Needed,Compiled Preview Ready,Run Ready,Run In Progress,Completed, andRecovery Required.
-
UX-state compatibility — PASS
- The architecture can support the UX reference because:
- recovery can be progressive and asynchronous,
- clarification can be session-backed and resumed,
- preview generation can be represented as a stateful asynchronous action,
- launch remains a gated terminal action.
- If Phase 0 research later shows Superset cannot provide reliable compilation preview or SQL Lab execution hooks compatible with the required interaction model, planning must stop and the UX contract must be renegotiated.
- The architecture can support the UX reference because:
-
Async boundaries — PASS
- Long-running work already fits the repository constitution through
TaskManager. - Session start, deep context recovery, semantic enrichment from external sources, preview generation, and launch-hand-off side effects should be dispatched as tasks or internally asynchronous service steps with observable state changes.
- Long-running work already fits the repository constitution through
-
Frontend API-wrapper rules — PASS
- Existing frontend uses
/home/busya/dev/ss-tools/frontend/src/lib/api.jswrappers. - New frontend work must use
requestApi,fetchApi,postApi, or wrapper modules only; nativefetchremains forbidden.
- Existing frontend uses
-
RBAC/security constraints — PASS WITH DESIGN REQUIREMENT
- New endpoints must use existing auth and permission dependencies.
- New orchestration actions need explicit permission modeling for reading sessions, editing semantic mappings, answering clarification prompts, generating previews, and launching runs.
- Session data must remain self-scoped/auditable and must not permit cross-user mutation without explicit policy.
- Action: Add
DATASET_REVIEW_*permissions tobackend/src/scripts/seed_permissions.py.
-
Security & Threat Model — PASS
- Session isolation: Every session record is strictly bound to
user_id. Query filters must include owner check. - Audit trail:
DatasetRunContextis immutable after launch. - Credential handling: Reuse existing
SupersetClientencrypted configuration. - Action: API endpoints must use
Depends(get_current_user)and explicit permission checks.
- Session isolation: Every session record is strictly bound to
-
Belief-state/logging constraints — PASS WITH REQUIRED APPLICATION
- Complexity 4/5 Python orchestration modules will require
belief_scopeplus meaningfullogger.reason()andlogger.reflect()traces around state transitions, preview validation, warning approvals, and launch gating.
- Complexity 4/5 Python orchestration modules will require
Post-Design Gate Assessment
-
Semantic protocol compliance — PASS
- All modules in
contracts/modules.mdfollow the complexity-driven metadata requirements. - Relation syntax matches the canonical
@RELATION: [PREDICATE] ->[TARGET_ID]format. - Python modules (Complexity 4/5) explicitly specify
logger.reason()andbelief_scoperequirements in their contracts.
- All modules in
-
API Schema Completeness — PASS
contracts/api.yamlprovides a fully typed OpenAPI 3.0.3 specification.- Every session lifecycle, semantic review, and execution gate is covered by a typed endpoint.
-
UX-Technical Alignment — PASS
- Design supports the WYSIWWR principle via
SupersetCompilationAdapter. - Fallback strategies for missing preview or SQL Lab hooks are defined in
research.md.
- Design supports the WYSIWWR principle via
Final Gate Result
PASS - The implementation plan and design artifacts are constitution-compliant and ready for task breakdown.
Project Structure
Documentation (this feature)
/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│ ├── api.yaml
│ └── modules.md
└── tasks.md
Source Code (repository root)
/home/busya/dev/ss-tools/backend/
├── src/
│ ├── api/
│ │ └── routes/
│ ├── core/
│ ├── models/
│ ├── schemas/
│ └── services/
/home/busya/dev/ss-tools/frontend/
├── src/
│ ├── lib/
│ │ ├── api/
│ │ ├── components/
│ │ ├── i18n/
│ │ └── stores/
│ └── routes/
/home/busya/dev/ss-tools/backend/src/api/routes/__tests__/
/home/busya/dev/ss-tools/backend/src/services/__tests__/
/home/busya/dev/ss-tools/frontend/src/lib/**/__tests__/
/home/busya/dev/ss-tools/frontend/src/routes/**/__tests__/
Structure Decision: Use the repository’s existing web-application split. Backend implementation belongs under /home/busya/dev/ss-tools/backend/src/{models,schemas,services,api/routes}. Frontend implementation belongs under /home/busya/dev/ss-tools/frontend/src/{routes,lib/components,lib/api,lib/stores}. Tests will stay adjacent to their current backend/frontend conventions.
Semantic Contract Guidance
Use this section to drive Phase 1 artifacts, especially
contracts/modules.md.
Planned Critical/High-Value Modules
DatasetReviewOrchestrator—@COMPLEXITY: 5SemanticSourceResolver—@COMPLEXITY: 4ClarificationEngine—@COMPLEXITY: 4SupersetContextExtractor—@COMPLEXITY: 4SupersetCompilationAdapter—@COMPLEXITY: 4DatasetReviewSessionRepositoryor equivalent persistence boundary —@COMPLEXITY: 5DatasetReviewWorkspace—@COMPLEXITY: 5SourceIntakePanel—@COMPLEXITY: 3ValidationFindingsPanel—@COMPLEXITY: 3SemanticLayerReview—@COMPLEXITY: 3ClarificationDialog—@COMPLEXITY: 3ExecutionMappingReview—@COMPLEXITY: 3CompiledSQLPreview—@COMPLEXITY: 3LaunchConfirmationPanel—@COMPLEXITY: 3
Required Semantic Rules
- Use
@COMPLEXITYor@C:as the primary rule source. - Match contract density to complexity:
- Complexity 1: anchors only,
@PURPOSEoptional - Complexity 2:
@PURPOSE - Complexity 3:
@PURPOSE,@RELATION; UI also@UX_STATE - Complexity 4:
@PURPOSE,@RELATION,@PRE,@POST,@SIDE_EFFECT; Python also meaningfullogger.reason()/logger.reflect()path - Complexity 5: level 4 +
@DATA_CONTRACT,@INVARIANT; Python alsobelief_scope; UI also@UX_FEEDBACK,@UX_RECOVERY,@UX_REACTIVITY
- Complexity 1: anchors only,
- Write relations only in canonical form:
@RELATION: [PREDICATE] ->[TARGET_ID] - If any relation target, DTO, or contract dependency is unknown, emit
[NEED_CONTEXT: target]instead of inventing placeholders. - Preserve medium-appropriate anchor/comment syntax for Python, Svelte markup, and Svelte script contexts.
Phase 0: Research Agenda
Open Questions Requiring Resolution
- How to reliably extract saved native filters from supported Superset links and versions.
- How to discover dataset runtime template variables and Jinja placeholders using available Superset APIs and dataset payloads.
- How to perform a safe Superset-side compiled SQL preview compatible with the current deployment/version.
- How to create or bind a SQL Lab execution session as the canonical audited launch target.
- How to model semantic source ranking, fuzzy match review, conflict detection, and provenance without collapsing into an orchestration god-object.
- How to persist resumable clarification and review sessions using the current database stack.
- How to design typed API contracts that support field-level semantic operations, mapping approval flow, and session lifecycle operations.
- How to degrade gracefully when Superset import/preview or LLM enrichment only partially succeeds.
Required Research Outputs
Research must produce explicit decisions for:
- Superset link parsing and recovery strategy
- Superset compilation/SQL Lab integration approach
- Semantic source resolution architecture
- Clarification session persistence model
- Session persistence/audit model
- API schema granularity and endpoint set
- Test strategy for Superset-dependent and LLM-dependent flows
- Delivery milestones for incremental rollout
Phase 1: Design Focus
Phase 1 must generate:
- typed domain entities and DTOs in
/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/data-model.md - expanded semantic contracts in
/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/contracts/modules.md - typed OpenAPI schemas and missing endpoints in
/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/contracts/api.yaml - execution and validation guide in
/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/quickstart.md
Phase 1 must specifically close the current gaps around:
- field-level semantic operations,
- clarification engine responsibilities,
- mapping approval endpoints,
- session lifecycle APIs,
- exportable outputs,
- error-path validation scenarios,
- alignment between UX states and UI contracts.
Delivery Milestones
| Milestone | FR Coverage | Scope | User Value |
|---|---|---|---|
| M1: Sessioned Auto Review | FR-001 to FR-011, FR-035, FR-037 | Source intake, dataset review session, initial profile, findings, provenance, semantic-source application, export of review outputs | Users get immediate documentation, validation, and trusted-source enrichment without manual reconstruction |
| M2: Guided Clarification | FR-012 to FR-020, FR-036, FR-038, FR-039, FR-040 | Clarification engine, resumable questions, question templates/eval, field-level semantic overrides, conflict review, progress persistence | Users can resolve ambiguity safely and preserve manual intent |
| M3: Controlled Execution | FR-021 to FR-034 | Filter extraction, template-variable mapping, warning approvals, compiled preview, SQL Lab launch, manual export path, audited run context | Users can move from recovered context to reproducible execution with clear readiness gates |
RBAC Model
| Permission | Description | Target Role(s) |
|---|---|---|
dataset:session:read |
View own review sessions | Analytics Engineer, BI Engineer, Data Steward |
dataset:session:manage |
Edit mappings, answer questions, override semantics | Analytics Engineer, BI Engineer |
dataset:session:approve |
Approve warning-level mappings | Senior Analytics Engineer, Data Steward |
dataset:execution:preview |
Trigger Superset SQL compilation preview | Analytics Engineer, BI Engineer |
dataset:execution:launch |
Create SQL Lab session in target environment | Analytics Engineer, BI Engineer |
dataset:execution:launch_prod |
Launch in Production-staged environment | Senior Analytics Engineer |
Integration Points
Service Reuse (Critical)
- Superset Interaction: Use existing
backend/src/core/superset_client.py(do not duplicate HTTP clients). - LLM Interaction: Use existing
backend/src/services/llm_provider.pyviaLLMProviderService. - Notifications: Integrate with
NotificationServicefor launch outcomes and preview readiness. - i18n: Use existing
frontend/src/lib/i18n/for all user-facing strings in the review workspace.
Rollout & Monitoring
Feature Flags
ff_dataset_auto_review: Enables basic documentation and intake.ff_dataset_clarification: Enables guided dialogue mode.ff_dataset_execution: Enables preview and launch capabilities.
Metrics & Alerting
- Metrics: Session completion rate, time-to-first-summary, preview failure rate (Superset compilation errors vs connection errors), clarification engagement.
- Alerting: High rate of
503Superset API failures; persistent LLM provider timeouts (> 30s); unauthorized cross-session access attempts.
Implementation Sequencing
Backend First
- Add persistent review-session domain model and schemas.
- Add orchestration services and Superset adapters.
- Add typed API endpoints and explicit RBAC.
- Add task/event integration and audit persistence.
- Add backend tests for session lifecycle, preview gating, launch gating, and degradation paths.
Frontend Next
- Add dataset review route/workspace shell and session loading.
- Add source-intake, summary, findings, and semantic review panels.
- Add clarification dialog and mapping approval UI.
- Add compiled preview and launch confirmation UI.
- Add frontend tests for state transitions, wrappers, and critical UX invariants.
Integration/Hardening
- Validate Superset version compatibility against real/staged environment.
- Verify progressive session recovery and resume flows.
- Verify audit replay/run-context capture.
- Measure success-criteria instrumentation feasibility.
Testing Strategy
Backend
- Unit tests for semantic ranking, provenance/conflict rules, clarification prioritization, preview gating, and launch guards.
- Integration tests for session persistence, Superset adapter behavior, SQL preview orchestration, and SQL Lab launch orchestration with mocked upstream responses.
- API contract tests for typed response schemas, RBAC enforcement, mapping approval operations, field-level semantic edits, export operations, and session lifecycle.
Frontend
- Unit/component tests for state-driven UI contracts, provenance rendering, one-question clarification, mapping approval flow, stale preview handling, and launch gating visuals.
- Integration-style route tests for resume flows, progressive loading, and error recovery states.
External Dependency Strategy
- Mock Superset APIs for CI determinism.
- Use stable fixtures/snapshots for LLM-produced structured outputs.
- Treat provider/transport failure as explicit degraded states rather than semantic failure.
- Include replayable fixtures for imported filters, template variables, conflict cases, and compilation errors.
Risks & Mitigations
| Risk | Why It Matters | Mitigation |
|---|---|---|
| Superset version lacks a stable compiled-preview endpoint | FR-029 and WYSIWWR depend on native Superset-side compilation | Resolve in Phase 0; if unsupported, stop and renegotiate UX/feature scope before implementation |
| Superset link/native filter formats differ across installations | Could make import brittle or partial | Design recovery as best-effort with explicit provenance and recovery-required state |
| SQL Lab launch handoff is inconsistent across environments | FR-032 requires canonical audited launch target | Research version-compatible creation strategy and define fallback as blocked, not silent substitution |
| Semantic resolution logic becomes an orchestration god-object | Hurts maintainability and contract traceability | Separate SemanticSourceResolver, ClarificationEngine, and Superset extraction responsibilities |
| Fuzzy matching creates too many false positives | Undermines trust and increases approval burden | Keep explicit confidence hierarchy, review-required fuzzy matches, and field-level selective application |
| LLM/provider outages interrupt review quality | Could block non-critical enrichment | Degrade to partial review state with preserved trusted-source results and explicit next action |
| Session lifecycle becomes hard to resume safely | FR-019 and FR-036 require resumability | Persist answers, approvals, and current recommended action as first-class session state |
Post-Design Re-Check Criteria
After Phase 1 artifacts are produced, re-check:
- semantic protocol coverage against all planned modules/components,
- UX-state coverage against
/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/ux_reference.md, - explicit API support for field-level semantic actions, mapping approval, exports, and session lifecycle,
- belief-state/logging expectations for Complexity 4/5 Python modules,
- typed schemas sufficient for backend/frontend parallel implementation,
- quickstart coverage of happy path plus critical negative/recovery paths.
Complexity Tracking
Fill ONLY if Constitution Check has violations that must be justified
No justified constitution violations at planning time.