Files
ss-tools/specs/027-dataset-llm-orchestration/plan.md

327 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Implementation Plan: LLM Dataset Orchestration
**Branch**: `027-dataset-llm-orchestration` | **Date**: 2026-03-16 | **Spec**: `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/spec.md`
**Input**: Feature specification from `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/spec.md`
**Note**: This template is filled in by the `/speckit.plan` command. See `/home/busya/dev/ss-tools/.specify/templates/plan-template.md` for the execution workflow.
## Summary
Deliver a dataset-centered orchestration flow that lets users start from a Superset link or dataset selection, recover analytical context, enrich semantics from trusted sources before AI generation, resolve ambiguity through guided clarification, generate a Superset-side compiled SQL preview, and launch an audited SQL Lab execution only when readiness gates pass.
The implementation will extend the existing FastAPI + SvelteKit architecture rather than creating a parallel subsystem. Backend work will add a persisted review-session domain, orchestration services for semantic recovery and clarification, Superset adapters for context extraction and SQL Lab execution, and explicit APIs for mapping approvals and field-level semantic overrides. Frontend work will add a dedicated dataset review workspace with progressive recovery, semantic-source review, one-question-at-a-time clarification, mapping approval controls, compiled SQL preview, and resumable session state.
## Implementation Status
Accepted delivery to date covers the **US1 automatic review slice** introduced in commit [`feat(us1): add dataset review orchestration automatic review slice`](.git). The implemented scope includes the review-session startup flow, Superset link/context intake, trusted-source semantic enrichment, export endpoints, and the initial dataset review workspace/panels needed to render findings and readable review output.
Feature delivery also required repository-wide stabilization and compatibility collateral outside the dedicated dataset-review modules. Those follow-up fixes keep the accepted US1 slice working against the current repository baseline, including task/log API compatibility, dashboard/profile filtering behavior, Git route/repository-path hardening, report-list event handling, LLM provider encryption-key validation, and clean-release compatibility repairs exercised by shared acceptance gates. US2 guided clarification and US3 controlled execution remain planned work and are not accepted by this status note.
## Technical Context
**Language/Version**: Python 3.9+ backend, Node.js 18+ frontend, Svelte 5 / SvelteKit frontend runtime
**Primary Dependencies**: FastAPI, SQLAlchemy, Pydantic, existing `TaskManager`, existing `SupersetClient`, existing LLM provider stack, SvelteKit, Tailwind CSS, frontend `requestApi`/`fetchApi` wrappers
**Storage**: Existing application databases for persistent session/domain entities; existing tasks database for async execution metadata; filesystem for optional uploaded semantic sources/artifacts
**Testing**: pytest for backend unit/integration/API tests; Vitest for frontend component/store/API-wrapper tests
**Target Platform**: Linux-hosted FastAPI + Svelte web application integrated with Superset
**Project Type**: Web application with backend API and frontend SPA
**Performance Goals**:
- Initial summary generation: < 30s (Progressive recovery visible within < 5s)
- Preview compilation: < 10s
- Session load / Resume: < 2s
- SC-002 target: first readable summary under 5 minutes for complex datasets.
**Constraints**: Launch must remain blocked without successful Superset-side compiled preview; long-running recovery/enrichment/preview work must be asynchronous and observable; frontend must use existing API wrappers instead of native fetch; manual semantic overrides must never be silently overwritten; auditability and provenance are prioritized over raw throughput
**Scale/Scope**: One end-to-end feature spanning dataset intake, session persistence, semantic enrichment, clarification, mapping approval, preview, and launch; multiple new backend services/APIs plus a new multi-state frontend workspace
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
### Pre-Research Gate Assessment
1. **Semantic protocol compliance — PASS WITH REQUIRED PHASE 1 EXPANSION**
- New backend orchestration and persistence modules must follow `/home/busya/dev/ss-tools/.ai/standards/semantics.md`.
- Existing draft contracts are incomplete for the feature scope; Phase 1 must add explicit contracts for semantic-source resolution, clarification lifecycle, Superset context extraction, session persistence, and missing UI states.
- Complexity 4/5 Python modules must explicitly define `logger.reason()` / `logger.reflect()` paths; Complexity 5 boundaries must use `belief_scope`.
2. **Complexity-driven contract coverage — PASS WITH GAPS TO CLOSE**
- The core orchestration boundary is Complexity 5 because it gates launch, audit, state transitions, and cross-service consistency.
- Semantic source resolution, clarification workflow, mapping approval state, and session persistence each require explicit contracts instead of being hidden inside one orchestrator.
- UI contracts must map to the UX state machine, especially `Empty`, `Importing`, `Review Ready`, `Semantic Source Review Needed`, `Clarification Active`, `Mapping Review Needed`, `Compiled Preview Ready`, `Run Ready`, `Run In Progress`, `Completed`, and `Recovery Required`.
3. **UX-state compatibility — PASS**
- The architecture can support the UX reference because:
- recovery can be progressive and asynchronous,
- clarification can be session-backed and resumed,
- preview generation can be represented as a stateful asynchronous action,
- launch remains a gated terminal action.
- If Phase 0 research later shows Superset cannot provide reliable compilation preview or SQL Lab execution hooks compatible with the required interaction model, planning must stop and the UX contract must be renegotiated.
4. **Async boundaries — PASS**
- Long-running work already fits the repository constitution through `TaskManager`.
- Session start, deep context recovery, semantic enrichment from external sources, preview generation, and launch-hand-off side effects should be dispatched as tasks or internally asynchronous service steps with observable state changes.
5. **Frontend API-wrapper rules — PASS**
- Existing frontend uses `/home/busya/dev/ss-tools/frontend/src/lib/api.js` wrappers.
- New frontend work must use `requestApi`, `fetchApi`, `postApi`, or wrapper modules only; native `fetch` remains forbidden.
6. **RBAC/security constraints — PASS WITH DESIGN REQUIREMENT**
- New endpoints must use existing auth and permission dependencies.
- New orchestration actions need explicit permission modeling for reading sessions, editing semantic mappings, answering clarification prompts, generating previews, and launching runs.
- Session data must remain self-scoped/auditable and must not permit cross-user mutation without explicit policy.
- **Action**: Add `DATASET_REVIEW_*` permissions to `backend/src/scripts/seed_permissions.py`.
7. **Security & Threat Model — PASS**
- Session isolation: Every session record is strictly bound to `user_id`. Query filters must include owner check.
- Audit trail: `DatasetRunContext` is immutable after launch.
- Credential handling: Reuse existing `SupersetClient` encrypted configuration.
- **Action**: API endpoints must use `Depends(get_current_user)` and explicit permission checks.
7. **Belief-state/logging constraints — PASS WITH REQUIRED APPLICATION**
- Complexity 4/5 Python orchestration modules will require `belief_scope` plus meaningful `logger.reason()` and `logger.reflect()` traces around state transitions, preview validation, warning approvals, and launch gating.
### Post-Design Gate Assessment
1. **Semantic protocol compliance — PASS**
- All modules in `contracts/modules.md` follow the complexity-driven metadata requirements.
- Relation syntax matches the canonical `@RELATION: [PREDICATE] ->[TARGET_ID]` format.
- Python modules (Complexity 4/5) explicitly specify `logger.reason()` and `belief_scope` requirements in their contracts.
2. **API Schema Completeness — PASS**
- `contracts/api.yaml` provides a fully typed OpenAPI 3.0.3 specification.
- Every session lifecycle, semantic review, and execution gate is covered by a typed endpoint.
3. **UX-Technical Alignment — PASS**
- Design supports the WYSIWWR principle via `SupersetCompilationAdapter`.
- Fallback strategies for missing preview or SQL Lab hooks are defined in `research.md`.
### Final Gate Result
**PASS** - The implementation plan and design artifacts are constitution-compliant and ready for task breakdown.
## Project Structure
### Documentation (this feature)
```text
/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│ ├── api.yaml
│ └── modules.md
└── tasks.md
```
### Source Code (repository root)
```text
/home/busya/dev/ss-tools/backend/
├── src/
│ ├── api/
│ │ └── routes/
│ ├── core/
│ ├── models/
│ ├── schemas/
│ └── services/
/home/busya/dev/ss-tools/frontend/
├── src/
│ ├── lib/
│ │ ├── api/
│ │ ├── components/
│ │ ├── i18n/
│ │ └── stores/
│ └── routes/
/home/busya/dev/ss-tools/backend/src/api/routes/__tests__/
/home/busya/dev/ss-tools/backend/src/services/__tests__/
/home/busya/dev/ss-tools/frontend/src/lib/**/__tests__/
/home/busya/dev/ss-tools/frontend/src/routes/**/__tests__/
```
**Structure Decision**: Use the repositorys existing web-application split. Backend implementation belongs under `/home/busya/dev/ss-tools/backend/src/{models,schemas,services,api/routes}`. Frontend implementation belongs under `/home/busya/dev/ss-tools/frontend/src/{routes,lib/components,lib/api,lib/stores}`. Tests will stay adjacent to their current backend/frontend conventions.
## Semantic Contract Guidance
> Use this section to drive Phase 1 artifacts, especially `contracts/modules.md`.
### Planned Critical/High-Value Modules
- `DatasetReviewOrchestrator` `@COMPLEXITY: 5`
- `SemanticSourceResolver` `@COMPLEXITY: 4`
- `ClarificationEngine` `@COMPLEXITY: 4`
- `SupersetContextExtractor` `@COMPLEXITY: 4`
- `SupersetCompilationAdapter` `@COMPLEXITY: 4`
- `DatasetReviewSessionRepository` or equivalent persistence boundary `@COMPLEXITY: 5`
- `DatasetReviewWorkspace` `@COMPLEXITY: 5`
- `SourceIntakePanel` `@COMPLEXITY: 3`
- `ValidationFindingsPanel` `@COMPLEXITY: 3`
- `SemanticLayerReview` `@COMPLEXITY: 3`
- `ClarificationDialog` `@COMPLEXITY: 3`
- `ExecutionMappingReview` `@COMPLEXITY: 3`
- `CompiledSQLPreview` `@COMPLEXITY: 3`
- `LaunchConfirmationPanel` `@COMPLEXITY: 3`
### Required Semantic Rules
- Use `@COMPLEXITY` or `@C:` as the primary rule source.
- Match contract density to complexity:
- Complexity 1: anchors only, `@PURPOSE` optional
- Complexity 2: `@PURPOSE`
- Complexity 3: `@PURPOSE`, `@RELATION`; UI also `@UX_STATE`
- Complexity 4: `@PURPOSE`, `@RELATION`, `@PRE`, `@POST`, `@SIDE_EFFECT`; Python also meaningful `logger.reason()` / `logger.reflect()` path
- Complexity 5: level 4 + `@DATA_CONTRACT`, `@INVARIANT`; Python also `belief_scope`; UI also `@UX_FEEDBACK`, `@UX_RECOVERY`, `@UX_REACTIVITY`
- Write relations only in canonical form: `@RELATION: [PREDICATE] ->[TARGET_ID]`
- If any relation target, DTO, or contract dependency is unknown, emit `[NEED_CONTEXT: target]` instead of inventing placeholders.
- Preserve medium-appropriate anchor/comment syntax for Python, Svelte markup, and Svelte script contexts.
## Phase 0: Research Agenda
### Open Questions Requiring Resolution
1. How to reliably extract saved native filters from supported Superset links and versions.
2. How to discover dataset runtime template variables and Jinja placeholders using available Superset APIs and dataset payloads.
3. How to perform a safe Superset-side compiled SQL preview compatible with the current deployment/version.
4. How to create or bind a SQL Lab execution session as the canonical audited launch target.
5. How to model semantic source ranking, fuzzy match review, conflict detection, and provenance without collapsing into an orchestration god-object.
6. How to persist resumable clarification and review sessions using the current database stack.
7. How to design typed API contracts that support field-level semantic operations, mapping approval flow, and session lifecycle operations.
8. How to degrade gracefully when Superset import/preview or LLM enrichment only partially succeeds.
### Required Research Outputs
Research must produce explicit decisions for:
- Superset link parsing and recovery strategy
- Superset compilation/SQL Lab integration approach
- Semantic source resolution architecture
- Clarification session persistence model
- Session persistence/audit model
- API schema granularity and endpoint set
- Test strategy for Superset-dependent and LLM-dependent flows
- Delivery milestones for incremental rollout
## Phase 1: Design Focus
Phase 1 must generate:
- typed domain entities and DTOs in `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/data-model.md`
- expanded semantic contracts in `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/contracts/modules.md`
- typed OpenAPI schemas and missing endpoints in `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/contracts/api.yaml`
- execution and validation guide in `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/quickstart.md`
Phase 1 must specifically close the current gaps around:
- field-level semantic operations,
- clarification engine responsibilities,
- mapping approval endpoints,
- session lifecycle APIs,
- exportable outputs,
- error-path validation scenarios,
- alignment between UX states and UI contracts.
## Delivery Milestones
| Milestone | FR Coverage | Scope | User Value |
|-----------|-------------|-------|------------|
| M1: Sessioned Auto Review | FR-001 to FR-011, FR-035, FR-037 | Source intake, dataset review session, initial profile, findings, provenance, semantic-source application, export of review outputs | Users get immediate documentation, validation, and trusted-source enrichment without manual reconstruction |
| M2: Guided Clarification | FR-012 to FR-020, FR-036, FR-038, FR-039, FR-040 | Clarification engine, resumable questions, question templates/eval, field-level semantic overrides, conflict review, progress persistence | Users can resolve ambiguity safely and preserve manual intent |
| M3: Controlled Execution | FR-021 to FR-034 | Filter extraction, template-variable mapping, warning approvals, compiled preview, SQL Lab launch, manual export path, audited run context | Users can move from recovered context to reproducible execution with clear readiness gates |
## RBAC Model
| Permission | Description | Target Role(s) |
|------------|-------------|----------------|
| `dataset:session:read` | View own review sessions | Analytics Engineer, BI Engineer, Data Steward |
| `dataset:session:manage` | Edit mappings, answer questions, override semantics | Analytics Engineer, BI Engineer |
| `dataset:session:approve` | Approve warning-level mappings | Senior Analytics Engineer, Data Steward |
| `dataset:execution:preview` | Trigger Superset SQL compilation preview | Analytics Engineer, BI Engineer |
| `dataset:execution:launch` | Create SQL Lab session in target environment | Analytics Engineer, BI Engineer |
| `dataset:execution:launch_prod` | Launch in Production-staged environment | Senior Analytics Engineer |
## Integration Points
### Service Reuse (Critical)
- **Superset Interaction**: Use existing `backend/src/core/superset_client.py` (do not duplicate HTTP clients).
- **LLM Interaction**: Use existing `backend/src/services/llm_provider.py` via `LLMProviderService`.
- **Notifications**: Integrate with `NotificationService` for launch outcomes and preview readiness.
- **i18n**: Use existing `frontend/src/lib/i18n/` for all user-facing strings in the review workspace.
## Rollout & Monitoring
### Feature Flags
- `ff_dataset_auto_review`: Enables basic documentation and intake.
- `ff_dataset_clarification`: Enables guided dialogue mode.
- `ff_dataset_execution`: Enables preview and launch capabilities.
### Metrics & Alerting
- **Metrics**: Session completion rate, time-to-first-summary, preview failure rate (Superset compilation errors vs connection errors), clarification engagement.
- **Alerting**: High rate of `503` Superset API failures; persistent LLM provider timeouts (> 30s); unauthorized cross-session access attempts.
## Implementation Sequencing
### Backend First
1. Add persistent review-session domain model and schemas.
2. Add orchestration services and Superset adapters.
3. Add typed API endpoints and explicit RBAC.
4. Add task/event integration and audit persistence.
5. Add backend tests for session lifecycle, preview gating, launch gating, and degradation paths.
### Frontend Next
1. Add dataset review route/workspace shell and session loading.
2. Add source-intake, summary, findings, and semantic review panels.
3. Add clarification dialog and mapping approval UI.
4. Add compiled preview and launch confirmation UI.
5. Add frontend tests for state transitions, wrappers, and critical UX invariants.
### Integration/Hardening
1. Validate Superset version compatibility against real/staged environment.
2. Verify progressive session recovery and resume flows.
3. Verify audit replay/run-context capture.
4. Measure success-criteria instrumentation feasibility.
## Testing Strategy
### Backend
- **Unit tests** for semantic ranking, provenance/conflict rules, clarification prioritization, preview gating, and launch guards.
- **Integration tests** for session persistence, Superset adapter behavior, SQL preview orchestration, and SQL Lab launch orchestration with mocked upstream responses.
- **API contract tests** for typed response schemas, RBAC enforcement, mapping approval operations, field-level semantic edits, export operations, and session lifecycle.
### Frontend
- **Unit/component tests** for state-driven UI contracts, provenance rendering, one-question clarification, mapping approval flow, stale preview handling, and launch gating visuals.
- **Integration-style route tests** for resume flows, progressive loading, and error recovery states.
### External Dependency Strategy
- Mock Superset APIs for CI determinism.
- Use stable fixtures/snapshots for LLM-produced structured outputs.
- Treat provider/transport failure as explicit degraded states rather than semantic failure.
- Include replayable fixtures for imported filters, template variables, conflict cases, and compilation errors.
## Risks & Mitigations
| Risk | Why It Matters | Mitigation |
|------|----------------|------------|
| Superset version lacks a stable compiled-preview endpoint | FR-029 and WYSIWWR depend on native Superset-side compilation | Resolve in Phase 0; if unsupported, stop and renegotiate UX/feature scope before implementation |
| Superset link/native filter formats differ across installations | Could make import brittle or partial | Design recovery as best-effort with explicit provenance and recovery-required state |
| SQL Lab launch handoff is inconsistent across environments | FR-032 requires canonical audited launch target | Research version-compatible creation strategy and define fallback as blocked, not silent substitution |
| Semantic resolution logic becomes an orchestration god-object | Hurts maintainability and contract traceability | Separate `SemanticSourceResolver`, `ClarificationEngine`, and Superset extraction responsibilities |
| Fuzzy matching creates too many false positives | Undermines trust and increases approval burden | Keep explicit confidence hierarchy, review-required fuzzy matches, and field-level selective application |
| LLM/provider outages interrupt review quality | Could block non-critical enrichment | Degrade to partial review state with preserved trusted-source results and explicit next action |
| Session lifecycle becomes hard to resume safely | FR-019 and FR-036 require resumability | Persist answers, approvals, and current recommended action as first-class session state |
## Post-Design Re-Check Criteria
After Phase 1 artifacts are produced, re-check:
- semantic protocol coverage against all planned modules/components,
- UX-state coverage against `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/ux_reference.md`,
- explicit API support for field-level semantic actions, mapping approval, exports, and session lifecycle,
- belief-state/logging expectations for Complexity 4/5 Python modules,
- typed schemas sufficient for backend/frontend parallel implementation,
- quickstart coverage of happy path plus critical negative/recovery paths.
## Complexity Tracking
> **Fill ONLY if Constitution Check has violations that must be justified**
No justified constitution violations at planning time.