327 lines
21 KiB
Markdown
327 lines
21 KiB
Markdown
# Implementation Plan: LLM Dataset Orchestration
|
||
|
||
**Branch**: `027-dataset-llm-orchestration` | **Date**: 2026-03-16 | **Spec**: `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/spec.md`
|
||
**Input**: Feature specification from `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/spec.md`
|
||
|
||
**Note**: This template is filled in by the `/speckit.plan` command. See `/home/busya/dev/ss-tools/.specify/templates/plan-template.md` for the execution workflow.
|
||
|
||
## Summary
|
||
|
||
Deliver a dataset-centered orchestration flow that lets users start from a Superset link or dataset selection, recover analytical context, enrich semantics from trusted sources before AI generation, resolve ambiguity through guided clarification, generate a Superset-side compiled SQL preview, and launch an audited SQL Lab execution only when readiness gates pass.
|
||
|
||
The implementation will extend the existing FastAPI + SvelteKit architecture rather than creating a parallel subsystem. Backend work will add a persisted review-session domain, orchestration services for semantic recovery and clarification, Superset adapters for context extraction and SQL Lab execution, and explicit APIs for mapping approvals and field-level semantic overrides. Frontend work will add a dedicated dataset review workspace with progressive recovery, semantic-source review, one-question-at-a-time clarification, mapping approval controls, compiled SQL preview, and resumable session state.
|
||
|
||
## Implementation Status
|
||
|
||
Accepted delivery to date covers the **US1 automatic review slice** introduced in commit [`feat(us1): add dataset review orchestration automatic review slice`](.git). The implemented scope includes the review-session startup flow, Superset link/context intake, trusted-source semantic enrichment, export endpoints, and the initial dataset review workspace/panels needed to render findings and readable review output.
|
||
|
||
Feature delivery also required repository-wide stabilization and compatibility collateral outside the dedicated dataset-review modules. Those follow-up fixes keep the accepted US1 slice working against the current repository baseline, including task/log API compatibility, dashboard/profile filtering behavior, Git route/repository-path hardening, report-list event handling, LLM provider encryption-key validation, and clean-release compatibility repairs exercised by shared acceptance gates. US2 guided clarification and US3 controlled execution remain planned work and are not accepted by this status note.
|
||
|
||
## Technical Context
|
||
|
||
**Language/Version**: Python 3.9+ backend, Node.js 18+ frontend, Svelte 5 / SvelteKit frontend runtime
|
||
**Primary Dependencies**: FastAPI, SQLAlchemy, Pydantic, existing `TaskManager`, existing `SupersetClient`, existing LLM provider stack, SvelteKit, Tailwind CSS, frontend `requestApi`/`fetchApi` wrappers
|
||
**Storage**: Existing application databases for persistent session/domain entities; existing tasks database for async execution metadata; filesystem for optional uploaded semantic sources/artifacts
|
||
**Testing**: pytest for backend unit/integration/API tests; Vitest for frontend component/store/API-wrapper tests
|
||
**Target Platform**: Linux-hosted FastAPI + Svelte web application integrated with Superset
|
||
**Project Type**: Web application with backend API and frontend SPA
|
||
**Performance Goals**:
|
||
- Initial summary generation: < 30s (Progressive recovery visible within < 5s)
|
||
- Preview compilation: < 10s
|
||
- Session load / Resume: < 2s
|
||
- SC-002 target: first readable summary under 5 minutes for complex datasets.
|
||
**Constraints**: Launch must remain blocked without successful Superset-side compiled preview; long-running recovery/enrichment/preview work must be asynchronous and observable; frontend must use existing API wrappers instead of native fetch; manual semantic overrides must never be silently overwritten; auditability and provenance are prioritized over raw throughput
|
||
**Scale/Scope**: One end-to-end feature spanning dataset intake, session persistence, semantic enrichment, clarification, mapping approval, preview, and launch; multiple new backend services/APIs plus a new multi-state frontend workspace
|
||
|
||
## Constitution Check
|
||
|
||
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
||
|
||
### Pre-Research Gate Assessment
|
||
|
||
1. **Semantic protocol compliance — PASS WITH REQUIRED PHASE 1 EXPANSION**
|
||
- New backend orchestration and persistence modules must follow `/home/busya/dev/ss-tools/.ai/standards/semantics.md`.
|
||
- Existing draft contracts are incomplete for the feature scope; Phase 1 must add explicit contracts for semantic-source resolution, clarification lifecycle, Superset context extraction, session persistence, and missing UI states.
|
||
- Complexity 4/5 Python modules must explicitly define `logger.reason()` / `logger.reflect()` paths; Complexity 5 boundaries must use `belief_scope`.
|
||
|
||
2. **Complexity-driven contract coverage — PASS WITH GAPS TO CLOSE**
|
||
- The core orchestration boundary is Complexity 5 because it gates launch, audit, state transitions, and cross-service consistency.
|
||
- Semantic source resolution, clarification workflow, mapping approval state, and session persistence each require explicit contracts instead of being hidden inside one orchestrator.
|
||
- UI contracts must map to the UX state machine, especially `Empty`, `Importing`, `Review Ready`, `Semantic Source Review Needed`, `Clarification Active`, `Mapping Review Needed`, `Compiled Preview Ready`, `Run Ready`, `Run In Progress`, `Completed`, and `Recovery Required`.
|
||
|
||
3. **UX-state compatibility — PASS**
|
||
- The architecture can support the UX reference because:
|
||
- recovery can be progressive and asynchronous,
|
||
- clarification can be session-backed and resumed,
|
||
- preview generation can be represented as a stateful asynchronous action,
|
||
- launch remains a gated terminal action.
|
||
- If Phase 0 research later shows Superset cannot provide reliable compilation preview or SQL Lab execution hooks compatible with the required interaction model, planning must stop and the UX contract must be renegotiated.
|
||
|
||
4. **Async boundaries — PASS**
|
||
- Long-running work already fits the repository constitution through `TaskManager`.
|
||
- Session start, deep context recovery, semantic enrichment from external sources, preview generation, and launch-hand-off side effects should be dispatched as tasks or internally asynchronous service steps with observable state changes.
|
||
|
||
5. **Frontend API-wrapper rules — PASS**
|
||
- Existing frontend uses `/home/busya/dev/ss-tools/frontend/src/lib/api.js` wrappers.
|
||
- New frontend work must use `requestApi`, `fetchApi`, `postApi`, or wrapper modules only; native `fetch` remains forbidden.
|
||
|
||
6. **RBAC/security constraints — PASS WITH DESIGN REQUIREMENT**
|
||
- New endpoints must use existing auth and permission dependencies.
|
||
- New orchestration actions need explicit permission modeling for reading sessions, editing semantic mappings, answering clarification prompts, generating previews, and launching runs.
|
||
- Session data must remain self-scoped/auditable and must not permit cross-user mutation without explicit policy.
|
||
- **Action**: Add `DATASET_REVIEW_*` permissions to `backend/src/scripts/seed_permissions.py`.
|
||
|
||
7. **Security & Threat Model — PASS**
|
||
- Session isolation: Every session record is strictly bound to `user_id`. Query filters must include owner check.
|
||
- Audit trail: `DatasetRunContext` is immutable after launch.
|
||
- Credential handling: Reuse existing `SupersetClient` encrypted configuration.
|
||
- **Action**: API endpoints must use `Depends(get_current_user)` and explicit permission checks.
|
||
|
||
7. **Belief-state/logging constraints — PASS WITH REQUIRED APPLICATION**
|
||
- Complexity 4/5 Python orchestration modules will require `belief_scope` plus meaningful `logger.reason()` and `logger.reflect()` traces around state transitions, preview validation, warning approvals, and launch gating.
|
||
|
||
### Post-Design Gate Assessment
|
||
|
||
1. **Semantic protocol compliance — PASS**
|
||
- All modules in `contracts/modules.md` follow the complexity-driven metadata requirements.
|
||
- Relation syntax matches the canonical `@RELATION: [PREDICATE] ->[TARGET_ID]` format.
|
||
- Python modules (Complexity 4/5) explicitly specify `logger.reason()` and `belief_scope` requirements in their contracts.
|
||
|
||
2. **API Schema Completeness — PASS**
|
||
- `contracts/api.yaml` provides a fully typed OpenAPI 3.0.3 specification.
|
||
- Every session lifecycle, semantic review, and execution gate is covered by a typed endpoint.
|
||
|
||
3. **UX-Technical Alignment — PASS**
|
||
- Design supports the WYSIWWR principle via `SupersetCompilationAdapter`.
|
||
- Fallback strategies for missing preview or SQL Lab hooks are defined in `research.md`.
|
||
|
||
### Final Gate Result
|
||
|
||
**PASS** - The implementation plan and design artifacts are constitution-compliant and ready for task breakdown.
|
||
|
||
## Project Structure
|
||
|
||
### Documentation (this feature)
|
||
|
||
```text
|
||
/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/
|
||
├── plan.md
|
||
├── research.md
|
||
├── data-model.md
|
||
├── quickstart.md
|
||
├── contracts/
|
||
│ ├── api.yaml
|
||
│ └── modules.md
|
||
└── tasks.md
|
||
```
|
||
|
||
### Source Code (repository root)
|
||
|
||
```text
|
||
/home/busya/dev/ss-tools/backend/
|
||
├── src/
|
||
│ ├── api/
|
||
│ │ └── routes/
|
||
│ ├── core/
|
||
│ ├── models/
|
||
│ ├── schemas/
|
||
│ └── services/
|
||
|
||
/home/busya/dev/ss-tools/frontend/
|
||
├── src/
|
||
│ ├── lib/
|
||
│ │ ├── api/
|
||
│ │ ├── components/
|
||
│ │ ├── i18n/
|
||
│ │ └── stores/
|
||
│ └── routes/
|
||
|
||
/home/busya/dev/ss-tools/backend/src/api/routes/__tests__/
|
||
/home/busya/dev/ss-tools/backend/src/services/__tests__/
|
||
/home/busya/dev/ss-tools/frontend/src/lib/**/__tests__/
|
||
/home/busya/dev/ss-tools/frontend/src/routes/**/__tests__/
|
||
```
|
||
|
||
**Structure Decision**: Use the repository’s existing web-application split. Backend implementation belongs under `/home/busya/dev/ss-tools/backend/src/{models,schemas,services,api/routes}`. Frontend implementation belongs under `/home/busya/dev/ss-tools/frontend/src/{routes,lib/components,lib/api,lib/stores}`. Tests will stay adjacent to their current backend/frontend conventions.
|
||
|
||
## Semantic Contract Guidance
|
||
|
||
> Use this section to drive Phase 1 artifacts, especially `contracts/modules.md`.
|
||
|
||
### Planned Critical/High-Value Modules
|
||
|
||
- `DatasetReviewOrchestrator` — `@COMPLEXITY: 5`
|
||
- `SemanticSourceResolver` — `@COMPLEXITY: 4`
|
||
- `ClarificationEngine` — `@COMPLEXITY: 4`
|
||
- `SupersetContextExtractor` — `@COMPLEXITY: 4`
|
||
- `SupersetCompilationAdapter` — `@COMPLEXITY: 4`
|
||
- `DatasetReviewSessionRepository` or equivalent persistence boundary — `@COMPLEXITY: 5`
|
||
- `DatasetReviewWorkspace` — `@COMPLEXITY: 5`
|
||
- `SourceIntakePanel` — `@COMPLEXITY: 3`
|
||
- `ValidationFindingsPanel` — `@COMPLEXITY: 3`
|
||
- `SemanticLayerReview` — `@COMPLEXITY: 3`
|
||
- `ClarificationDialog` — `@COMPLEXITY: 3`
|
||
- `ExecutionMappingReview` — `@COMPLEXITY: 3`
|
||
- `CompiledSQLPreview` — `@COMPLEXITY: 3`
|
||
- `LaunchConfirmationPanel` — `@COMPLEXITY: 3`
|
||
|
||
### Required Semantic Rules
|
||
|
||
- Use `@COMPLEXITY` or `@C:` as the primary rule source.
|
||
- Match contract density to complexity:
|
||
- Complexity 1: anchors only, `@PURPOSE` optional
|
||
- Complexity 2: `@PURPOSE`
|
||
- Complexity 3: `@PURPOSE`, `@RELATION`; UI also `@UX_STATE`
|
||
- Complexity 4: `@PURPOSE`, `@RELATION`, `@PRE`, `@POST`, `@SIDE_EFFECT`; Python also meaningful `logger.reason()` / `logger.reflect()` path
|
||
- Complexity 5: level 4 + `@DATA_CONTRACT`, `@INVARIANT`; Python also `belief_scope`; UI also `@UX_FEEDBACK`, `@UX_RECOVERY`, `@UX_REACTIVITY`
|
||
- Write relations only in canonical form: `@RELATION: [PREDICATE] ->[TARGET_ID]`
|
||
- If any relation target, DTO, or contract dependency is unknown, emit `[NEED_CONTEXT: target]` instead of inventing placeholders.
|
||
- Preserve medium-appropriate anchor/comment syntax for Python, Svelte markup, and Svelte script contexts.
|
||
|
||
## Phase 0: Research Agenda
|
||
|
||
### Open Questions Requiring Resolution
|
||
|
||
1. How to reliably extract saved native filters from supported Superset links and versions.
|
||
2. How to discover dataset runtime template variables and Jinja placeholders using available Superset APIs and dataset payloads.
|
||
3. How to perform a safe Superset-side compiled SQL preview compatible with the current deployment/version.
|
||
4. How to create or bind a SQL Lab execution session as the canonical audited launch target.
|
||
5. How to model semantic source ranking, fuzzy match review, conflict detection, and provenance without collapsing into an orchestration god-object.
|
||
6. How to persist resumable clarification and review sessions using the current database stack.
|
||
7. How to design typed API contracts that support field-level semantic operations, mapping approval flow, and session lifecycle operations.
|
||
8. How to degrade gracefully when Superset import/preview or LLM enrichment only partially succeeds.
|
||
|
||
### Required Research Outputs
|
||
|
||
Research must produce explicit decisions for:
|
||
- Superset link parsing and recovery strategy
|
||
- Superset compilation/SQL Lab integration approach
|
||
- Semantic source resolution architecture
|
||
- Clarification session persistence model
|
||
- Session persistence/audit model
|
||
- API schema granularity and endpoint set
|
||
- Test strategy for Superset-dependent and LLM-dependent flows
|
||
- Delivery milestones for incremental rollout
|
||
|
||
## Phase 1: Design Focus
|
||
|
||
Phase 1 must generate:
|
||
- typed domain entities and DTOs in `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/data-model.md`
|
||
- expanded semantic contracts in `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/contracts/modules.md`
|
||
- typed OpenAPI schemas and missing endpoints in `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/contracts/api.yaml`
|
||
- execution and validation guide in `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/quickstart.md`
|
||
|
||
Phase 1 must specifically close the current gaps around:
|
||
- field-level semantic operations,
|
||
- clarification engine responsibilities,
|
||
- mapping approval endpoints,
|
||
- session lifecycle APIs,
|
||
- exportable outputs,
|
||
- error-path validation scenarios,
|
||
- alignment between UX states and UI contracts.
|
||
|
||
## Delivery Milestones
|
||
|
||
| Milestone | FR Coverage | Scope | User Value |
|
||
|-----------|-------------|-------|------------|
|
||
| M1: Sessioned Auto Review | FR-001 to FR-011, FR-035, FR-037 | Source intake, dataset review session, initial profile, findings, provenance, semantic-source application, export of review outputs | Users get immediate documentation, validation, and trusted-source enrichment without manual reconstruction |
|
||
| M2: Guided Clarification | FR-012 to FR-020, FR-036, FR-038, FR-039, FR-040 | Clarification engine, resumable questions, question templates/eval, field-level semantic overrides, conflict review, progress persistence | Users can resolve ambiguity safely and preserve manual intent |
|
||
| M3: Controlled Execution | FR-021 to FR-034 | Filter extraction, template-variable mapping, warning approvals, compiled preview, SQL Lab launch, manual export path, audited run context | Users can move from recovered context to reproducible execution with clear readiness gates |
|
||
|
||
## RBAC Model
|
||
|
||
| Permission | Description | Target Role(s) |
|
||
|------------|-------------|----------------|
|
||
| `dataset:session:read` | View own review sessions | Analytics Engineer, BI Engineer, Data Steward |
|
||
| `dataset:session:manage` | Edit mappings, answer questions, override semantics | Analytics Engineer, BI Engineer |
|
||
| `dataset:session:approve` | Approve warning-level mappings | Senior Analytics Engineer, Data Steward |
|
||
| `dataset:execution:preview` | Trigger Superset SQL compilation preview | Analytics Engineer, BI Engineer |
|
||
| `dataset:execution:launch` | Create SQL Lab session in target environment | Analytics Engineer, BI Engineer |
|
||
| `dataset:execution:launch_prod` | Launch in Production-staged environment | Senior Analytics Engineer |
|
||
|
||
## Integration Points
|
||
|
||
### Service Reuse (Critical)
|
||
- **Superset Interaction**: Use existing `backend/src/core/superset_client.py` (do not duplicate HTTP clients).
|
||
- **LLM Interaction**: Use existing `backend/src/services/llm_provider.py` via `LLMProviderService`.
|
||
- **Notifications**: Integrate with `NotificationService` for launch outcomes and preview readiness.
|
||
- **i18n**: Use existing `frontend/src/lib/i18n/` for all user-facing strings in the review workspace.
|
||
|
||
## Rollout & Monitoring
|
||
|
||
### Feature Flags
|
||
- `ff_dataset_auto_review`: Enables basic documentation and intake.
|
||
- `ff_dataset_clarification`: Enables guided dialogue mode.
|
||
- `ff_dataset_execution`: Enables preview and launch capabilities.
|
||
|
||
### Metrics & Alerting
|
||
- **Metrics**: Session completion rate, time-to-first-summary, preview failure rate (Superset compilation errors vs connection errors), clarification engagement.
|
||
- **Alerting**: High rate of `503` Superset API failures; persistent LLM provider timeouts (> 30s); unauthorized cross-session access attempts.
|
||
|
||
## Implementation Sequencing
|
||
|
||
### Backend First
|
||
1. Add persistent review-session domain model and schemas.
|
||
2. Add orchestration services and Superset adapters.
|
||
3. Add typed API endpoints and explicit RBAC.
|
||
4. Add task/event integration and audit persistence.
|
||
5. Add backend tests for session lifecycle, preview gating, launch gating, and degradation paths.
|
||
|
||
### Frontend Next
|
||
1. Add dataset review route/workspace shell and session loading.
|
||
2. Add source-intake, summary, findings, and semantic review panels.
|
||
3. Add clarification dialog and mapping approval UI.
|
||
4. Add compiled preview and launch confirmation UI.
|
||
5. Add frontend tests for state transitions, wrappers, and critical UX invariants.
|
||
|
||
### Integration/Hardening
|
||
1. Validate Superset version compatibility against real/staged environment.
|
||
2. Verify progressive session recovery and resume flows.
|
||
3. Verify audit replay/run-context capture.
|
||
4. Measure success-criteria instrumentation feasibility.
|
||
|
||
## Testing Strategy
|
||
|
||
### Backend
|
||
- **Unit tests** for semantic ranking, provenance/conflict rules, clarification prioritization, preview gating, and launch guards.
|
||
- **Integration tests** for session persistence, Superset adapter behavior, SQL preview orchestration, and SQL Lab launch orchestration with mocked upstream responses.
|
||
- **API contract tests** for typed response schemas, RBAC enforcement, mapping approval operations, field-level semantic edits, export operations, and session lifecycle.
|
||
|
||
### Frontend
|
||
- **Unit/component tests** for state-driven UI contracts, provenance rendering, one-question clarification, mapping approval flow, stale preview handling, and launch gating visuals.
|
||
- **Integration-style route tests** for resume flows, progressive loading, and error recovery states.
|
||
|
||
### External Dependency Strategy
|
||
- Mock Superset APIs for CI determinism.
|
||
- Use stable fixtures/snapshots for LLM-produced structured outputs.
|
||
- Treat provider/transport failure as explicit degraded states rather than semantic failure.
|
||
- Include replayable fixtures for imported filters, template variables, conflict cases, and compilation errors.
|
||
|
||
## Risks & Mitigations
|
||
|
||
| Risk | Why It Matters | Mitigation |
|
||
|------|----------------|------------|
|
||
| Superset version lacks a stable compiled-preview endpoint | FR-029 and WYSIWWR depend on native Superset-side compilation | Resolve in Phase 0; if unsupported, stop and renegotiate UX/feature scope before implementation |
|
||
| Superset link/native filter formats differ across installations | Could make import brittle or partial | Design recovery as best-effort with explicit provenance and recovery-required state |
|
||
| SQL Lab launch handoff is inconsistent across environments | FR-032 requires canonical audited launch target | Research version-compatible creation strategy and define fallback as blocked, not silent substitution |
|
||
| Semantic resolution logic becomes an orchestration god-object | Hurts maintainability and contract traceability | Separate `SemanticSourceResolver`, `ClarificationEngine`, and Superset extraction responsibilities |
|
||
| Fuzzy matching creates too many false positives | Undermines trust and increases approval burden | Keep explicit confidence hierarchy, review-required fuzzy matches, and field-level selective application |
|
||
| LLM/provider outages interrupt review quality | Could block non-critical enrichment | Degrade to partial review state with preserved trusted-source results and explicit next action |
|
||
| Session lifecycle becomes hard to resume safely | FR-019 and FR-036 require resumability | Persist answers, approvals, and current recommended action as first-class session state |
|
||
|
||
## Post-Design Re-Check Criteria
|
||
|
||
After Phase 1 artifacts are produced, re-check:
|
||
- semantic protocol coverage against all planned modules/components,
|
||
- UX-state coverage against `/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/ux_reference.md`,
|
||
- explicit API support for field-level semantic actions, mapping approval, exports, and session lifecycle,
|
||
- belief-state/logging expectations for Complexity 4/5 Python modules,
|
||
- typed schemas sufficient for backend/frontend parallel implementation,
|
||
- quickstart coverage of happy path plus critical negative/recovery paths.
|
||
|
||
## Complexity Tracking
|
||
|
||
> **Fill ONLY if Constitution Check has violations that must be justified**
|
||
|
||
No justified constitution violations at planning time.
|