Files
ss-tools/specs/027-dataset-llm-orchestration/plan.md

21 KiB
Raw Blame History

Implementation Plan: LLM Dataset Orchestration

Branch: 027-dataset-llm-orchestration | Date: 2026-03-16 | Spec: /home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/spec.md
Input: Feature specification from /home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/spec.md

Note: This template is filled in by the /speckit.plan command. See /home/busya/dev/ss-tools/.specify/templates/plan-template.md for the execution workflow.

Summary

Deliver a dataset-centered orchestration flow that lets users start from a Superset link or dataset selection, recover analytical context, enrich semantics from trusted sources before AI generation, resolve ambiguity through guided clarification, generate a Superset-side compiled SQL preview, and launch an audited SQL Lab execution only when readiness gates pass.

The implementation will extend the existing FastAPI + SvelteKit architecture rather than creating a parallel subsystem. Backend work will add a persisted review-session domain, orchestration services for semantic recovery and clarification, Superset adapters for context extraction and SQL Lab execution, and explicit APIs for mapping approvals and field-level semantic overrides. Frontend work will add a dedicated dataset review workspace with progressive recovery, semantic-source review, one-question-at-a-time clarification, mapping approval controls, compiled SQL preview, and resumable session state.

Implementation Status

Accepted delivery to date covers the US1 automatic review slice introduced in commit feat(us1): add dataset review orchestration automatic review slice. The implemented scope includes the review-session startup flow, Superset link/context intake, trusted-source semantic enrichment, export endpoints, and the initial dataset review workspace/panels needed to render findings and readable review output.

Feature delivery also required repository-wide stabilization and compatibility collateral outside the dedicated dataset-review modules. Those follow-up fixes keep the accepted US1 slice working against the current repository baseline, including task/log API compatibility, dashboard/profile filtering behavior, Git route/repository-path hardening, report-list event handling, LLM provider encryption-key validation, and clean-release compatibility repairs exercised by shared acceptance gates. US2 guided clarification and US3 controlled execution remain planned work and are not accepted by this status note.

Technical Context

Language/Version: Python 3.9+ backend, Node.js 18+ frontend, Svelte 5 / SvelteKit frontend runtime
Primary Dependencies: FastAPI, SQLAlchemy, Pydantic, existing TaskManager, existing SupersetClient, existing LLM provider stack, SvelteKit, Tailwind CSS, frontend requestApi/fetchApi wrappers
Storage: Existing application databases for persistent session/domain entities; existing tasks database for async execution metadata; filesystem for optional uploaded semantic sources/artifacts
Testing: pytest for backend unit/integration/API tests; Vitest for frontend component/store/API-wrapper tests
Target Platform: Linux-hosted FastAPI + Svelte web application integrated with Superset
Project Type: Web application with backend API and frontend SPA
Performance Goals:

  • Initial summary generation: < 30s (Progressive recovery visible within < 5s)
  • Preview compilation: < 10s
  • Session load / Resume: < 2s
  • SC-002 target: first readable summary under 5 minutes for complex datasets. Constraints: Launch must remain blocked without successful Superset-side compiled preview; long-running recovery/enrichment/preview work must be asynchronous and observable; frontend must use existing API wrappers instead of native fetch; manual semantic overrides must never be silently overwritten; auditability and provenance are prioritized over raw throughput Scale/Scope: One end-to-end feature spanning dataset intake, session persistence, semantic enrichment, clarification, mapping approval, preview, and launch; multiple new backend services/APIs plus a new multi-state frontend workspace

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Pre-Research Gate Assessment

  1. Semantic protocol compliance — PASS WITH REQUIRED PHASE 1 EXPANSION

    • New backend orchestration and persistence modules must follow /home/busya/dev/ss-tools/.ai/standards/semantics.md.
    • Existing draft contracts are incomplete for the feature scope; Phase 1 must add explicit contracts for semantic-source resolution, clarification lifecycle, Superset context extraction, session persistence, and missing UI states.
    • Complexity 4/5 Python modules must explicitly define logger.reason() / logger.reflect() paths; Complexity 5 boundaries must use belief_scope.
  2. Complexity-driven contract coverage — PASS WITH GAPS TO CLOSE

    • The core orchestration boundary is Complexity 5 because it gates launch, audit, state transitions, and cross-service consistency.
    • Semantic source resolution, clarification workflow, mapping approval state, and session persistence each require explicit contracts instead of being hidden inside one orchestrator.
    • UI contracts must map to the UX state machine, especially Empty, Importing, Review Ready, Semantic Source Review Needed, Clarification Active, Mapping Review Needed, Compiled Preview Ready, Run Ready, Run In Progress, Completed, and Recovery Required.
  3. UX-state compatibility — PASS

    • The architecture can support the UX reference because:
      • recovery can be progressive and asynchronous,
      • clarification can be session-backed and resumed,
      • preview generation can be represented as a stateful asynchronous action,
      • launch remains a gated terminal action.
    • If Phase 0 research later shows Superset cannot provide reliable compilation preview or SQL Lab execution hooks compatible with the required interaction model, planning must stop and the UX contract must be renegotiated.
  4. Async boundaries — PASS

    • Long-running work already fits the repository constitution through TaskManager.
    • Session start, deep context recovery, semantic enrichment from external sources, preview generation, and launch-hand-off side effects should be dispatched as tasks or internally asynchronous service steps with observable state changes.
  5. Frontend API-wrapper rules — PASS

    • Existing frontend uses /home/busya/dev/ss-tools/frontend/src/lib/api.js wrappers.
    • New frontend work must use requestApi, fetchApi, postApi, or wrapper modules only; native fetch remains forbidden.
  6. RBAC/security constraints — PASS WITH DESIGN REQUIREMENT

    • New endpoints must use existing auth and permission dependencies.
    • New orchestration actions need explicit permission modeling for reading sessions, editing semantic mappings, answering clarification prompts, generating previews, and launching runs.
    • Session data must remain self-scoped/auditable and must not permit cross-user mutation without explicit policy.
    • Action: Add DATASET_REVIEW_* permissions to backend/src/scripts/seed_permissions.py.
  7. Security & Threat Model — PASS

    • Session isolation: Every session record is strictly bound to user_id. Query filters must include owner check.
    • Audit trail: DatasetRunContext is immutable after launch.
    • Credential handling: Reuse existing SupersetClient encrypted configuration.
    • Action: API endpoints must use Depends(get_current_user) and explicit permission checks.
  8. Belief-state/logging constraints — PASS WITH REQUIRED APPLICATION

    • Complexity 4/5 Python orchestration modules will require belief_scope plus meaningful logger.reason() and logger.reflect() traces around state transitions, preview validation, warning approvals, and launch gating.

Post-Design Gate Assessment

  1. Semantic protocol compliance — PASS

    • All modules in contracts/modules.md follow the complexity-driven metadata requirements.
    • Relation syntax matches the canonical @RELATION: [PREDICATE] ->[TARGET_ID] format.
    • Python modules (Complexity 4/5) explicitly specify logger.reason() and belief_scope requirements in their contracts.
  2. API Schema Completeness — PASS

    • contracts/api.yaml provides a fully typed OpenAPI 3.0.3 specification.
    • Every session lifecycle, semantic review, and execution gate is covered by a typed endpoint.
  3. UX-Technical Alignment — PASS

    • Design supports the WYSIWWR principle via SupersetCompilationAdapter.
    • Fallback strategies for missing preview or SQL Lab hooks are defined in research.md.

Final Gate Result

PASS - The implementation plan and design artifacts are constitution-compliant and ready for task breakdown.

Project Structure

Documentation (this feature)

/home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   ├── api.yaml
│   └── modules.md
└── tasks.md

Source Code (repository root)

/home/busya/dev/ss-tools/backend/
├── src/
│   ├── api/
│   │   └── routes/
│   ├── core/
│   ├── models/
│   ├── schemas/
│   └── services/

 /home/busya/dev/ss-tools/frontend/
├── src/
│   ├── lib/
│   │   ├── api/
│   │   ├── components/
│   │   ├── i18n/
│   │   └── stores/
│   └── routes/

 /home/busya/dev/ss-tools/backend/src/api/routes/__tests__/
 /home/busya/dev/ss-tools/backend/src/services/__tests__/
 /home/busya/dev/ss-tools/frontend/src/lib/**/__tests__/
 /home/busya/dev/ss-tools/frontend/src/routes/**/__tests__/

Structure Decision: Use the repositorys existing web-application split. Backend implementation belongs under /home/busya/dev/ss-tools/backend/src/{models,schemas,services,api/routes}. Frontend implementation belongs under /home/busya/dev/ss-tools/frontend/src/{routes,lib/components,lib/api,lib/stores}. Tests will stay adjacent to their current backend/frontend conventions.

Semantic Contract Guidance

Use this section to drive Phase 1 artifacts, especially contracts/modules.md.

Planned Critical/High-Value Modules

  • DatasetReviewOrchestrator@COMPLEXITY: 5
  • SemanticSourceResolver@COMPLEXITY: 4
  • ClarificationEngine@COMPLEXITY: 4
  • SupersetContextExtractor@COMPLEXITY: 4
  • SupersetCompilationAdapter@COMPLEXITY: 4
  • DatasetReviewSessionRepository or equivalent persistence boundary — @COMPLEXITY: 5
  • DatasetReviewWorkspace@COMPLEXITY: 5
  • SourceIntakePanel@COMPLEXITY: 3
  • ValidationFindingsPanel@COMPLEXITY: 3
  • SemanticLayerReview@COMPLEXITY: 3
  • ClarificationDialog@COMPLEXITY: 3
  • ExecutionMappingReview@COMPLEXITY: 3
  • CompiledSQLPreview@COMPLEXITY: 3
  • LaunchConfirmationPanel@COMPLEXITY: 3

Required Semantic Rules

  • Use @COMPLEXITY or @C: as the primary rule source.
  • Match contract density to complexity:
    • Complexity 1: anchors only, @PURPOSE optional
    • Complexity 2: @PURPOSE
    • Complexity 3: @PURPOSE, @RELATION; UI also @UX_STATE
    • Complexity 4: @PURPOSE, @RELATION, @PRE, @POST, @SIDE_EFFECT; Python also meaningful logger.reason() / logger.reflect() path
    • Complexity 5: level 4 + @DATA_CONTRACT, @INVARIANT; Python also belief_scope; UI also @UX_FEEDBACK, @UX_RECOVERY, @UX_REACTIVITY
  • Write relations only in canonical form: @RELATION: [PREDICATE] ->[TARGET_ID]
  • If any relation target, DTO, or contract dependency is unknown, emit [NEED_CONTEXT: target] instead of inventing placeholders.
  • Preserve medium-appropriate anchor/comment syntax for Python, Svelte markup, and Svelte script contexts.

Phase 0: Research Agenda

Open Questions Requiring Resolution

  1. How to reliably extract saved native filters from supported Superset links and versions.
  2. How to discover dataset runtime template variables and Jinja placeholders using available Superset APIs and dataset payloads.
  3. How to perform a safe Superset-side compiled SQL preview compatible with the current deployment/version.
  4. How to create or bind a SQL Lab execution session as the canonical audited launch target.
  5. How to model semantic source ranking, fuzzy match review, conflict detection, and provenance without collapsing into an orchestration god-object.
  6. How to persist resumable clarification and review sessions using the current database stack.
  7. How to design typed API contracts that support field-level semantic operations, mapping approval flow, and session lifecycle operations.
  8. How to degrade gracefully when Superset import/preview or LLM enrichment only partially succeeds.

Required Research Outputs

Research must produce explicit decisions for:

  • Superset link parsing and recovery strategy
  • Superset compilation/SQL Lab integration approach
  • Semantic source resolution architecture
  • Clarification session persistence model
  • Session persistence/audit model
  • API schema granularity and endpoint set
  • Test strategy for Superset-dependent and LLM-dependent flows
  • Delivery milestones for incremental rollout

Phase 1: Design Focus

Phase 1 must generate:

  • typed domain entities and DTOs in /home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/data-model.md
  • expanded semantic contracts in /home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/contracts/modules.md
  • typed OpenAPI schemas and missing endpoints in /home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/contracts/api.yaml
  • execution and validation guide in /home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/quickstart.md

Phase 1 must specifically close the current gaps around:

  • field-level semantic operations,
  • clarification engine responsibilities,
  • mapping approval endpoints,
  • session lifecycle APIs,
  • exportable outputs,
  • error-path validation scenarios,
  • alignment between UX states and UI contracts.

Delivery Milestones

Milestone FR Coverage Scope User Value
M1: Sessioned Auto Review FR-001 to FR-011, FR-035, FR-037 Source intake, dataset review session, initial profile, findings, provenance, semantic-source application, export of review outputs Users get immediate documentation, validation, and trusted-source enrichment without manual reconstruction
M2: Guided Clarification FR-012 to FR-020, FR-036, FR-038, FR-039, FR-040 Clarification engine, resumable questions, question templates/eval, field-level semantic overrides, conflict review, progress persistence Users can resolve ambiguity safely and preserve manual intent
M3: Controlled Execution FR-021 to FR-034 Filter extraction, template-variable mapping, warning approvals, compiled preview, SQL Lab launch, manual export path, audited run context Users can move from recovered context to reproducible execution with clear readiness gates

RBAC Model

Permission Description Target Role(s)
dataset:session:read View own review sessions Analytics Engineer, BI Engineer, Data Steward
dataset:session:manage Edit mappings, answer questions, override semantics Analytics Engineer, BI Engineer
dataset:session:approve Approve warning-level mappings Senior Analytics Engineer, Data Steward
dataset:execution:preview Trigger Superset SQL compilation preview Analytics Engineer, BI Engineer
dataset:execution:launch Create SQL Lab session in target environment Analytics Engineer, BI Engineer
dataset:execution:launch_prod Launch in Production-staged environment Senior Analytics Engineer

Integration Points

Service Reuse (Critical)

  • Superset Interaction: Use existing backend/src/core/superset_client.py (do not duplicate HTTP clients).
  • LLM Interaction: Use existing backend/src/services/llm_provider.py via LLMProviderService.
  • Notifications: Integrate with NotificationService for launch outcomes and preview readiness.
  • i18n: Use existing frontend/src/lib/i18n/ for all user-facing strings in the review workspace.

Rollout & Monitoring

Feature Flags

  • ff_dataset_auto_review: Enables basic documentation and intake.
  • ff_dataset_clarification: Enables guided dialogue mode.
  • ff_dataset_execution: Enables preview and launch capabilities.

Metrics & Alerting

  • Metrics: Session completion rate, time-to-first-summary, preview failure rate (Superset compilation errors vs connection errors), clarification engagement.
  • Alerting: High rate of 503 Superset API failures; persistent LLM provider timeouts (> 30s); unauthorized cross-session access attempts.

Implementation Sequencing

Backend First

  1. Add persistent review-session domain model and schemas.
  2. Add orchestration services and Superset adapters.
  3. Add typed API endpoints and explicit RBAC.
  4. Add task/event integration and audit persistence.
  5. Add backend tests for session lifecycle, preview gating, launch gating, and degradation paths.

Frontend Next

  1. Add dataset review route/workspace shell and session loading.
  2. Add source-intake, summary, findings, and semantic review panels.
  3. Add clarification dialog and mapping approval UI.
  4. Add compiled preview and launch confirmation UI.
  5. Add frontend tests for state transitions, wrappers, and critical UX invariants.

Integration/Hardening

  1. Validate Superset version compatibility against real/staged environment.
  2. Verify progressive session recovery and resume flows.
  3. Verify audit replay/run-context capture.
  4. Measure success-criteria instrumentation feasibility.

Testing Strategy

Backend

  • Unit tests for semantic ranking, provenance/conflict rules, clarification prioritization, preview gating, and launch guards.
  • Integration tests for session persistence, Superset adapter behavior, SQL preview orchestration, and SQL Lab launch orchestration with mocked upstream responses.
  • API contract tests for typed response schemas, RBAC enforcement, mapping approval operations, field-level semantic edits, export operations, and session lifecycle.

Frontend

  • Unit/component tests for state-driven UI contracts, provenance rendering, one-question clarification, mapping approval flow, stale preview handling, and launch gating visuals.
  • Integration-style route tests for resume flows, progressive loading, and error recovery states.

External Dependency Strategy

  • Mock Superset APIs for CI determinism.
  • Use stable fixtures/snapshots for LLM-produced structured outputs.
  • Treat provider/transport failure as explicit degraded states rather than semantic failure.
  • Include replayable fixtures for imported filters, template variables, conflict cases, and compilation errors.

Risks & Mitigations

Risk Why It Matters Mitigation
Superset version lacks a stable compiled-preview endpoint FR-029 and WYSIWWR depend on native Superset-side compilation Resolve in Phase 0; if unsupported, stop and renegotiate UX/feature scope before implementation
Superset link/native filter formats differ across installations Could make import brittle or partial Design recovery as best-effort with explicit provenance and recovery-required state
SQL Lab launch handoff is inconsistent across environments FR-032 requires canonical audited launch target Research version-compatible creation strategy and define fallback as blocked, not silent substitution
Semantic resolution logic becomes an orchestration god-object Hurts maintainability and contract traceability Separate SemanticSourceResolver, ClarificationEngine, and Superset extraction responsibilities
Fuzzy matching creates too many false positives Undermines trust and increases approval burden Keep explicit confidence hierarchy, review-required fuzzy matches, and field-level selective application
LLM/provider outages interrupt review quality Could block non-critical enrichment Degrade to partial review state with preserved trusted-source results and explicit next action
Session lifecycle becomes hard to resume safely FR-019 and FR-036 require resumability Persist answers, approvals, and current recommended action as first-class session state

Post-Design Re-Check Criteria

After Phase 1 artifacts are produced, re-check:

  • semantic protocol coverage against all planned modules/components,
  • UX-state coverage against /home/busya/dev/ss-tools/specs/027-dataset-llm-orchestration/ux_reference.md,
  • explicit API support for field-level semantic actions, mapping approval, exports, and session lifecycle,
  • belief-state/logging expectations for Complexity 4/5 Python modules,
  • typed schemas sufficient for backend/frontend parallel implementation,
  • quickstart coverage of happy path plus critical negative/recovery paths.

Complexity Tracking

Fill ONLY if Constitution Check has violations that must be justified

No justified constitution violations at planning time.