Move dataset review clarification into the assistant workspace and rework the review page into a chat-centric layout with execution rails. Add session-scoped assistant actions for mappings, semantic fields, and SQL preview generation. Introduce optimistic locking for dataset review mutations, propagate session versions through API responses, and mask imported filter values before assistant exposure. Refresh tests, i18n, and spec artifacts to match the new workflow. BREAKING CHANGE: dataset review mutation endpoints now require the X-Session-Version header, and clarification is no longer handled through ClarificationDialog-based flows
27 KiB
27 KiB
Semantic Module Contracts: LLM Dataset Orchestration
Feature: LLM Dataset Orchestration
Branch: 027-dataset-llm-orchestration
This document defines the semantic contracts for the core components of the Dataset LLM Orchestration feature, following the GRACE-Poly Standard.
1. Backend Modules
[DEF:DatasetReviewOrchestrator:Module]
@COMPLEXITY: 5
@PURPOSE: Coordinate the full dataset review session lifecycle across intake, recovery, semantic review, clarification, mapping review, preview generation, and launch.
@LAYER: Domain
@RELATION: [DEPENDS_ON] ->[DatasetReviewSessionRepository]
@RELATION: [DEPENDS_ON] ->[SemanticSourceResolver]
@RELATION: [DEPENDS_ON] ->[ClarificationEngine]
@RELATION: [DEPENDS_ON] ->[SupersetContextExtractor]
@RELATION: [DEPENDS_ON] ->[SupersetCompilationAdapter]
@RELATION: [DEPENDS_ON] ->[TaskManager]
@RELATION: [EXPOSES_STATE_TO] ->[AssistantApi]
@PRE: session mutations must execute inside a persisted session boundary scoped to one authenticated user.
@POST: state transitions are persisted atomically and emit observable progress for long-running steps.
@SIDE_EFFECT: creates task records, updates session aggregates, triggers upstream Superset calls, persists audit artifacts.
@DATA_CONTRACT: Input[SessionCommand] -> Output[DatasetReviewSession | CompiledPreview | DatasetRunContext]
@INVARIANT: Launch is blocked unless a current session has no open blocking findings, all launch-sensitive mappings are approved, and a non-stale Superset-generated compiled preview matches the current input fingerprint.
@TEST_CONTRACT: start_or_resume_session -> returns persisted session shell with recommended next action
@TEST_SCENARIO: launch_gate_blocks_stale_preview -> launch rejected when preview fingerprint no longer matches current mapping inputs
@TEST_EDGE: missing_dataset_ref -> blocking failure
@TEST_EDGE: stale_preview -> blocking failure
@TEST_EDGE: sql_lab_launch_failure -> terminal failed launch state with audit record
@TEST_INVARIANT: launch_gate -> VERIFIED_BY: [launch_gate_blocks_stale_preview]
ƒ start_session
@PURPOSE: Initialize a new session from a Superset link or dataset selection and trigger context recovery.
@PRE: source input is non-empty and environment is accessible.
@POST: session exists in persisted storage with intake/recovery state and task linkage when async work is required.
@SIDE_EFFECT: persists session and may enqueue recovery task.
ƒ apply_semantic_source
@PURPOSE: Apply a selected semantic source and update field-level candidate/decision state.
@PRE: source exists and session is not terminal.
@POST: semantic field entries and findings reflect selected-source outcomes without overwriting locked manual values.
@SIDE_EFFECT: updates semantic decisions and conflict findings.
ƒ record_clarification_answer
@PURPOSE: Persist one clarification answer and re-evaluate profile, findings, and readiness.
@PRE: target question belongs to the session’s active clarification session.
@POST: answer is saved before current-question pointer advances.
@SIDE_EFFECT: updates clarification and finding state.
ƒ prepare_launch_preview
@PURPOSE: Assemble effective execution inputs and trigger Superset-side preview compilation.
@PRE: all required variables have candidate values or explicitly accepted defaults.
@POST: returns preview artifact in pending, ready, failed, or stale state.
@SIDE_EFFECT: persists preview attempt and upstream compilation diagnostics.
ƒ launch_dataset
@PURPOSE: Start the approved dataset execution through SQL Lab and persist run context for audit/replay.
@PRE: session is run-ready and compiled preview is current.
@POST: returns persisted run context with SQL Lab session reference and launch outcome.
@SIDE_EFFECT: creates SQL Lab execution session and audit snapshot.
[/DEF:DatasetReviewOrchestrator:Module]
[DEF:DatasetReviewSessionRepository:Module]
@COMPLEXITY: 5
@PURPOSE: Persist and retrieve dataset review session aggregates, including readiness, findings, semantic decisions, clarification state, previews, and run contexts.
@LAYER: Domain
@RELATION: [DEPENDS_ON] ->[DatasetReviewSession]
@RELATION: [DEPENDS_ON] ->[DatasetProfile]
@RELATION: [DEPENDS_ON] ->[ValidationFinding]
@RELATION: [DEPENDS_ON] ->[CompiledPreview]
@PRE: repository operations execute within authenticated request or task scope.
@POST: session aggregate reads are structurally consistent and writes preserve ownership and version semantics.
@SIDE_EFFECT: reads/writes application persistence layer.
@DATA_CONTRACT: Input[SessionMutation] -> Output[PersistedSessionAggregate]
@INVARIANT: answers, mapping approvals, preview artifacts, and launch snapshots are never attributed to the wrong user or session.
@TEST_CONTRACT: save_then_resume -> persisted session can be reopened without losing semantic/manual/clarification state
@TEST_SCENARIO: resume_session_preserves_manual_overrides -> locked semantic fields remain active after reload
@TEST_EDGE: foreign_user_access -> rejected
@TEST_EDGE: missing_session -> not found
@TEST_EDGE: partial_preview_snapshot -> preserved but not marked launchable
@TEST_INVARIANT: ownership_scope -> VERIFIED_BY: [foreign_user_access]
ƒ create_session
@PURPOSE: Persist initial session shell.
ƒ load_session_detail
@PURPOSE: Return the full session aggregate for API/frontend use.
ƒ save_profile_and_findings
@PURPOSE: Persist profile and validation state together.
ƒ save_preview
@PURPOSE: Persist compiled preview attempt and mark older fingerprints stale.
ƒ save_run_context
@PURPOSE: Persist immutable launch audit snapshot.
[/DEF:DatasetReviewSessionRepository:Module]
[DEF:SemanticSourceResolver:Module]
@COMPLEXITY: 4
@PURPOSE: Resolve, rank, and apply semantic metadata candidates from files, connected dictionaries, reference datasets, and AI generation fallback.
@LAYER: Domain
@RELATION: [DEPENDS_ON] ->[LLMProviderService]
@RELATION: [DEPENDS_ON] ->[SemanticSource]
@RELATION: [DEPENDS_ON] ->[SemanticFieldEntry]
@RELATION: [DEPENDS_ON] ->[SemanticCandidate]
@PRE: selected source and target field set must be known.
@POST: candidate ranking follows the configured confidence hierarchy and unresolved fuzzy matches remain reviewable.
@SIDE_EFFECT: may create conflict findings and semantic candidate records.
@DATA_CONTRACT: Input[SemanticSourceSelection | SemanticFieldSet | ManualFieldDecision] -> Output[SemanticCandidateSet | RankedSemanticResolution | ValidationFindingSet]
@INVARIANT: Manual overrides are never silently replaced by imported, inferred, or AI-generated values.
@TEST_CONTRACT: rank_candidates -> exact dictionary beats reference import beats fuzzy beats AI draft
@TEST_SCENARIO: manual_lock_survives_reimport -> locked field remains active after another source is applied
@TEST_EDGE: malformed_source_payload -> failed source application with explanatory finding
@TEST_EDGE: conflicting_sources -> conflict state preserved for review
@TEST_EDGE: no_trusted_matches -> AI draft fallback only
@TEST_INVARIANT: confidence_hierarchy -> VERIFIED_BY: [rank_candidates]
ƒ resolve_from_file
@PURPOSE: Normalize uploaded semantic file records into field-level candidates.
ƒ resolve_from_dictionary
@PURPOSE: Resolve candidates from connected tabular dictionary sources.
ƒ resolve_from_reference_dataset
@PURPOSE: Reuse semantic metadata from trusted Superset datasets.
ƒ rank_candidates
@PURPOSE: Apply confidence ordering and determine best candidate per field.
ƒ detect_conflicts
@PURPOSE: Mark competing candidate sets that require explicit user review.
ƒ apply_field_decision
@PURPOSE: Accept, reject, or manually override a field-level semantic value.
[/DEF:SemanticSourceResolver:Module]
[DEF:ClarificationEngine:Module]
@COMPLEXITY: 4
@PURPOSE: Manage mixed-initiative clarification sessions, including prioritized agent prompts, answer persistence, assistant routing, and readiness impact updates.
@LAYER: Domain
@RELATION: [DEPENDS_ON] ->[ClarificationSession]
@RELATION: [DEPENDS_ON] ->[ClarificationQuestion]
@RELATION: [DEPENDS_ON] ->[ClarificationAnswer]
@RELATION: [DEPENDS_ON] ->[ValidationFinding]
@RELATION: [DISPATCHES] ->[AssistantChatPanel]
@PRE: target session contains unresolved or contradictory review state.
@POST: every recorded answer updates the clarification session and associated session state deterministically, and the next agent prompt is routable through assistant chat.
@SIDE_EFFECT: creates clarification questions, persists answers, updates findings/profile state, emits assistant-routable clarification prompts.
@DATA_CONTRACT: Input[ClarificationSessionState | ClarificationAnswerCommand] -> Output[ClarificationQuestionPayload | ClarificationProgressSnapshot | SessionReadinessDelta]
@INVARIANT: Clarification answers are persisted before the current question pointer or readiness state is advanced.
@TEST_CONTRACT: next_question_selection -> returns only one highest-priority unresolved question at a time
@TEST_SCENARIO: save_and_resume_clarification -> reopening session restores current question and prior answers
@TEST_EDGE: skipped_question -> unresolved topic remains visible
@TEST_EDGE: expert_review_marked -> topic deferred without false resolution
@TEST_EDGE: duplicate_answer_submission -> idempotent or rejected deterministically
@TEST_INVARIANT: single_active_question -> VERIFIED_BY: [next_question_selection]
ƒ start_or_resume
@PURPOSE: Open clarification mode on the highest-priority unresolved question.
ƒ build_question_payload
@PURPOSE: Return question, why-it-matters text, current guess, and suggested options for assistant-chat delivery.
ƒ record_answer
@PURPOSE: Persist one answer and compute state impact.
ƒ summarize_progress
@PURPOSE: Produce the clarification change summary shown on exit or pause.
[/DEF:ClarificationEngine:Module]
[DEF:SupersetContextExtractor:Module]
@COMPLEXITY: 4
@PURPOSE: Recover dataset, dashboard, filter, and runtime-template context from Superset links and related API payloads.
@LAYER: Infra
@RELATION: [DEPENDS_ON] ->[ImportedFilter]
@RELATION: [DEPENDS_ON] ->[TemplateVariable]
@RELATION: [DEPENDS_ON] ->[SupersetClient]
@DATA_CONTRACT: Input[SupersetLink | DatasetReference | EnvironmentContext] -> Output[RecoveredSupersetContext | ImportedFilterSet | TemplateVariableSet | RecoverySummary]
@PRE: Superset link or dataset reference must be parseable enough to resolve an environment-scoped target resource.
@POST: returns the best available recovered context with explicit provenance and partial-recovery markers when necessary.
@SIDE_EFFECT: performs upstream Superset API reads.
@INVARIANT: Partial recovery is surfaced explicitly and never misrepresented as fully confirmed context.
@TEST_CONTRACT: recover_context_from_link -> output distinguishes URL-derived, native-filter-derived, and unresolved context
@TEST_SCENARIO: partial_filter_recovery_marks_recovery_required -> session remains usable but not falsely complete
@TEST_EDGE: unsupported_link_shape -> intake failure with actionable finding
@TEST_EDGE: dataset_without_filters -> successful dataset recovery with empty imported filter set
@TEST_EDGE: missing_dashboard_binding -> partial recovery only
@TEST_INVARIANT: provenance_visibility -> VERIFIED_BY: [recover_context_from_link]
ƒ parse_superset_link
@PURPOSE: Extract candidate identifiers and query state from supported Superset URLs.
ƒ recover_imported_filters
@PURPOSE: Build imported filter entries from URL state and Superset-side saved context.
ƒ discover_template_variables
@PURPOSE: Detect runtime variables and Jinja references from dataset query-bearing fields.
ƒ build_recovery_summary
@PURPOSE: Summarize recovered, partial, and unresolved context for session state and UX.
[/DEF:SupersetContextExtractor:Module]
[DEF:SupersetCompilationAdapter:Module]
@COMPLEXITY: 4
@PURPOSE: Interact with Superset preview compilation and SQL Lab execution endpoints using the current approved execution context.
@LAYER: Infra
@RELATION: [DEPENDS_ON] ->[CompiledPreview]
@RELATION: [DEPENDS_ON] ->[DatasetRunContext]
@RELATION: [DEPENDS_ON] ->[SupersetClient]
@DATA_CONTRACT: Input[ApprovedExecutionContext | PreviewFingerprint | LaunchRequest] -> Output[CompiledPreview | PreviewFailureArtifact | DatasetRunContext | LaunchFailureAudit]
@PRE: effective template params and dataset execution reference are available.
@POST: preview and launch calls return Superset-originated artifacts or explicit errors.
@SIDE_EFFECT: performs upstream Superset preview and SQL Lab calls.
@INVARIANT: The adapter never fabricates compiled SQL locally; preview truth is delegated to Superset only.
@TEST_CONTRACT: compile_then_launch -> launch uses the same effective input fingerprint verified in preview
@TEST_SCENARIO: preview_failure_blocks_launch -> no SQL Lab session is created after failed preview
@TEST_EDGE: compilation_endpoint_error -> failed preview artifact with readable diagnostics
@TEST_EDGE: sql_lab_creation_error -> failed launch audit state
@TEST_EDGE: fingerprint_mismatch -> launch rejected
@TEST_INVARIANT: superset_truth_source -> VERIFIED_BY: [preview_failure_blocks_launch]
ƒ compile_preview
@PURPOSE: Request Superset-side compiled SQL preview for the current effective inputs.
ƒ mark_preview_stale
@PURPOSE: Invalidate previous preview after mapping or value changes.
ƒ create_sql_lab_session
@PURPOSE: Create the canonical audited execution session after all launch gates pass.
[/DEF:SupersetCompilationAdapter:Module]
2. Frontend Components
ƒ handleSourceSubmit
ƒ handleResumeSession
ƒ handleLaunch
ƒ submitSessionScopedMessage
ƒ renderConfirmationCard
ƒ highlightWorkspaceTarget
ƒ handleSessionScopedMessage
ƒ dispatchDatasetReviewIntent
ƒ submitSupersetLink
ƒ submitDatasetSelection
ƒ groupFindingsBySeverity
ƒ jumpToFindingTarget
ƒ applyManualOverride
ƒ applyCandidateSelection
Retired Contract: ClarificationDialog
ClarificationDialog is retired for feature 027 rebaseline. Its responsibilities move into AssistantChatPanel, which now owns clarification presentation, free-form dataset questions, and confirmation-card interactions inside the global assistant drawer.
ƒ approveMapping
ƒ overrideMappingValue
ƒ requestPreview
ƒ showPreviewErrorTarget
ƒ buildLaunchSummary
ƒ confirmLaunch
3. Contract Coverage Notes
The feature requires:
- dedicated semantic resolution contracts instead of hiding source-ranking logic inside orchestration,
- a first-class clarification engine because guided ambiguity resolution is a persisted workflow, not a simple endpoint,
- a Superset extraction boundary distinct from preview/launch behavior,
- UI contracts that cover the UX state machine rather than only the happy path.
These contracts are intended to align directly with: