Move dataset review clarification into the assistant workspace and rework the review page into a chat-centric layout with execution rails. Add session-scoped assistant actions for mappings, semantic fields, and SQL preview generation. Introduce optimistic locking for dataset review mutations, propagate session versions through API responses, and mask imported filter values before assistant exposure. Refresh tests, i18n, and spec artifacts to match the new workflow. BREAKING CHANGE: dataset review mutation endpoints now require the X-Session-Version header, and clarification is no longer handled through ClarificationDialog-based flows
26 KiB
Feature Specification: LLM Dataset Orchestration
Feature Branch: 027-dataset-llm-orchestration
Created: 2026-03-16
Status: Draft
Input: User description: "Я хочу проработать механизм llm документирования и проверки датасетов. И в автоматическом режиме, и в режиме диалога с агентом с уточнением атрибутов и прочего неявного. Так же, нам нужен механизм запуска датасетов на стороне superset, с поддержкой jinja шаблонов. В идеале, пользователь должен скормить ссылку из суперсета с сохраненными native filters, ss-tools должен вытащить все фильтры и собрать их для датасета"
Clarifications
Session 2026-03-16
- Q: Which execution target should be canonical for approved dataset launch? → A: Superset SQL Lab session is the canonical audited launch target.
- Q: What user action should be required to clear mapping warnings before launch? → A: Any mapping warning requires explicit user approval, but manual edit is optional.
- Q: What should happen if Superset-side SQL compilation is unavailable before launch? → A: Launch stays blocked until Superset-side compiled preview succeeds.
User Scenarios & Testing (mandatory)
User Story 1 - Recover, enrich, and explain dataset context automatically (Priority: P1)
A data engineer or analytics engineer submits a dataset or a Superset link and immediately receives a readable explanation of what the dataset is, which filters were recovered, which semantic labels were reused from trusted sources, and what still needs review.
Why this priority: The first user need is fast understanding with minimal reinvention. Without an immediate and trustworthy first-pass interpretation, neither clarification nor execution provides value.
Independent Test: Can be fully tested by submitting a dataset with partial metadata or a Superset link with saved filters and verifying that the system produces a business-readable summary, distinguishes source confidence, searches trusted semantic sources before generating new labels, and shows the next recommended action without requiring manual dialogue.
Acceptance Scenarios:
- Given a dataset with partial technical metadata, When the user starts automatic review, Then the system generates a business-readable documentation draft, groups known and unresolved attributes, and presents a current readiness state.
- Given a valid Superset link with reusable saved native filters, When the user imports it, Then the system recovers the available filter context and presents imported values separately from inferred or user-provided values.
- Given connected dictionaries, spreadsheet sources, or trusted reference datasets are available, When automatic review runs, Then the system attempts semantic enrichment from those sources before creating AI-generated labels from scratch.
- Given multiple semantic candidates exist for a field, When the first summary is shown, Then the system clearly indicates the provenance and confidence level of the chosen or suggested semantic value.
User Story 2 - Resolve ambiguities through mixed-initiative assistant clarification (Priority: P2)
A data steward, analytics engineer, or domain expert works with an agent through a chat-centric central workspace to resolve ambiguous business meanings, conflicting metadata, conflicting semantic sources, and missing run-time values one issue at a time while also asking free-form context questions, while still being able to fall back to manual review when LLM assistance is unavailable or not desired.
Why this priority: Real datasets often contain implicit semantics that cannot be derived safely from source metadata alone. Guided clarification converts uncertainty into auditable decisions.
Independent Test: Can be fully tested by opening assistant clarification for a dataset with ambiguous attributes or conflicting semantic sources and verifying that the system asks focused questions in the assistant panel, explains why each question matters, accepts free-form follow-up questions, stores answers, and updates readiness and validation outcomes in real time.
Acceptance Scenarios:
- Given a dataset has blocking ambiguities, When the user starts clarification from the central assistant chat, Then the system asks one focused question at a time in that chat context and explains the significance of the question in business terms.
- Given the system already has a current guess for an unresolved attribute, When the question is shown, Then the system presents that guess along with selectable answers, a custom-answer option, and a skip option.
- Given semantic source reuse is likely, When the system detects a strong match with a trusted dictionary or reference dataset, Then the agent can proactively suggest that source as the preferred basis for semantic enrichment inside the assistant chat stream.
- Given fuzzy semantic matches were found from a selected dictionary or dataset, When the system presents them, Then the user can approve them in bulk, review them individually, keep only exact matches, or exclude a dataset tab from the current review set without leaving the chat-driven workflow.
- Given the user confirms or edits an answer, When the response is saved, Then the system updates the dataset profile, validation findings, readiness state, and any affected dataset tab state without losing prior context.
- Given the user exits clarification before all issues are resolved, When the session is saved, Then the system preserves answered questions, unresolved questions, excluded datasets, and the current recommended next action.
- Given a user asks a free-form question about the active dataset review session, When the question references profile, filters, mappings, validation, SQL preview state, or dataset-selection scope, Then the assistant answers using the current session context without forcing the user back into a rigid scripted flow.
- Given the session contains multiple datasets or candidate datasets, When the user navigates between them, Then the workspace keeps the chat as the primary interaction surface while exposing per-dataset tabs and allowing individual datasets to be excluded from the current review scope.
- Given LLM assistance is unavailable, disabled, or intentionally skipped, When the user continues the session, Then the system provides a manual review mode for documentation, mapping, and clarification work without blocking core progress.
User Story 3 - Prepare and launch a controlled dataset run (Priority: P3)
A BI engineer reviews the assembled run context across one or more dataset candidates, verifies filters and placeholders, understands any remaining warnings, reviews the compiled SQL preview, and launches the approved dataset context with confidence that the execution can be reproduced later.
Why this priority: Execution is the final high-value outcome, but it must feel controlled and auditable rather than opaque.
Independent Test: Can be fully tested by preparing a dataset run from imported or manually confirmed filter context and verifying that the system blocks missing required values, blocks missing preview approval conditions, allows review and editing, and records the exact run context used.
Acceptance Scenarios:
- Given an assembled dataset context contains required filters and placeholders, When the user opens run preparation, Then the system shows the effective filters, unresolved assumptions, semantic provenance signals, current run readiness, and active dataset selection in one place.
- Given required values are still missing, When the user attempts to launch, Then the system blocks launch and highlights the specific values that must be completed.
- Given warning-level mapping transformations are present, When the user reviews run preparation, Then the system requires explicit approval for each warning before launch while still allowing optional manual edits.
- Given Superset-side SQL compilation preview is unavailable or fails, When the user attempts to launch, Then the system blocks launch until a successful compiled preview is available.
- Given the workspace includes multiple dataset candidates, When the user excludes one or more datasets from review, Then the system updates launch readiness, findings, and previews to reflect only the active dataset scope while retaining the excluded datasets for later reconsideration.
- Given the dataset context is run-ready, When the user confirms launch, Then the system creates or starts a Superset SQL Lab session as the canonical execution target and records the dataset identity, effective filters, parameter values, outstanding warnings, dataset-scope decisions, and execution outcome for later audit or replay.
Edge Cases
- What happens when a dataset has enough structural metadata to document technically but not enough business context to explain its meaning?
- How does the system handle a Superset link that identifies the dataset but contains no reusable native filters?
- What happens when imported filters conflict with previously saved defaults or with the dataset’s documented business meaning?
- How does the system handle parameterized placeholders that exist in the run context but do not yet have values?
- What happens when a user skips clarification questions and proceeds with warnings?
- How does the system present cases where one attribute is confirmed by a user, inferred from metadata, and contradicted by imported filter context?
- What happens when a user leaves during clarification or run preparation and returns later?
- What happens when a semantic label exists in a spreadsheet dictionary, a reference dataset, and an AI proposal with different values?
- How does the system handle fuzzy semantic matches where source and target names are similar in meaning but not identical in form?
- What happens when a user manually edits a semantic value and a higher-confidence imported source becomes available later?
Requirements (mandatory)
Functional Requirements
- FR-001: The system MUST allow users to start dataset review and execution preparation from the frontend workspace by selecting a dataset source or providing a Superset link.
- FR-002: The system MUST generate an initial dataset profile that distinguishes confirmed metadata, inferred metadata, imported metadata, unresolved metadata, and AI-draft metadata where applicable.
- FR-003: The system MUST produce human-readable dataset documentation that explains the dataset purpose, business meaning, major attributes, filters, and known limitations in language suitable for operational stakeholders.
- FR-004: The system MUST assign and display a current readiness state for the dataset review so users can immediately understand whether the dataset is review-ready, semantic-source-review-needed, clarification-needed, partially ready, or run-ready.
- FR-005: The system MUST validate dataset completeness and consistency across attributes, business semantics, semantic enrichment sources, filters, assumptions, and execution readiness.
- FR-006: The system MUST classify validation findings into blocking issues, warnings, and informational findings.
- FR-007: The system MUST allow users to inspect the provenance of important dataset values, including whether each value was confirmed from a connected dictionary, imported from a trusted dataset, inferred from fuzzy matching, generated as an AI draft, manually edited by a user, or still unresolved.
- FR-008: The system MUST search connected semantic sources during automatic review, including supported external dictionaries and trusted reference datasets, before creating AI-generated semantic values from scratch.
- FR-009: The system MUST support semantic enrichment for at least
verbose_name,description, and display formatting metadata for dataset fields and metrics when such metadata is available from a trusted source. - FR-010: The system MUST apply a visible confidence hierarchy to semantic enrichment candidates in this order: exact dictionary/file match, trusted reference dataset match, fuzzy semantic match, AI-generated draft.
- FR-011: The system MUST allow users to choose and apply a semantic source from the frontend workspace using supported source types, including uploaded files, connected tabular dictionaries, and existing trusted Superset datasets.
- FR-012: The system MUST allow users to start a guided clarification flow for unresolved or contradictory dataset details.
- FR-013: The guided clarification flow MUST present one focused question at a time inside the central assistant chat rather than an isolated modal or an unstructured list of unresolved items.
- FR-014: The chat workspace MUST remain the primary interaction surface during recovery, clarification, and mapping review, while completed phases collapse into concise summaries that preserve context without leaving the conversation flow.
- FR-015: The system MUST provide a manual fallback mode for documentation, mapping, and review work when LLM assistance is unavailable, disabled, or intentionally skipped by the user.
- FR-016: Each clarification question MUST explain why the answer matters and, when available, show the system’s current best guess.
- FR-017: The system MUST allow the user to answer with a suggested option, provide a custom answer, skip the question, or mark the item for later expert review.
- FR-018: The system MUST allow the agent to proactively recommend a semantic source when schema overlap or semantic similarity with a trusted source is strong enough to justify reuse.
- FR-019: The system MUST distinguish exact semantic matches from fuzzy semantic matches and MUST require user review before fuzzy matches are applied.
- FR-020: The system MUST preserve answers provided during clarification and immediately update the dataset profile, validation findings, readiness state, and visible phase summaries when those answers affect review outcomes.
- FR-021: The system MUST allow users to pause and resume a clarification session without losing prior answers, unresolved items, or progress state.
- FR-022: The system MUST summarize what changed when a clarification session ends, including resolved ambiguities, remaining ambiguities, and impact on run readiness.
- FR-023: The system MUST extract reusable saved native filters from a provided Superset link whenever such filters are present and accessible.
- FR-024: The system MUST detect and expose runtime template variables referenced by the dataset execution logic so they can be mapped from imported or user-provided filter values.
- FR-025: The system MUST present extracted filters with their current value, source, confidence state, and whether user confirmation is required.
- FR-026: The system MUST preserve partially recovered value when a Superset import is incomplete and MUST explain which parts were recovered successfully and which still require manual or guided completion.
- FR-027: The system MUST support dataset execution contexts that include parameterized placeholders so users can complete required run-time values before launch.
- FR-028: The system MUST provide a dedicated pre-run review that presents the effective dataset identity, selected filters, required placeholders, unresolved assumptions, and current warnings in one place before launch.
- FR-029: The system MUST support review sessions containing multiple datasets or dataset candidates and MUST present them in a way that makes the active dataset scope explicit.
- FR-030: The system MUST allow users to switch between dataset-specific review surfaces without losing the central assistant chat context.
- FR-031: The system MUST allow users to exclude individual datasets from the current review scope and MUST immediately recalculate findings, readiness, and preview context based on the remaining active datasets.
- FR-032: The system MUST require explicit user approval for each warning-level mapping transformation before launch, while allowing the user to manually edit the mapped value instead of approving it.
- FR-033: The system MUST require a successful Superset-side compiled SQL preview before launch and MUST keep launch blocked if the preview is unavailable or compilation fails.
- FR-034: The system MUST prevent dataset launch when required values, required execution attributes, required warning approvals, or a required compiled preview are missing and MUST explain what must be completed.
- FR-035: The system MUST allow users to review and adjust the assembled filter set before starting a dataset run.
- FR-036: The system MUST support column-name mapping inputs from every mapping source and configuration path already supported by the existing dataset-mapping capability, including database-backed mapping and spreadsheet-file mapping, without narrowing the supported review scope.
- FR-037: The system MUST use a Superset SQL Lab session as the canonical audited execution target for approved dataset launch.
- FR-038: The system MUST record the dataset run context, including dataset identity, selected filters, parameter values, unresolved assumptions, the associated SQL Lab session reference, mapping approvals, semantic-source decisions, dataset-scope inclusion or exclusion decisions, and execution outcome, so that users can audit or repeat the run later.
- FR-039: The system MUST support a workflow where automatic review, semantic enrichment, guided clarification, dataset execution, and manual fallback review can be used independently or in sequence on the same dataset.
- FR-040: The system MUST provide exportable outputs for dataset documentation and validation results so users can share them outside the immediate workflow.
- FR-041: The system MUST preserve a usable frontend session state when a user stops mid-flow so they can resume review, clarification, semantic enrichment review, manual fallback review, or run preparation without reconstructing prior work.
- FR-042: The system MUST make the recommended next action explicit at each major state of the workflow.
- FR-043: The system MUST provide side-by-side comparison when multiple semantic sources disagree for the same field and MUST not silently overwrite a user-entered value with imported or AI-generated metadata.
- FR-044: The system MUST preserve manual semantic overrides unless the user explicitly replaces them.
- FR-045: The system MUST allow users to apply semantic enrichment selectively at field level rather than only as an all-or-nothing operation.
- FR-046: The system MUST provide an inline feedback mechanism (thumbs up/down) for AI-generated content to support continuous improvement of semantic matching and summarization.
- FR-047: The system MUST support multi-user collaboration on review sessions, allowing owners to invite collaborators with specific roles (viewer, reviewer, approver).
- FR-048: The system MUST provide batch approval actions for mapping warnings and fuzzy semantic matches to reduce manual effort for experienced users.
- FR-049: The system MUST capture and persist a structured event log of all session-related actions (e.g., source intake, answer submission, approval, exclusion, launch) to support audit, replay, and collaboration visibility.
- FR-050: The system MUST allow users to ask free-form questions about the currently loaded dataset review context, including profile, filters, mappings, findings, compiled SQL preview status, and active dataset scope, through the assistant chat.
- FR-051: The assistant MUST be able to accept natural-language commands that mutate the current dataset review session state, such as approving mappings, excluding a dataset from review, or generating a SQL preview, while preserving existing launch and approval gates.
Key Entities (include if feature involves data)
- Dataset Profile: The consolidated representation of a dataset, including business purpose, attributes, filters, assumptions, readiness state, validation state, provenance of each important fact, and semantic enrichment status.
- Validation Finding: A blocking issue, warning, or informational observation raised during dataset review, including severity, explanation, affected area, and resolution state.
- Clarification Session: A resumable interaction record that stores unresolved questions, user answers, system guesses, expert-review flags, and remaining ambiguities for a dataset.
- Semantic Source: A reusable origin of semantic metadata, such as an uploaded file, connected tabular dictionary, or trusted reference dataset, used to enrich field- and metric-level business meaning.
- Semantic Mapping Decision: A recorded choice about which semantic source or proposed value was accepted, rejected, edited, or left unresolved for a field or metric.
- Imported Filter Set: The collection of reusable filters extracted from a Superset link, including source context, mapped dataset fields, current values, confidence state, and confirmation status.
- Dataset Review Scope: The active set of one or more datasets included in the current review session, including tab ordering, inclusion or exclusion status, and the currently focused dataset.
- Dataset Run Context: The execution-ready snapshot of dataset inputs, selected filters, parameterized placeholders, unresolved assumptions, warnings, mapping approvals, semantic-source decisions, dataset-scope decisions, the associated SQL Lab session reference, and launch outcome used for auditing or replay.
- Readiness State: The current workflow status that tells the user whether the dataset is still being recovered, ready for review, needs semantic-source review, needs clarification, is partially ready, or is ready to run.
Success Criteria (mandatory)
Measurable Outcomes
- SC-001: At least 90% of datasets submitted with standard source metadata produce an initial documentation draft without requiring manual reconstruction from scratch.
- SC-002: Users can reach a first readable validation and documentation summary for a newly submitted dataset in under 5 minutes for the primary workflow.
- SC-003: At least 70% of eligible semantic fields are populated from trusted external dictionaries or trusted reference datasets before AI-generated drafting is needed.
- SC-004: At least 85% of clarification questions shown in guided mode are judged by pilot users as relevant and helpful to resolving ambiguity (measured via the built-in feedback mechanism).
- SC-005: At least 80% of Superset links containing reusable saved native filters result in an imported filter set that users can review without rebuilding the context manually.
- SC-006: At least 85% of pilot users correctly identify which values are confirmed versus imported versus inferred versus AI-generated during moderated usability review.
- SC-007: At least 80% of pilot users working with multi-dataset sessions can switch dataset focus or exclude a dataset from review without facilitator assistance during moderated usability review.
- SC-008: At least 90% of dataset runs started from an imported, clarified, or manually reviewed context include a complete recorded run context that can be reopened later.
- SC-009: Pilot users successfully complete the end-to-end flow of import, review, semantic enrichment, clarification, and launch on their first attempt in at least 75% of observed sessions.
- SC-010: Support requests caused by missing or unclear dataset attributes decrease by at least 40% within the target pilot group after adoption.
Assumptions
- Users already have permission to access the datasets and Superset artifacts they submit to ss-tools.
- Saved native filters embedded in a Superset link are considered the preferred reusable source of analytical context when available.
- Users need both self-service automation and a guided conversational path because dataset semantics are often incomplete, implicit, conflicting, or distributed across multiple semantic sources.
- The central assistant chat is the primary workspace surface for progressing through recovery, clarification, and mapping review, while completed phases collapse into summaries instead of remaining fully expanded.
- Multi-dataset review is a primary usage pattern, so users need to switch dataset focus and exclude individual datasets from review without resetting session context.
- A manual mode must remain available as a fallback so documentation, mapping, and run preparation can continue when LLM capabilities are unavailable.
- The feature is intended for internal operational use where clarity, traceability, semantic consistency, and repeatable execution are more important than raw execution speed.
- Exportable documentation and validation outputs are required for collaboration, review, and audit use cases.
- Users may choose to proceed with warnings, but not with missing required execution inputs, missing required mapping approvals, or missing required compiled preview.
- Superset SQL Lab session creation is the canonical audited launch path for approved execution.
- Warning-level mapping transformations require explicit user approval before launch, while manual correction remains optional.
- Existing dataset-mapping capabilities already support multiple source types and configuration paths, and the orchestration workflow must preserve that breadth.
- Launch requires a successful Superset-side compiled preview and cannot fall back to an unverified local approximation.
- Trusted semantic sources already exist or can be introduced incrementally through frontend-managed files, connected dictionaries, or reference datasets without requiring organizations to discard existing semantic workflows.