feat(ui): add chat-driven dataset review flow

Move dataset review clarification into the assistant workspace and rework the review page into a chat-centric layout with execution rails. Add session-scoped assistant actions for mappings, semantic fields, and SQL preview generation. Introduce optimistic locking for dataset review mutations, propagate session versions through API responses, and mask imported filter values before assistant exposure. Refresh tests, i18n, and spec artifacts to match the new workflow. BREAKING CHANGE: dataset review mutation endpoints now require the X-Session-Version header, and clarification is no longer handled through ClarificationDialog-based flows
2026-03-26 13:33:12 +03:00
parent d7911fb2f1
commit 7c85552132
74 changed files with 6122 additions and 2970 deletions
--- a/specs/027-dataset-llm-orchestration/spec.md
+++ b/specs/027-dataset-llm-orchestration/spec.md
@@ -32,28 +32,31 @@ A data engineer or analytics engineer submits a dataset or a Superset link and i

 ---

-### User Story 2 - Resolve ambiguities through guided clarification (Priority: P2)
+### User Story 2 - Resolve ambiguities through mixed-initiative assistant clarification (Priority: P2)

-A data steward, analytics engineer, or domain expert works with an agent to resolve ambiguous business meanings, conflicting metadata, conflicting semantic sources, and missing run-time values one issue at a time.
+A data steward, analytics engineer, or domain expert works with an agent through a chat-centric central workspace to resolve ambiguous business meanings, conflicting metadata, conflicting semantic sources, and missing run-time values one issue at a time while also asking free-form context questions, while still being able to fall back to manual review when LLM assistance is unavailable or not desired.

 **Why this priority**: Real datasets often contain implicit semantics that cannot be derived safely from source metadata alone. Guided clarification converts uncertainty into auditable decisions.

-**Independent Test**: Can be fully tested by opening clarification mode for a dataset with ambiguous attributes or conflicting semantic sources and verifying that the system asks focused questions, explains why each question matters, stores answers, and updates readiness and validation outcomes in real time.
+**Independent Test**: Can be fully tested by opening assistant clarification for a dataset with ambiguous attributes or conflicting semantic sources and verifying that the system asks focused questions in the assistant panel, explains why each question matters, accepts free-form follow-up questions, stores answers, and updates readiness and validation outcomes in real time.

 **Acceptance Scenarios**:

-1. **Given** a dataset has blocking ambiguities, **When** the user starts guided clarification, **Then** the system asks one focused question at a time and explains the significance of the question in business terms.
+1. **Given** a dataset has blocking ambiguities, **When** the user starts clarification from the central assistant chat, **Then** the system asks one focused question at a time in that chat context and explains the significance of the question in business terms.
 2. **Given** the system already has a current guess for an unresolved attribute, **When** the question is shown, **Then** the system presents that guess along with selectable answers, a custom-answer option, and a skip option.
-3. **Given** semantic source reuse is likely, **When** the system detects a strong match with a trusted dictionary or reference dataset, **Then** the agent can proactively suggest that source as the preferred basis for semantic enrichment.
-4. **Given** fuzzy semantic matches were found from a selected dictionary or dataset, **When** the system presents them, **Then** the user can approve them in bulk, review them individually, or keep only exact matches.
-5. **Given** the user confirms or edits an answer, **When** the response is saved, **Then** the system updates the dataset profile, validation findings, and readiness state without losing prior context.
-6. **Given** the user exits clarification before all issues are resolved, **When** the session is saved, **Then** the system preserves answered questions, unresolved questions, and the current recommended next action.
+3. **Given** semantic source reuse is likely, **When** the system detects a strong match with a trusted dictionary or reference dataset, **Then** the agent can proactively suggest that source as the preferred basis for semantic enrichment inside the assistant chat stream.
+4. **Given** fuzzy semantic matches were found from a selected dictionary or dataset, **When** the system presents them, **Then** the user can approve them in bulk, review them individually, keep only exact matches, or exclude a dataset tab from the current review set without leaving the chat-driven workflow.
+5. **Given** the user confirms or edits an answer, **When** the response is saved, **Then** the system updates the dataset profile, validation findings, readiness state, and any affected dataset tab state without losing prior context.
+6. **Given** the user exits clarification before all issues are resolved, **When** the session is saved, **Then** the system preserves answered questions, unresolved questions, excluded datasets, and the current recommended next action.
+7. **Given** a user asks a free-form question about the active dataset review session, **When** the question references profile, filters, mappings, validation, SQL preview state, or dataset-selection scope, **Then** the assistant answers using the current session context without forcing the user back into a rigid scripted flow.
+8. **Given** the session contains multiple datasets or candidate datasets, **When** the user navigates between them, **Then** the workspace keeps the chat as the primary interaction surface while exposing per-dataset tabs and allowing individual datasets to be excluded from the current review scope.
+9. **Given** LLM assistance is unavailable, disabled, or intentionally skipped, **When** the user continues the session, **Then** the system provides a manual review mode for documentation, mapping, and clarification work without blocking core progress.

 ---

 ### User Story 3 - Prepare and launch a controlled dataset run (Priority: P3)

-A BI engineer reviews the assembled run context, verifies filters and placeholders, understands any remaining warnings, reviews the compiled SQL preview, and launches the dataset with confidence that the execution can be reproduced later.
+A BI engineer reviews the assembled run context across one or more dataset candidates, verifies filters and placeholders, understands any remaining warnings, reviews the compiled SQL preview, and launches the approved dataset context with confidence that the execution can be reproduced later.

 **Why this priority**: Execution is the final high-value outcome, but it must feel controlled and auditable rather than opaque.

@@ -61,11 +64,12 @@ A BI engineer reviews the assembled run context, verifies filters and placeholde

 **Acceptance Scenarios**:

-1. **Given** an assembled dataset context contains required filters and placeholders, **When** the user opens run preparation, **Then** the system shows the effective filters, unresolved assumptions, semantic provenance signals, and current run readiness in one place.
+1. **Given** an assembled dataset context contains required filters and placeholders, **When** the user opens run preparation, **Then** the system shows the effective filters, unresolved assumptions, semantic provenance signals, current run readiness, and active dataset selection in one place.
 2. **Given** required values are still missing, **When** the user attempts to launch, **Then** the system blocks launch and highlights the specific values that must be completed.
 3. **Given** warning-level mapping transformations are present, **When** the user reviews run preparation, **Then** the system requires explicit approval for each warning before launch while still allowing optional manual edits.
 4. **Given** Superset-side SQL compilation preview is unavailable or fails, **When** the user attempts to launch, **Then** the system blocks launch until a successful compiled preview is available.
-5. **Given** the dataset is run-ready, **When** the user confirms launch, **Then** the system creates or starts a Superset SQL Lab session as the canonical execution target and records the dataset identity, effective filters, parameter values, outstanding warnings, and execution outcome for later audit or replay.
+5. **Given** the workspace includes multiple dataset candidates, **When** the user excludes one or more datasets from review, **Then** the system updates launch readiness, findings, and previews to reflect only the active dataset scope while retaining the excluded datasets for later reconsideration.
+6. **Given** the dataset context is run-ready, **When** the user confirms launch, **Then** the system creates or starts a Superset SQL Lab session as the canonical execution target and records the dataset identity, effective filters, parameter values, outstanding warnings, dataset-scope decisions, and execution outcome for later audit or replay.

 ---

@@ -98,38 +102,45 @@ A BI engineer reviews the assembled run context, verifies filters and placeholde
 - **FR-010**: The system MUST apply a visible confidence hierarchy to semantic enrichment candidates in this order: exact dictionary/file match, trusted reference dataset match, fuzzy semantic match, AI-generated draft.
 - **FR-011**: The system MUST allow users to choose and apply a semantic source from the frontend workspace using supported source types, including uploaded files, connected tabular dictionaries, and existing trusted Superset datasets.
 - **FR-012**: The system MUST allow users to start a guided clarification flow for unresolved or contradictory dataset details.
- **FR-013**: The guided clarification flow MUST present one focused question at a time rather than an unstructured list of unresolved items.
- **FR-014**: Each clarification question MUST explain why the answer matters and, when available, show the system’s current best guess.
- **FR-015**: The system MUST allow the user to answer with a suggested option, provide a custom answer, skip the question, or mark the item for later expert review.
- **FR-016**: The system MUST allow the agent to proactively recommend a semantic source when schema overlap or semantic similarity with a trusted source is strong enough to justify reuse.
- **FR-017**: The system MUST distinguish exact semantic matches from fuzzy semantic matches and MUST require user review before fuzzy matches are applied.
- **FR-018**: The system MUST preserve answers provided during clarification and immediately update the dataset profile, validation findings, and readiness state when those answers affect review outcomes.
- **FR-019**: The system MUST allow users to pause and resume a clarification session without losing prior answers, unresolved items, or progress state.
- **FR-020**: The system MUST summarize what changed when a clarification session ends, including resolved ambiguities, remaining ambiguities, and impact on run readiness.
- **FR-021**: (Consolidated with FR-001)
- **FR-022**: The system MUST extract reusable saved native filters from a provided Superset link whenever such filters are present and accessible.
- **FR-023**: The system MUST detect and expose runtime template variables referenced by the dataset execution logic so they can be mapped from imported or user-provided filter values.
- **FR-024**: The system MUST present extracted filters with their current value, source, confidence state, and whether user confirmation is required.
- **FR-025**: The system MUST preserve partially recovered value when a Superset import is incomplete and MUST explain which parts were recovered successfully and which still require manual or guided completion.
- **FR-026**: The system MUST support dataset execution contexts that include parameterized placeholders so users can complete required run-time values before launch.
- **FR-027**: The system MUST provide a dedicated pre-run review that presents the effective dataset identity, selected filters, required placeholders, unresolved assumptions, and current warnings in one place before launch.
- **FR-028**: The system MUST require explicit user approval for each warning-level mapping transformation before launch, while allowing the user to manually edit the mapped value instead of approving it.
- **FR-029**: The system MUST require a successful Superset-side compiled SQL preview before launch and MUST keep launch blocked if the preview is unavailable or compilation fails.
- **FR-030**: The system MUST prevent dataset launch when required values, required execution attributes, required warning approvals, or a required compiled preview are missing and MUST explain what must be completed.
- **FR-031**: The system MUST allow users to review and adjust the assembled filter set before starting a dataset run.
- **FR-032**: The system MUST use a Superset SQL Lab session as the canonical audited execution target for approved dataset launch.
- **FR-033**: The system MUST record the dataset run context, including dataset identity, selected filters, parameter values, unresolved assumptions, the associated SQL Lab session reference, mapping approvals, semantic-source decisions, and execution outcome, so that users can audit or repeat the run later.
- **FR-034**: The system MUST support a workflow where automatic review, semantic enrichment, guided clarification, and dataset execution can be used independently or in sequence on the same dataset.
- **FR-035**: The system MUST provide exportable outputs for dataset documentation and validation results so users can share them outside the immediate workflow.
- **FR-036**: The system MUST preserve a usable frontend session state when a user stops mid-flow so they can resume review, clarification, semantic enrichment review, or run preparation without reconstructing prior work.
- **FR-037**: The system MUST make the recommended next action explicit at each major state of the workflow.
- **FR-038**: The system MUST provide side-by-side comparison when multiple semantic sources disagree for the same field and MUST not silently overwrite a user-entered value with imported or AI-generated metadata.
- **FR-039**: The system MUST preserve manual semantic overrides unless the user explicitly replaces them.
- **FR-040**: The system MUST allow users to apply semantic enrichment selectively at field level rather than only as an all-or-nothing operation.
- **FR-041**: The system MUST provide an inline feedback mechanism (thumbs up/down) for AI-generated content to support continuous improvement of semantic matching and summarization.
- **FR-042**: The system MUST support multi-user collaboration on review sessions, allowing owners to invite collaborators with specific roles (viewer, reviewer, approver).
- **FR-043**: The system MUST provide batch approval actions for mapping warnings and fuzzy semantic matches to reduce manual effort for experienced users.
- **FR-044**: The system MUST capture and persist a structured event log of all session-related actions (e.g., source intake, answer submission, approval, launch) to support audit, replay, and collaboration visibility.
+- **FR-013**: The guided clarification flow MUST present one focused question at a time inside the central assistant chat rather than an isolated modal or an unstructured list of unresolved items.
+- **FR-014**: The chat workspace MUST remain the primary interaction surface during recovery, clarification, and mapping review, while completed phases collapse into concise summaries that preserve context without leaving the conversation flow.
+- **FR-015**: The system MUST provide a manual fallback mode for documentation, mapping, and review work when LLM assistance is unavailable, disabled, or intentionally skipped by the user.
+- **FR-016**: Each clarification question MUST explain why the answer matters and, when available, show the system’s current best guess.
+- **FR-017**: The system MUST allow the user to answer with a suggested option, provide a custom answer, skip the question, or mark the item for later expert review.
+- **FR-018**: The system MUST allow the agent to proactively recommend a semantic source when schema overlap or semantic similarity with a trusted source is strong enough to justify reuse.
+- **FR-019**: The system MUST distinguish exact semantic matches from fuzzy semantic matches and MUST require user review before fuzzy matches are applied.
+- **FR-020**: The system MUST preserve answers provided during clarification and immediately update the dataset profile, validation findings, readiness state, and visible phase summaries when those answers affect review outcomes.
+- **FR-021**: The system MUST allow users to pause and resume a clarification session without losing prior answers, unresolved items, or progress state.
+- **FR-022**: The system MUST summarize what changed when a clarification session ends, including resolved ambiguities, remaining ambiguities, and impact on run readiness.
+- **FR-023**: The system MUST extract reusable saved native filters from a provided Superset link whenever such filters are present and accessible.
+- **FR-024**: The system MUST detect and expose runtime template variables referenced by the dataset execution logic so they can be mapped from imported or user-provided filter values.
+- **FR-025**: The system MUST present extracted filters with their current value, source, confidence state, and whether user confirmation is required.
+- **FR-026**: The system MUST preserve partially recovered value when a Superset import is incomplete and MUST explain which parts were recovered successfully and which still require manual or guided completion.
+- **FR-027**: The system MUST support dataset execution contexts that include parameterized placeholders so users can complete required run-time values before launch.
+- **FR-028**: The system MUST provide a dedicated pre-run review that presents the effective dataset identity, selected filters, required placeholders, unresolved assumptions, and current warnings in one place before launch.
+- **FR-029**: The system MUST support review sessions containing multiple datasets or dataset candidates and MUST present them in a way that makes the active dataset scope explicit.
+- **FR-030**: The system MUST allow users to switch between dataset-specific review surfaces without losing the central assistant chat context.
+- **FR-031**: The system MUST allow users to exclude individual datasets from the current review scope and MUST immediately recalculate findings, readiness, and preview context based on the remaining active datasets.
+- **FR-032**: The system MUST require explicit user approval for each warning-level mapping transformation before launch, while allowing the user to manually edit the mapped value instead of approving it.
+- **FR-033**: The system MUST require a successful Superset-side compiled SQL preview before launch and MUST keep launch blocked if the preview is unavailable or compilation fails.
+- **FR-034**: The system MUST prevent dataset launch when required values, required execution attributes, required warning approvals, or a required compiled preview are missing and MUST explain what must be completed.
+- **FR-035**: The system MUST allow users to review and adjust the assembled filter set before starting a dataset run.
+- **FR-036**: The system MUST support column-name mapping inputs from every mapping source and configuration path already supported by the existing dataset-mapping capability, including database-backed mapping and spreadsheet-file mapping, without narrowing the supported review scope.
+- **FR-037**: The system MUST use a Superset SQL Lab session as the canonical audited execution target for approved dataset launch.
+- **FR-038**: The system MUST record the dataset run context, including dataset identity, selected filters, parameter values, unresolved assumptions, the associated SQL Lab session reference, mapping approvals, semantic-source decisions, dataset-scope inclusion or exclusion decisions, and execution outcome, so that users can audit or repeat the run later.
+- **FR-039**: The system MUST support a workflow where automatic review, semantic enrichment, guided clarification, dataset execution, and manual fallback review can be used independently or in sequence on the same dataset.
+- **FR-040**: The system MUST provide exportable outputs for dataset documentation and validation results so users can share them outside the immediate workflow.
+- **FR-041**: The system MUST preserve a usable frontend session state when a user stops mid-flow so they can resume review, clarification, semantic enrichment review, manual fallback review, or run preparation without reconstructing prior work.
+- **FR-042**: The system MUST make the recommended next action explicit at each major state of the workflow.
+- **FR-043**: The system MUST provide side-by-side comparison when multiple semantic sources disagree for the same field and MUST not silently overwrite a user-entered value with imported or AI-generated metadata.
+- **FR-044**: The system MUST preserve manual semantic overrides unless the user explicitly replaces them.
+- **FR-045**: The system MUST allow users to apply semantic enrichment selectively at field level rather than only as an all-or-nothing operation.
+- **FR-046**: The system MUST provide an inline feedback mechanism (thumbs up/down) for AI-generated content to support continuous improvement of semantic matching and summarization.
+- **FR-047**: The system MUST support multi-user collaboration on review sessions, allowing owners to invite collaborators with specific roles (viewer, reviewer, approver).
+- **FR-048**: The system MUST provide batch approval actions for mapping warnings and fuzzy semantic matches to reduce manual effort for experienced users.
+- **FR-049**: The system MUST capture and persist a structured event log of all session-related actions (e.g., source intake, answer submission, approval, exclusion, launch) to support audit, replay, and collaboration visibility.
+- **FR-050**: The system MUST allow users to ask free-form questions about the currently loaded dataset review context, including profile, filters, mappings, findings, compiled SQL preview status, and active dataset scope, through the assistant chat.
+- **FR-051**: The assistant MUST be able to accept natural-language commands that mutate the current dataset review session state, such as approving mappings, excluding a dataset from review, or generating a SQL preview, while preserving existing launch and approval gates.

 ### Key Entities *(include if feature involves data)*

@@ -139,7 +150,8 @@ A BI engineer reviews the assembled run context, verifies filters and placeholde
 - **Semantic Source**: A reusable origin of semantic metadata, such as an uploaded file, connected tabular dictionary, or trusted reference dataset, used to enrich field- and metric-level business meaning.
 - **Semantic Mapping Decision**: A recorded choice about which semantic source or proposed value was accepted, rejected, edited, or left unresolved for a field or metric.
 - **Imported Filter Set**: The collection of reusable filters extracted from a Superset link, including source context, mapped dataset fields, current values, confidence state, and confirmation status.
- **Dataset Run Context**: The execution-ready snapshot of dataset inputs, selected filters, parameterized placeholders, unresolved assumptions, warnings, mapping approvals, semantic-source decisions, the associated SQL Lab session reference, and launch outcome used for auditing or replay.
+- **Dataset Review Scope**: The active set of one or more datasets included in the current review session, including tab ordering, inclusion or exclusion status, and the currently focused dataset.
+- **Dataset Run Context**: The execution-ready snapshot of dataset inputs, selected filters, parameterized placeholders, unresolved assumptions, warnings, mapping approvals, semantic-source decisions, dataset-scope decisions, the associated SQL Lab session reference, and launch outcome used for auditing or replay.
 - **Readiness State**: The current workflow status that tells the user whether the dataset is still being recovered, ready for review, needs semantic-source review, needs clarification, is partially ready, or is ready to run.

 ## Success Criteria *(mandatory)*
@@ -152,19 +164,24 @@ A BI engineer reviews the assembled run context, verifies filters and placeholde
 - **SC-004**: At least 85% of clarification questions shown in guided mode are judged by pilot users as relevant and helpful to resolving ambiguity (measured via the built-in feedback mechanism).
 - **SC-005**: At least 80% of Superset links containing reusable saved native filters result in an imported filter set that users can review without rebuilding the context manually.
 - **SC-006**: At least 85% of pilot users correctly identify which values are confirmed versus imported versus inferred versus AI-generated during moderated usability review.
- **SC-007**: At least 90% of dataset runs started from an imported or clarified context include a complete recorded run context that can be reopened later.
- **SC-008**: Pilot users successfully complete the end-to-end flow of import, review, semantic enrichment, clarification, and launch on their first attempt in at least 75% of observed sessions.
- **SC-009**: Support requests caused by missing or unclear dataset attributes decrease by at least 40% within the target pilot group after adoption.
+- **SC-007**: At least 80% of pilot users working with multi-dataset sessions can switch dataset focus or exclude a dataset from review without facilitator assistance during moderated usability review.
+- **SC-008**: At least 90% of dataset runs started from an imported, clarified, or manually reviewed context include a complete recorded run context that can be reopened later.
+- **SC-009**: Pilot users successfully complete the end-to-end flow of import, review, semantic enrichment, clarification, and launch on their first attempt in at least 75% of observed sessions.
+- **SC-010**: Support requests caused by missing or unclear dataset attributes decrease by at least 40% within the target pilot group after adoption.

 ## Assumptions

 - Users already have permission to access the datasets and Superset artifacts they submit to ss-tools.
 - Saved native filters embedded in a Superset link are considered the preferred reusable source of analytical context when available.
 - Users need both self-service automation and a guided conversational path because dataset semantics are often incomplete, implicit, conflicting, or distributed across multiple semantic sources.
+- The central assistant chat is the primary workspace surface for progressing through recovery, clarification, and mapping review, while completed phases collapse into summaries instead of remaining fully expanded.
+- Multi-dataset review is a primary usage pattern, so users need to switch dataset focus and exclude individual datasets from review without resetting session context.
+- A manual mode must remain available as a fallback so documentation, mapping, and run preparation can continue when LLM capabilities are unavailable.
 - The feature is intended for internal operational use where clarity, traceability, semantic consistency, and repeatable execution are more important than raw execution speed.
 - Exportable documentation and validation outputs are required for collaboration, review, and audit use cases.
 - Users may choose to proceed with warnings, but not with missing required execution inputs, missing required mapping approvals, or missing required compiled preview.
 - Superset SQL Lab session creation is the canonical audited launch path for approved execution.
 - Warning-level mapping transformations require explicit user approval before launch, while manual correction remains optional.
+- Existing dataset-mapping capabilities already support multiple source types and configuration paths, and the orchestration workflow must preserve that breadth.
 - Launch requires a successful Superset-side compiled preview and cannot fall back to an unverified local approximation.
 - Trusted semantic sources already exist or can be introduced incrementally through frontend-managed files, connected dictionaries, or reference datasets without requiring organizations to discard existing semantic workflows.