feat(ui): add chat-driven dataset review flow

Move dataset review clarification into the assistant workspace and rework the review page into a chat-centric layout with execution rails. Add session-scoped assistant actions for mappings, semantic fields, and SQL preview generation. Introduce optimistic locking for dataset review mutations, propagate session versions through API responses, and mask imported filter values before assistant exposure. Refresh tests, i18n, and spec artifacts to match the new workflow. BREAKING CHANGE: dataset review mutation endpoints now require the X-Session-Version header, and clarification is no longer handled through ClarificationDialog-based flows
2026-03-26 13:33:12 +03:00
parent d7911fb2f1
commit 7c85552132
74 changed files with 6122 additions and 2970 deletions
--- a/specs/027-dataset-llm-orchestration/checklists/requirements.md
+++ b/specs/027-dataset-llm-orchestration/checklists/requirements.md
@@ -38,6 +38,7 @@
 ## Notes

 - Validation completed against [spec.md](../spec.md) and [ux_reference.md](../ux_reference.md).
- Automatic documentation, guided clarification, and Superset-derived dataset execution are all represented as independently testable user journeys.
- Error recovery is aligned between the UX reference and the functional requirements, especially for partial filter import, missing run-time values, and conflicting metadata.
- The specification is ready for the next phase.
+- Automatic documentation, guided clarification, Superset-derived dataset execution, and manual fallback review are represented as independently testable user journeys.
+- Chat-centric workflow, collapsible completed phases, multi-dataset tabbed review scope, and per-dataset exclusion behavior are reflected in both the specification and UX reference.
+- Error recovery is aligned between the UX reference and the functional requirements, especially for partial filter import, missing run-time values, conflicting metadata, and unavailable LLM assistance.
+- The specification is ready for the next phase.
--- a/specs/027-dataset-llm-orchestration/contracts/api.yaml
+++ b/specs/027-dataset-llm-orchestration/contracts/api.yaml
@@ -12,8 +12,8 @@ tags:
  - name: Mapping Review
  - name: Preview and Launch
  - name: Exports
+  - name: Assistant Integration

-paths:
 security:
  - bearerAuth: []

@@ -85,6 +85,7 @@ paths:
      summary: Update resumable session lifecycle state
      parameters:
        - $ref: '#/components/parameters/SessionId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      requestBody:
        required: true
        content:
@@ -129,6 +130,7 @@ paths:
      summary: Apply a semantic source to the current session
      parameters:
        - $ref: '#/components/parameters/SessionId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      requestBody:
        required: true
        content:
@@ -156,6 +158,7 @@ paths:
      parameters:
        - $ref: '#/components/parameters/SessionId'
        - $ref: '#/components/parameters/FieldId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      requestBody:
        required: true
        content:
@@ -183,6 +186,7 @@ paths:
      parameters:
        - $ref: '#/components/parameters/SessionId'
        - $ref: '#/components/parameters/FieldId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      responses:
        '200':
          description: Field locked
@@ -202,6 +206,7 @@ paths:
      parameters:
        - $ref: '#/components/parameters/SessionId'
        - $ref: '#/components/parameters/FieldId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      responses:
        '200':
          description: Field unlocked
@@ -236,6 +241,7 @@ paths:
      summary: Submit an answer to the current clarification question
      parameters:
        - $ref: '#/components/parameters/SessionId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      requestBody:
        required: true
        content:
@@ -262,6 +268,7 @@ paths:
      summary: Resume or start clarification mode for the next unresolved question
      parameters:
        - $ref: '#/components/parameters/SessionId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      responses:
        '200':
          description: Clarification resumed
@@ -295,6 +302,7 @@ paths:
      parameters:
        - $ref: '#/components/parameters/SessionId'
        - $ref: '#/components/parameters/MappingId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      requestBody:
        required: true
        content:
@@ -320,6 +328,7 @@ paths:
      parameters:
        - $ref: '#/components/parameters/SessionId'
        - $ref: '#/components/parameters/MappingId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      requestBody:
        required: false
        content:
@@ -344,6 +353,7 @@ paths:
      summary: Trigger Superset-side SQL compilation preview
      parameters:
        - $ref: '#/components/parameters/SessionId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      responses:
        '202':
          description: Preview generation started
@@ -373,6 +383,7 @@ paths:
      x-required-permissions: [dataset:execution:launch]
      parameters:
        - $ref: '#/components/parameters/SessionId'
+        - $ref: '#/components/parameters/SessionVersionHeader'
      responses:
        '201':
          description: Dataset launched
@@ -480,6 +491,14 @@ components:
        minimum: 1
        maximum: 100
        default: 20
+    SessionVersionHeader:
+      name: X-Session-Version
+      in: header
+      required: true
+      description: Optimistic-lock version of the current dataset review session. Requests with a stale version are rejected with conflict semantics.
+      schema:
+        type: integer
+        minimum: 0

  securitySchemes:
    bearerAuth:
@@ -586,6 +605,7 @@ components:
      type: object
      required:
        - session_id
+        - version
        - dataset_ref
        - environment_id
        - readiness_state
@@ -597,6 +617,9 @@ components:
      properties:
        session_id:
          type: string
+        version:
+          type: integer
+          minimum: 0
        dataset_ref:
          type: string
        dataset_id:
@@ -991,6 +1014,9 @@ components:
          type: string
          nullable: true
        raw_value: {}
+        raw_value_masked:
+          type: boolean
+          description: Indicates whether the raw filter value has been masked or redacted before exposure to assistant or LLM-facing context.
        normalized_value:
          nullable: true
        source:
@@ -1209,10 +1235,10 @@ components:

    ClarificationState:
      type: object
-      required: [clarification_session]
      properties:
        clarification_session:
          $ref: '#/components/schemas/ClarificationSessionSummary'
+          nullable: true
        current_question:
          $ref: '#/components/schemas/ClarificationQuestion'
          nullable: true
@@ -1398,6 +1424,7 @@ components:
      type: object
      required:
        - session
+        - session_version
        - profile
        - findings
        - semantic_sources
@@ -1408,6 +1435,10 @@ components:
      properties:
        session:
          $ref: '#/components/schemas/SessionSummary'
+        session_version:
+          type: integer
+          minimum: 0
+          description: Convenience mirror of the current session version for assistant and workspace synchronization.
        profile:
          $ref: '#/components/schemas/DatasetProfile'
        findings:
@@ -1444,6 +1475,24 @@ components:
          $ref: '#/components/schemas/DatasetRunContextSummary'
          nullable: true

+    AssistantMessageRequest:
+      type: object
+      required: [message]
+      description: Request payload accepted by the global Assistant API when the assistant is scoped to dataset review context.
+      properties:
+        conversation_id:
+          type: string
+          nullable: true
+        message:
+          type: string
+          minLength: 1
+          maxLength: 4000
+        dataset_review_session_id:
+          type: string
+          nullable: true
+          description: Optional active dataset review session binding used to ground assistant answers and route approved commands into the current orchestration session.
+      additionalProperties: false
+
    ErrorResponse:
      type: object
      required: [error_code, message]
@@ -1476,4 +1525,4 @@ components:
        errors:
          type: array
          items:
-            $ref: '#/components/schemas/ValidationErrorItem'
+            $ref: '#/components/schemas/ValidationErrorItem'
--- a/specs/027-dataset-llm-orchestration/contracts/modules.md
+++ b/specs/027-dataset-llm-orchestration/contracts/modules.md
@@ -19,6 +19,7 @@ This document defines the semantic contracts for the core components of the Data
 # @RELATION: [DEPENDS_ON] ->[SupersetContextExtractor]
 # @RELATION: [DEPENDS_ON] ->[SupersetCompilationAdapter]
 # @RELATION: [DEPENDS_ON] ->[TaskManager]
+# @RELATION: [EXPOSES_STATE_TO] ->[AssistantApi]
 # @PRE: session mutations must execute inside a persisted session boundary scoped to one authenticated user.
 # @POST: state transitions are persisted atomically and emit observable progress for long-running steps.
 # @SIDE_EFFECT: creates task records, updates session aggregates, triggers upstream Superset calls, persists audit artifacts.
@@ -148,15 +149,16 @@ This document defines the semantic contracts for the core components of the Data

 # [DEF:ClarificationEngine:Module]
 # @COMPLEXITY: 4
-# @PURPOSE: Manage one-question-at-a-time clarification sessions, including prioritization, answer persistence, and readiness impact updates.
+# @PURPOSE: Manage mixed-initiative clarification sessions, including prioritized agent prompts, answer persistence, assistant routing, and readiness impact updates.
 # @LAYER: Domain
 # @RELATION: [DEPENDS_ON] ->[ClarificationSession]
 # @RELATION: [DEPENDS_ON] ->[ClarificationQuestion]
 # @RELATION: [DEPENDS_ON] ->[ClarificationAnswer]
 # @RELATION: [DEPENDS_ON] ->[ValidationFinding]
+# @RELATION: [DISPATCHES] ->[AssistantChatPanel]
 # @PRE: target session contains unresolved or contradictory review state.
-# @POST: every recorded answer updates the clarification session and associated session state deterministically.
-# @SIDE_EFFECT: creates clarification questions, persists answers, updates findings/profile state.
+# @POST: every recorded answer updates the clarification session and associated session state deterministically, and the next agent prompt is routable through assistant chat.
+# @SIDE_EFFECT: creates clarification questions, persists answers, updates findings/profile state, emits assistant-routable clarification prompts.
 # @DATA_CONTRACT: Input[ClarificationSessionState | ClarificationAnswerCommand] -> Output[ClarificationQuestionPayload | ClarificationProgressSnapshot | SessionReadinessDelta]
 # @INVARIANT: Clarification answers are persisted before the current question pointer or readiness state is advanced.
 # @TEST_CONTRACT: next_question_selection -> returns only one highest-priority unresolved question at a time
@@ -170,7 +172,7 @@ This document defines the semantic contracts for the core components of the Data
 # @PURPOSE: Open clarification mode on the highest-priority unresolved question.

 #### ƒ **build_question_payload**
-# @PURPOSE: Return question, why-it-matters text, current guess, and suggested options.
+# @PURPOSE: Return question, why-it-matters text, current guess, and suggested options for assistant-chat delivery.

 #### ƒ **record_answer**
 # @PURPOSE: Persist one answer and compute state impact.
@@ -253,19 +255,20 @@ This document defines the semantic contracts for the core components of the Data

 <!-- [DEF:DatasetReviewWorkspace:Component] -->
 <!-- @COMPLEXITY: 5 -->
-<!-- @PURPOSE: Main dataset review workspace coordinating session state, progressive recovery, semantic review, clarification, preview, and launch UX. -->
+<!-- @PURPOSE: Main dataset review workspace coordinating session state, progressive recovery, semantic review, assistant-chat clarification, preview, and launch UX. -->
 <!-- @LAYER: UI -->
 <!-- @RELATION: [BINDS_TO] ->[api_module] -->
-<!-- @RELATION: [BINDS_TO] ->[assistantChat] -->
+<!-- @RELATION: [BINDS_TO] ->[AssistantApi] -->
+<!-- @RELATION: [BINDS_TO] ->[AssistantChatPanel] -->
 <!-- @RELATION: [BINDS_TO] ->[taskDrawer] -->
 <!-- @UX_STATE: Empty -> Show source intake with Superset link and dataset-selection entry actions. -->
 <!-- @UX_STATE: Importing -> Show progressive recovery milestones as context is assembled. -->
 <!-- @UX_STATE: Review -> Show summary, findings, semantic layer, filters, mapping, and next action. -->
-<!-- @UX_STATE: Clarification -> Focus the user on one current clarification question while preserving wider session context. -->
+<!-- @UX_STATE: Clarification -> Focus the user on one current assistant-led clarification thread while preserving wider session context. -->
 <!-- @UX_STATE: Ready -> Show launch summary and unambiguous run-ready signal without hiding warnings. -->
 <!-- @UX_FEEDBACK: Main CTA changes by readiness state and reflects current highest-value next action. -->
 <!-- @UX_RECOVERY: Users can save, resume, or reopen an unfinished session without losing context. -->
-<!-- @UX_REACTIVITY: Uses Svelte runes for session, readiness, preview, and task state derivation. -->
+<!-- @UX_REACTIVITY: Uses Svelte runes for session, readiness, preview, task state derivation, and assistant/workspace focus synchronization. -->
 <!-- @INVARIANT: Navigation away from dirty session state must require explicit confirmation. -->
 <!-- @TEST_CONTRACT: workspace_state_machine -> one and only one major readiness-driven CTA is primary at a time -->
 <!-- @TEST_SCENARIO: resume_preserves_state -> reopening unfinished session restores current panel state and next action -->
@@ -291,6 +294,55 @@ This document defines the semantic contracts for the core components of the Data

 ---

+<!-- [DEF:AssistantChatPanel:Component] -->
+<!-- @COMPLEXITY: 4 -->
+<!-- @PURPOSE: Provide the mixed-initiative assistant drawer for clarification, free-form dataset questions, contextual actions, and confirmation cards tied to the active dataset review session. -->
+<!-- @LAYER: UI -->
+<!-- @RELATION: [DEPENDS_ON] ->[AssistantApi] -->
+<!-- @RELATION: [BINDS_TO] ->[DatasetReviewWorkspace] -->
+<!-- @RELATION: [BINDS_TO] ->[ClarificationEngine] -->
+<!-- @UX_STATE: Idle -> Drawer is closed or shows starter prompts for the active session. -->
+<!-- @UX_STATE: ClarificationQueue -> Assistant presents the next prioritized clarification prompt with suggested answers. -->
+<!-- @UX_STATE: Freeform -> User asks context questions about filters, findings, mappings, or SQL preview state. -->
+<!-- @UX_STATE: NeedsConfirmation -> Chat renders confirmation cards for dangerous or audit-relevant actions. -->
+<!-- @UX_STATE: Error -> Drawer preserves session context and shows retry guidance without hiding the workspace. -->
+<!-- @UX_FEEDBACK: Context actions and assistant prompts highlight the referenced workspace filter, field, mapping, or finding. -->
+<!-- @UX_RECOVERY: Users can skip, defer, resume, or abandon a clarification thread without losing session state. -->
+
+#### ƒ **submitSessionScopedMessage**
+<!-- @PURPOSE: Send a free-form or guided assistant message bound to the active dataset review session. -->
+
+#### ƒ **renderConfirmationCard**
+<!-- @PURPOSE: Present assistant-driven confirmation UI for state-changing actions such as mapping approval, preview generation, or launch. -->
+
+#### ƒ **highlightWorkspaceTarget**
+<!-- @PURPOSE: Synchronize assistant focus with the referenced workspace element. -->
+
+<!-- [/DEF:AssistantChatPanel:Component] -->
+
+---
+
+<!-- [DEF:AssistantApi:Module] -->
+<!-- @COMPLEXITY: 4 -->
+<!-- @PURPOSE: Accept session-scoped assistant messages and route grounded dataset-review intents to orchestration contracts without bypassing approval gates. -->
+<!-- @LAYER: UI -->
+<!-- @RELATION: [DEPENDS_ON] ->[DatasetReviewOrchestrator] -->
+<!-- @RELATION: [DEPENDS_ON] ->[ClarificationEngine] -->
+<!-- @RELATION: [DEPENDS_ON] ->[DatasetReviewSessionRepository] -->
+<!-- @PRE: Assistant requests are authenticated and may include an active dataset review session identifier. -->
+<!-- @POST: Responses stay grounded in the current session context and return deterministic confirmation or action states for frontend rendering. -->
+<!-- @SIDE_EFFECT: Reads session state, may dispatch approved orchestration commands, and records assistant interaction outcomes through existing audit pathways. -->
+
+#### ƒ **handleSessionScopedMessage**
+<!-- @PURPOSE: Load active dataset review context and answer or route a user assistant message against that session. -->
+
+#### ƒ **dispatchDatasetReviewIntent**
+<!-- @PURPOSE: Route approved dataset-review commands such as mapping approval or preview generation to orchestration services. -->
+
+<!-- [/DEF:AssistantApi:Module] -->
+
+---
+
 <!-- [DEF:SourceIntakePanel:Component] -->
 <!-- @COMPLEXITY: 3 -->
 <!-- @PURPOSE: Collect initial dataset source input through Superset link paste or dataset selection entry paths. -->
@@ -356,28 +408,9 @@ This document defines the semantic contracts for the core components of the Data

 ---

-<!-- [DEF:ClarificationDialog:Component] -->
-<!-- @COMPLEXITY: 3 -->
-<!-- @PURPOSE: One-question-at-a-time clarification surface for unresolved or contradictory dataset meanings. -->
-<!-- @LAYER: UI -->
-<!-- @RELATION: [BINDS_TO] ->[api_module] -->
-<!-- @RELATION: [BINDS_TO] ->[assistantChat] -->
-<!-- @UX_STATE: Question -> Show current question, why-it-matters text, current guess, and selectable answers. -->
-<!-- @UX_STATE: Saving -> Disable controls while persisting answer. -->
-<!-- @UX_STATE: Completed -> Show clarification summary and impact on readiness. -->
-<!-- @UX_FEEDBACK: Each answer updates profile and findings without forcing a full page reload. -->
-<!-- @UX_RECOVERY: Users can skip, defer to expert review, pause, and resume later. -->
+### Retired Contract: `ClarificationDialog`

-#### ƒ **submitAnswer**
-<!-- @PURPOSE: Save selected or custom clarification answer. -->
-
-#### ƒ **skipQuestion**
-<!-- @PURPOSE: Defer the current question while keeping it unresolved. -->
-
-#### ƒ **pauseClarification**
-<!-- @PURPOSE: Exit clarification mode without losing prior answers. -->
-
-<!-- [/DEF:ClarificationDialog:Component] -->
+`ClarificationDialog` is retired for feature 027 rebaseline. Its responsibilities move into `AssistantChatPanel`, which now owns clarification presentation, free-form dataset questions, and confirmation-card interactions inside the global assistant drawer.

 ---

@@ -460,4 +493,4 @@ These contracts are intended to align directly with:
 - [`specs/027-dataset-llm-orchestration/spec.md`](../spec.md)
 - [`specs/027-dataset-llm-orchestration/ux_reference.md`](../ux_reference.md)
 - [`specs/027-dataset-llm-orchestration/research.md`](../research.md)
- [`specs/027-dataset-llm-orchestration/data-model.md`](../data-model.md)
+- [`specs/027-dataset-llm-orchestration/data-model.md`](../data-model.md)
--- a/specs/027-dataset-llm-orchestration/data-model.md
+++ b/specs/027-dataset-llm-orchestration/data-model.md
@@ -43,6 +43,7 @@ Represents the top-level resumable workflow container for one dataset review/exe
 | `dashboard_id` | integer \| null | no | Superset dashboard id if imported from dashboard link |
 | `readiness_state` | enum | yes | Current workflow readiness state |
 | `recommended_action` | enum | yes | Explicit next recommended action |
+| `version` | integer | yes | Optimistic-lock version incremented on every persisted session mutation |
 | `status` | enum | yes | Session lifecycle status |
 | `current_phase` | enum | yes | Active workflow phase |
 | `active_task_id` | string \| null | no | Linked long-running task if one is active |
@@ -58,9 +59,11 @@ Represents the top-level resumable workflow container for one dataset review/exe
 - `source_input` must be non-empty.
 - `environment_id` must resolve to a configured environment.
 - `readiness_state` and `recommended_action` must always be present.
+- `version` starts at `0` on session creation and increments monotonically after every successful session mutation.
 - `user_id` ownership must be enforced for all mutations, unless collaborator roles allow otherwise.
 - `dataset_id` becomes required before preview or launch phases.
 - `last_preview_id` must refer to a preview generated from the same session.
+- Mutating requests must include the caller's last observed session version; mismatches are rejected as optimistic-lock conflicts rather than silently merged.

 ### Enums

@@ -342,6 +345,7 @@ Represents one recovered or user-supplied filter value.
 | `filter_name` | string | yes | Source filter name |
 | `display_name` | string \| null | no | User-facing label |
 | `raw_value` | json | yes | Original recovered value |
+| `raw_value_masked` | boolean | yes | Whether the stored or exposed raw value has been masked/redacted for assistant or LLM-facing use |
 | `normalized_value` | json \| null | no | Optional transformed value |
 | `source` | enum | yes | Origin of the filter |
 | `confidence_state` | enum | yes | Confidence/provenance class |
@@ -370,6 +374,11 @@ Represents one recovered or user-supplied filter value.
 - `missing`
 - `conflicted`

+### Validation rules
+- `raw_value` may be stored for audit and replay, but any context passed into assistant or LLM-facing orchestration must use a masked/redacted representation when the value may contain PII or other sensitive identifiers.
+- `raw_value_masked=true` is required whenever the exported assistant context omits or redacts sensitive substrings from the original filter payload.
+- Masking policy must preserve enough structure for mapping and clarification, for example key shape, value type, cardinality hints, and non-sensitive tokens.
+
 ---

 ### Entity: `TemplateVariable`
@@ -722,6 +731,8 @@ The future API and persistence layers should group models roughly as follows:
 - `SessionDetail`
 - `SessionListItem`

+`SessionSummary` and `SessionDetail` should both surface the current `version` so frontend workspace state, collaborator actions, and assistant-driven mutations can use the same optimistic-lock boundary.
+
 ### Review DTOs
 - `DatasetProfileDto`
 - `ValidationFindingDto`
@@ -761,4 +772,4 @@ The Phase 0 research questions are considered resolved for design purposes:
 This model is ready to drive:
 - [`contracts/modules.md`](./contracts/modules.md)
 - [`contracts/api.yaml`](./contracts/api.yaml)
- [`quickstart.md`](./quickstart.md)
+- [`quickstart.md`](./quickstart.md)
--- a/specs/027-dataset-llm-orchestration/spec.md
+++ b/specs/027-dataset-llm-orchestration/spec.md
@@ -32,28 +32,31 @@ A data engineer or analytics engineer submits a dataset or a Superset link and i

 ---

-### User Story 2 - Resolve ambiguities through guided clarification (Priority: P2)
+### User Story 2 - Resolve ambiguities through mixed-initiative assistant clarification (Priority: P2)

-A data steward, analytics engineer, or domain expert works with an agent to resolve ambiguous business meanings, conflicting metadata, conflicting semantic sources, and missing run-time values one issue at a time.
+A data steward, analytics engineer, or domain expert works with an agent through a chat-centric central workspace to resolve ambiguous business meanings, conflicting metadata, conflicting semantic sources, and missing run-time values one issue at a time while also asking free-form context questions, while still being able to fall back to manual review when LLM assistance is unavailable or not desired.

 **Why this priority**: Real datasets often contain implicit semantics that cannot be derived safely from source metadata alone. Guided clarification converts uncertainty into auditable decisions.

-**Independent Test**: Can be fully tested by opening clarification mode for a dataset with ambiguous attributes or conflicting semantic sources and verifying that the system asks focused questions, explains why each question matters, stores answers, and updates readiness and validation outcomes in real time.
+**Independent Test**: Can be fully tested by opening assistant clarification for a dataset with ambiguous attributes or conflicting semantic sources and verifying that the system asks focused questions in the assistant panel, explains why each question matters, accepts free-form follow-up questions, stores answers, and updates readiness and validation outcomes in real time.

 **Acceptance Scenarios**:

-1. **Given** a dataset has blocking ambiguities, **When** the user starts guided clarification, **Then** the system asks one focused question at a time and explains the significance of the question in business terms.
+1. **Given** a dataset has blocking ambiguities, **When** the user starts clarification from the central assistant chat, **Then** the system asks one focused question at a time in that chat context and explains the significance of the question in business terms.
 2. **Given** the system already has a current guess for an unresolved attribute, **When** the question is shown, **Then** the system presents that guess along with selectable answers, a custom-answer option, and a skip option.
-3. **Given** semantic source reuse is likely, **When** the system detects a strong match with a trusted dictionary or reference dataset, **Then** the agent can proactively suggest that source as the preferred basis for semantic enrichment.
-4. **Given** fuzzy semantic matches were found from a selected dictionary or dataset, **When** the system presents them, **Then** the user can approve them in bulk, review them individually, or keep only exact matches.
-5. **Given** the user confirms or edits an answer, **When** the response is saved, **Then** the system updates the dataset profile, validation findings, and readiness state without losing prior context.
-6. **Given** the user exits clarification before all issues are resolved, **When** the session is saved, **Then** the system preserves answered questions, unresolved questions, and the current recommended next action.
+3. **Given** semantic source reuse is likely, **When** the system detects a strong match with a trusted dictionary or reference dataset, **Then** the agent can proactively suggest that source as the preferred basis for semantic enrichment inside the assistant chat stream.
+4. **Given** fuzzy semantic matches were found from a selected dictionary or dataset, **When** the system presents them, **Then** the user can approve them in bulk, review them individually, keep only exact matches, or exclude a dataset tab from the current review set without leaving the chat-driven workflow.
+5. **Given** the user confirms or edits an answer, **When** the response is saved, **Then** the system updates the dataset profile, validation findings, readiness state, and any affected dataset tab state without losing prior context.
+6. **Given** the user exits clarification before all issues are resolved, **When** the session is saved, **Then** the system preserves answered questions, unresolved questions, excluded datasets, and the current recommended next action.
+7. **Given** a user asks a free-form question about the active dataset review session, **When** the question references profile, filters, mappings, validation, SQL preview state, or dataset-selection scope, **Then** the assistant answers using the current session context without forcing the user back into a rigid scripted flow.
+8. **Given** the session contains multiple datasets or candidate datasets, **When** the user navigates between them, **Then** the workspace keeps the chat as the primary interaction surface while exposing per-dataset tabs and allowing individual datasets to be excluded from the current review scope.
+9. **Given** LLM assistance is unavailable, disabled, or intentionally skipped, **When** the user continues the session, **Then** the system provides a manual review mode for documentation, mapping, and clarification work without blocking core progress.

 ---

 ### User Story 3 - Prepare and launch a controlled dataset run (Priority: P3)

-A BI engineer reviews the assembled run context, verifies filters and placeholders, understands any remaining warnings, reviews the compiled SQL preview, and launches the dataset with confidence that the execution can be reproduced later.
+A BI engineer reviews the assembled run context across one or more dataset candidates, verifies filters and placeholders, understands any remaining warnings, reviews the compiled SQL preview, and launches the approved dataset context with confidence that the execution can be reproduced later.

 **Why this priority**: Execution is the final high-value outcome, but it must feel controlled and auditable rather than opaque.

@@ -61,11 +64,12 @@ A BI engineer reviews the assembled run context, verifies filters and placeholde

 **Acceptance Scenarios**:

-1. **Given** an assembled dataset context contains required filters and placeholders, **When** the user opens run preparation, **Then** the system shows the effective filters, unresolved assumptions, semantic provenance signals, and current run readiness in one place.
+1. **Given** an assembled dataset context contains required filters and placeholders, **When** the user opens run preparation, **Then** the system shows the effective filters, unresolved assumptions, semantic provenance signals, current run readiness, and active dataset selection in one place.
 2. **Given** required values are still missing, **When** the user attempts to launch, **Then** the system blocks launch and highlights the specific values that must be completed.
 3. **Given** warning-level mapping transformations are present, **When** the user reviews run preparation, **Then** the system requires explicit approval for each warning before launch while still allowing optional manual edits.
 4. **Given** Superset-side SQL compilation preview is unavailable or fails, **When** the user attempts to launch, **Then** the system blocks launch until a successful compiled preview is available.
-5. **Given** the dataset is run-ready, **When** the user confirms launch, **Then** the system creates or starts a Superset SQL Lab session as the canonical execution target and records the dataset identity, effective filters, parameter values, outstanding warnings, and execution outcome for later audit or replay.
+5. **Given** the workspace includes multiple dataset candidates, **When** the user excludes one or more datasets from review, **Then** the system updates launch readiness, findings, and previews to reflect only the active dataset scope while retaining the excluded datasets for later reconsideration.
+6. **Given** the dataset context is run-ready, **When** the user confirms launch, **Then** the system creates or starts a Superset SQL Lab session as the canonical execution target and records the dataset identity, effective filters, parameter values, outstanding warnings, dataset-scope decisions, and execution outcome for later audit or replay.

 ---

@@ -98,38 +102,45 @@ A BI engineer reviews the assembled run context, verifies filters and placeholde
 - **FR-010**: The system MUST apply a visible confidence hierarchy to semantic enrichment candidates in this order: exact dictionary/file match, trusted reference dataset match, fuzzy semantic match, AI-generated draft.
 - **FR-011**: The system MUST allow users to choose and apply a semantic source from the frontend workspace using supported source types, including uploaded files, connected tabular dictionaries, and existing trusted Superset datasets.
 - **FR-012**: The system MUST allow users to start a guided clarification flow for unresolved or contradictory dataset details.
- **FR-013**: The guided clarification flow MUST present one focused question at a time rather than an unstructured list of unresolved items.
- **FR-014**: Each clarification question MUST explain why the answer matters and, when available, show the system’s current best guess.
- **FR-015**: The system MUST allow the user to answer with a suggested option, provide a custom answer, skip the question, or mark the item for later expert review.
- **FR-016**: The system MUST allow the agent to proactively recommend a semantic source when schema overlap or semantic similarity with a trusted source is strong enough to justify reuse.
- **FR-017**: The system MUST distinguish exact semantic matches from fuzzy semantic matches and MUST require user review before fuzzy matches are applied.
- **FR-018**: The system MUST preserve answers provided during clarification and immediately update the dataset profile, validation findings, and readiness state when those answers affect review outcomes.
- **FR-019**: The system MUST allow users to pause and resume a clarification session without losing prior answers, unresolved items, or progress state.
- **FR-020**: The system MUST summarize what changed when a clarification session ends, including resolved ambiguities, remaining ambiguities, and impact on run readiness.
- **FR-021**: (Consolidated with FR-001)
- **FR-022**: The system MUST extract reusable saved native filters from a provided Superset link whenever such filters are present and accessible.
- **FR-023**: The system MUST detect and expose runtime template variables referenced by the dataset execution logic so they can be mapped from imported or user-provided filter values.
- **FR-024**: The system MUST present extracted filters with their current value, source, confidence state, and whether user confirmation is required.
- **FR-025**: The system MUST preserve partially recovered value when a Superset import is incomplete and MUST explain which parts were recovered successfully and which still require manual or guided completion.
- **FR-026**: The system MUST support dataset execution contexts that include parameterized placeholders so users can complete required run-time values before launch.
- **FR-027**: The system MUST provide a dedicated pre-run review that presents the effective dataset identity, selected filters, required placeholders, unresolved assumptions, and current warnings in one place before launch.
- **FR-028**: The system MUST require explicit user approval for each warning-level mapping transformation before launch, while allowing the user to manually edit the mapped value instead of approving it.
- **FR-029**: The system MUST require a successful Superset-side compiled SQL preview before launch and MUST keep launch blocked if the preview is unavailable or compilation fails.
- **FR-030**: The system MUST prevent dataset launch when required values, required execution attributes, required warning approvals, or a required compiled preview are missing and MUST explain what must be completed.
- **FR-031**: The system MUST allow users to review and adjust the assembled filter set before starting a dataset run.
- **FR-032**: The system MUST use a Superset SQL Lab session as the canonical audited execution target for approved dataset launch.
- **FR-033**: The system MUST record the dataset run context, including dataset identity, selected filters, parameter values, unresolved assumptions, the associated SQL Lab session reference, mapping approvals, semantic-source decisions, and execution outcome, so that users can audit or repeat the run later.
- **FR-034**: The system MUST support a workflow where automatic review, semantic enrichment, guided clarification, and dataset execution can be used independently or in sequence on the same dataset.
- **FR-035**: The system MUST provide exportable outputs for dataset documentation and validation results so users can share them outside the immediate workflow.
- **FR-036**: The system MUST preserve a usable frontend session state when a user stops mid-flow so they can resume review, clarification, semantic enrichment review, or run preparation without reconstructing prior work.
- **FR-037**: The system MUST make the recommended next action explicit at each major state of the workflow.
- **FR-038**: The system MUST provide side-by-side comparison when multiple semantic sources disagree for the same field and MUST not silently overwrite a user-entered value with imported or AI-generated metadata.
- **FR-039**: The system MUST preserve manual semantic overrides unless the user explicitly replaces them.
- **FR-040**: The system MUST allow users to apply semantic enrichment selectively at field level rather than only as an all-or-nothing operation.
- **FR-041**: The system MUST provide an inline feedback mechanism (thumbs up/down) for AI-generated content to support continuous improvement of semantic matching and summarization.
- **FR-042**: The system MUST support multi-user collaboration on review sessions, allowing owners to invite collaborators with specific roles (viewer, reviewer, approver).
- **FR-043**: The system MUST provide batch approval actions for mapping warnings and fuzzy semantic matches to reduce manual effort for experienced users.
- **FR-044**: The system MUST capture and persist a structured event log of all session-related actions (e.g., source intake, answer submission, approval, launch) to support audit, replay, and collaboration visibility.
+- **FR-013**: The guided clarification flow MUST present one focused question at a time inside the central assistant chat rather than an isolated modal or an unstructured list of unresolved items.
+- **FR-014**: The chat workspace MUST remain the primary interaction surface during recovery, clarification, and mapping review, while completed phases collapse into concise summaries that preserve context without leaving the conversation flow.
+- **FR-015**: The system MUST provide a manual fallback mode for documentation, mapping, and review work when LLM assistance is unavailable, disabled, or intentionally skipped by the user.
+- **FR-016**: Each clarification question MUST explain why the answer matters and, when available, show the system’s current best guess.
+- **FR-017**: The system MUST allow the user to answer with a suggested option, provide a custom answer, skip the question, or mark the item for later expert review.
+- **FR-018**: The system MUST allow the agent to proactively recommend a semantic source when schema overlap or semantic similarity with a trusted source is strong enough to justify reuse.
+- **FR-019**: The system MUST distinguish exact semantic matches from fuzzy semantic matches and MUST require user review before fuzzy matches are applied.
+- **FR-020**: The system MUST preserve answers provided during clarification and immediately update the dataset profile, validation findings, readiness state, and visible phase summaries when those answers affect review outcomes.
+- **FR-021**: The system MUST allow users to pause and resume a clarification session without losing prior answers, unresolved items, or progress state.
+- **FR-022**: The system MUST summarize what changed when a clarification session ends, including resolved ambiguities, remaining ambiguities, and impact on run readiness.
+- **FR-023**: The system MUST extract reusable saved native filters from a provided Superset link whenever such filters are present and accessible.
+- **FR-024**: The system MUST detect and expose runtime template variables referenced by the dataset execution logic so they can be mapped from imported or user-provided filter values.
+- **FR-025**: The system MUST present extracted filters with their current value, source, confidence state, and whether user confirmation is required.
+- **FR-026**: The system MUST preserve partially recovered value when a Superset import is incomplete and MUST explain which parts were recovered successfully and which still require manual or guided completion.
+- **FR-027**: The system MUST support dataset execution contexts that include parameterized placeholders so users can complete required run-time values before launch.
+- **FR-028**: The system MUST provide a dedicated pre-run review that presents the effective dataset identity, selected filters, required placeholders, unresolved assumptions, and current warnings in one place before launch.
+- **FR-029**: The system MUST support review sessions containing multiple datasets or dataset candidates and MUST present them in a way that makes the active dataset scope explicit.
+- **FR-030**: The system MUST allow users to switch between dataset-specific review surfaces without losing the central assistant chat context.
+- **FR-031**: The system MUST allow users to exclude individual datasets from the current review scope and MUST immediately recalculate findings, readiness, and preview context based on the remaining active datasets.
+- **FR-032**: The system MUST require explicit user approval for each warning-level mapping transformation before launch, while allowing the user to manually edit the mapped value instead of approving it.
+- **FR-033**: The system MUST require a successful Superset-side compiled SQL preview before launch and MUST keep launch blocked if the preview is unavailable or compilation fails.
+- **FR-034**: The system MUST prevent dataset launch when required values, required execution attributes, required warning approvals, or a required compiled preview are missing and MUST explain what must be completed.
+- **FR-035**: The system MUST allow users to review and adjust the assembled filter set before starting a dataset run.
+- **FR-036**: The system MUST support column-name mapping inputs from every mapping source and configuration path already supported by the existing dataset-mapping capability, including database-backed mapping and spreadsheet-file mapping, without narrowing the supported review scope.
+- **FR-037**: The system MUST use a Superset SQL Lab session as the canonical audited execution target for approved dataset launch.
+- **FR-038**: The system MUST record the dataset run context, including dataset identity, selected filters, parameter values, unresolved assumptions, the associated SQL Lab session reference, mapping approvals, semantic-source decisions, dataset-scope inclusion or exclusion decisions, and execution outcome, so that users can audit or repeat the run later.
+- **FR-039**: The system MUST support a workflow where automatic review, semantic enrichment, guided clarification, dataset execution, and manual fallback review can be used independently or in sequence on the same dataset.
+- **FR-040**: The system MUST provide exportable outputs for dataset documentation and validation results so users can share them outside the immediate workflow.
+- **FR-041**: The system MUST preserve a usable frontend session state when a user stops mid-flow so they can resume review, clarification, semantic enrichment review, manual fallback review, or run preparation without reconstructing prior work.
+- **FR-042**: The system MUST make the recommended next action explicit at each major state of the workflow.
+- **FR-043**: The system MUST provide side-by-side comparison when multiple semantic sources disagree for the same field and MUST not silently overwrite a user-entered value with imported or AI-generated metadata.
+- **FR-044**: The system MUST preserve manual semantic overrides unless the user explicitly replaces them.
+- **FR-045**: The system MUST allow users to apply semantic enrichment selectively at field level rather than only as an all-or-nothing operation.
+- **FR-046**: The system MUST provide an inline feedback mechanism (thumbs up/down) for AI-generated content to support continuous improvement of semantic matching and summarization.
+- **FR-047**: The system MUST support multi-user collaboration on review sessions, allowing owners to invite collaborators with specific roles (viewer, reviewer, approver).
+- **FR-048**: The system MUST provide batch approval actions for mapping warnings and fuzzy semantic matches to reduce manual effort for experienced users.
+- **FR-049**: The system MUST capture and persist a structured event log of all session-related actions (e.g., source intake, answer submission, approval, exclusion, launch) to support audit, replay, and collaboration visibility.
+- **FR-050**: The system MUST allow users to ask free-form questions about the currently loaded dataset review context, including profile, filters, mappings, findings, compiled SQL preview status, and active dataset scope, through the assistant chat.
+- **FR-051**: The assistant MUST be able to accept natural-language commands that mutate the current dataset review session state, such as approving mappings, excluding a dataset from review, or generating a SQL preview, while preserving existing launch and approval gates.

 ### Key Entities *(include if feature involves data)*

@@ -139,7 +150,8 @@ A BI engineer reviews the assembled run context, verifies filters and placeholde
 - **Semantic Source**: A reusable origin of semantic metadata, such as an uploaded file, connected tabular dictionary, or trusted reference dataset, used to enrich field- and metric-level business meaning.
 - **Semantic Mapping Decision**: A recorded choice about which semantic source or proposed value was accepted, rejected, edited, or left unresolved for a field or metric.
 - **Imported Filter Set**: The collection of reusable filters extracted from a Superset link, including source context, mapped dataset fields, current values, confidence state, and confirmation status.
- **Dataset Run Context**: The execution-ready snapshot of dataset inputs, selected filters, parameterized placeholders, unresolved assumptions, warnings, mapping approvals, semantic-source decisions, the associated SQL Lab session reference, and launch outcome used for auditing or replay.
+- **Dataset Review Scope**: The active set of one or more datasets included in the current review session, including tab ordering, inclusion or exclusion status, and the currently focused dataset.
+- **Dataset Run Context**: The execution-ready snapshot of dataset inputs, selected filters, parameterized placeholders, unresolved assumptions, warnings, mapping approvals, semantic-source decisions, dataset-scope decisions, the associated SQL Lab session reference, and launch outcome used for auditing or replay.
 - **Readiness State**: The current workflow status that tells the user whether the dataset is still being recovered, ready for review, needs semantic-source review, needs clarification, is partially ready, or is ready to run.

 ## Success Criteria *(mandatory)*
@@ -152,19 +164,24 @@ A BI engineer reviews the assembled run context, verifies filters and placeholde
 - **SC-004**: At least 85% of clarification questions shown in guided mode are judged by pilot users as relevant and helpful to resolving ambiguity (measured via the built-in feedback mechanism).
 - **SC-005**: At least 80% of Superset links containing reusable saved native filters result in an imported filter set that users can review without rebuilding the context manually.
 - **SC-006**: At least 85% of pilot users correctly identify which values are confirmed versus imported versus inferred versus AI-generated during moderated usability review.
- **SC-007**: At least 90% of dataset runs started from an imported or clarified context include a complete recorded run context that can be reopened later.
- **SC-008**: Pilot users successfully complete the end-to-end flow of import, review, semantic enrichment, clarification, and launch on their first attempt in at least 75% of observed sessions.
- **SC-009**: Support requests caused by missing or unclear dataset attributes decrease by at least 40% within the target pilot group after adoption.
+- **SC-007**: At least 80% of pilot users working with multi-dataset sessions can switch dataset focus or exclude a dataset from review without facilitator assistance during moderated usability review.
+- **SC-008**: At least 90% of dataset runs started from an imported, clarified, or manually reviewed context include a complete recorded run context that can be reopened later.
+- **SC-009**: Pilot users successfully complete the end-to-end flow of import, review, semantic enrichment, clarification, and launch on their first attempt in at least 75% of observed sessions.
+- **SC-010**: Support requests caused by missing or unclear dataset attributes decrease by at least 40% within the target pilot group after adoption.

 ## Assumptions

 - Users already have permission to access the datasets and Superset artifacts they submit to ss-tools.
 - Saved native filters embedded in a Superset link are considered the preferred reusable source of analytical context when available.
 - Users need both self-service automation and a guided conversational path because dataset semantics are often incomplete, implicit, conflicting, or distributed across multiple semantic sources.
+- The central assistant chat is the primary workspace surface for progressing through recovery, clarification, and mapping review, while completed phases collapse into summaries instead of remaining fully expanded.
+- Multi-dataset review is a primary usage pattern, so users need to switch dataset focus and exclude individual datasets from review without resetting session context.
+- A manual mode must remain available as a fallback so documentation, mapping, and run preparation can continue when LLM capabilities are unavailable.
 - The feature is intended for internal operational use where clarity, traceability, semantic consistency, and repeatable execution are more important than raw execution speed.
 - Exportable documentation and validation outputs are required for collaboration, review, and audit use cases.
 - Users may choose to proceed with warnings, but not with missing required execution inputs, missing required mapping approvals, or missing required compiled preview.
 - Superset SQL Lab session creation is the canonical audited launch path for approved execution.
 - Warning-level mapping transformations require explicit user approval before launch, while manual correction remains optional.
+- Existing dataset-mapping capabilities already support multiple source types and configuration paths, and the orchestration workflow must preserve that breadth.
 - Launch requires a successful Superset-side compiled preview and cannot fall back to an unverified local approximation.
 - Trusted semantic sources already exist or can be introduced incrementally through frontend-managed files, connected dictionaries, or reference datasets without requiring organizations to discard existing semantic workflows.
--- a/specs/027-dataset-llm-orchestration/tasks.md
+++ b/specs/027-dataset-llm-orchestration/tasks.md
@@ -5,105 +5,65 @@

 ---

-## Phase 1: Setup
+## Rebaseline Note

- [x] T001 Initialize backend service directory structure for `dataset_review` in `backend/src/services/dataset_review/`
- [x] T002 Initialize frontend component directory for `dataset-review` in `frontend/src/lib/components/dataset-review/`
- [x] T003 Register `ff_dataset_auto_review`, `ff_dataset_clarification`, and `ff_dataset_execution` feature flags in configuration
- [x] T004 [P] Seed new `DATASET_REVIEW_*` permissions in `backend/src/scripts/seed_permissions.py`
+This task list is rebaselined to the approved mixed-initiative assistant scope from `027-task.md`.
+
+Previously completed implementation checkboxes remain historical only. They are no longer treated as authoritative for feature acceptance until the codebase is aligned with the refreshed spec artifacts below.

 ---

-## Phase 2: Foundational Layer
+## Phase A: Spec Refresh Gate

- [x] T005 [P] Implement Core SQLAlchemy models for session, profile, and findings in `backend/src/models/dataset_review.py`
- [x] T006 [P] Implement Semantic, Mapping, and Clarification models in `backend/src/models/dataset_review.py`
- [x] T007 [P] Implement Preview and Launch Audit models in `backend/src/models/dataset_review.py`
- [x] T008 [P] Implement `DatasetReviewSessionRepository` (CRITICAL: C5, PRE: auth scope, POST: consistent aggregates, INVARIANTS: ownership scope) in `backend/src/services/dataset_review/repositories/session_repository.py`
- [x] T009 [P] Create Pydantic schemas for Session Summary and Detail in `backend/src/schemas/dataset_review.py`
- [x] T010 [P] Create Svelte store for session management in `frontend/src/lib/stores/datasetReviewSession.js`
+- [x] T001 Refresh `ux_reference.md` to replace modal clarification with `AssistantChatPanel` mixed-initiative behavior
+- [x] T002 Refresh `spec.md` to update FR-013 and add FR-045/FR-046
+- [x] T003 Refresh `contracts/api.yaml` to add `dataset_review_session_id` to `AssistantMessageRequest` and expose session version in session DTOs
+- [x] T004 Refresh `data-model.md` to add optimistic-lock `version` and PII masking requirements for `ImportedFilter.raw_value`
+- [x] T005 Refresh `contracts/modules.md` to route clarification through `AssistantChatPanel`, expose orchestrator state to `AssistantApi`, and retire `ClarificationDialog`

 ---

-## Phase 3: User Story 1 — Automatic Review (P1)
+## Phase B: Backend Rebaseline Work

-**Goal**: Submission of link/dataset produces immediate readable summary and semantic enrichment from trusted sources.
-
-**Independent Test**: Submit a Superset link; verify session created, summary generated, and findings populated without manual intervention.
-
- [X] T011 [P] [US1] Implement `StartSessionRequest` and lifecycle endpoints in `backend/src/api/routes/dataset_review.py`
- [X] T012 [US1] Implement `DatasetReviewOrchestrator.start_session` (CRITICAL: C5, PRE: non-empty input, POST: enqueued recovery, BELIEF: uses `belief_scope`) in `backend/src/services/dataset_review/orchestrator.py`
- [X] T013 [P] [US1] Implement `SupersetContextExtractor.parse_superset_link` (CRITICAL: C4, PRE: parseable link, POST: resolved target, REL: uses `SupersetClient`) in `backend/src/core/utils/superset_context_extractor.py`
- [X] T014 [US1] Implement `SemanticSourceResolver.resolve_from_dictionary` (CRITICAL: C4, PRE: source exists, POST: confidence-ranked candidates) in `backend/src/services/dataset_review/semantic_resolver.py`
- [X] T015 [US1] Implement Documentation and Validation export endpoints (JSON/Markdown) in `backend/src/api/routes/dataset_review.py`
- [X] T016 [P] [US1] Implement `SourceIntakePanel` (C3, UX_STATE: Idle/Validating/Rejected) in `frontend/src/lib/components/dataset-review/SourceIntakePanel.svelte`
- [X] T017 [P] [US1] Implement `ValidationFindingsPanel` (C3, UX_STATE: Blocking/Warning/Info) in `frontend/src/lib/components/dataset-review/ValidationFindingsPanel.svelte`
- [X] T018 [US1] Create main `DatasetReviewWorkspace` (CRITICAL: C5, UX_STATE: Empty/Importing/Review) in `frontend/src/routes/datasets/review/[id]/+page.svelte`
- [x] T019 [US1] Verify implementation matches ux_reference.md (Happy Path & Errors)
- [x] T020 [US1] Acceptance: Perform semantic audit & algorithm emulation by Tester
+- [ ] T006 Implement optimistic locking in `backend/src/services/dataset_review/repositories/session_repository.py` using `DatasetReviewSession.version` and conflict semantics
+- [ ] T007 Update dataset review session schemas and route handlers to require and return session version consistently
+- [ ] T008 Add `dataset_review_session_id` to `backend/src/api/routes/assistant.py::AssistantMessageRequest`
+- [ ] T009 Load dataset-review context into assistant planning flow when `dataset_review_session_id` is present
+- [ ] T010 Add assistant intent routing for dataset-review commands (`APPROVE_MAPPINGS`, `SET_FIELD_SEMANTICS`, `GENERATE_SQL_PREVIEW`) through `DatasetReviewOrchestrator`
+- [ ] T011 Add PII masking/redaction for imported filter values before assistant or LLM-facing context assembly
+- [ ] T012 Preserve backend mutation observability with semantic logging and conflict-safe mutation boundaries

 ---

-## Phase 4: User Story 2 — Guided Clarification (P2)
+## Phase C: Frontend Rebaseline Work

-**Goal**: Resolve ambiguities and conflicting metadata through one-question-at-a-time dialogue.
-
-**Independent Test**: Open a session with unresolved findings; answer questions one by one and verify readiness state updates in real-time.
-
- [X] T021 [P] [US2] Implement `ClarificationEngine.build_question_payload` (CRITICAL: C4, PRE: unresolved state, POST: prioritized question) in `backend/src/services/dataset_review/clarification_engine.py`
- [X] T022 [US2] Implement `ClarificationEngine.record_answer` (CRITICAL: C4, PRE: question active, POST: answer persisted before state advance) in `backend/src/services/dataset_review/clarification_engine.py`
- [X] T023 [P] [US2] Implement field-level semantic override and lock endpoints in `backend/src/api/routes/dataset_review.py`
- [X] T024 [US2] Implement `SemanticLayerReview` component (C3, UX_STATE: Conflicted/Manual) in `frontend/src/lib/components/dataset-review/SemanticLayerReview.svelte`
- [X] T025 [P] [US2] Implement `ClarificationDialog` (C3, UX_STATE: Question/Saving/Completed, REL: binds to `assistantChat`) in `frontend/src/lib/components/dataset-review/ClarificationDialog.svelte`
- [X] T026 [US2] Implement LLM feedback (👍/👎) storage and UI handlers in `backend/src/api/routes/dataset_review.py`
- [x] T027 [US2] Verify implementation matches ux_reference.md (Happy Path & Errors)
- [x] T028 [US2] Acceptance: Perform semantic audit & algorithm emulation by Tester
+- [ ] T013 Integrate `AssistantChatPanel` into `frontend/src/routes/datasets/review/[id]/+page.svelte` as the clarification surface
+- [ ] T014 Remove or retire `ClarificationDialog` usage from the dataset review workflow
+- [ ] T015 Add inline **[✨ Ask AI]** / **[✨ Improve]** triggers for findings, partial filters, mappings, and business summary surfaces
+- [ ] T016 Add workspace highlight synchronization when assistant prompts reference `field_id`, `filter_id`, mapping, or finding targets
+- [ ] T017 Render assistant confirmation cards for state-changing actions tied to dataset review sessions
+- [ ] T018 Update SQL preview refresh behavior so rapid mapping changes produce `stale` state and debounce/explicit regeneration behavior

 ---

-## Phase 5: User Story 3 — Controlled Execution (P3)
+## Phase D: Validation After Rebaseline

-**Goal**: Review mappings, generate Superset-side preview, and launch audited SQL Lab execution.
-
-**Independent Test**: Map filters to variables; trigger preview; verify launch blocked until preview succeeds; verify SQL Lab session creation.
-
- [X] T029 [P] [US3] Implement `SupersetContextExtractor.recover_imported_filters` and variable discovery in `backend/src/core/utils/superset_context_extractor.py`
- [X] T030 [US3] Implement `SupersetCompilationAdapter.compile_preview` (CRITICAL: C4, PRE: effective inputs available, POST: Superset-compiled SQL only) in `backend/src/core/utils/superset_compilation_adapter.py`
- [X] T031 [US3] Implement `DatasetReviewOrchestrator.launch_dataset` (CRITICAL: C5, PRE: run-ready + preview match, POST: audited run context) in `backend/src/services/dataset_review/orchestrator.py`
- [X] T032 [P] [US3] Implement mapping approval and preview trigger endpoints in `backend/src/api/routes/dataset_review.py`
- [X] T033 [P] [US3] Implement `ExecutionMappingReview` component (C3, UX_STATE: WarningApproval/Approved) in `frontend/src/lib/components/dataset-review/ExecutionMappingReview.svelte`
- [X] T034 [P] [US3] Implement `CompiledSQLPreview` component (C3, UX_STATE: Ready/Stale/Error) in `frontend/src/lib/components/dataset-review/CompiledSQLPreview.svelte`
- [X] T035 [US3] Implement `LaunchConfirmationPanel` (C3, UX_STATE: Blocked/Ready/Submitted) in `frontend/src/lib/components/dataset-review/LaunchConfirmationPanel.svelte`
- [x] T036 [US3] Verify implementation matches ux_reference.md (Happy Path & Errors)
- [x] T037 [US3] Acceptance: Perform semantic audit & algorithm emulation by Tester
-
---
-
-## Final Phase: Polish & Security
-
- [X] T038 Implement `SessionEvent` logger and persistence logic in `backend/src/services/dataset_review/event_logger.py`
- [X] T039 Implement automatic version propagation logic for updated `SemanticSource` entities
- [X] T040 Add batch approval API and UI actions for mapping/semantics
- [X] T041 Add integration tests for Superset version compatibility matrix in `backend/tests/services/dataset_review/test_superset_matrix.py`
- [X] T042 Final audit of RBAC enforcement across all session-mutation endpoints
- [X] T043 Verify i18n coverage for all user-facing strings in `frontend/src/lib/i18n/`
+- [ ] T019 Verify feature behavior against refreshed `ux_reference.md`, especially mixed-initiative clarification and context actions
+- [ ] T020 Verify WYSIWWR, Superset-only SQL compilation, session-version conflict handling, and PII masking boundaries
+- [ ] T021 Run semantic audit for updated 027 contracts before implementation handoff closure

 ---

 ## Dependencies & Strategy

-### Story Completion Order
-1. **Foundation** (Blocking: T005-T010)
-2. **User Story 1** (Blocking for US2 and US3)
-3. **User Story 2** (Can be implemented in parallel with US3 parts, but requires US1 findings)
-4. **User Story 3** (Final terminal action)
+### Delivery Order
+1. **Spec Refresh Gate**
+2. **Backend optimistic locking and assistant routing**
+3. **Frontend assistant-panel integration and contextual UX**
+4. **Validation and semantic audit**

-### Parallel Execution Opportunities
- T011, T013, T016 (API, Parser, UI Setup) can run simultaneously once T001-T010 are done.
- T021 and T025 (Clarification Backend/Frontend) can run in parallel.
- T030 and T034 (Preview Backend/Frontend) can run in parallel.
-
-### Implementation Strategy
- **MVP First**: Implement US1 with hardcoded trusted sources to prove the session/summary lifecycle.
- **Incremental Delivery**: Release US1 for documentation value, then US2 for metadata cleanup, finally US3 for execution.
- **WYSIWWR Guard**: T030 must never be compromised; if Superset API fails, implementation must prioritize the "Manual Launch" fallback defined in research.
+### Scope Invariants
+- Keep feature work bounded to `027-dataset-llm-orchestration` implementation surfaces.
+- Preserve **WYSIWWR** and Superset-only SQL compilation.
+- Do not allow assistant commands to bypass explicit approval or launch gates.
+- Keep i18n, Tailwind-first UI, `requestApi` / `fetchApi`, and `TaskManager` conventions intact.
--- a/specs/027-dataset-llm-orchestration/ux_reference.md
+++ b/specs/027-dataset-llm-orchestration/ux_reference.md
@@ -23,7 +23,7 @@
 *   **Expose certainty, do not fake certainty**: The system must always distinguish confirmed facts, inferred facts, imported facts, unresolved facts, and AI drafts.
 *   **Guide, then get out of the way**: The product should proactively suggest next actions but should not force the user into a rigid wizard if they already know what they want to do.
 *   **Progress over perfection**: A user should be able to get partial value immediately, save progress, and return later.
-*   **One ambiguity at a time**: In dialogue mode, the user should never feel interrogated by a wall of questions.
+*   **One ambiguity at a time**: In assistant-guided dialogue mode, the user should never feel interrogated by a wall of questions.
 *   **Execution must feel safe**: Before launch, the user should clearly understand what will run, with which filters, with which unresolved assumptions.
 *   **Superset import should feel like recovery, not parsing**: The user expectation is not “we decoded a link”, but “we recovered the analysis context I had in Superset.”
 *   **What You See Is What Will Run (WYSIWWR)**: Before any launch, the system must show the final compiled SQL query exactly as it will be sent for execution, with all template substitutions already resolved.
@@ -60,13 +60,25 @@ The semantic confidence hierarchy is explicit:

 This mode should feel like the system is recovering and inheriting existing semantic knowledge before inventing anything new.

-### Mode B: Guided Clarification
+### Mode B: Mixed-Initiative Assistant Clarification

-User enters a focused interaction with the agent to resolve unresolved attributes, missing filter meanings, inconsistent business semantics, conflicting semantic sources, or run-time gaps.
+User enters a focused interaction with the agent through the central **AssistantChatPanel** to resolve unresolved attributes, missing filter meanings, inconsistent business semantics, conflicting semantic sources, or run-time gaps.
+
+This mode is mixed-initiative:
+*   the system may push the next highest-priority clarification from a **Clarification Queue**,
+*   the user may ask free-form questions about the current dataset context at any time,
+*   the agent may propose state-changing actions, but execution still follows existing approval and launch gates,
+*   completed phases collapse into compact summaries so the chat remains the primary workspace.

 This mode is for confidence-building and resolving uncertainty.

-### Mode C: Run Preparation
+### Mode C: Manual Fallback Review
+
+When LLM assistance is unavailable, disabled, or intentionally skipped, the user continues through explicit manual surfaces for documentation, semantic review, mapping, and run preparation.
+
+This mode preserves the same auditability and launch gates as the chat-centric flow, but replaces agent prompts with direct editable controls, explicit review queues, and manual confirmation actions.
+
+### Mode D: Run Preparation

 User reviews the assembled run context, edits values where needed, confirms assumptions, inspects the compiled SQL preview, and launches the dataset only when the context is good enough.

@@ -76,7 +88,7 @@ This mode is for controlled execution.

 ### High-Level Story

-The user opens ss-tools because they have a dataset they need to understand and run, but they do not fully trust the metadata. They paste a Superset link or select a dataset source in the web interface. In seconds, the workspace fills with a structured interpretation: what the dataset appears to be, which filters were recovered, which Jinja-driven variables exist in the dataset, which semantic labels were inherited from trusted sources, what is already known, and what is still uncertain. The user scans a short human-readable summary, adjusts the business meaning manually if needed, approves a few semantic and filter mappings, resolves only the remaining ambiguities through a short guided dialogue, and reaches a “Run Ready” state after reviewing the final SQL compiled by Superset itself. Launch feels deliberate and safe because the interface shows exactly what will be used, how imported filters map to runtime variables, and where each semantic label came from.
+The user opens ss-tools because they have one or more datasets they need to understand and run, but they do not fully trust the metadata. They paste a Superset link or select a dataset source in the web interface. In seconds, the workspace fills with a structured interpretation: what each dataset appears to be, which filters were recovered, which Jinja-driven variables exist in the dataset, which semantic labels were inherited from trusted sources, what is already known, and what is still uncertain. The user works primarily through a central chat-centric workspace, scans a short human-readable summary, adjusts the business meaning manually if needed, reviews dataset-specific tabs when multiple datasets are present, excludes low-value datasets from the current review when appropriate, approves a few semantic and filter mappings, resolves only the remaining ambiguities through a short guided dialogue, and reaches a “Run Ready” state after reviewing the final SQL compiled by Superset itself. If LLM assistance is unavailable, the user can continue through manual review surfaces without losing the same auditability and launch protections. Launch feels deliberate and safe because the interface shows exactly what will be used, how imported filters map to runtime variables, and where each semantic label came from.

 ### Detailed Step-by-Step Journey

@@ -133,7 +145,7 @@ Instead of showing a spinner for too long, the interface should reveal results p

 #### Step 4: First Readable Summary

-The user sees a compact summary card:
+The user sees a compact summary card in the center chat workspace:
 *   what this dataset appears to represent,
 *   what period/scope/segments are implied,
 *   what filters were recovered,
@@ -143,6 +155,8 @@ This summary is the anchor of trust. It must be short, business-readable, and im

 The summary is editable. If the user sees that the generated business meaning is incorrect or incomplete, they can use **[Edit]** to manually correct the summary without starting a long clarification dialogue.

+When multiple datasets are present, the workspace shows dataset tabs above or beside the central chat area so the user can switch focus quickly. Each tab clearly shows whether the dataset is active, excluded from review, or still needs attention.
+
 **Desired feeling**: “I can explain this dataset to someone else already, and I can quickly fix the explanation if it is wrong.”

 #### Step 5: Validation Triage
@@ -163,13 +177,15 @@ If ambiguities remain, the product presents an explicit choice:
 *   **Continue with current assumptions**
 *   **Save and return later**

+Choosing **Fix now with agent** opens the global **AssistantChatPanel** instead of a dedicated modal flow. The same panel also remains available from inline context actions such as **[✨ Ask AI]** next to unresolved filters, validation warnings, mapping rows, and the editable business summary.
+
 This is a critical UX moment. The user must feel in control rather than forced into a mandatory workflow.

 **Desired feeling**: “I decide how much rigor I need right now.”

-#### Step 7: Guided Clarification
+#### Step 7: Assistant Chat Clarification

-If the user chooses clarification, the workspace switches into a focused dialogue mode.
+If the user chooses clarification, the workspace keeps its main layout and opens a focused dialogue stream in **AssistantChatPanel**.

 The agent asks one question at a time, each with:
 *   why this matters,
@@ -180,6 +196,12 @@ The agent asks one question at a time, each with:

 Each answer updates the dataset profile in real time.

+As the user completes a phase such as recovery review, clarification, or mapping review, that phase collapses into a compact summary block in the chat timeline so progress remains visible without forcing the user to scroll through expanded historical panels.
+
+When the agent question references a specific filter, field, mapping, or finding, the related card in the workspace is visually highlighted so the user can keep spatial context while answering in chat.
+
+If LLM assistance is unavailable, the same unresolved items remain available in manual review panels with equivalent actions, but the system does not pretend that chat guidance is active.
+
 **Desired feeling**: “This is helping me resolve uncertainty, not making me fill a form.”

 #### Step 8: Run Readiness Review
@@ -272,30 +294,31 @@ The user can reopen the run later and understand the exact state used.
 *   save/resume controls,
 *   recent actions timeline.

-### Center Column: Meaning & Validation
+### Center Column: Chat-Centric Review Surface
+*   the always-visible **AssistantChatPanel** as the primary workspace,
 *   generated business summary,
 *   manual override with **[Edit]** for the generated summary and business interpretation,
+*   collapsible phase summaries for completed recovery, clarification, and mapping stages,
 *   documentation draft preview,
 *   validation findings grouped by severity,
 *   confidence markers,
 *   unresolved assumptions.

-### Center Column: Columns & Metrics
-*   semantic layer table for columns and metrics,
-*   visible values for `verbose_name`, `description`, and formatting metadata where available,
-*   provenance badges for every semantically enriched field, such as `[ 📄 dict.xlsx ]`, `[ 📊 Dataset: Master Sales ]`, or `[ ✨ AI Guessed ]`,
-*   side-by-side conflict view when multiple semantic sources disagree,
-*   **Apply semantic source...** action that opens source selection for file, database dictionary, or existing Superset datasets,
-*   manual per-field override so the user can keep, replace, or rewrite semantic metadata.
+### Center Column: Dataset Scope Navigation
+*   dataset tabs when multiple datasets or candidate datasets are present,
+*   clear active/excluded state for each dataset tab,
+*   fast switching between dataset-specific semantic review, filters, and mapping widgets without leaving the central chat,
+*   explicit exclude-from-review action for datasets that should not affect current readiness.

-### Right Column: Filters & Execution
+### Right Column: Execution, Manual Fallback, and Artifacts
 *   imported filters,
 *   parameter placeholders,
 *   **Jinja Template Mapping** block with visible mapping between source filters and detected dataset variables,
 *   run-time values,
 *   **Compiled SQL Preview** block or action to open the compiled query returned by Superset API,
 *   readiness checklist,
-*   primary CTA.
+*   primary CTA,
+*   manual review surfaces that remain available when chat assistance is unavailable.

 This structure matters because the user mentally works across four questions:
 1. What is this?
@@ -337,7 +360,7 @@ Raw detail is valuable, but it should never compete visually with the answer to

 ## 6.1 Conversation Pattern

-The agent interaction is not a chat for general brainstorming. It is a structured clarification assistant.
+The agent interaction is not a chat for general brainstorming. It is a structured operational assistant embedded in **AssistantChatPanel** that supports both guided clarification and user-initiated context questions.

 Each prompt should contain:
 *   **Question**
@@ -369,6 +392,13 @@ Choose one:

 This keeps the agent focused, useful, and fast.

+The assistant may also answer free-form prompts such as:
+*   “Why is this filter marked partial?”
+*   “Which mapping is still blocking launch?”
+*   “Show me why the SQL preview is stale.”
+
+Free-form answers must stay grounded in current session context and should link back to the relevant workspace element.
+
 ## 6.2 Agent-Led Semantic Source Suggestion

 The agent may proactively suggest a semantic source when the schema strongly resembles an existing reference.
@@ -436,6 +466,36 @@ The user must be able to:

 These controls are critical for real-world data workflows.

+## 6.5.1 Context Actions
+
+Inline micro-actions should appear next to high-friction items inside the workspace:
+*   unresolved or partial imported filters,
+*   blocking and warning validation findings,
+*   editable business summary,
+*   mappings that still require approval or normalization review.
+
+Recommended actions:
+*   **[Ask in chat]** — opens or focuses **AssistantChatPanel** with hidden structured context and a user-visible question seed.
+*   **[Improve in chat]** — asks the assistant to refine a draft summary or semantic description while preserving manual intent and provenance rules.
+*   **[Edit manually]** — opens the equivalent manual review control when LLM assistance is unavailable or intentionally skipped.
+
+These actions should feel like contextual escalation, not a page transition, and they must degrade gracefully into manual controls when the chat assistant is not active.
+
+## 6.5.2 Confirmation Cards
+
+Dangerous or audit-relevant assistant actions should render as chat-native confirmation cards backed by `AssistantConfirmationRecord`.
+
+Examples:
+*   approve all mapping warnings,
+*   trigger SQL preview generation,
+*   launch the dataset in SQL Lab.
+
+The confirmation card must summarize:
+*   intended action,
+*   affected session scope,
+*   remaining blocking gates or warnings,
+*   explicit confirm/cancel controls.
+
 ## 6.6 Dialogue Exit Conditions

 The user can leave dialogue mode when:
@@ -467,7 +527,7 @@ The system found reusable semantic sources, but the user still needs to choose,
 There are meaningful unresolved items. Product suggests dialogue mode.

 ### State 6: Clarification Active
-One-question-at-a-time guided flow.
+One-question-at-a-time guided flow routed through `AssistantChatPanel` while the workspace stays visible.

 ### State 7: Mapping Review Needed
 Recovered filters and detected Jinja variables exist, but the mapping still requires approval, correction, or completion.
@@ -552,6 +612,7 @@ If the interface does not make these decisions visible, the user will feel lost
 *   Detected Jinja variables should appear as a second wave of recovered context so the user understands execution awareness is expanding.
 *   Detected semantic source candidates should appear as a third wave, with confidence labels and provenance badges.
 *   Every clarified answer should immediately remove or downgrade a validation finding where relevant.
+*   When the assistant focuses on a specific filter, field, or finding, the corresponding workspace element should glow or highlight until the user acts or changes focus.
 *   Provenance badges should update live:
    *   Confirmed
    *   Imported
@@ -671,7 +732,8 @@ Recommended trust markers:
 *   mapping approval status,
 *   “compiled by Superset” status on the SQL preview,
 *   “last changed by” and “changed in clarification” notes,
-*   “used in run” markers for final execution inputs.
+*   “used in run” markers for final execution inputs,
+*   confirmation cards in the assistant stream for state-changing actions.

 Conflict rule:
 *   The system must never silently overwrite user-entered semantic values with data from a dictionary, another dataset, or AI generation.
@@ -717,4 +779,4 @@ The UX is working if users can, with minimal hesitation:
 *   inspect the final compiled SQL before launch,
 *   resolve only the ambiguities that matter,
 *   reach a clear run/no-run decision,
-*   reopen the same context later without confusion.
+*   reopen the same context later without confusion.