feat(ui): add chat-driven dataset review flow

Move dataset review clarification into the assistant workspace and rework the review page into a chat-centric layout with execution rails. Add session-scoped assistant actions for mappings, semantic fields, and SQL preview generation. Introduce optimistic locking for dataset review mutations, propagate session versions through API responses, and mask imported filter values before assistant exposure. Refresh tests, i18n, and spec artifacts to match the new workflow. BREAKING CHANGE: dataset review mutation endpoints now require the X-Session-Version header, and clarification is no longer handled through ClarificationDialog-based flows
2026-03-26 13:33:12 +03:00
parent d7911fb2f1
commit 7c85552132
74 changed files with 6122 additions and 2970 deletions
--- a/specs/027-dataset-llm-orchestration/ux_reference.md
+++ b/specs/027-dataset-llm-orchestration/ux_reference.md
@@ -23,7 +23,7 @@
 *   **Expose certainty, do not fake certainty**: The system must always distinguish confirmed facts, inferred facts, imported facts, unresolved facts, and AI drafts.
 *   **Guide, then get out of the way**: The product should proactively suggest next actions but should not force the user into a rigid wizard if they already know what they want to do.
 *   **Progress over perfection**: A user should be able to get partial value immediately, save progress, and return later.
-*   **One ambiguity at a time**: In dialogue mode, the user should never feel interrogated by a wall of questions.
+*   **One ambiguity at a time**: In assistant-guided dialogue mode, the user should never feel interrogated by a wall of questions.
 *   **Execution must feel safe**: Before launch, the user should clearly understand what will run, with which filters, with which unresolved assumptions.
 *   **Superset import should feel like recovery, not parsing**: The user expectation is not “we decoded a link”, but “we recovered the analysis context I had in Superset.”
 *   **What You See Is What Will Run (WYSIWWR)**: Before any launch, the system must show the final compiled SQL query exactly as it will be sent for execution, with all template substitutions already resolved.
@@ -60,13 +60,25 @@ The semantic confidence hierarchy is explicit:

 This mode should feel like the system is recovering and inheriting existing semantic knowledge before inventing anything new.

-### Mode B: Guided Clarification
+### Mode B: Mixed-Initiative Assistant Clarification

-User enters a focused interaction with the agent to resolve unresolved attributes, missing filter meanings, inconsistent business semantics, conflicting semantic sources, or run-time gaps.
+User enters a focused interaction with the agent through the central **AssistantChatPanel** to resolve unresolved attributes, missing filter meanings, inconsistent business semantics, conflicting semantic sources, or run-time gaps.
+
+This mode is mixed-initiative:
+*   the system may push the next highest-priority clarification from a **Clarification Queue**,
+*   the user may ask free-form questions about the current dataset context at any time,
+*   the agent may propose state-changing actions, but execution still follows existing approval and launch gates,
+*   completed phases collapse into compact summaries so the chat remains the primary workspace.

 This mode is for confidence-building and resolving uncertainty.

-### Mode C: Run Preparation
+### Mode C: Manual Fallback Review
+
+When LLM assistance is unavailable, disabled, or intentionally skipped, the user continues through explicit manual surfaces for documentation, semantic review, mapping, and run preparation.
+
+This mode preserves the same auditability and launch gates as the chat-centric flow, but replaces agent prompts with direct editable controls, explicit review queues, and manual confirmation actions.
+
+### Mode D: Run Preparation

 User reviews the assembled run context, edits values where needed, confirms assumptions, inspects the compiled SQL preview, and launches the dataset only when the context is good enough.

@@ -76,7 +88,7 @@ This mode is for controlled execution.

 ### High-Level Story

-The user opens ss-tools because they have a dataset they need to understand and run, but they do not fully trust the metadata. They paste a Superset link or select a dataset source in the web interface. In seconds, the workspace fills with a structured interpretation: what the dataset appears to be, which filters were recovered, which Jinja-driven variables exist in the dataset, which semantic labels were inherited from trusted sources, what is already known, and what is still uncertain. The user scans a short human-readable summary, adjusts the business meaning manually if needed, approves a few semantic and filter mappings, resolves only the remaining ambiguities through a short guided dialogue, and reaches a “Run Ready” state after reviewing the final SQL compiled by Superset itself. Launch feels deliberate and safe because the interface shows exactly what will be used, how imported filters map to runtime variables, and where each semantic label came from.
+The user opens ss-tools because they have one or more datasets they need to understand and run, but they do not fully trust the metadata. They paste a Superset link or select a dataset source in the web interface. In seconds, the workspace fills with a structured interpretation: what each dataset appears to be, which filters were recovered, which Jinja-driven variables exist in the dataset, which semantic labels were inherited from trusted sources, what is already known, and what is still uncertain. The user works primarily through a central chat-centric workspace, scans a short human-readable summary, adjusts the business meaning manually if needed, reviews dataset-specific tabs when multiple datasets are present, excludes low-value datasets from the current review when appropriate, approves a few semantic and filter mappings, resolves only the remaining ambiguities through a short guided dialogue, and reaches a “Run Ready” state after reviewing the final SQL compiled by Superset itself. If LLM assistance is unavailable, the user can continue through manual review surfaces without losing the same auditability and launch protections. Launch feels deliberate and safe because the interface shows exactly what will be used, how imported filters map to runtime variables, and where each semantic label came from.

 ### Detailed Step-by-Step Journey

@@ -133,7 +145,7 @@ Instead of showing a spinner for too long, the interface should reveal results p

 #### Step 4: First Readable Summary

-The user sees a compact summary card:
+The user sees a compact summary card in the center chat workspace:
 *   what this dataset appears to represent,
 *   what period/scope/segments are implied,
 *   what filters were recovered,
@@ -143,6 +155,8 @@ This summary is the anchor of trust. It must be short, business-readable, and im

 The summary is editable. If the user sees that the generated business meaning is incorrect or incomplete, they can use **[Edit]** to manually correct the summary without starting a long clarification dialogue.

+When multiple datasets are present, the workspace shows dataset tabs above or beside the central chat area so the user can switch focus quickly. Each tab clearly shows whether the dataset is active, excluded from review, or still needs attention.
+
 **Desired feeling**: “I can explain this dataset to someone else already, and I can quickly fix the explanation if it is wrong.”

 #### Step 5: Validation Triage
@@ -163,13 +177,15 @@ If ambiguities remain, the product presents an explicit choice:
 *   **Continue with current assumptions**
 *   **Save and return later**

+Choosing **Fix now with agent** opens the global **AssistantChatPanel** instead of a dedicated modal flow. The same panel also remains available from inline context actions such as **[✨ Ask AI]** next to unresolved filters, validation warnings, mapping rows, and the editable business summary.
+
 This is a critical UX moment. The user must feel in control rather than forced into a mandatory workflow.

 **Desired feeling**: “I decide how much rigor I need right now.”

-#### Step 7: Guided Clarification
+#### Step 7: Assistant Chat Clarification

-If the user chooses clarification, the workspace switches into a focused dialogue mode.
+If the user chooses clarification, the workspace keeps its main layout and opens a focused dialogue stream in **AssistantChatPanel**.

 The agent asks one question at a time, each with:
 *   why this matters,
@@ -180,6 +196,12 @@ The agent asks one question at a time, each with:

 Each answer updates the dataset profile in real time.

+As the user completes a phase such as recovery review, clarification, or mapping review, that phase collapses into a compact summary block in the chat timeline so progress remains visible without forcing the user to scroll through expanded historical panels.
+
+When the agent question references a specific filter, field, mapping, or finding, the related card in the workspace is visually highlighted so the user can keep spatial context while answering in chat.
+
+If LLM assistance is unavailable, the same unresolved items remain available in manual review panels with equivalent actions, but the system does not pretend that chat guidance is active.
+
 **Desired feeling**: “This is helping me resolve uncertainty, not making me fill a form.”

 #### Step 8: Run Readiness Review
@@ -272,30 +294,31 @@ The user can reopen the run later and understand the exact state used.
 *   save/resume controls,
 *   recent actions timeline.

-### Center Column: Meaning & Validation
+### Center Column: Chat-Centric Review Surface
+*   the always-visible **AssistantChatPanel** as the primary workspace,
 *   generated business summary,
 *   manual override with **[Edit]** for the generated summary and business interpretation,
+*   collapsible phase summaries for completed recovery, clarification, and mapping stages,
 *   documentation draft preview,
 *   validation findings grouped by severity,
 *   confidence markers,
 *   unresolved assumptions.

-### Center Column: Columns & Metrics
-*   semantic layer table for columns and metrics,
-*   visible values for `verbose_name`, `description`, and formatting metadata where available,
-*   provenance badges for every semantically enriched field, such as `[ 📄 dict.xlsx ]`, `[ 📊 Dataset: Master Sales ]`, or `[ ✨ AI Guessed ]`,
-*   side-by-side conflict view when multiple semantic sources disagree,
-*   **Apply semantic source...** action that opens source selection for file, database dictionary, or existing Superset datasets,
-*   manual per-field override so the user can keep, replace, or rewrite semantic metadata.
+### Center Column: Dataset Scope Navigation
+*   dataset tabs when multiple datasets or candidate datasets are present,
+*   clear active/excluded state for each dataset tab,
+*   fast switching between dataset-specific semantic review, filters, and mapping widgets without leaving the central chat,
+*   explicit exclude-from-review action for datasets that should not affect current readiness.

-### Right Column: Filters & Execution
+### Right Column: Execution, Manual Fallback, and Artifacts
 *   imported filters,
 *   parameter placeholders,
 *   **Jinja Template Mapping** block with visible mapping between source filters and detected dataset variables,
 *   run-time values,
 *   **Compiled SQL Preview** block or action to open the compiled query returned by Superset API,
 *   readiness checklist,
-*   primary CTA.
+*   primary CTA,
+*   manual review surfaces that remain available when chat assistance is unavailable.

 This structure matters because the user mentally works across four questions:
 1. What is this?
@@ -337,7 +360,7 @@ Raw detail is valuable, but it should never compete visually with the answer to

 ## 6.1 Conversation Pattern

-The agent interaction is not a chat for general brainstorming. It is a structured clarification assistant.
+The agent interaction is not a chat for general brainstorming. It is a structured operational assistant embedded in **AssistantChatPanel** that supports both guided clarification and user-initiated context questions.

 Each prompt should contain:
 *   **Question**
@@ -369,6 +392,13 @@ Choose one:

 This keeps the agent focused, useful, and fast.

+The assistant may also answer free-form prompts such as:
+*   “Why is this filter marked partial?”
+*   “Which mapping is still blocking launch?”
+*   “Show me why the SQL preview is stale.”
+
+Free-form answers must stay grounded in current session context and should link back to the relevant workspace element.
+
 ## 6.2 Agent-Led Semantic Source Suggestion

 The agent may proactively suggest a semantic source when the schema strongly resembles an existing reference.
@@ -436,6 +466,36 @@ The user must be able to:

 These controls are critical for real-world data workflows.

+## 6.5.1 Context Actions
+
+Inline micro-actions should appear next to high-friction items inside the workspace:
+*   unresolved or partial imported filters,
+*   blocking and warning validation findings,
+*   editable business summary,
+*   mappings that still require approval or normalization review.
+
+Recommended actions:
+*   **[Ask in chat]** — opens or focuses **AssistantChatPanel** with hidden structured context and a user-visible question seed.
+*   **[Improve in chat]** — asks the assistant to refine a draft summary or semantic description while preserving manual intent and provenance rules.
+*   **[Edit manually]** — opens the equivalent manual review control when LLM assistance is unavailable or intentionally skipped.
+
+These actions should feel like contextual escalation, not a page transition, and they must degrade gracefully into manual controls when the chat assistant is not active.
+
+## 6.5.2 Confirmation Cards
+
+Dangerous or audit-relevant assistant actions should render as chat-native confirmation cards backed by `AssistantConfirmationRecord`.
+
+Examples:
+*   approve all mapping warnings,
+*   trigger SQL preview generation,
+*   launch the dataset in SQL Lab.
+
+The confirmation card must summarize:
+*   intended action,
+*   affected session scope,
+*   remaining blocking gates or warnings,
+*   explicit confirm/cancel controls.
+
 ## 6.6 Dialogue Exit Conditions

 The user can leave dialogue mode when:
@@ -467,7 +527,7 @@ The system found reusable semantic sources, but the user still needs to choose,
 There are meaningful unresolved items. Product suggests dialogue mode.

 ### State 6: Clarification Active
-One-question-at-a-time guided flow.
+One-question-at-a-time guided flow routed through `AssistantChatPanel` while the workspace stays visible.

 ### State 7: Mapping Review Needed
 Recovered filters and detected Jinja variables exist, but the mapping still requires approval, correction, or completion.
@@ -552,6 +612,7 @@ If the interface does not make these decisions visible, the user will feel lost
 *   Detected Jinja variables should appear as a second wave of recovered context so the user understands execution awareness is expanding.
 *   Detected semantic source candidates should appear as a third wave, with confidence labels and provenance badges.
 *   Every clarified answer should immediately remove or downgrade a validation finding where relevant.
+*   When the assistant focuses on a specific filter, field, or finding, the corresponding workspace element should glow or highlight until the user acts or changes focus.
 *   Provenance badges should update live:
    *   Confirmed
    *   Imported
@@ -671,7 +732,8 @@ Recommended trust markers:
 *   mapping approval status,
 *   “compiled by Superset” status on the SQL preview,
 *   “last changed by” and “changed in clarification” notes,
-*   “used in run” markers for final execution inputs.
+*   “used in run” markers for final execution inputs,
+*   confirmation cards in the assistant stream for state-changing actions.

 Conflict rule:
 *   The system must never silently overwrite user-entered semantic values with data from a dictionary, another dataset, or AI generation.
@@ -717,4 +779,4 @@ The UX is working if users can, with minimal hesitation:
 *   inspect the final compiled SQL before launch,
 *   resolve only the ambiguities that matter,
 *   reach a clear run/no-run decision,
-*   reopen the same context later without confusion.
+*   reopen the same context later without confusion.