ss-tools/specs/027-dataset-llm-orchestration/ux_reference.md

# UX Reference: LLM Dataset Orchestration

**Feature Branch**: `027-dataset-llm-orchestration`
**Created**: 2026-03-16
**Status**: Draft

## 1. User Persona & Context

*   **Primary user**: Analytics engineer or BI engineer who needs to quickly understand, validate, parameterize, enrich, and run a dataset that may have incomplete business context.
*   **Secondary user**: Data steward or domain expert who helps confirm meanings, resolve ambiguities, and approve the documented interpretation of a dataset.
*   **What is the user trying to achieve?**: Convert a raw dataset or a Superset-derived analytical context into something understandable, trustworthy, semantically enriched, and runnable without manually reverse-engineering filters, semantics, and hidden assumptions.
*   **Mindset**: The user is usually unsure about part of the dataset. They want speed, but they do not want “magic” that hides uncertainty. They need confidence, traceability, the ability to intervene, and reuse of existing semantic assets rather than endless redefinition.
*   **Context of use**:
    *   Reviewing a dataset before reuse in analysis.
    *   Preparing a dataset for migration or operational execution.
    *   Importing an existing analytical context from Superset instead of rebuilding it manually.
    *   Reusing semantic metadata from Excel or database dictionaries.
    *   Inheriting semantic layer settings from neighboring or master datasets in Superset.
    *   Collaborating with another person who knows the business meaning better than the technical owner.

## 2. UX Principles

*   **Expose certainty, do not fake certainty**: The system must always distinguish confirmed facts, inferred facts, imported facts, unresolved facts, and AI drafts.
*   **Guide, then get out of the way**: The product should proactively suggest next actions but should not force the user into a rigid wizard if they already know what they want to do.
*   **Progress over perfection**: A user should be able to get partial value immediately, save progress, and return later.
*   **One ambiguity at a time**: In assistant-guided dialogue mode, the user should never feel interrogated by a wall of questions.
*   **Execution must feel safe**: Before launch, the user should clearly understand what will run, with which filters, with which unresolved assumptions.
*   **Superset import should feel like recovery, not parsing**: The user expectation is not “we decoded a link”, but “we recovered the analysis context I had in Superset.”
*   **What You See Is What Will Run (WYSIWWR)**: Before any launch, the system must show the final compiled SQL query exactly as it will be sent for execution, with all template substitutions already resolved.
*   **Single Source of Truth for Execution**: The LLM never writes or edits SQL directly. The LLM only helps interpret business meaning and map available filter values into execution parameters. Jinja compilation and final SQL generation are always delegated to the native Superset execution API so the preview and the real run stay aligned.
*   **Reuse before invention**: The system should prefer trusted semantic sources before generating new names or descriptions from scratch.
*   **Confidence hierarchy must stay visible**: Semantic enrichment should follow a clear source priority: exact match from dictionary, inherited match from reference dataset, fuzzy semantic match, and only then AI-generated draft.
*   **Manual intent wins**: The system must never silently overwrite a user’s manual semantic edits with imported or generated metadata.

## 3. Core Product Modes

### Mode A: Automatic Review

User submits a dataset or imported analytical context and immediately receives:
*   documentation draft,
*   validation findings,
*   filter/context extraction result,
*   semantic enrichment candidates for columns and metrics,
*   recommended next action.

This mode is for speed and low-friction first-pass understanding.

Automatic review is not limited to generating names and descriptions from scratch. During first-pass analysis, the system actively searches connected semantic sources:
*   external dictionaries from database tables or uploaded spreadsheet files,
*   other reference datasets in Superset,
*   neighboring datasets that reuse the same physical tables or overlapping schema,
*   LLM-driven fuzzy semantic matching when exact reuse is not possible.

The semantic confidence hierarchy is explicit:

1. **Confirmed** — exact match from connected dictionary or file.
2. **Imported** — reused match from a trusted reference dataset.
3. **Inferred** — fuzzy or semantic match proposed through LLM-assisted comparison.
4. **AI Draft** — generated by the LLM from scratch when no stronger source exists.

This mode should feel like the system is recovering and inheriting existing semantic knowledge before inventing anything new.

### Mode B: Mixed-Initiative Assistant Clarification

User enters a focused interaction with the agent through the central **AssistantChatPanel** to resolve unresolved attributes, missing filter meanings, inconsistent business semantics, conflicting semantic sources, or run-time gaps.

This mode is mixed-initiative:
*   the system may push the next highest-priority clarification from a **Clarification Queue**,
*   the user may ask free-form questions about the current dataset context at any time,
*   the agent may propose state-changing actions, but execution still follows existing approval and launch gates,
*   completed phases collapse into compact summaries so the chat remains the primary workspace.

This mode is for confidence-building and resolving uncertainty.

### Mode C: Manual Fallback Review

When LLM assistance is unavailable, disabled, or intentionally skipped, the user continues through explicit manual surfaces for documentation, semantic review, mapping, and run preparation.

This mode preserves the same auditability and launch gates as the chat-centric flow, but replaces agent prompts with direct editable controls, explicit review queues, and manual confirmation actions.

### Mode D: Run Preparation

User reviews the assembled run context, edits values where needed, confirms assumptions, inspects the compiled SQL preview, and launches the dataset only when the context is good enough.

This mode is for controlled execution.

## 4. Primary Happy Path

### High-Level Story

The user opens ss-tools because they have one or more datasets they need to understand and run, but they do not fully trust the metadata. They paste a Superset link or select a dataset source in the web interface. In seconds, the workspace fills with a structured interpretation: what each dataset appears to be, which filters were recovered, which Jinja-driven variables exist in the dataset, which semantic labels were inherited from trusted sources, what is already known, and what is still uncertain. The user works primarily through a central chat-centric workspace, scans a short human-readable summary, adjusts the business meaning manually if needed, reviews dataset-specific tabs when multiple datasets are present, excludes low-value datasets from the current review when appropriate, approves a few semantic and filter mappings, resolves only the remaining ambiguities through a short guided dialogue, and reaches a “Run Ready” state after reviewing the final SQL compiled by Superset itself. If LLM assistance is unavailable, the user can continue through manual review surfaces without losing the same auditability and launch protections. Launch feels deliberate and safe because the interface shows exactly what will be used, how imported filters map to runtime variables, and where each semantic label came from.

### Detailed Step-by-Step Journey

#### Step 1: Entry

The user lands on an empty “Dataset Review Workspace”.

The screen offers two clear entry paths:
*   **Paste Superset Link**
*   **Select Dataset Source**

The user should instantly understand that both paths lead to the same outcome: a documented, semantically enriched, and runnable dataset context.

**Desired feeling**: “I know where to start.”

#### Step 2: Source Intake

The user pastes a Superset link.

The system immediately validates the input shape and responds optimistically:
*   link recognized,
*   source identified,
*   import started.

The system should avoid blocking the user with technical checks unless the import is impossible.

**Desired feeling**: “The system understood what I gave it.”

#### Step 3: Context Recovery

The system assembles the first-pass interpretation:
*   dataset identity,
*   imported native filters,
*   obvious dimensions/measures,
*   initial business summary,
*   unresolved items,
*   discovered Jinja variables used by the dataset,
*   candidate semantic sources for columns and metrics.

Context recovery is not limited to decoding the Superset link. The system also inspects the dataset through the Superset-side API to detect all available runtime template variables referenced inside the dataset query logic, for example variables used in expressions like `{{ filter_values('region') }}`.

In parallel, ss-tools gathers semantic metadata in the background from neighboring or reference datasets, especially those using the same physical tables, overlapping schema, or known business lineage. This gives the system an immediate base for suggesting `verbose_name`, `description`, and `d3format` values before asking the user to define them manually.

Instead of showing a spinner for too long, the interface should reveal results progressively as they become available:
*   dataset recognized,
*   saved native filters recovered from the link,
*   dataset template variables detected from the dataset body,
*   nearby or master datasets identified as semantic candidates,
*   dictionary or spreadsheet matches found,
*   preliminary mapping candidates suggested between filter inputs and template variables,
*   preliminary semantic matches suggested for columns and metrics.

**Desired feeling**: “I’m already getting value before everything is finished.”

#### Step 4: First Readable Summary

The user sees a compact summary card in the center chat workspace:
*   what this dataset appears to represent,
*   what period/scope/segments are implied,
*   what filters were recovered,
*   whether execution is currently possible.

This summary is the anchor of trust. It must be short, business-readable, and immediately useful.

The summary is editable. If the user sees that the generated business meaning is incorrect or incomplete, they can use **[Edit]** to manually correct the summary without starting a long clarification dialogue.

When multiple datasets are present, the workspace shows dataset tabs above or beside the central chat area so the user can switch focus quickly. Each tab clearly shows whether the dataset is active, excluded from review, or still needs attention.

**Desired feeling**: “I can explain this dataset to someone else already, and I can quickly fix the explanation if it is wrong.”

#### Step 5: Validation Triage

The system groups findings into:
*   **Blocking**
*   **Needs Attention**
*   **Informational**

The user does not need to read everything. They need to know what is stopping them from running, what is risky, and what can be reviewed later.

**Desired feeling**: “I know what matters right now.”

#### Step 6: Clarification Decision

If ambiguities remain, the product presents an explicit choice:
*   **Fix now with agent**
*   **Continue with current assumptions**
*   **Save and return later**

Choosing **Fix now with agent** opens the global **AssistantChatPanel** instead of a dedicated modal flow. The same panel also remains available from inline context actions such as **[✨ Ask AI]** next to unresolved filters, validation warnings, mapping rows, and the editable business summary.

This is a critical UX moment. The user must feel in control rather than forced into a mandatory workflow.

**Desired feeling**: “I decide how much rigor I need right now.”

#### Step 7: Assistant Chat Clarification

If the user chooses clarification, the workspace keeps its main layout and opens a focused dialogue stream in **AssistantChatPanel**.

The agent asks one question at a time, each with:
*   why this matters,
*   what the current guess is,
*   quick-select answers when possible,
*   an option to skip,
*   an option to say “I don’t know”.

Each answer updates the dataset profile in real time.

As the user completes a phase such as recovery review, clarification, or mapping review, that phase collapses into a compact summary block in the chat timeline so progress remains visible without forcing the user to scroll through expanded historical panels.

When the agent question references a specific filter, field, mapping, or finding, the related card in the workspace is visually highlighted so the user can keep spatial context while answering in chat.

If LLM assistance is unavailable, the same unresolved items remain available in manual review panels with equivalent actions, but the system does not pretend that chat guidance is active.

**Desired feeling**: “This is helping me resolve uncertainty, not making me fill a form.”

#### Step 8: Run Readiness Review

When blocking issues are resolved, the system returns to a run-preparation state with:
*   selected filters,
*   placeholder values,
*   unresolved warnings,
*   final business summary,
*   provenance labels for each key value,
*   visible mapping between imported filters and detected Jinja template variables,
*   semantic provenance for important columns and metrics,
*   a preview of the final compiled SQL returned by Superset.

This step contains the critical **Smart Mapping** stage. The system uses the LLM to propose a mapping between the filter values recovered from the Superset link and the Jinja variables discovered in the dataset. The LLM does not generate SQL. It only assembles or suggests the parameter payload used for execution, such as the effective template parameter object.

The user can review each mapping explicitly:
*   source filter,
*   target Jinja variable,
*   transformed value if normalization was required,
*   confidence state,
*   warning state,
*   manual override.

Semantic review also remains visible here. Users can inspect where key `verbose_name`, `description`, and `d3format` values came from and whether they were confirmed from a dictionary, imported from a reference dataset, inferred from fuzzy matching, or generated as AI drafts.

Before launch, ss-tools performs a **Dry Run via Superset API**. The backend sends the assembled execution parameters to Superset for safe server-side compilation of the query without triggering the real dataset run. The result is shown as the **Compiled Query Preview**.

The **Compiled Query Preview** is a read-only SQL block that shows the final SQL with Jinja substitutions already resolved by Superset. Substituted values should be visibly highlighted so users can quickly inspect what changed.

If smart mapping introduced warnings, for example a value normalization such as `Europe → EU`, the launch button stays blocked until the user explicitly approves the mapping or edits it manually. The user must never run a query whose effective substitutions are still ambiguous.

Before launch, the user should be able to inspect the full context in one place.

**Desired feeling**: “I know exactly what will run, and I trust that this preview matches the real execution.”

#### Step 9: Launch

The user presses **Launch Dataset**.

The final confirmation is not a generic “Are you sure?” modal. It is a run summary:
*   dataset,
*   effective filters,
*   variable inputs,
*   warnings still open,
*   compiled SQL preview status,
*   semantic source summary for important fields,
*   what will be recorded for audit.

“Launch” has a concrete execution meaning. Depending on the selected path, ss-tools either:
*   sends the prepared execution payload for execution in Superset SQL Lab, or
*   redirects the user into a ready-to-run Superset analytical view with the assembled execution context already applied.

In both cases, the user expectation is the same: the execution uses the exact compiled query and runtime parameters they already reviewed.

**Desired feeling**: “This run is controlled, reproducible, and uses the exact query I approved.”

#### Step 10: Post-Run Feedback

After launch, the system confirms:
*   run started or completed,
*   context saved,
*   documentation linked,
*   validation snapshot preserved,
*   compiled query version associated with the run,
*   execution handoff target available,
*   semantic mapping decisions preserved for reuse.

The post-run state should provide useful artifacts, such as:
*   a link to the created Superset execution session,
*   a preview of the first rows of returned data directly in ss-tools when available,
*   or an updated saved dataset context that can be reopened and reused later.

The user can reopen the run later and understand the exact state used.

**Desired feeling**: “I can trust this later, not just right now.”

## 5. End-to-End Interaction Model

## 5.1 Main Workspace Structure

**Screen/Component**: Dataset Review Workspace

**Layout**: Adaptive three-column workspace.

### Left Column: Source & Session
*   dataset source card,
*   Superset import status,
*   session state,
*   save/resume controls,
*   recent actions timeline.

### Center Column: Chat-Centric Review Surface
*   the always-visible **AssistantChatPanel** as the primary workspace,
*   generated business summary,
*   manual override with **[Edit]** for the generated summary and business interpretation,
*   collapsible phase summaries for completed recovery, clarification, and mapping stages,
*   documentation draft preview,
*   validation findings grouped by severity,
*   confidence markers,
*   unresolved assumptions.

### Center Column: Dataset Scope Navigation
*   dataset tabs when multiple datasets or candidate datasets are present,
*   clear active/excluded state for each dataset tab,
*   fast switching between dataset-specific semantic review, filters, and mapping widgets without leaving the central chat,
*   explicit exclude-from-review action for datasets that should not affect current readiness.

### Right Column: Execution, Manual Fallback, and Artifacts
*   imported filters,
*   parameter placeholders,
*   **Jinja Template Mapping** block with visible mapping between source filters and detected dataset variables,
*   run-time values,
*   **Compiled SQL Preview** block or action to open the compiled query returned by Superset API,
*   readiness checklist,
*   primary CTA,
*   manual review surfaces that remain available when chat assistance is unavailable.

This structure matters because the user mentally works across four questions:
1. What is this?
2. Can I trust its meaning?
3. Can I trust what will run?
4. Can I run it?

## 5.2 Primary CTAs by State

The main CTA should change based on readiness:

*   **Empty** → `Import from Superset`
*   **Intake complete** → `Review Documentation`
*   **Semantic source available** → `Apply Semantic Source`
*   **Ambiguities present** → `Start Clarification`
*   **Mapping warnings present** → `Approve Mapping`
*   **Compilation preview missing** → `Generate SQL Preview`
*   **Blocking values missing** → `Complete Required Values`
*   **Run-ready** → `Launch Dataset`

The product should never make the user guess what the next best action is.

## 5.3 Information Hierarchy

At any moment, the most visible information should be:

1. current readiness state,
2. blocking problems,
3. imported/recovered context,
4. mapping status between recovered filters and runtime variables,
5. semantic source confidence for key fields,
6. business explanation,
7. compiled SQL preview status,
8. detailed metadata.

Raw detail is valuable, but it should never compete visually with the answer to “Can I proceed?”

## 6. Dialogue UX: Agent Interaction Design

## 6.1 Conversation Pattern

The agent interaction is not a chat for general brainstorming. It is a structured operational assistant embedded in **AssistantChatPanel** that supports both guided clarification and user-initiated context questions.

Each prompt should contain:
*   **Question**
*   **Why this matters**
*   **Current system guess**
*   **Suggested answers**
*   **Optional free-form input**
*   **Skip for now**

Example interaction:

```text
Question 2 of 5
What does the "region_scope" filter represent in this dataset?

Why this matters:
This value changes how the final aggregation is interpreted.

Current guess:
It appears to mean the reporting region, not the customer region.

Choose one:
[1] Reporting region
[2] Customer region
[3] Both depending on use case
[4] I’m not sure
[5] Enter custom answer
```

This keeps the agent focused, useful, and fast.

The assistant may also answer free-form prompts such as:
*   “Why is this filter marked partial?”
*   “Which mapping is still blocking launch?”
*   “Show me why the SQL preview is stale.”

Free-form answers must stay grounded in current session context and should link back to the relevant workspace element.

## 6.2 Agent-Led Semantic Source Suggestion

The agent may proactively suggest a semantic source when the schema strongly resembles an existing reference.

Example interaction:

```text
Question: Semantic Layer Source

I noticed that 80% of the columns in this dataset (user_id, region, revenue) match the existing "Core_Users_Master" dataset.

Why this matters:
Reusing existing metadata keeps verbose names, descriptions, and d3formats consistent across dashboards.

How would you like to populate the semantic layer?
[1] Copy from "Core_Users_Master" dataset (Recommended)
[2] Upload an Excel (.xlsx) or DB dictionary
[3] Let AI generate them from scratch
[4] Skip and leave as database names
```

This should feel like a smart reuse recommendation, not a forced detour.

## 6.3 Fuzzy Matching Confirmation Pattern

When the user chooses an external dictionary and exact matches are incomplete, the agent should summarize the result clearly before applying it.

Example:

```text
I matched 15 columns exactly from the selected dictionary.
I also found 3 likely semantic matches that need confirmation.

Please review:
- reg_code → region
- rev_total → revenue
- usr_identifier → user_id

How would you like to proceed?
[1] Accept all suggested semantic matches
[2] Review one by one
[3] Ignore fuzzy matches and keep exact ones only
```

The user must understand which matches are exact, which are semantic guesses, and which remain unresolved.

## 6.4 Agent Tone

The agent should sound:
*   precise,
*   calm,
*   operational,
*   non-judgmental.

It should never imply the user made a mistake when data is ambiguous. Ambiguity is treated as a normal property of datasets.

## 6.5 Dialogue Controls

The user must be able to:
*   skip a question,
*   save and exit,
*   review previous answers,
*   revise a prior answer,
*   mark an item as “needs expert review”.

These controls are critical for real-world data workflows.

## 6.5.1 Context Actions

Inline micro-actions should appear next to high-friction items inside the workspace:
*   unresolved or partial imported filters,
*   blocking and warning validation findings,
*   editable business summary,
*   mappings that still require approval or normalization review.

Recommended actions:
*   **[Ask in chat]** — opens or focuses **AssistantChatPanel** with hidden structured context and a user-visible question seed.
*   **[Improve in chat]** — asks the assistant to refine a draft summary or semantic description while preserving manual intent and provenance rules.
*   **[Edit manually]** — opens the equivalent manual review control when LLM assistance is unavailable or intentionally skipped.

These actions should feel like contextual escalation, not a page transition, and they must degrade gracefully into manual controls when the chat assistant is not active.

## 6.5.2 Confirmation Cards

Dangerous or audit-relevant assistant actions should render as chat-native confirmation cards backed by `AssistantConfirmationRecord`.

Examples:
*   approve all mapping warnings,
*   trigger SQL preview generation,
*   launch the dataset in SQL Lab.

The confirmation card must summarize:
*   intended action,
*   affected session scope,
*   remaining blocking gates or warnings,
*   explicit confirm/cancel controls.

## 6.6 Dialogue Exit Conditions

The user can leave dialogue mode when:
*   all blocking ambiguities are resolved,
*   user chooses to continue with warnings,
*   session is saved for later,
*   no further useful clarification can be generated.

The agent must explicitly summarize what changed before exit:
*   resolved items,
*   still unresolved items,
*   effect on run readiness.

## 7. State Model

### State 1: Empty
No dataset loaded. Clear entry choices.

### State 2: Importing
Progressive loading with visible milestones.

### State 3: Review Ready
Documentation and validation visible. User can understand the dataset immediately.

### State 4: Semantic Source Review Needed
The system found reusable semantic sources, but the user still needs to choose, approve, or reject them.

### State 5: Clarification Needed
There are meaningful unresolved items. Product suggests dialogue mode.

### State 6: Clarification Active
One-question-at-a-time guided flow routed through `AssistantChatPanel` while the workspace stays visible.

### State 7: Mapping Review Needed
Recovered filters and detected Jinja variables exist, but the mapping still requires approval, correction, or completion.

### State 8: Compiled Preview Ready
Superset has compiled the current parameter set, and the user can inspect the exact SQL that would run.

### State 9: Partially Ready
No blockers, but warnings remain.

### State 10: Run Ready
Everything required for launch is complete.

### State 11: Run In Progress
Execution feedback and status tracking.

### State 12: Completed
Run outcome and saved context available.

### State 13: Recovery Required
Import, mapping, semantic enrichment, or compilation was partial; manual or guided recovery needed.

## 8. Key User Decisions

The UX must support these decisions explicitly:

*   Is this imported context trustworthy enough?
*   Which semantic source should define `verbose_name`, `description`, and `d3format`?
*   Do I want to reuse a master dataset or apply a spreadsheet/database dictionary?
*   Should I accept fuzzy semantic matches or only exact ones?
*   Do I need clarification now or can I continue?
*   Are the filters correct as imported?
*   Which source filter should map to which Jinja variable?
*   Is the transformed value acceptable if normalization was applied?
*   Which values are confirmed versus guessed?
*   Does the compiled SQL match my intent?
*   Is the dataset safe enough to run?
*   Do I want to save current progress and come back later?

If the interface does not make these decisions visible, the user will feel lost even if the feature is technically correct.

## 9. UI Layout & Flow

**Screen**: Dataset Review Workspace

*   **Top Bar**:
    *   Source badge
    *   Dataset name
    *   Readiness status pill
    *   Save session
    *   Export summary

*   **Hero Summary Block**:
    *   “What this dataset is”
    *   “What is ready”
    *   “What still needs attention”
    *   Primary CTA
    *   **[Edit]** action for manual correction

*   **Tabs or Sections**:
    *   Overview
    *   Documentation
    *   Semantic Layer
    *   Validation
    *   Filters
    *   Mapping
    *   SQL Preview
    *   Clarification History
    *   Run History

*   **Right Rail**:
    *   readiness checklist,
    *   semantic source status,
    *   missing required values,
    *   mapping warnings,
    *   SQL preview status,
    *   launch button.

## 10. Micro-Interactions

*   Imported filters should animate into the panel one by one as they are recovered.
*   Detected Jinja variables should appear as a second wave of recovered context so the user understands execution awareness is expanding.
*   Detected semantic source candidates should appear as a third wave, with confidence labels and provenance badges.
*   Every clarified answer should immediately remove or downgrade a validation finding where relevant.
*   When the assistant focuses on a specific filter, field, or finding, the corresponding workspace element should glow or highlight until the user acts or changes focus.
*   Provenance badges should update live:
    *   Confirmed
    *   Imported
    *   Inferred
    *   AI Draft
    *   Mapped
    *   Needs Review
*   The primary CTA should change smoothly, not abruptly, as the state progresses.
*   When launch becomes available, the interface should celebrate readiness subtly but should not hide remaining warnings.
*   Value transformations proposed by mapping should be visually diffed so the user can spot changes like `Europe → EU` instantly.
*   The compiled SQL preview should visibly refresh when mapping or parameter values change.
*   Manual semantic overrides should visually lock the affected field so later imports do not silently replace it.

## 11. Error Experience

**Philosophy**: Never show a dead end. Every error state must preserve recovered value, explain what failed, and show the nearest path forward.

### Scenario A: Superset Link Recognized, Filter Extraction Partial

*   **User Action**: Pastes a valid Superset link with partially recoverable filter state.
*   **System Response**:
    *   “We recovered the dataset and 3 filters, but 2 saved filters need manual review.”
    *   Missing or low-confidence filters are listed explicitly.
    *   The system still opens the workspace with partial value.
*   **Recovery**:
    *   review recovered filters,
    *   add missing ones manually,
    *   ask the agent to help reconstruct intent.

### Scenario B: No Clear Business Meaning Can Be Inferred

*   **User Action**: Submits a technically valid dataset with poor metadata.
*   **System Response**:
    *   “We could identify the structure of this dataset, but not its business meaning.”
    *   Documentation remains skeletal but usable.
    *   Clarification becomes the obvious next step.
*   **Recovery**:
    *   launch dialogue mode,
    *   invite domain expert input,
    *   save draft and resume later.

### Scenario C: Required Run-Time Values Missing

*   **User Action**: Tries to launch with incomplete placeholders.
*   **System Response**:
    *   launch blocked,
    *   missing values highlighted in-place,
    *   concise summary of what is required.
*   **Recovery**:
    *   fill values inline,
    *   return to review,
    *   or save incomplete context.

### Scenario D: Conflicting Meanings Across Sources

*   **User Action**: Reviews a dataset where imported filter context and documented semantics conflict.
*   **System Response**:
    *   both candidate meanings are shown side-by-side,
    *   neither is silently chosen if confidence is low,
    *   the conflict is framed as a decision, not a failure.
*   **Recovery**:
    *   user confirms one meaning,
    *   leaves item unresolved,
    *   or marks for expert review.

### Scenario E: User Leaves Mid-Flow

*   **User Action**: Closes the session before clarification or run prep is complete.
*   **System Response**:
    *   autosave or explicit save confirmation,
    *   summary of current progress,
    *   preserved unresolved items.
*   **Recovery**:
    *   resume from last state without repeating prior answers.

### Scenario F: Superset API Compilation Failed

*   **User Action**: The mapped runtime values are sent for Jinja compilation, but Superset returns a compilation error.
*   **System Response**:
    *   the **Compiled SQL Preview** switches into an error state instead of pretending preview is available,
    *   the problematic variable or mapping row is highlighted,
    *   the original compilation error returned by Superset is shown in readable form,
    *   launch remains blocked until the issue is resolved.
*   **Recovery**:
    *   user manually edits the mapped value,
    *   user changes the filter-to-template mapping,
    *   or user asks the agent to help normalize the value format and then regenerates the preview.

### Scenario G: Semantic Sources Conflict

*   **User Action**: A column has one value from a spreadsheet dictionary, a different value from a reference dataset, and a third AI-generated proposal.
*   **System Response**:
    *   the interface shows a side-by-side comparison instead of silently choosing one,
    *   the higher-priority source is highlighted as recommended,
    *   the conflict is marked as a warning if user input would be changed.
*   **Recovery**:
    *   user selects one source,
    *   user keeps the current manual value,
    *   or user applies the recommended higher-confidence source field by field.

## 12. UX for Trust & Transparency

Trust is central to this feature.

The interface must visibly answer:
*   Where did this value come from?
*   Did the system infer this or did the user confirm it?
*   Which runtime variable will receive this value?
*   Was the final SQL preview compiled by Superset or just estimated locally?
*   Did this semantic label come from a dictionary, another dataset, a fuzzy match, or AI generation?
*   What is still unknown?
*   What will happen if I proceed anyway?

Recommended trust markers:
*   provenance badge on important fields,
*   confidence labels for imported or inferred data,
*   mapping approval status,
*   “compiled by Superset” status on the SQL preview,
*   “last changed by” and “changed in clarification” notes,
*   “used in run” markers for final execution inputs,
*   confirmation cards in the assistant stream for state-changing actions.

Conflict rule:
*   The system must never silently overwrite user-entered semantic values with data from a dictionary, another dataset, or AI generation.
*   If multiple sources disagree, the interface shows them side by side and either:
    *   asks the user to choose, or
    *   recommends the highest-priority source while clearly marking the recommendation as a warning until approved.
*   Manual user input remains the most sensitive value and must be preserved unless the user explicitly replaces it.

## 13. UX for Collaboration

This workflow often spans more than one person.

The UX should support:
*   sharing documentation draft,
*   handing a clarification session to a domain expert,
*   preserving unresolved questions explicitly,
*   recording who confirmed which meaning,
*   sharing the reviewed compiled SQL preview as part of execution approval,
*   sharing which semantic sources were applied and why.

The user should be able to leave behind a state that another person can understand in under a minute.

## 14. Tone & Voice

*   **Style**: Concise, trustworthy, operational, and transparent.
*   **System behavior language**:
    *   Prefer: “Recovered”, “Confirmed”, “Imported”, “Inferred”, “AI Draft”, “Needs review”, “Ready to run”, “Compiled by Superset”
    *   Avoid: “Magic”, “Solved”, “Guaranteed”, “Auto-fixed”
*   **Terminology**:
    *   Use “dataset”, “clarification”, “validation finding”, “run context”, “imported filter”, “Jinja variable”, “template mapping”, “compiled SQL preview”, “semantic source”, “provenance”, “assumption”, “confidence”.
    *   Avoid overly technical wording in primary UX surfaces when a business-readable phrase exists.

## 15. UX Success Signals

The UX is working if users can, with minimal hesitation:
*   understand what dataset they are dealing with,
*   see what was recovered from Superset,
*   see which Jinja variables were discovered for runtime execution,
*   understand which semantic source supplied each important field,
*   reuse existing semantic assets before accepting AI guesses,
*   tell which values are trustworthy,
*   review and approve filter-to-template mapping without confusion,
*   inspect the final compiled SQL before launch,
*   resolve only the ambiguities that matter,
*   reach a clear run/no-run decision,
*   reopen the same context later without confusion.