# UX Reference: LLM Dataset Orchestration **Feature Branch**: `027-dataset-llm-orchestration` **Created**: 2026-03-16 **Status**: Draft ## 1. User Persona & Context * **Primary user**: Analytics engineer or BI engineer who needs to quickly understand, validate, parameterize, enrich, and run a dataset that may have incomplete business context. * **Secondary user**: Data steward or domain expert who helps confirm meanings, resolve ambiguities, and approve the documented interpretation of a dataset. * **What is the user trying to achieve?**: Convert a raw dataset or a Superset-derived analytical context into something understandable, trustworthy, semantically enriched, and runnable without manually reverse-engineering filters, semantics, and hidden assumptions. * **Mindset**: The user is usually unsure about part of the dataset. They want speed, but they do not want “magic” that hides uncertainty. They need confidence, traceability, the ability to intervene, and reuse of existing semantic assets rather than endless redefinition. * **Context of use**: * Reviewing a dataset before reuse in analysis. * Preparing a dataset for migration or operational execution. * Importing an existing analytical context from Superset instead of rebuilding it manually. * Reusing semantic metadata from Excel or database dictionaries. * Inheriting semantic layer settings from neighboring or master datasets in Superset. * Collaborating with another person who knows the business meaning better than the technical owner. ## 2. UX Principles * **Expose certainty, do not fake certainty**: The system must always distinguish confirmed facts, inferred facts, imported facts, unresolved facts, and AI drafts. * **Guide, then get out of the way**: The product should proactively suggest next actions but should not force the user into a rigid wizard if they already know what they want to do. * **Progress over perfection**: A user should be able to get partial value immediately, save progress, and return later. * **One ambiguity at a time**: In dialogue mode, the user should never feel interrogated by a wall of questions. * **Execution must feel safe**: Before launch, the user should clearly understand what will run, with which filters, with which unresolved assumptions. * **Superset import should feel like recovery, not parsing**: The user expectation is not “we decoded a link”, but “we recovered the analysis context I had in Superset.” * **What You See Is What Will Run (WYSIWWR)**: Before any launch, the system must show the final compiled SQL query exactly as it will be sent for execution, with all template substitutions already resolved. * **Single Source of Truth for Execution**: The LLM never writes or edits SQL directly. The LLM only helps interpret business meaning and map available filter values into execution parameters. Jinja compilation and final SQL generation are always delegated to the native Superset execution API so the preview and the real run stay aligned. * **Reuse before invention**: The system should prefer trusted semantic sources before generating new names or descriptions from scratch. * **Confidence hierarchy must stay visible**: Semantic enrichment should follow a clear source priority: exact match from dictionary, inherited match from reference dataset, fuzzy semantic match, and only then AI-generated draft. * **Manual intent wins**: The system must never silently overwrite a user’s manual semantic edits with imported or generated metadata. ## 3. Core Product Modes ### Mode A: Automatic Review User submits a dataset or imported analytical context and immediately receives: * documentation draft, * validation findings, * filter/context extraction result, * semantic enrichment candidates for columns and metrics, * recommended next action. This mode is for speed and low-friction first-pass understanding. Automatic review is not limited to generating names and descriptions from scratch. During first-pass analysis, the system actively searches connected semantic sources: * external dictionaries from database tables or uploaded spreadsheet files, * other reference datasets in Superset, * neighboring datasets that reuse the same physical tables or overlapping schema, * LLM-driven fuzzy semantic matching when exact reuse is not possible. The semantic confidence hierarchy is explicit: 1. **Confirmed** — exact match from connected dictionary or file. 2. **Imported** — reused match from a trusted reference dataset. 3. **Inferred** — fuzzy or semantic match proposed through LLM-assisted comparison. 4. **AI Draft** — generated by the LLM from scratch when no stronger source exists. This mode should feel like the system is recovering and inheriting existing semantic knowledge before inventing anything new. ### Mode B: Guided Clarification User enters a focused interaction with the agent to resolve unresolved attributes, missing filter meanings, inconsistent business semantics, conflicting semantic sources, or run-time gaps. This mode is for confidence-building and resolving uncertainty. ### Mode C: Run Preparation User reviews the assembled run context, edits values where needed, confirms assumptions, inspects the compiled SQL preview, and launches the dataset only when the context is good enough. This mode is for controlled execution. ## 4. Primary Happy Path ### High-Level Story The user opens ss-tools because they have a dataset they need to understand and run, but they do not fully trust the metadata. They paste a Superset link or select a dataset source in the web interface. In seconds, the workspace fills with a structured interpretation: what the dataset appears to be, which filters were recovered, which Jinja-driven variables exist in the dataset, which semantic labels were inherited from trusted sources, what is already known, and what is still uncertain. The user scans a short human-readable summary, adjusts the business meaning manually if needed, approves a few semantic and filter mappings, resolves only the remaining ambiguities through a short guided dialogue, and reaches a “Run Ready” state after reviewing the final SQL compiled by Superset itself. Launch feels deliberate and safe because the interface shows exactly what will be used, how imported filters map to runtime variables, and where each semantic label came from. ### Detailed Step-by-Step Journey #### Step 1: Entry The user lands on an empty “Dataset Review Workspace”. The screen offers two clear entry paths: * **Paste Superset Link** * **Select Dataset Source** The user should instantly understand that both paths lead to the same outcome: a documented, semantically enriched, and runnable dataset context. **Desired feeling**: “I know where to start.” #### Step 2: Source Intake The user pastes a Superset link. The system immediately validates the input shape and responds optimistically: * link recognized, * source identified, * import started. The system should avoid blocking the user with technical checks unless the import is impossible. **Desired feeling**: “The system understood what I gave it.” #### Step 3: Context Recovery The system assembles the first-pass interpretation: * dataset identity, * imported native filters, * obvious dimensions/measures, * initial business summary, * unresolved items, * discovered Jinja variables used by the dataset, * candidate semantic sources for columns and metrics. Context recovery is not limited to decoding the Superset link. The system also inspects the dataset through the Superset-side API to detect all available runtime template variables referenced inside the dataset query logic, for example variables used in expressions like `{{ filter_values('region') }}`. In parallel, ss-tools gathers semantic metadata in the background from neighboring or reference datasets, especially those using the same physical tables, overlapping schema, or known business lineage. This gives the system an immediate base for suggesting `verbose_name`, `description`, and `d3format` values before asking the user to define them manually. Instead of showing a spinner for too long, the interface should reveal results progressively as they become available: * dataset recognized, * saved native filters recovered from the link, * dataset template variables detected from the dataset body, * nearby or master datasets identified as semantic candidates, * dictionary or spreadsheet matches found, * preliminary mapping candidates suggested between filter inputs and template variables, * preliminary semantic matches suggested for columns and metrics. **Desired feeling**: “I’m already getting value before everything is finished.” #### Step 4: First Readable Summary The user sees a compact summary card: * what this dataset appears to represent, * what period/scope/segments are implied, * what filters were recovered, * whether execution is currently possible. This summary is the anchor of trust. It must be short, business-readable, and immediately useful. The summary is editable. If the user sees that the generated business meaning is incorrect or incomplete, they can use **[Edit]** to manually correct the summary without starting a long clarification dialogue. **Desired feeling**: “I can explain this dataset to someone else already, and I can quickly fix the explanation if it is wrong.” #### Step 5: Validation Triage The system groups findings into: * **Blocking** * **Needs Attention** * **Informational** The user does not need to read everything. They need to know what is stopping them from running, what is risky, and what can be reviewed later. **Desired feeling**: “I know what matters right now.” #### Step 6: Clarification Decision If ambiguities remain, the product presents an explicit choice: * **Fix now with agent** * **Continue with current assumptions** * **Save and return later** This is a critical UX moment. The user must feel in control rather than forced into a mandatory workflow. **Desired feeling**: “I decide how much rigor I need right now.” #### Step 7: Guided Clarification If the user chooses clarification, the workspace switches into a focused dialogue mode. The agent asks one question at a time, each with: * why this matters, * what the current guess is, * quick-select answers when possible, * an option to skip, * an option to say “I don’t know”. Each answer updates the dataset profile in real time. **Desired feeling**: “This is helping me resolve uncertainty, not making me fill a form.” #### Step 8: Run Readiness Review When blocking issues are resolved, the system returns to a run-preparation state with: * selected filters, * placeholder values, * unresolved warnings, * final business summary, * provenance labels for each key value, * visible mapping between imported filters and detected Jinja template variables, * semantic provenance for important columns and metrics, * a preview of the final compiled SQL returned by Superset. This step contains the critical **Smart Mapping** stage. The system uses the LLM to propose a mapping between the filter values recovered from the Superset link and the Jinja variables discovered in the dataset. The LLM does not generate SQL. It only assembles or suggests the parameter payload used for execution, such as the effective template parameter object. The user can review each mapping explicitly: * source filter, * target Jinja variable, * transformed value if normalization was required, * confidence state, * warning state, * manual override. Semantic review also remains visible here. Users can inspect where key `verbose_name`, `description`, and `d3format` values came from and whether they were confirmed from a dictionary, imported from a reference dataset, inferred from fuzzy matching, or generated as AI drafts. Before launch, ss-tools performs a **Dry Run via Superset API**. The backend sends the assembled execution parameters to Superset for safe server-side compilation of the query without triggering the real dataset run. The result is shown as the **Compiled Query Preview**. The **Compiled Query Preview** is a read-only SQL block that shows the final SQL with Jinja substitutions already resolved by Superset. Substituted values should be visibly highlighted so users can quickly inspect what changed. If smart mapping introduced warnings, for example a value normalization such as `Europe → EU`, the launch button stays blocked until the user explicitly approves the mapping or edits it manually. The user must never run a query whose effective substitutions are still ambiguous. Before launch, the user should be able to inspect the full context in one place. **Desired feeling**: “I know exactly what will run, and I trust that this preview matches the real execution.” #### Step 9: Launch The user presses **Launch Dataset**. The final confirmation is not a generic “Are you sure?” modal. It is a run summary: * dataset, * effective filters, * variable inputs, * warnings still open, * compiled SQL preview status, * semantic source summary for important fields, * what will be recorded for audit. “Launch” has a concrete execution meaning. Depending on the selected path, ss-tools either: * sends the prepared execution payload for execution in Superset SQL Lab, or * redirects the user into a ready-to-run Superset analytical view with the assembled execution context already applied. In both cases, the user expectation is the same: the execution uses the exact compiled query and runtime parameters they already reviewed. **Desired feeling**: “This run is controlled, reproducible, and uses the exact query I approved.” #### Step 10: Post-Run Feedback After launch, the system confirms: * run started or completed, * context saved, * documentation linked, * validation snapshot preserved, * compiled query version associated with the run, * execution handoff target available, * semantic mapping decisions preserved for reuse. The post-run state should provide useful artifacts, such as: * a link to the created Superset execution session, * a preview of the first rows of returned data directly in ss-tools when available, * or an updated saved dataset context that can be reopened and reused later. The user can reopen the run later and understand the exact state used. **Desired feeling**: “I can trust this later, not just right now.” ## 5. End-to-End Interaction Model ## 5.1 Main Workspace Structure **Screen/Component**: Dataset Review Workspace **Layout**: Adaptive three-column workspace. ### Left Column: Source & Session * dataset source card, * Superset import status, * session state, * save/resume controls, * recent actions timeline. ### Center Column: Meaning & Validation * generated business summary, * manual override with **[Edit]** for the generated summary and business interpretation, * documentation draft preview, * validation findings grouped by severity, * confidence markers, * unresolved assumptions. ### Center Column: Columns & Metrics * semantic layer table for columns and metrics, * visible values for `verbose_name`, `description`, and formatting metadata where available, * provenance badges for every semantically enriched field, such as `[ 📄 dict.xlsx ]`, `[ 📊 Dataset: Master Sales ]`, or `[ ✨ AI Guessed ]`, * side-by-side conflict view when multiple semantic sources disagree, * **Apply semantic source...** action that opens source selection for file, database dictionary, or existing Superset datasets, * manual per-field override so the user can keep, replace, or rewrite semantic metadata. ### Right Column: Filters & Execution * imported filters, * parameter placeholders, * **Jinja Template Mapping** block with visible mapping between source filters and detected dataset variables, * run-time values, * **Compiled SQL Preview** block or action to open the compiled query returned by Superset API, * readiness checklist, * primary CTA. This structure matters because the user mentally works across four questions: 1. What is this? 2. Can I trust its meaning? 3. Can I trust what will run? 4. Can I run it? ## 5.2 Primary CTAs by State The main CTA should change based on readiness: * **Empty** → `Import from Superset` * **Intake complete** → `Review Documentation` * **Semantic source available** → `Apply Semantic Source` * **Ambiguities present** → `Start Clarification` * **Mapping warnings present** → `Approve Mapping` * **Compilation preview missing** → `Generate SQL Preview` * **Blocking values missing** → `Complete Required Values` * **Run-ready** → `Launch Dataset` The product should never make the user guess what the next best action is. ## 5.3 Information Hierarchy At any moment, the most visible information should be: 1. current readiness state, 2. blocking problems, 3. imported/recovered context, 4. mapping status between recovered filters and runtime variables, 5. semantic source confidence for key fields, 6. business explanation, 7. compiled SQL preview status, 8. detailed metadata. Raw detail is valuable, but it should never compete visually with the answer to “Can I proceed?” ## 6. Dialogue UX: Agent Interaction Design ## 6.1 Conversation Pattern The agent interaction is not a chat for general brainstorming. It is a structured clarification assistant. Each prompt should contain: * **Question** * **Why this matters** * **Current system guess** * **Suggested answers** * **Optional free-form input** * **Skip for now** Example interaction: ```text Question 2 of 5 What does the "region_scope" filter represent in this dataset? Why this matters: This value changes how the final aggregation is interpreted. Current guess: It appears to mean the reporting region, not the customer region. Choose one: [1] Reporting region [2] Customer region [3] Both depending on use case [4] I’m not sure [5] Enter custom answer ``` This keeps the agent focused, useful, and fast. ## 6.2 Agent-Led Semantic Source Suggestion The agent may proactively suggest a semantic source when the schema strongly resembles an existing reference. Example interaction: ```text Question: Semantic Layer Source I noticed that 80% of the columns in this dataset (user_id, region, revenue) match the existing "Core_Users_Master" dataset. Why this matters: Reusing existing metadata keeps verbose names, descriptions, and d3formats consistent across dashboards. How would you like to populate the semantic layer? [1] Copy from "Core_Users_Master" dataset (Recommended) [2] Upload an Excel (.xlsx) or DB dictionary [3] Let AI generate them from scratch [4] Skip and leave as database names ``` This should feel like a smart reuse recommendation, not a forced detour. ## 6.3 Fuzzy Matching Confirmation Pattern When the user chooses an external dictionary and exact matches are incomplete, the agent should summarize the result clearly before applying it. Example: ```text I matched 15 columns exactly from the selected dictionary. I also found 3 likely semantic matches that need confirmation. Please review: - reg_code → region - rev_total → revenue - usr_identifier → user_id How would you like to proceed? [1] Accept all suggested semantic matches [2] Review one by one [3] Ignore fuzzy matches and keep exact ones only ``` The user must understand which matches are exact, which are semantic guesses, and which remain unresolved. ## 6.4 Agent Tone The agent should sound: * precise, * calm, * operational, * non-judgmental. It should never imply the user made a mistake when data is ambiguous. Ambiguity is treated as a normal property of datasets. ## 6.5 Dialogue Controls The user must be able to: * skip a question, * save and exit, * review previous answers, * revise a prior answer, * mark an item as “needs expert review”. These controls are critical for real-world data workflows. ## 6.6 Dialogue Exit Conditions The user can leave dialogue mode when: * all blocking ambiguities are resolved, * user chooses to continue with warnings, * session is saved for later, * no further useful clarification can be generated. The agent must explicitly summarize what changed before exit: * resolved items, * still unresolved items, * effect on run readiness. ## 7. State Model ### State 1: Empty No dataset loaded. Clear entry choices. ### State 2: Importing Progressive loading with visible milestones. ### State 3: Review Ready Documentation and validation visible. User can understand the dataset immediately. ### State 4: Semantic Source Review Needed The system found reusable semantic sources, but the user still needs to choose, approve, or reject them. ### State 5: Clarification Needed There are meaningful unresolved items. Product suggests dialogue mode. ### State 6: Clarification Active One-question-at-a-time guided flow. ### State 7: Mapping Review Needed Recovered filters and detected Jinja variables exist, but the mapping still requires approval, correction, or completion. ### State 8: Compiled Preview Ready Superset has compiled the current parameter set, and the user can inspect the exact SQL that would run. ### State 9: Partially Ready No blockers, but warnings remain. ### State 10: Run Ready Everything required for launch is complete. ### State 11: Run In Progress Execution feedback and status tracking. ### State 12: Completed Run outcome and saved context available. ### State 13: Recovery Required Import, mapping, semantic enrichment, or compilation was partial; manual or guided recovery needed. ## 8. Key User Decisions The UX must support these decisions explicitly: * Is this imported context trustworthy enough? * Which semantic source should define `verbose_name`, `description`, and `d3format`? * Do I want to reuse a master dataset or apply a spreadsheet/database dictionary? * Should I accept fuzzy semantic matches or only exact ones? * Do I need clarification now or can I continue? * Are the filters correct as imported? * Which source filter should map to which Jinja variable? * Is the transformed value acceptable if normalization was applied? * Which values are confirmed versus guessed? * Does the compiled SQL match my intent? * Is the dataset safe enough to run? * Do I want to save current progress and come back later? If the interface does not make these decisions visible, the user will feel lost even if the feature is technically correct. ## 9. UI Layout & Flow **Screen**: Dataset Review Workspace * **Top Bar**: * Source badge * Dataset name * Readiness status pill * Save session * Export summary * **Hero Summary Block**: * “What this dataset is” * “What is ready” * “What still needs attention” * Primary CTA * **[Edit]** action for manual correction * **Tabs or Sections**: * Overview * Documentation * Semantic Layer * Validation * Filters * Mapping * SQL Preview * Clarification History * Run History * **Right Rail**: * readiness checklist, * semantic source status, * missing required values, * mapping warnings, * SQL preview status, * launch button. ## 10. Micro-Interactions * Imported filters should animate into the panel one by one as they are recovered. * Detected Jinja variables should appear as a second wave of recovered context so the user understands execution awareness is expanding. * Detected semantic source candidates should appear as a third wave, with confidence labels and provenance badges. * Every clarified answer should immediately remove or downgrade a validation finding where relevant. * Provenance badges should update live: * Confirmed * Imported * Inferred * AI Draft * Mapped * Needs Review * The primary CTA should change smoothly, not abruptly, as the state progresses. * When launch becomes available, the interface should celebrate readiness subtly but should not hide remaining warnings. * Value transformations proposed by mapping should be visually diffed so the user can spot changes like `Europe → EU` instantly. * The compiled SQL preview should visibly refresh when mapping or parameter values change. * Manual semantic overrides should visually lock the affected field so later imports do not silently replace it. ## 11. Error Experience **Philosophy**: Never show a dead end. Every error state must preserve recovered value, explain what failed, and show the nearest path forward. ### Scenario A: Superset Link Recognized, Filter Extraction Partial * **User Action**: Pastes a valid Superset link with partially recoverable filter state. * **System Response**: * “We recovered the dataset and 3 filters, but 2 saved filters need manual review.” * Missing or low-confidence filters are listed explicitly. * The system still opens the workspace with partial value. * **Recovery**: * review recovered filters, * add missing ones manually, * ask the agent to help reconstruct intent. ### Scenario B: No Clear Business Meaning Can Be Inferred * **User Action**: Submits a technically valid dataset with poor metadata. * **System Response**: * “We could identify the structure of this dataset, but not its business meaning.” * Documentation remains skeletal but usable. * Clarification becomes the obvious next step. * **Recovery**: * launch dialogue mode, * invite domain expert input, * save draft and resume later. ### Scenario C: Required Run-Time Values Missing * **User Action**: Tries to launch with incomplete placeholders. * **System Response**: * launch blocked, * missing values highlighted in-place, * concise summary of what is required. * **Recovery**: * fill values inline, * return to review, * or save incomplete context. ### Scenario D: Conflicting Meanings Across Sources * **User Action**: Reviews a dataset where imported filter context and documented semantics conflict. * **System Response**: * both candidate meanings are shown side-by-side, * neither is silently chosen if confidence is low, * the conflict is framed as a decision, not a failure. * **Recovery**: * user confirms one meaning, * leaves item unresolved, * or marks for expert review. ### Scenario E: User Leaves Mid-Flow * **User Action**: Closes the session before clarification or run prep is complete. * **System Response**: * autosave or explicit save confirmation, * summary of current progress, * preserved unresolved items. * **Recovery**: * resume from last state without repeating prior answers. ### Scenario F: Superset API Compilation Failed * **User Action**: The mapped runtime values are sent for Jinja compilation, but Superset returns a compilation error. * **System Response**: * the **Compiled SQL Preview** switches into an error state instead of pretending preview is available, * the problematic variable or mapping row is highlighted, * the original compilation error returned by Superset is shown in readable form, * launch remains blocked until the issue is resolved. * **Recovery**: * user manually edits the mapped value, * user changes the filter-to-template mapping, * or user asks the agent to help normalize the value format and then regenerates the preview. ### Scenario G: Semantic Sources Conflict * **User Action**: A column has one value from a spreadsheet dictionary, a different value from a reference dataset, and a third AI-generated proposal. * **System Response**: * the interface shows a side-by-side comparison instead of silently choosing one, * the higher-priority source is highlighted as recommended, * the conflict is marked as a warning if user input would be changed. * **Recovery**: * user selects one source, * user keeps the current manual value, * or user applies the recommended higher-confidence source field by field. ## 12. UX for Trust & Transparency Trust is central to this feature. The interface must visibly answer: * Where did this value come from? * Did the system infer this or did the user confirm it? * Which runtime variable will receive this value? * Was the final SQL preview compiled by Superset or just estimated locally? * Did this semantic label come from a dictionary, another dataset, a fuzzy match, or AI generation? * What is still unknown? * What will happen if I proceed anyway? Recommended trust markers: * provenance badge on important fields, * confidence labels for imported or inferred data, * mapping approval status, * “compiled by Superset” status on the SQL preview, * “last changed by” and “changed in clarification” notes, * “used in run” markers for final execution inputs. Conflict rule: * The system must never silently overwrite user-entered semantic values with data from a dictionary, another dataset, or AI generation. * If multiple sources disagree, the interface shows them side by side and either: * asks the user to choose, or * recommends the highest-priority source while clearly marking the recommendation as a warning until approved. * Manual user input remains the most sensitive value and must be preserved unless the user explicitly replaces it. ## 13. UX for Collaboration This workflow often spans more than one person. The UX should support: * sharing documentation draft, * handing a clarification session to a domain expert, * preserving unresolved questions explicitly, * recording who confirmed which meaning, * sharing the reviewed compiled SQL preview as part of execution approval, * sharing which semantic sources were applied and why. The user should be able to leave behind a state that another person can understand in under a minute. ## 14. Tone & Voice * **Style**: Concise, trustworthy, operational, and transparent. * **System behavior language**: * Prefer: “Recovered”, “Confirmed”, “Imported”, “Inferred”, “AI Draft”, “Needs review”, “Ready to run”, “Compiled by Superset” * Avoid: “Magic”, “Solved”, “Guaranteed”, “Auto-fixed” * **Terminology**: * Use “dataset”, “clarification”, “validation finding”, “run context”, “imported filter”, “Jinja variable”, “template mapping”, “compiled SQL preview”, “semantic source”, “provenance”, “assumption”, “confidence”. * Avoid overly technical wording in primary UX surfaces when a business-readable phrase exists. ## 15. UX Success Signals The UX is working if users can, with minimal hesitation: * understand what dataset they are dealing with, * see what was recovered from Superset, * see which Jinja variables were discovered for runtime execution, * understand which semantic source supplied each important field, * reuse existing semantic assets before accepting AI guesses, * tell which values are trustworthy, * review and approve filter-to-template mapping without confusion, * inspect the final compiled SQL before launch, * resolve only the ambiguities that matter, * reach a clear run/no-run decision, * reopen the same context later without confusion.