Files
ss-tools/specs/027-dataset-llm-orchestration/quickstart.md
2026-03-16 23:11:19 +03:00

10 KiB
Raw Blame History

Quickstart: LLM Dataset Orchestration

Feature: LLM Dataset Orchestration
Branch: 027-dataset-llm-orchestration

This guide validates the end-to-end workflow for dataset review, semantic enrichment, clarification, preview generation, and controlled SQL Lab launch.


1. Prerequisites

  1. Access to a configured Superset environment with:
    • at least one dataset,
    • at least one dashboard URL containing reusable analytical context,
    • permissions sufficient for dataset inspection and SQL Lab session creation.
  2. An active LLM provider configured in ss-tools.
  3. Optional semantic sources for enrichment testing:
    • uploaded spreadsheet dictionary,
    • connected tabular dictionary,
    • trusted reference Superset dataset.
  4. A test user account with permission to create and resume sessions.
  5. A second test user account for ownership/visibility guard validation.

2. Primary End-to-End Happy Path

Step 1: Start Review Session

  • Navigate to the dataset-review workflow entry from the datasets area.
  • Start a session using one of:
    • a Superset dashboard link with saved filters,
    • a direct dataset selection.
  • Verify:
    • a new session is created,
    • the session gets a visible readiness state,
    • the first recommended action is explicit.

Step 2: Observe Progressive Recovery

  • Keep the session open while recovery runs.
  • Verify progressive updates appear for:
    • dataset recognition,
    • imported filter recovery,
    • template/Jinja variable discovery,
    • preliminary semantic-source candidates,
    • first-pass business summary.
  • Verify partial work is shown before the whole pipeline finishes.

Step 3: Review Automatic Analysis

  • Inspect the generated business summary and validation findings.
  • Verify:
    • the summary is readable by an operational stakeholder,
    • findings are grouped by severity,
    • provenance/confidence markers distinguish confirmed/imported/inferred/AI-draft values,
    • the next recommended action changes appropriately.

Step 4: Apply Semantic Source

  • Use Apply semantic source and choose:
    • spreadsheet dictionary,
    • connected dictionary,
    • trusted reference dataset.
  • Verify:
    • exact matches are applied as stronger candidates,
    • fuzzy matches remain reviewable rather than silently applied,
    • semantic conflicts are shown side by side,
    • field-level manual overrides remain possible.

Step 5: Confirm Field-Level Semantics

  • Manually override one fields verbose_name or description.
  • Apply another semantic source afterward.
  • Verify:
    • the manual field remains locked,
    • imported/generated values do not silently overwrite it,
    • provenance changes to manual override.

Step 6: Guided Clarification

  • Enter clarification mode from a session with unresolved findings.
  • Answer one question using a suggested option.
  • Answer another with a custom value.
  • Skip one question.
  • Mark one for expert review.
  • Verify:
    • only one active question is shown at a time,
    • each question includes “why this matters” and current guess,
    • answers update readiness/findings/profile state,
    • skipped and expert-review items remain visible as unresolved.

Step 7: Pause and Resume

  • Save or pause the session mid-clarification.
  • Leave the page and reopen the session.
  • Verify:
    • the session resumes with prior answers intact,
    • the current question or next unresolved question is restored,
    • manual semantic decisions and pending mappings are preserved.

Step 8: Review Mapping and Generate Preview

  • Open the mapping review section.
  • Approve one warning-level mapping transformation.
  • Manually override another transformed mapping value.
  • Trigger Generate SQL Preview.
  • Verify:
    • all required variables are visible,
    • warning approvals are explicit,
    • the preview is read-only,
    • preview status shows it was compiled by Superset,
    • substituted values are visible in the final SQL.

Step 9: Launch Dataset

  • Move the session to Run Ready.
  • Click Launch Dataset.
  • Verify:
    • launch confirmation shows dataset identity, effective filters, parameter values, warnings, and preview status,
    • a SQL Lab session reference is returned,
    • an audited run context is stored,
    • the session moves to run-in-progress or completed state appropriately.

Step 10: Export Outputs

  • Export documentation.
  • Export validation findings.
  • Verify:
    • both artifacts are generated,
    • artifact metadata or file reference is associated with the session,
    • exported output reflects the current reviewed state.

Step 11: Collaboration and Review

  • As User A, add User B as a reviewer.
  • Access the same session as User B.
  • Verify:
    • User B can view the session state.
    • User B can answer clarification questions but cannot approve launch-critical mappings.
    • Audit log (if implemented) records which user performed which action.

3. Negative and Recovery Scenarios

  • Start a session with a malformed or unsupported link.
  • Verify:
    • intake fails with actionable error messaging,
    • no fake recovered context is shown,
    • the user can correct input in place.

Scenario B: Partial Filter Recovery

  • Use a link where only some filters can be recovered.
  • Verify:
    • recovered filters are shown,
    • unrecovered pieces are explicitly marked,
    • session enters recovery_required or equivalent partial state,
    • workflow remains usable.

Scenario C: Dataset Without Clear Business Meaning

  • Use a dataset with weak metadata and no strong trusted semantic matches.
  • Verify:
    • the summary remains minimal but usable,
    • the system does not pretend certainty,
    • clarification becomes the recommended next step.

Scenario D: Conflicting Semantic Sources

  • Apply two semantic sources that disagree for the same field.
  • Verify:
    • both candidates are shown side by side,
    • recommended source is visible if confidence differs,
    • no silent overwrite occurs,
    • conflict remains until explicitly resolved.

Scenario E: Missing Required Runtime Value

  • Leave a required template variable unmapped.
  • Attempt preview or launch.
  • Verify:
    • preview or launch is blocked according to gate rules,
    • missing values are highlighted specifically,
    • recommended next action becomes completion/remediation rather than launch.

Scenario F: Preview Compilation Failure

  • Provide a mapping value known to break Superset-side compilation.
  • Trigger preview.
  • Verify:
    • preview moves to failed state,
    • readable Superset error details are shown,
    • launch remains blocked,
    • the user can navigate back to the problematic mapping/value.

Scenario G: Preview Staleness After Input Change

  • Successfully generate preview.
  • Change an approved mapping or required value.
  • Verify:
    • preview state becomes stale,
    • launch is blocked until preview is regenerated,
    • stale state is visible and not hidden.

Scenario H: SQL Lab Launch Failure

  • Simulate or trigger SQL Lab session creation failure.
  • Verify:
    • launch result is marked failed,
    • the audit record still preserves attempted run context,
    • the session remains recoverable,
    • no success redirect is shown.

Scenario I: Cross-User Access Guard

  • Try to open or mutate the first users session from a second user account (without collaborator access).
  • Verify:
    • access is denied,
    • no session state leaks to the second user,
    • ownership/permission is enforced on view and mutation paths.

4. UX Invariants to Validate

  • The primary CTA always reflects the current highest-value next step.
  • The launch button stays blocked if:
    • blocking findings remain,
    • required values are missing,
    • warning-level mappings needing approval are unresolved,
    • preview is missing, failed, or stale.
  • Manual semantic overrides are never silently overwritten.
  • Every important semantic value exposes visible provenance.
  • Clarification shows one focused question at a time.
  • Partial recovery preserves usable value and explains what is missing.
  • Preview explicitly indicates it was compiled by Superset.
  • Session resume restores prior state without forcing re-entry.

5. Suggested Verification by Milestone

Milestone 1: Sessioned Auto Review

Validate:

  • source intake,
  • progressive recovery,
  • automatic documentation summary,
  • typed findings display,
  • semantic source application,
  • export endpoints.

Milestone 2: Guided Clarification

Validate:

  • clarification question flow,
  • answer persistence,
  • resume behavior,
  • conflict review,
  • field-level manual override/lock behavior.

Milestone 3: Controlled Execution

Validate:

  • mapping review,
  • explicit warning approvals,
  • Superset-side preview,
  • preview staleness handling,
  • SQL Lab launch,
  • audited run context persistence.

6. Success-Criteria Measurement Hints

These are not implementation metrics by themselves; they are validation hints for pilot runs.

For SC-001

Track how many submitted datasets produce an initial documentation draft without manual reconstruction.

For SC-002

Measure time from session start to first readable summary visible to the user.

For SC-003

Measure the percentage of semantic fields populated from trusted sources before AI-draft fallback.

For SC-005

Measure the percentage of eligible Superset links that produce a non-empty imported filter set usable for review.

For SC-007

Check that launched sessions always persist:

  • dataset identity,
  • effective filters,
  • template params,
  • approved mappings,
  • preview reference,
  • SQL Lab session reference,
  • outcome.

For SC-008

Run moderated first-attempt sessions and record whether users complete import → review → clarification (if needed) → preview → launch without facilitator intervention.


7. Completion Checklist

A Phase 1 design is operationally validated when all are true:

  • Happy-path session can be started and completed.
  • Partial recovery behaves as explicit partial recovery, not silent failure.
  • Clarification is resumable.
  • Semantic conflict review is explicit.
  • Field-level override lock works.
  • Preview is Superset-generated and becomes stale after input mutation.
  • Launch targets SQL Lab only.
  • Export outputs are available.
  • Ownership and guard rails are enforced.