298 lines
10 KiB
Markdown
298 lines
10 KiB
Markdown
# Quickstart: LLM Dataset Orchestration
|
||
|
||
**Feature**: [LLM Dataset Orchestration](./spec.md)
|
||
**Branch**: `027-dataset-llm-orchestration`
|
||
|
||
This guide validates the end-to-end workflow for dataset review, semantic enrichment, clarification, preview generation, and controlled SQL Lab launch.
|
||
|
||
---
|
||
|
||
## 1. Prerequisites
|
||
|
||
1. Access to a configured Superset environment with:
|
||
- at least one dataset,
|
||
- at least one dashboard URL containing reusable analytical context,
|
||
- permissions sufficient for dataset inspection and SQL Lab session creation.
|
||
2. An active LLM provider configured in ss-tools.
|
||
3. Optional semantic sources for enrichment testing:
|
||
- uploaded spreadsheet dictionary,
|
||
- connected tabular dictionary,
|
||
- trusted reference Superset dataset.
|
||
4. A test user account with permission to create and resume sessions.
|
||
5. A second test user account for ownership/visibility guard validation.
|
||
|
||
---
|
||
|
||
## 2. Primary End-to-End Happy Path
|
||
|
||
### Step 1: Start Review Session
|
||
- Navigate to the dataset-review workflow entry from the datasets area.
|
||
- Start a session using one of:
|
||
- a Superset dashboard link with saved filters,
|
||
- a direct dataset selection.
|
||
- **Verify**:
|
||
- a new session is created,
|
||
- the session gets a visible readiness state,
|
||
- the first recommended action is explicit.
|
||
|
||
### Step 2: Observe Progressive Recovery
|
||
- Keep the session open while recovery runs.
|
||
- **Verify** progressive updates appear for:
|
||
- dataset recognition,
|
||
- imported filter recovery,
|
||
- template/Jinja variable discovery,
|
||
- preliminary semantic-source candidates,
|
||
- first-pass business summary.
|
||
- **Verify** partial work is shown before the whole pipeline finishes.
|
||
|
||
### Step 3: Review Automatic Analysis
|
||
- Inspect the generated business summary and validation findings.
|
||
- **Verify**:
|
||
- the summary is readable by an operational stakeholder,
|
||
- findings are grouped by severity,
|
||
- provenance/confidence markers distinguish confirmed/imported/inferred/AI-draft values,
|
||
- the next recommended action changes appropriately.
|
||
|
||
### Step 4: Apply Semantic Source
|
||
- Use **Apply semantic source** and choose:
|
||
- spreadsheet dictionary,
|
||
- connected dictionary,
|
||
- trusted reference dataset.
|
||
- **Verify**:
|
||
- exact matches are applied as stronger candidates,
|
||
- fuzzy matches remain reviewable rather than silently applied,
|
||
- semantic conflicts are shown side by side,
|
||
- field-level manual overrides remain possible.
|
||
|
||
### Step 5: Confirm Field-Level Semantics
|
||
- Manually override one field’s `verbose_name` or description.
|
||
- Apply another semantic source afterward.
|
||
- **Verify**:
|
||
- the manual field remains locked,
|
||
- imported/generated values do not silently overwrite it,
|
||
- provenance changes to manual override.
|
||
|
||
### Step 6: Guided Clarification
|
||
- Enter clarification mode from a session with unresolved findings.
|
||
- Answer one question using a suggested option.
|
||
- Answer another with a custom value.
|
||
- Skip one question.
|
||
- Mark one for expert review.
|
||
- **Verify**:
|
||
- only one active question is shown at a time,
|
||
- each question includes “why this matters” and current guess,
|
||
- answers update readiness/findings/profile state,
|
||
- skipped and expert-review items remain visible as unresolved.
|
||
|
||
### Step 7: Pause and Resume
|
||
- Save or pause the session mid-clarification.
|
||
- Leave the page and reopen the session.
|
||
- **Verify**:
|
||
- the session resumes with prior answers intact,
|
||
- the current question or next unresolved question is restored,
|
||
- manual semantic decisions and pending mappings are preserved.
|
||
|
||
### Step 8: Review Mapping and Generate Preview
|
||
- Open the mapping review section.
|
||
- Approve one warning-level mapping transformation.
|
||
- Manually override another transformed mapping value.
|
||
- Trigger **Generate SQL Preview**.
|
||
- **Verify**:
|
||
- all required variables are visible,
|
||
- warning approvals are explicit,
|
||
- the preview is read-only,
|
||
- preview status shows it was compiled by Superset,
|
||
- substituted values are visible in the final SQL.
|
||
|
||
### Step 9: Launch Dataset
|
||
- Move the session to `Run Ready`.
|
||
- Click **Launch Dataset**.
|
||
- **Verify**:
|
||
- launch confirmation shows dataset identity, effective filters, parameter values, warnings, and preview status,
|
||
- a SQL Lab session reference is returned,
|
||
- an audited run context is stored,
|
||
- the session moves to run-in-progress or completed state appropriately.
|
||
|
||
### Step 10: Export Outputs
|
||
- Export documentation.
|
||
- Export validation findings.
|
||
- **Verify**:
|
||
- both artifacts are generated,
|
||
- artifact metadata or file reference is associated with the session,
|
||
- exported output reflects the current reviewed state.
|
||
|
||
### Step 11: Collaboration and Review
|
||
- As User A, add User B as a `reviewer`.
|
||
- Access the same session as User B.
|
||
- **Verify**:
|
||
- User B can view the session state.
|
||
- User B can answer clarification questions but cannot approve launch-critical mappings.
|
||
- Audit log (if implemented) records which user performed which action.
|
||
|
||
---
|
||
|
||
## 3. Negative and Recovery Scenarios
|
||
|
||
### Scenario A: Invalid Superset Link
|
||
- Start a session with a malformed or unsupported link.
|
||
- **Verify**:
|
||
- intake fails with actionable error messaging,
|
||
- no fake recovered context is shown,
|
||
- the user can correct input in place.
|
||
|
||
### Scenario B: Partial Filter Recovery
|
||
- Use a link where only some filters can be recovered.
|
||
- **Verify**:
|
||
- recovered filters are shown,
|
||
- unrecovered pieces are explicitly marked,
|
||
- session enters `recovery_required` or equivalent partial state,
|
||
- workflow remains usable.
|
||
|
||
### Scenario C: Dataset Without Clear Business Meaning
|
||
- Use a dataset with weak metadata and no strong trusted semantic matches.
|
||
- **Verify**:
|
||
- the summary remains minimal but usable,
|
||
- the system does not pretend certainty,
|
||
- clarification becomes the recommended next step.
|
||
|
||
### Scenario D: Conflicting Semantic Sources
|
||
- Apply two semantic sources that disagree for the same field.
|
||
- **Verify**:
|
||
- both candidates are shown side by side,
|
||
- recommended source is visible if confidence differs,
|
||
- no silent overwrite occurs,
|
||
- conflict remains until explicitly resolved.
|
||
|
||
### Scenario E: Missing Required Runtime Value
|
||
- Leave a required template variable unmapped.
|
||
- Attempt preview or launch.
|
||
- **Verify**:
|
||
- preview or launch is blocked according to gate rules,
|
||
- missing values are highlighted specifically,
|
||
- recommended next action becomes completion/remediation rather than launch.
|
||
|
||
### Scenario F: Preview Compilation Failure
|
||
- Provide a mapping value known to break Superset-side compilation.
|
||
- Trigger preview.
|
||
- **Verify**:
|
||
- preview moves to `failed` state,
|
||
- readable Superset error details are shown,
|
||
- launch remains blocked,
|
||
- the user can navigate back to the problematic mapping/value.
|
||
|
||
### Scenario G: Preview Staleness After Input Change
|
||
- Successfully generate preview.
|
||
- Change an approved mapping or required value.
|
||
- **Verify**:
|
||
- preview state becomes `stale`,
|
||
- launch is blocked until preview is regenerated,
|
||
- stale state is visible and not hidden.
|
||
|
||
### Scenario H: SQL Lab Launch Failure
|
||
- Simulate or trigger SQL Lab session creation failure.
|
||
- **Verify**:
|
||
- launch result is marked failed,
|
||
- the audit record still preserves attempted run context,
|
||
- the session remains recoverable,
|
||
- no success redirect is shown.
|
||
|
||
### Scenario I: Cross-User Access Guard
|
||
- Try to open or mutate the first user’s session from a second user account (without collaborator access).
|
||
- **Verify**:
|
||
- access is denied,
|
||
- no session state leaks to the second user,
|
||
- ownership/permission is enforced on view and mutation paths.
|
||
|
||
---
|
||
|
||
## 4. UX Invariants to Validate
|
||
|
||
- [ ] The primary CTA always reflects the current highest-value next step.
|
||
- [ ] The launch button stays blocked if:
|
||
- [ ] blocking findings remain,
|
||
- [ ] required values are missing,
|
||
- [ ] warning-level mappings needing approval are unresolved,
|
||
- [ ] preview is missing, failed, or stale.
|
||
- [ ] Manual semantic overrides are never silently overwritten.
|
||
- [ ] Every important semantic value exposes visible provenance.
|
||
- [ ] Clarification shows one focused question at a time.
|
||
- [ ] Partial recovery preserves usable value and explains what is missing.
|
||
- [ ] Preview explicitly indicates it was compiled by Superset.
|
||
- [ ] Session resume restores prior state without forcing re-entry.
|
||
|
||
---
|
||
|
||
## 5. Suggested Verification by Milestone
|
||
|
||
### Milestone 1: Sessioned Auto Review
|
||
Validate:
|
||
- source intake,
|
||
- progressive recovery,
|
||
- automatic documentation summary,
|
||
- typed findings display,
|
||
- semantic source application,
|
||
- export endpoints.
|
||
|
||
### Milestone 2: Guided Clarification
|
||
Validate:
|
||
- clarification question flow,
|
||
- answer persistence,
|
||
- resume behavior,
|
||
- conflict review,
|
||
- field-level manual override/lock behavior.
|
||
|
||
### Milestone 3: Controlled Execution
|
||
Validate:
|
||
- mapping review,
|
||
- explicit warning approvals,
|
||
- Superset-side preview,
|
||
- preview staleness handling,
|
||
- SQL Lab launch,
|
||
- audited run context persistence.
|
||
|
||
---
|
||
|
||
## 6. Success-Criteria Measurement Hints
|
||
|
||
These are not implementation metrics by themselves; they are validation hints for pilot runs.
|
||
|
||
### For [SC-001](./spec.md)
|
||
Track how many submitted datasets produce an initial documentation draft without manual reconstruction.
|
||
|
||
### For [SC-002](./spec.md)
|
||
Measure time from session start to first readable summary visible to the user.
|
||
|
||
### For [SC-003](./spec.md)
|
||
Measure the percentage of semantic fields populated from trusted sources before AI-draft fallback.
|
||
|
||
### For [SC-005](./spec.md)
|
||
Measure the percentage of eligible Superset links that produce a non-empty imported filter set usable for review.
|
||
|
||
### For [SC-007](./spec.md)
|
||
Check that launched sessions always persist:
|
||
- dataset identity,
|
||
- effective filters,
|
||
- template params,
|
||
- approved mappings,
|
||
- preview reference,
|
||
- SQL Lab session reference,
|
||
- outcome.
|
||
|
||
### For [SC-008](./spec.md)
|
||
Run moderated first-attempt sessions and record whether users complete import → review → clarification (if needed) → preview → launch without facilitator intervention.
|
||
|
||
---
|
||
|
||
## 7. Completion Checklist
|
||
|
||
A Phase 1 design is operationally validated when all are true:
|
||
|
||
- [ ] Happy-path session can be started and completed.
|
||
- [ ] Partial recovery behaves as explicit partial recovery, not silent failure.
|
||
- [ ] Clarification is resumable.
|
||
- [ ] Semantic conflict review is explicit.
|
||
- [ ] Field-level override lock works.
|
||
- [ ] Preview is Superset-generated and becomes stale after input mutation.
|
||
- [ ] Launch targets SQL Lab only.
|
||
- [ ] Export outputs are available.
|
||
- [ ] Ownership and guard rails are enforced. |