Files
ss-tools/specs/027-dataset-llm-orchestration/quickstart.md
2026-03-16 23:11:19 +03:00

298 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Quickstart: LLM Dataset Orchestration
**Feature**: [LLM Dataset Orchestration](./spec.md)
**Branch**: `027-dataset-llm-orchestration`
This guide validates the end-to-end workflow for dataset review, semantic enrichment, clarification, preview generation, and controlled SQL Lab launch.
---
## 1. Prerequisites
1. Access to a configured Superset environment with:
- at least one dataset,
- at least one dashboard URL containing reusable analytical context,
- permissions sufficient for dataset inspection and SQL Lab session creation.
2. An active LLM provider configured in ss-tools.
3. Optional semantic sources for enrichment testing:
- uploaded spreadsheet dictionary,
- connected tabular dictionary,
- trusted reference Superset dataset.
4. A test user account with permission to create and resume sessions.
5. A second test user account for ownership/visibility guard validation.
---
## 2. Primary End-to-End Happy Path
### Step 1: Start Review Session
- Navigate to the dataset-review workflow entry from the datasets area.
- Start a session using one of:
- a Superset dashboard link with saved filters,
- a direct dataset selection.
- **Verify**:
- a new session is created,
- the session gets a visible readiness state,
- the first recommended action is explicit.
### Step 2: Observe Progressive Recovery
- Keep the session open while recovery runs.
- **Verify** progressive updates appear for:
- dataset recognition,
- imported filter recovery,
- template/Jinja variable discovery,
- preliminary semantic-source candidates,
- first-pass business summary.
- **Verify** partial work is shown before the whole pipeline finishes.
### Step 3: Review Automatic Analysis
- Inspect the generated business summary and validation findings.
- **Verify**:
- the summary is readable by an operational stakeholder,
- findings are grouped by severity,
- provenance/confidence markers distinguish confirmed/imported/inferred/AI-draft values,
- the next recommended action changes appropriately.
### Step 4: Apply Semantic Source
- Use **Apply semantic source** and choose:
- spreadsheet dictionary,
- connected dictionary,
- trusted reference dataset.
- **Verify**:
- exact matches are applied as stronger candidates,
- fuzzy matches remain reviewable rather than silently applied,
- semantic conflicts are shown side by side,
- field-level manual overrides remain possible.
### Step 5: Confirm Field-Level Semantics
- Manually override one fields `verbose_name` or description.
- Apply another semantic source afterward.
- **Verify**:
- the manual field remains locked,
- imported/generated values do not silently overwrite it,
- provenance changes to manual override.
### Step 6: Guided Clarification
- Enter clarification mode from a session with unresolved findings.
- Answer one question using a suggested option.
- Answer another with a custom value.
- Skip one question.
- Mark one for expert review.
- **Verify**:
- only one active question is shown at a time,
- each question includes “why this matters” and current guess,
- answers update readiness/findings/profile state,
- skipped and expert-review items remain visible as unresolved.
### Step 7: Pause and Resume
- Save or pause the session mid-clarification.
- Leave the page and reopen the session.
- **Verify**:
- the session resumes with prior answers intact,
- the current question or next unresolved question is restored,
- manual semantic decisions and pending mappings are preserved.
### Step 8: Review Mapping and Generate Preview
- Open the mapping review section.
- Approve one warning-level mapping transformation.
- Manually override another transformed mapping value.
- Trigger **Generate SQL Preview**.
- **Verify**:
- all required variables are visible,
- warning approvals are explicit,
- the preview is read-only,
- preview status shows it was compiled by Superset,
- substituted values are visible in the final SQL.
### Step 9: Launch Dataset
- Move the session to `Run Ready`.
- Click **Launch Dataset**.
- **Verify**:
- launch confirmation shows dataset identity, effective filters, parameter values, warnings, and preview status,
- a SQL Lab session reference is returned,
- an audited run context is stored,
- the session moves to run-in-progress or completed state appropriately.
### Step 10: Export Outputs
- Export documentation.
- Export validation findings.
- **Verify**:
- both artifacts are generated,
- artifact metadata or file reference is associated with the session,
- exported output reflects the current reviewed state.
### Step 11: Collaboration and Review
- As User A, add User B as a `reviewer`.
- Access the same session as User B.
- **Verify**:
- User B can view the session state.
- User B can answer clarification questions but cannot approve launch-critical mappings.
- Audit log (if implemented) records which user performed which action.
---
## 3. Negative and Recovery Scenarios
### Scenario A: Invalid Superset Link
- Start a session with a malformed or unsupported link.
- **Verify**:
- intake fails with actionable error messaging,
- no fake recovered context is shown,
- the user can correct input in place.
### Scenario B: Partial Filter Recovery
- Use a link where only some filters can be recovered.
- **Verify**:
- recovered filters are shown,
- unrecovered pieces are explicitly marked,
- session enters `recovery_required` or equivalent partial state,
- workflow remains usable.
### Scenario C: Dataset Without Clear Business Meaning
- Use a dataset with weak metadata and no strong trusted semantic matches.
- **Verify**:
- the summary remains minimal but usable,
- the system does not pretend certainty,
- clarification becomes the recommended next step.
### Scenario D: Conflicting Semantic Sources
- Apply two semantic sources that disagree for the same field.
- **Verify**:
- both candidates are shown side by side,
- recommended source is visible if confidence differs,
- no silent overwrite occurs,
- conflict remains until explicitly resolved.
### Scenario E: Missing Required Runtime Value
- Leave a required template variable unmapped.
- Attempt preview or launch.
- **Verify**:
- preview or launch is blocked according to gate rules,
- missing values are highlighted specifically,
- recommended next action becomes completion/remediation rather than launch.
### Scenario F: Preview Compilation Failure
- Provide a mapping value known to break Superset-side compilation.
- Trigger preview.
- **Verify**:
- preview moves to `failed` state,
- readable Superset error details are shown,
- launch remains blocked,
- the user can navigate back to the problematic mapping/value.
### Scenario G: Preview Staleness After Input Change
- Successfully generate preview.
- Change an approved mapping or required value.
- **Verify**:
- preview state becomes `stale`,
- launch is blocked until preview is regenerated,
- stale state is visible and not hidden.
### Scenario H: SQL Lab Launch Failure
- Simulate or trigger SQL Lab session creation failure.
- **Verify**:
- launch result is marked failed,
- the audit record still preserves attempted run context,
- the session remains recoverable,
- no success redirect is shown.
### Scenario I: Cross-User Access Guard
- Try to open or mutate the first users session from a second user account (without collaborator access).
- **Verify**:
- access is denied,
- no session state leaks to the second user,
- ownership/permission is enforced on view and mutation paths.
---
## 4. UX Invariants to Validate
- [ ] The primary CTA always reflects the current highest-value next step.
- [ ] The launch button stays blocked if:
- [ ] blocking findings remain,
- [ ] required values are missing,
- [ ] warning-level mappings needing approval are unresolved,
- [ ] preview is missing, failed, or stale.
- [ ] Manual semantic overrides are never silently overwritten.
- [ ] Every important semantic value exposes visible provenance.
- [ ] Clarification shows one focused question at a time.
- [ ] Partial recovery preserves usable value and explains what is missing.
- [ ] Preview explicitly indicates it was compiled by Superset.
- [ ] Session resume restores prior state without forcing re-entry.
---
## 5. Suggested Verification by Milestone
### Milestone 1: Sessioned Auto Review
Validate:
- source intake,
- progressive recovery,
- automatic documentation summary,
- typed findings display,
- semantic source application,
- export endpoints.
### Milestone 2: Guided Clarification
Validate:
- clarification question flow,
- answer persistence,
- resume behavior,
- conflict review,
- field-level manual override/lock behavior.
### Milestone 3: Controlled Execution
Validate:
- mapping review,
- explicit warning approvals,
- Superset-side preview,
- preview staleness handling,
- SQL Lab launch,
- audited run context persistence.
---
## 6. Success-Criteria Measurement Hints
These are not implementation metrics by themselves; they are validation hints for pilot runs.
### For [SC-001](./spec.md)
Track how many submitted datasets produce an initial documentation draft without manual reconstruction.
### For [SC-002](./spec.md)
Measure time from session start to first readable summary visible to the user.
### For [SC-003](./spec.md)
Measure the percentage of semantic fields populated from trusted sources before AI-draft fallback.
### For [SC-005](./spec.md)
Measure the percentage of eligible Superset links that produce a non-empty imported filter set usable for review.
### For [SC-007](./spec.md)
Check that launched sessions always persist:
- dataset identity,
- effective filters,
- template params,
- approved mappings,
- preview reference,
- SQL Lab session reference,
- outcome.
### For [SC-008](./spec.md)
Run moderated first-attempt sessions and record whether users complete import → review → clarification (if needed) → preview → launch without facilitator intervention.
---
## 7. Completion Checklist
A Phase 1 design is operationally validated when all are true:
- [ ] Happy-path session can be started and completed.
- [ ] Partial recovery behaves as explicit partial recovery, not silent failure.
- [ ] Clarification is resumable.
- [ ] Semantic conflict review is explicit.
- [ ] Field-level override lock works.
- [ ] Preview is Superset-generated and becomes stale after input mutation.
- [ ] Launch targets SQL Lab only.
- [ ] Export outputs are available.
- [ ] Ownership and guard rails are enforced.