506 lines
21 KiB
Markdown
506 lines
21 KiB
Markdown
# Research: LLM Dataset Orchestration
|
||
|
||
**Feature**: [LLM Dataset Orchestration](./spec.md)
|
||
**Branch**: `027-dataset-llm-orchestration`
|
||
**Date**: 2026-03-16
|
||
|
||
## Scope
|
||
|
||
This document resolves the Phase 0 technical unknowns identified in [`specs/027-dataset-llm-orchestration/plan.md`](./plan.md) and establishes implementation decisions for:
|
||
- Superset link parsing and context recovery
|
||
- runtime variable and Jinja discovery
|
||
- compiled SQL preview strategy
|
||
- SQL Lab launch handoff
|
||
- semantic source resolution architecture
|
||
- clarification session architecture
|
||
- typed API scope
|
||
- persistence, testing, and rollout strategy
|
||
|
||
---
|
||
|
||
## 1. Superset Link Intake and Native Filter Recovery
|
||
|
||
### Decision
|
||
Use a dedicated backend adapter, `SupersetContextExtractor`, to parse supported Superset URLs and recover context in two layers:
|
||
1. **URL-level recovery** from dashboard, dataset, and query parameters in the provided link
|
||
2. **Superset API enrichment** using the resolved dashboard/dataset identifiers to recover saved filter context and related dataset metadata
|
||
|
||
The recovered filter set will be stored as an `ImportedFilterSet` with provenance per value:
|
||
- `superset_native`
|
||
- `superset_url`
|
||
- `manual`
|
||
- `inferred`
|
||
|
||
If only partial filter recovery is possible, the session enters `recovery_required` rather than failing closed.
|
||
|
||
### Rationale
|
||
The UX requires recovery to feel progressive rather than brittle. A pure URL parser would be too fragile and version-specific. A pure API lookup would miss context encoded in the link itself. A two-layer strategy preserves value even when one source is incomplete.
|
||
|
||
This also matches the spec’s requirement that partial recovery remain usable and explicitly explained.
|
||
|
||
### Alternatives considered
|
||
- **Parse only URL query parameters**
|
||
Rejected because it is too dependent on frontend link shape and misses server-side saved context.
|
||
- **Resolve everything only through dashboard API after identifying the dashboard**
|
||
Rejected because some context lives directly in the URL and should be captured before API reconciliation.
|
||
- **Block session start if any filter cannot be recovered**
|
||
Rejected because it violates the UX principle of progressive value and FR-025.
|
||
|
||
---
|
||
|
||
## 2. Runtime Template Variable and Jinja Discovery
|
||
|
||
### Decision
|
||
Detect runtime execution variables using a hybrid discovery pipeline:
|
||
1. Load dataset detail from Superset via existing [`backend/src/core/superset_client.py`](backend/src/core/superset_client.py)
|
||
2. Extract `sql`, `is_sqllab_view`, metric expressions, and other query-bearing fields where available
|
||
3. Apply backend parsing rules for:
|
||
- `{{ ... }}`
|
||
- `filter_values('...')`
|
||
- common template parameter references
|
||
- Jinja variables in SQL text and metric expressions
|
||
4. Normalize discovered variables into typed `TemplateVariable` entries with:
|
||
- variable name
|
||
- source expression
|
||
- inferred kind (`native_filter`, `parameter`, `derived`, `unknown`)
|
||
- required/optional status
|
||
- mapping status
|
||
|
||
The parser will not execute Jinja locally. It only discovers variables and structure.
|
||
|
||
### Rationale
|
||
The UX and specification both require execution transparency, but also require Superset to remain the source of truth for final SQL compilation. Discovery is safe locally; execution is not. Separating variable discovery from compilation preserves WYSIWWR and avoids local behavior drift.
|
||
|
||
### Alternatives considered
|
||
- **Use only string regex over SQL text**
|
||
Rejected because it is simple but insufficient for richer query-bearing payloads.
|
||
- **Execute local Jinja rendering to discover required params**
|
||
Rejected because it conflicts with the UX principle that only Superset defines final execution.
|
||
- **Rely on LLM to infer variables from SQL text**
|
||
Rejected because variable discovery must be deterministic and auditable.
|
||
|
||
---
|
||
|
||
## 3. Compiled SQL Preview Strategy
|
||
|
||
### Decision
|
||
Compiled preview will be generated only through a dedicated Superset-side compilation call path managed by `SupersetCompilationAdapter`. The adapter will:
|
||
1. receive the candidate run context,
|
||
2. submit it to a Superset-compatible preview/compilation flow,
|
||
3. store the exact returned compiled SQL plus metadata,
|
||
4. mark preview state as `ready`, `pending`, `failed`, or `stale`.
|
||
|
||
Launch remains blocked unless preview state is `ready` and bound to the current mapping/version snapshot.
|
||
|
||
#### Superset Version Compatibility Matrix
|
||
The system will explicitly check `SupersetClient.get_version()` during intake to determine available features:
|
||
- **v3.x+**: Supports all features (context recovery, native filter import, SQL compilation preview, SQL Lab launch).
|
||
- **v2.x**: Supports best-effort context recovery; SQL compilation preview may require fallback to Manual Launch.
|
||
- **Legacy/Unknown**: Automatic recovery disabled; session enters `recovery_required` mode immediately.
|
||
|
||
#### Fallback Strategy (Preview Availability < 100%)
|
||
If Superset-side compilation is unavailable (e.g., version mismatch or unsupported dataset type):
|
||
- The system will allow **Manual Launch Approval** by a user with `dataset:session:approve` permission.
|
||
- The UI will explicitly display: "Native Preview Unavailable - Manual Validation Required."
|
||
- The audit record will mark the run as "Launched without Compiled Preview."
|
||
|
||
### Rationale
|
||
This is a direct requirement from FR-029 and the UX reference’s WYSIWWR principle. The compiled preview must be the same source of truth as the real execution path. Even a “close enough” local approximation would undermine trust and violate the spec.
|
||
|
||
### Alternatives considered
|
||
- **Local best-effort Jinja compilation with warning banner**
|
||
Rejected because the spec explicitly forbids fallback to unverified local approximation.
|
||
- **No preview, only launch-time validation**
|
||
Rejected because it breaks both UX and explicit launch gate requirements.
|
||
- **Preview generated by the LLM**
|
||
Rejected because the LLM must not write or edit SQL directly.
|
||
|
||
---
|
||
|
||
## 4. SQL Lab Launch as Canonical Audited Target
|
||
|
||
### Decision
|
||
Treat Superset SQL Lab as the primary canonical launch target for this feature. `DatasetReviewOrchestrator.launch_dataset` will:
|
||
1. verify preview readiness,
|
||
2. verify required-value completeness,
|
||
3. verify warning approvals,
|
||
4. create or bind a SQL Lab execution session in Superset,
|
||
5. persist a `DatasetRunContext` snapshot with session reference, approved mappings, semantic decisions, and preview fingerprint.
|
||
|
||
#### Fallback path: Export Prepared Context
|
||
If SQL Lab session creation is unavailable:
|
||
- Provide an **Export Prepared Run Context (JSON/SQL)** action.
|
||
- This allows users to manually take the compiled SQL (if available) or the effective parameter payload to Superset.
|
||
- The system still records the `DatasetRunContext` as "Exported for Manual Launch" to preserve audit integrity.
|
||
|
||
### Rationale
|
||
The specification explicitly names SQL Lab as canonical. The system must preserve auditability and replay, not merely trigger execution. Persisting the SQL Lab reference and run context as a single audited boundary supports both operational trust and later reopening.
|
||
|
||
### Alternatives considered
|
||
- **Run query outside SQL Lab through another Superset endpoint**
|
||
Rejected because it violates the canonical target clarified in the spec.
|
||
- **Redirect user to a generic analytical view with params**
|
||
Rejected because it is weaker for audit and replay.
|
||
- **Allow fallback execution when SQL Lab creation fails**
|
||
Rejected because the feature’s safety model depends on the audited target.
|
||
|
||
---
|
||
|
||
## 5. Semantic Enrichment Architecture
|
||
|
||
### Decision
|
||
Split semantic enrichment into a dedicated `SemanticSourceResolver` domain module instead of embedding all logic inside `DatasetReviewOrchestrator`.
|
||
|
||
Planned responsibilities:
|
||
- `resolve_from_file`
|
||
- `resolve_from_dictionary`
|
||
- `resolve_from_reference_dataset`
|
||
- `rank_candidates`
|
||
- `detect_conflicts`
|
||
- `apply_field_decision`
|
||
|
||
Candidate ranking will follow the required confidence hierarchy:
|
||
1. exact dictionary/file match
|
||
2. trusted reference dataset match
|
||
3. fuzzy semantic match
|
||
4. AI-generated draft
|
||
|
||
Manual overrides are represented as a lockable terminal provenance state and cannot be overwritten implicitly.
|
||
|
||
### Rationale
|
||
The feature’s semantic logic is too substantial to remain an untyped orchestrator subroutine. The reviewer feedback correctly identified this as a likely god-object risk. A dedicated resolver improves testability, contract clarity, and future extensibility.
|
||
|
||
### Alternatives considered
|
||
- **Keep enrichment inside orchestrator only**
|
||
Rejected because it centralizes too many responsibilities in a single Complexity 5 module.
|
||
- **Push matching into frontend only**
|
||
Rejected because provenance, confidence, and audit decisions must be persisted server-side.
|
||
- **Use only AI generation when dictionaries are absent**
|
||
Rejected because “reuse before invention” is a core UX principle.
|
||
|
||
---
|
||
|
||
## 6. Clarification Session Architecture
|
||
|
||
### Decision
|
||
Create a dedicated `ClarificationEngine` service with persistent `ClarificationSession` state. The engine will manage:
|
||
- unresolved-item prioritization,
|
||
- one-question-at-a-time question generation,
|
||
- best-guess suggestion packaging,
|
||
- answer recording,
|
||
- progress summarization,
|
||
- resume-from-last-open-item behavior.
|
||
|
||
Answers are persisted before session state advancement.
|
||
|
||
#### LLM Quality & Feedback Mechanism
|
||
To maintain high trust in AI-generated summaries, match recommendations, and questions:
|
||
- **User Feedback**: UI will include inline feedback (👍/👎) for all AI-drafted content.
|
||
- **Audit Logging**: Feedback is persisted in `SemanticFieldDecision` and `ClarificationAnswer` records for prompt engineering improvement.
|
||
- **Confidence Triage**: Low-confidence LLM outputs (score < 0.7) will be automatically marked for "expert review" rather than presented as confirmed suggestions.
|
||
|
||
#### Question Generation Strategy
|
||
Question templates and logic will be developed in Milestone 2:
|
||
- **Evaluation Criteria**: Questions must be judged on: relevancy, business impact explanation, and actionability of suggestions.
|
||
- **Templates**: Standardized prompts for field ambiguity, filter conflict, and missing run-time values.
|
||
- **Human-in-the-loop**: High-risk semantic conflicts will prioritize "expert review" options.
|
||
|
||
### Rationale
|
||
Guided clarification is not a simple endpoint action; it is a stateful interaction model with resumability and audit expectations. Treating it as a first-class module aligns with FR-012 to FR-020 and supports deterministic behavior rather than ad hoc agent chat state.
|
||
|
||
### Alternatives considered
|
||
- **Use assistant conversation history as the only clarification state**
|
||
Rejected because feature state and conversational transcript are not the same thing.
|
||
- **Generate all questions at once**
|
||
Rejected because the UX explicitly requires one question at a time.
|
||
- **Do not persist skipped questions separately**
|
||
Rejected because resume and expert-review flows need explicit unresolved-state semantics.
|
||
|
||
---
|
||
|
||
## 7. API Contract Granularity
|
||
|
||
### Decision
|
||
Expand the API beyond the current session/preview/launch skeleton into a typed session contract set with explicit lifecycle and field-level operations.
|
||
|
||
Required endpoint groups:
|
||
1. **Session lifecycle**
|
||
- `POST /sessions`
|
||
- `GET /sessions`
|
||
- `GET /sessions/{id}`
|
||
- `PATCH /sessions/{id}`
|
||
- `DELETE /sessions/{id}`
|
||
|
||
2. **Semantic source and field-level semantic actions**
|
||
- `POST /sessions/{id}/semantic-source`
|
||
- `PATCH /sessions/{id}/fields/{field_id}/semantic`
|
||
- `POST /sessions/{id}/fields/{field_id}/lock`
|
||
- `POST /sessions/{id}/fields/{field_id}/unlock`
|
||
|
||
3. **Clarification**
|
||
- `GET /sessions/{id}/clarification`
|
||
- `POST /sessions/{id}/clarification/answers`
|
||
- `POST /sessions/{id}/clarification/resume`
|
||
|
||
4. **Mapping review**
|
||
- `GET /sessions/{id}/mappings`
|
||
- `PATCH /sessions/{id}/mappings/{mapping_id}`
|
||
- `POST /sessions/{id}/mappings/{mapping_id}/approve`
|
||
|
||
5. **Preview and launch**
|
||
- `POST /sessions/{id}/preview`
|
||
- `POST /sessions/{id}/launch`
|
||
|
||
6. **Exports**
|
||
- `GET /sessions/{id}/exports/documentation`
|
||
- `GET /sessions/{id}/exports/validation`
|
||
|
||
Core schemas must be fully typed, especially:
|
||
- `ValidationFinding`
|
||
- `ImportedFilter`
|
||
- `TemplateVariable`
|
||
- `MappingDecision`
|
||
- `SemanticFieldEntry`
|
||
- `ClarificationQuestion`
|
||
- `DatasetRunContextSummary`
|
||
- `CompiledPreview`
|
||
|
||
### Rationale
|
||
The existing API draft is structurally correct but too loose for backend/frontend parallel work. Typed schemas are a blocker for safe implementation, especially given mapping approvals, field-level overrides, and resumable state.
|
||
|
||
### Alternatives considered
|
||
- **Minimal object blobs in `SessionDetail`**
|
||
Rejected because they prevent stable frontend typing and contract validation.
|
||
- **Expose only session-level mutations**
|
||
Rejected because FR-040 and mapping approval UX require field-level actions.
|
||
- **Defer exports until later**
|
||
Rejected because FR-035 requires exportable outputs as part of the feature.
|
||
|
||
---
|
||
|
||
## 8. Persistence Model
|
||
|
||
### Decision
|
||
Persist dataset review workflow as first-class application entities rather than storing everything in ad hoc task payloads.
|
||
|
||
Persistence boundaries will include:
|
||
- `DatasetReviewSession`
|
||
- `DatasetProfileSnapshot`
|
||
- `ValidationFindingRecord`
|
||
- `SemanticFieldDecision`
|
||
- `ImportedFilterRecord`
|
||
- `TemplateVariableRecord`
|
||
- `MappingApprovalRecord`
|
||
- `ClarificationSessionRecord`
|
||
- `ClarificationAnswerRecord`
|
||
- `CompiledPreviewRecord`
|
||
- `DatasetRunContextRecord`
|
||
|
||
Long-running task execution remains in the existing task system, but tasks reference session identifiers rather than replacing session state.
|
||
|
||
### Rationale
|
||
The current task system is good for observability and async work, but not sufficient as the canonical store for resumable feature state. The feature requires reopening, editing, exporting, and replaying a session independently of task lifecycle.
|
||
|
||
### Alternatives considered
|
||
- **Store all workflow state inside task params/results**
|
||
Rejected because tasks are execution records, not editable session aggregates.
|
||
- **Use filesystem-only session snapshots**
|
||
Rejected because this weakens structured querying and RBAC alignment.
|
||
- **Persist only final run context**
|
||
Rejected because clarification and review resume need mid-flow state.
|
||
|
||
---
|
||
|
||
## 9. Async and Observability Strategy
|
||
|
||
### Decision
|
||
Use `TaskManager` for expensive or multi-step operations:
|
||
- initial deep context recovery,
|
||
- semantic enrichment imports over large sources,
|
||
- preview generation when upstream latency is non-trivial,
|
||
- launch handoff if it includes multi-step Superset orchestration.
|
||
|
||
Short synchronous operations may still exist for simple field edits and approvals, but session state transitions must remain observable and reflected in the session model.
|
||
|
||
### Rationale
|
||
This aligns with the repository constitution and existing operational model. It also supports progressive UX milestones and live readiness changes.
|
||
|
||
### Alternatives considered
|
||
- **Make all session operations synchronous**
|
||
Rejected because upstream Superset and enrichment work may exceed acceptable UI latency.
|
||
- **Use frontend polling only with no task integration**
|
||
Rejected because the repository already has stronger task observability patterns.
|
||
|
||
---
|
||
|
||
## 10. Frontend Integration Pattern
|
||
|
||
### Decision
|
||
Implement the feature on top of the existing [`frontend/src/lib/api.js`](frontend/src/lib/api.js) wrapper conventions and Svelte 5 rune-based patterns. Add dedicated wrapper functions under the frontend API layer for dataset orchestration rather than embedding endpoint strings inside components.
|
||
|
||
The main route will be a dedicated review workspace separate from the current [`frontend/src/routes/datasets/+page.svelte`](frontend/src/routes/datasets/+page.svelte), which remains the dataset hub/list page.
|
||
|
||
### Rationale
|
||
The current dataset page already serves as a listing and bulk-operation view. The new orchestration flow is a multi-state workspace and should not overload the existing hub component. Using wrapper functions preserves constitution compliance and testability.
|
||
|
||
### Alternatives considered
|
||
- **Extend the existing dataset hub page for the whole flow**
|
||
Rejected because it would overload a currently unrelated list page with a large state machine.
|
||
- **Call endpoints directly with native `fetch` in new components**
|
||
Rejected by constitution and existing frontend architecture.
|
||
- **Build the whole flow inside assistant chat only**
|
||
Rejected because the UX reference defines a structured workspace, not chat-only interaction.
|
||
|
||
---
|
||
|
||
## 11. Testing Strategy
|
||
|
||
### Decision
|
||
Use a three-layer testing strategy:
|
||
|
||
#### Backend
|
||
- **Unit tests**
|
||
- semantic ranking/conflict rules,
|
||
- clarification prioritization,
|
||
- mapping approval guards,
|
||
- preview stale/ready rules,
|
||
- launch blocking conditions.
|
||
|
||
- **Integration tests**
|
||
- session persistence,
|
||
- Superset context extraction with fixtures,
|
||
- compiled preview orchestration,
|
||
- SQL Lab launch handoff,
|
||
- export generation.
|
||
|
||
- **API contract tests**
|
||
- typed schema responses,
|
||
- RBAC guards,
|
||
- session lifecycle endpoints,
|
||
- field-level semantic operations,
|
||
- mapping approval endpoints.
|
||
|
||
#### Frontend
|
||
- **Component tests**
|
||
- source intake states,
|
||
- validation triage,
|
||
- semantic conflict view,
|
||
- clarification dialog,
|
||
- mapping review,
|
||
- compiled preview stale/error behavior,
|
||
- launch summary gating.
|
||
|
||
- **Route integration tests**
|
||
- progressive loading,
|
||
- save/resume flows,
|
||
- recovery-required state,
|
||
- warning approval requirements.
|
||
|
||
#### External dependency handling
|
||
- mock Superset endpoints with version-shaped fixtures,
|
||
- mock LLM structured outputs deterministically,
|
||
- verify degraded paths explicitly instead of pretending success.
|
||
|
||
### Rationale
|
||
The feature blends deterministic parsing, stateful orchestration, and external-service dependencies. It needs more than happy-path smoke tests.
|
||
|
||
### Alternatives considered
|
||
- **Rely on manual quickstart only**
|
||
Rejected because the feature has too many state transitions and edge cases.
|
||
- **Mock everything at unit level only**
|
||
Rejected because contract drift with Superset and frontend schemas would go undetected.
|
||
- **Run live LLM/Superset tests in CI**
|
||
Rejected because determinism and cost would be poor.
|
||
|
||
---
|
||
|
||
## 12. Incremental Delivery Strategy
|
||
|
||
### Decision
|
||
Implement in three milestones:
|
||
|
||
### Milestone 1 — Sessioned Auto Review
|
||
Includes:
|
||
- source intake,
|
||
- session persistence,
|
||
- profile/finding generation,
|
||
- semantic-source application,
|
||
- exportable documentation/validation outputs.
|
||
|
||
### Milestone 2 — Guided Clarification
|
||
Includes:
|
||
- clarification engine,
|
||
- resumable dialogue state,
|
||
- conflict review,
|
||
- field-level semantic overrides/locks.
|
||
|
||
### Milestone 3 — Controlled Execution
|
||
Includes:
|
||
- imported filter recovery,
|
||
- template-variable mapping,
|
||
- mapping approval workflow,
|
||
- compiled SQL preview,
|
||
- SQL Lab launch,
|
||
- audited run context.
|
||
|
||
### Rationale
|
||
This delivers user value earlier and reduces risk by separating understanding from execution. It also directly addresses the reviewer feedback that the feature should not be treated as a monolithic drop.
|
||
|
||
### Alternatives considered
|
||
- **Single all-at-once delivery**
|
||
Rejected due to risk concentration and hard-to-demonstrate progress.
|
||
- **Execution-first milestone**
|
||
Rejected because trust, semantics, and readiness are prerequisites for safe launch.
|
||
|
||
---
|
||
|
||
## 13. Risk Resolution Outcomes
|
||
|
||
### Decision
|
||
Treat the following as non-negotiable gates:
|
||
- no Superset-side preview support → stop execution-scope planning
|
||
- no reliable SQL Lab session handoff → block launch design
|
||
- no typed API schemas → block parallel frontend/backend implementation
|
||
- no explicit semantic/clarification modules → block Phase 1 contract completion
|
||
|
||
### Rationale
|
||
These are architecture-shaping constraints, not minor implementation details.
|
||
|
||
### Alternatives considered
|
||
- **Continue with placeholders and fill gaps during coding**
|
||
Rejected because the feature is too stateful and contract-heavy for speculative implementation.
|
||
|
||
---
|
||
|
||
## 14. Review Feedback Incorporated
|
||
|
||
### Decision
|
||
The next design phase must explicitly address the previously identified review gaps by:
|
||
- adding `SemanticSourceResolver`,
|
||
- adding `ClarificationEngine`,
|
||
- expanding `SupersetCompilationAdapter` into a broader extraction + compilation boundary or adding `SupersetContextExtractor`,
|
||
- fully typing `api.yaml`,
|
||
- adding mapping and field-level endpoints,
|
||
- adding session lifecycle endpoints,
|
||
- adding error-path coverage to `quickstart.md`,
|
||
- preserving delivery milestones in the plan.
|
||
|
||
### Rationale
|
||
These gaps are valid and materially affect implementation readiness. Folding them into Phase 1 keeps the planning workflow honest and actionable.
|
||
|
||
### Alternatives considered
|
||
- **Ignore review findings until code review**
|
||
Rejected because several issues are architectural, not cosmetic.
|
||
|
||
---
|
||
|
||
## Final Phase 0 Result
|
||
|
||
All identified `NEEDS CLARIFICATION` items from the plan are resolved at the planning level.
|
||
|
||
The implementation should proceed into Phase 1 with:
|
||
- expanded typed data model,
|
||
- expanded contracts,
|
||
- expanded API surface,
|
||
- negative-path quickstart scenarios,
|
||
- post-design constitution re-check. |