Таски готовы
This commit is contained in:
506
specs/027-dataset-llm-orchestration/research.md
Normal file
506
specs/027-dataset-llm-orchestration/research.md
Normal file
@@ -0,0 +1,506 @@
|
||||
# Research: LLM Dataset Orchestration
|
||||
|
||||
**Feature**: [LLM Dataset Orchestration](./spec.md)
|
||||
**Branch**: `027-dataset-llm-orchestration`
|
||||
**Date**: 2026-03-16
|
||||
|
||||
## Scope
|
||||
|
||||
This document resolves the Phase 0 technical unknowns identified in [`specs/027-dataset-llm-orchestration/plan.md`](./plan.md) and establishes implementation decisions for:
|
||||
- Superset link parsing and context recovery
|
||||
- runtime variable and Jinja discovery
|
||||
- compiled SQL preview strategy
|
||||
- SQL Lab launch handoff
|
||||
- semantic source resolution architecture
|
||||
- clarification session architecture
|
||||
- typed API scope
|
||||
- persistence, testing, and rollout strategy
|
||||
|
||||
---
|
||||
|
||||
## 1. Superset Link Intake and Native Filter Recovery
|
||||
|
||||
### Decision
|
||||
Use a dedicated backend adapter, `SupersetContextExtractor`, to parse supported Superset URLs and recover context in two layers:
|
||||
1. **URL-level recovery** from dashboard, dataset, and query parameters in the provided link
|
||||
2. **Superset API enrichment** using the resolved dashboard/dataset identifiers to recover saved filter context and related dataset metadata
|
||||
|
||||
The recovered filter set will be stored as an `ImportedFilterSet` with provenance per value:
|
||||
- `superset_native`
|
||||
- `superset_url`
|
||||
- `manual`
|
||||
- `inferred`
|
||||
|
||||
If only partial filter recovery is possible, the session enters `recovery_required` rather than failing closed.
|
||||
|
||||
### Rationale
|
||||
The UX requires recovery to feel progressive rather than brittle. A pure URL parser would be too fragile and version-specific. A pure API lookup would miss context encoded in the link itself. A two-layer strategy preserves value even when one source is incomplete.
|
||||
|
||||
This also matches the spec’s requirement that partial recovery remain usable and explicitly explained.
|
||||
|
||||
### Alternatives considered
|
||||
- **Parse only URL query parameters**
|
||||
Rejected because it is too dependent on frontend link shape and misses server-side saved context.
|
||||
- **Resolve everything only through dashboard API after identifying the dashboard**
|
||||
Rejected because some context lives directly in the URL and should be captured before API reconciliation.
|
||||
- **Block session start if any filter cannot be recovered**
|
||||
Rejected because it violates the UX principle of progressive value and FR-025.
|
||||
|
||||
---
|
||||
|
||||
## 2. Runtime Template Variable and Jinja Discovery
|
||||
|
||||
### Decision
|
||||
Detect runtime execution variables using a hybrid discovery pipeline:
|
||||
1. Load dataset detail from Superset via existing [`backend/src/core/superset_client.py`](backend/src/core/superset_client.py)
|
||||
2. Extract `sql`, `is_sqllab_view`, metric expressions, and other query-bearing fields where available
|
||||
3. Apply backend parsing rules for:
|
||||
- `{{ ... }}`
|
||||
- `filter_values('...')`
|
||||
- common template parameter references
|
||||
- Jinja variables in SQL text and metric expressions
|
||||
4. Normalize discovered variables into typed `TemplateVariable` entries with:
|
||||
- variable name
|
||||
- source expression
|
||||
- inferred kind (`native_filter`, `parameter`, `derived`, `unknown`)
|
||||
- required/optional status
|
||||
- mapping status
|
||||
|
||||
The parser will not execute Jinja locally. It only discovers variables and structure.
|
||||
|
||||
### Rationale
|
||||
The UX and specification both require execution transparency, but also require Superset to remain the source of truth for final SQL compilation. Discovery is safe locally; execution is not. Separating variable discovery from compilation preserves WYSIWWR and avoids local behavior drift.
|
||||
|
||||
### Alternatives considered
|
||||
- **Use only string regex over SQL text**
|
||||
Rejected because it is simple but insufficient for richer query-bearing payloads.
|
||||
- **Execute local Jinja rendering to discover required params**
|
||||
Rejected because it conflicts with the UX principle that only Superset defines final execution.
|
||||
- **Rely on LLM to infer variables from SQL text**
|
||||
Rejected because variable discovery must be deterministic and auditable.
|
||||
|
||||
---
|
||||
|
||||
## 3. Compiled SQL Preview Strategy
|
||||
|
||||
### Decision
|
||||
Compiled preview will be generated only through a dedicated Superset-side compilation call path managed by `SupersetCompilationAdapter`. The adapter will:
|
||||
1. receive the candidate run context,
|
||||
2. submit it to a Superset-compatible preview/compilation flow,
|
||||
3. store the exact returned compiled SQL plus metadata,
|
||||
4. mark preview state as `ready`, `pending`, `failed`, or `stale`.
|
||||
|
||||
Launch remains blocked unless preview state is `ready` and bound to the current mapping/version snapshot.
|
||||
|
||||
#### Superset Version Compatibility Matrix
|
||||
The system will explicitly check `SupersetClient.get_version()` during intake to determine available features:
|
||||
- **v3.x+**: Supports all features (context recovery, native filter import, SQL compilation preview, SQL Lab launch).
|
||||
- **v2.x**: Supports best-effort context recovery; SQL compilation preview may require fallback to Manual Launch.
|
||||
- **Legacy/Unknown**: Automatic recovery disabled; session enters `recovery_required` mode immediately.
|
||||
|
||||
#### Fallback Strategy (Preview Availability < 100%)
|
||||
If Superset-side compilation is unavailable (e.g., version mismatch or unsupported dataset type):
|
||||
- The system will allow **Manual Launch Approval** by a user with `dataset:session:approve` permission.
|
||||
- The UI will explicitly display: "Native Preview Unavailable - Manual Validation Required."
|
||||
- The audit record will mark the run as "Launched without Compiled Preview."
|
||||
|
||||
### Rationale
|
||||
This is a direct requirement from FR-029 and the UX reference’s WYSIWWR principle. The compiled preview must be the same source of truth as the real execution path. Even a “close enough” local approximation would undermine trust and violate the spec.
|
||||
|
||||
### Alternatives considered
|
||||
- **Local best-effort Jinja compilation with warning banner**
|
||||
Rejected because the spec explicitly forbids fallback to unverified local approximation.
|
||||
- **No preview, only launch-time validation**
|
||||
Rejected because it breaks both UX and explicit launch gate requirements.
|
||||
- **Preview generated by the LLM**
|
||||
Rejected because the LLM must not write or edit SQL directly.
|
||||
|
||||
---
|
||||
|
||||
## 4. SQL Lab Launch as Canonical Audited Target
|
||||
|
||||
### Decision
|
||||
Treat Superset SQL Lab as the primary canonical launch target for this feature. `DatasetReviewOrchestrator.launch_dataset` will:
|
||||
1. verify preview readiness,
|
||||
2. verify required-value completeness,
|
||||
3. verify warning approvals,
|
||||
4. create or bind a SQL Lab execution session in Superset,
|
||||
5. persist a `DatasetRunContext` snapshot with session reference, approved mappings, semantic decisions, and preview fingerprint.
|
||||
|
||||
#### Fallback path: Export Prepared Context
|
||||
If SQL Lab session creation is unavailable:
|
||||
- Provide an **Export Prepared Run Context (JSON/SQL)** action.
|
||||
- This allows users to manually take the compiled SQL (if available) or the effective parameter payload to Superset.
|
||||
- The system still records the `DatasetRunContext` as "Exported for Manual Launch" to preserve audit integrity.
|
||||
|
||||
### Rationale
|
||||
The specification explicitly names SQL Lab as canonical. The system must preserve auditability and replay, not merely trigger execution. Persisting the SQL Lab reference and run context as a single audited boundary supports both operational trust and later reopening.
|
||||
|
||||
### Alternatives considered
|
||||
- **Run query outside SQL Lab through another Superset endpoint**
|
||||
Rejected because it violates the canonical target clarified in the spec.
|
||||
- **Redirect user to a generic analytical view with params**
|
||||
Rejected because it is weaker for audit and replay.
|
||||
- **Allow fallback execution when SQL Lab creation fails**
|
||||
Rejected because the feature’s safety model depends on the audited target.
|
||||
|
||||
---
|
||||
|
||||
## 5. Semantic Enrichment Architecture
|
||||
|
||||
### Decision
|
||||
Split semantic enrichment into a dedicated `SemanticSourceResolver` domain module instead of embedding all logic inside `DatasetReviewOrchestrator`.
|
||||
|
||||
Planned responsibilities:
|
||||
- `resolve_from_file`
|
||||
- `resolve_from_dictionary`
|
||||
- `resolve_from_reference_dataset`
|
||||
- `rank_candidates`
|
||||
- `detect_conflicts`
|
||||
- `apply_field_decision`
|
||||
|
||||
Candidate ranking will follow the required confidence hierarchy:
|
||||
1. exact dictionary/file match
|
||||
2. trusted reference dataset match
|
||||
3. fuzzy semantic match
|
||||
4. AI-generated draft
|
||||
|
||||
Manual overrides are represented as a lockable terminal provenance state and cannot be overwritten implicitly.
|
||||
|
||||
### Rationale
|
||||
The feature’s semantic logic is too substantial to remain an untyped orchestrator subroutine. The reviewer feedback correctly identified this as a likely god-object risk. A dedicated resolver improves testability, contract clarity, and future extensibility.
|
||||
|
||||
### Alternatives considered
|
||||
- **Keep enrichment inside orchestrator only**
|
||||
Rejected because it centralizes too many responsibilities in a single Complexity 5 module.
|
||||
- **Push matching into frontend only**
|
||||
Rejected because provenance, confidence, and audit decisions must be persisted server-side.
|
||||
- **Use only AI generation when dictionaries are absent**
|
||||
Rejected because “reuse before invention” is a core UX principle.
|
||||
|
||||
---
|
||||
|
||||
## 6. Clarification Session Architecture
|
||||
|
||||
### Decision
|
||||
Create a dedicated `ClarificationEngine` service with persistent `ClarificationSession` state. The engine will manage:
|
||||
- unresolved-item prioritization,
|
||||
- one-question-at-a-time question generation,
|
||||
- best-guess suggestion packaging,
|
||||
- answer recording,
|
||||
- progress summarization,
|
||||
- resume-from-last-open-item behavior.
|
||||
|
||||
Answers are persisted before session state advancement.
|
||||
|
||||
#### LLM Quality & Feedback Mechanism
|
||||
To maintain high trust in AI-generated summaries, match recommendations, and questions:
|
||||
- **User Feedback**: UI will include inline feedback (👍/👎) for all AI-drafted content.
|
||||
- **Audit Logging**: Feedback is persisted in `SemanticFieldDecision` and `ClarificationAnswer` records for prompt engineering improvement.
|
||||
- **Confidence Triage**: Low-confidence LLM outputs (score < 0.7) will be automatically marked for "expert review" rather than presented as confirmed suggestions.
|
||||
|
||||
#### Question Generation Strategy
|
||||
Question templates and logic will be developed in Milestone 2:
|
||||
- **Evaluation Criteria**: Questions must be judged on: relevancy, business impact explanation, and actionability of suggestions.
|
||||
- **Templates**: Standardized prompts for field ambiguity, filter conflict, and missing run-time values.
|
||||
- **Human-in-the-loop**: High-risk semantic conflicts will prioritize "expert review" options.
|
||||
|
||||
### Rationale
|
||||
Guided clarification is not a simple endpoint action; it is a stateful interaction model with resumability and audit expectations. Treating it as a first-class module aligns with FR-012 to FR-020 and supports deterministic behavior rather than ad hoc agent chat state.
|
||||
|
||||
### Alternatives considered
|
||||
- **Use assistant conversation history as the only clarification state**
|
||||
Rejected because feature state and conversational transcript are not the same thing.
|
||||
- **Generate all questions at once**
|
||||
Rejected because the UX explicitly requires one question at a time.
|
||||
- **Do not persist skipped questions separately**
|
||||
Rejected because resume and expert-review flows need explicit unresolved-state semantics.
|
||||
|
||||
---
|
||||
|
||||
## 7. API Contract Granularity
|
||||
|
||||
### Decision
|
||||
Expand the API beyond the current session/preview/launch skeleton into a typed session contract set with explicit lifecycle and field-level operations.
|
||||
|
||||
Required endpoint groups:
|
||||
1. **Session lifecycle**
|
||||
- `POST /sessions`
|
||||
- `GET /sessions`
|
||||
- `GET /sessions/{id}`
|
||||
- `PATCH /sessions/{id}`
|
||||
- `DELETE /sessions/{id}`
|
||||
|
||||
2. **Semantic source and field-level semantic actions**
|
||||
- `POST /sessions/{id}/semantic-source`
|
||||
- `PATCH /sessions/{id}/fields/{field_id}/semantic`
|
||||
- `POST /sessions/{id}/fields/{field_id}/lock`
|
||||
- `POST /sessions/{id}/fields/{field_id}/unlock`
|
||||
|
||||
3. **Clarification**
|
||||
- `GET /sessions/{id}/clarification`
|
||||
- `POST /sessions/{id}/clarification/answers`
|
||||
- `POST /sessions/{id}/clarification/resume`
|
||||
|
||||
4. **Mapping review**
|
||||
- `GET /sessions/{id}/mappings`
|
||||
- `PATCH /sessions/{id}/mappings/{mapping_id}`
|
||||
- `POST /sessions/{id}/mappings/{mapping_id}/approve`
|
||||
|
||||
5. **Preview and launch**
|
||||
- `POST /sessions/{id}/preview`
|
||||
- `POST /sessions/{id}/launch`
|
||||
|
||||
6. **Exports**
|
||||
- `GET /sessions/{id}/exports/documentation`
|
||||
- `GET /sessions/{id}/exports/validation`
|
||||
|
||||
Core schemas must be fully typed, especially:
|
||||
- `ValidationFinding`
|
||||
- `ImportedFilter`
|
||||
- `TemplateVariable`
|
||||
- `MappingDecision`
|
||||
- `SemanticFieldEntry`
|
||||
- `ClarificationQuestion`
|
||||
- `DatasetRunContextSummary`
|
||||
- `CompiledPreview`
|
||||
|
||||
### Rationale
|
||||
The existing API draft is structurally correct but too loose for backend/frontend parallel work. Typed schemas are a blocker for safe implementation, especially given mapping approvals, field-level overrides, and resumable state.
|
||||
|
||||
### Alternatives considered
|
||||
- **Minimal object blobs in `SessionDetail`**
|
||||
Rejected because they prevent stable frontend typing and contract validation.
|
||||
- **Expose only session-level mutations**
|
||||
Rejected because FR-040 and mapping approval UX require field-level actions.
|
||||
- **Defer exports until later**
|
||||
Rejected because FR-035 requires exportable outputs as part of the feature.
|
||||
|
||||
---
|
||||
|
||||
## 8. Persistence Model
|
||||
|
||||
### Decision
|
||||
Persist dataset review workflow as first-class application entities rather than storing everything in ad hoc task payloads.
|
||||
|
||||
Persistence boundaries will include:
|
||||
- `DatasetReviewSession`
|
||||
- `DatasetProfileSnapshot`
|
||||
- `ValidationFindingRecord`
|
||||
- `SemanticFieldDecision`
|
||||
- `ImportedFilterRecord`
|
||||
- `TemplateVariableRecord`
|
||||
- `MappingApprovalRecord`
|
||||
- `ClarificationSessionRecord`
|
||||
- `ClarificationAnswerRecord`
|
||||
- `CompiledPreviewRecord`
|
||||
- `DatasetRunContextRecord`
|
||||
|
||||
Long-running task execution remains in the existing task system, but tasks reference session identifiers rather than replacing session state.
|
||||
|
||||
### Rationale
|
||||
The current task system is good for observability and async work, but not sufficient as the canonical store for resumable feature state. The feature requires reopening, editing, exporting, and replaying a session independently of task lifecycle.
|
||||
|
||||
### Alternatives considered
|
||||
- **Store all workflow state inside task params/results**
|
||||
Rejected because tasks are execution records, not editable session aggregates.
|
||||
- **Use filesystem-only session snapshots**
|
||||
Rejected because this weakens structured querying and RBAC alignment.
|
||||
- **Persist only final run context**
|
||||
Rejected because clarification and review resume need mid-flow state.
|
||||
|
||||
---
|
||||
|
||||
## 9. Async and Observability Strategy
|
||||
|
||||
### Decision
|
||||
Use `TaskManager` for expensive or multi-step operations:
|
||||
- initial deep context recovery,
|
||||
- semantic enrichment imports over large sources,
|
||||
- preview generation when upstream latency is non-trivial,
|
||||
- launch handoff if it includes multi-step Superset orchestration.
|
||||
|
||||
Short synchronous operations may still exist for simple field edits and approvals, but session state transitions must remain observable and reflected in the session model.
|
||||
|
||||
### Rationale
|
||||
This aligns with the repository constitution and existing operational model. It also supports progressive UX milestones and live readiness changes.
|
||||
|
||||
### Alternatives considered
|
||||
- **Make all session operations synchronous**
|
||||
Rejected because upstream Superset and enrichment work may exceed acceptable UI latency.
|
||||
- **Use frontend polling only with no task integration**
|
||||
Rejected because the repository already has stronger task observability patterns.
|
||||
|
||||
---
|
||||
|
||||
## 10. Frontend Integration Pattern
|
||||
|
||||
### Decision
|
||||
Implement the feature on top of the existing [`frontend/src/lib/api.js`](frontend/src/lib/api.js) wrapper conventions and Svelte 5 rune-based patterns. Add dedicated wrapper functions under the frontend API layer for dataset orchestration rather than embedding endpoint strings inside components.
|
||||
|
||||
The main route will be a dedicated review workspace separate from the current [`frontend/src/routes/datasets/+page.svelte`](frontend/src/routes/datasets/+page.svelte), which remains the dataset hub/list page.
|
||||
|
||||
### Rationale
|
||||
The current dataset page already serves as a listing and bulk-operation view. The new orchestration flow is a multi-state workspace and should not overload the existing hub component. Using wrapper functions preserves constitution compliance and testability.
|
||||
|
||||
### Alternatives considered
|
||||
- **Extend the existing dataset hub page for the whole flow**
|
||||
Rejected because it would overload a currently unrelated list page with a large state machine.
|
||||
- **Call endpoints directly with native `fetch` in new components**
|
||||
Rejected by constitution and existing frontend architecture.
|
||||
- **Build the whole flow inside assistant chat only**
|
||||
Rejected because the UX reference defines a structured workspace, not chat-only interaction.
|
||||
|
||||
---
|
||||
|
||||
## 11. Testing Strategy
|
||||
|
||||
### Decision
|
||||
Use a three-layer testing strategy:
|
||||
|
||||
#### Backend
|
||||
- **Unit tests**
|
||||
- semantic ranking/conflict rules,
|
||||
- clarification prioritization,
|
||||
- mapping approval guards,
|
||||
- preview stale/ready rules,
|
||||
- launch blocking conditions.
|
||||
|
||||
- **Integration tests**
|
||||
- session persistence,
|
||||
- Superset context extraction with fixtures,
|
||||
- compiled preview orchestration,
|
||||
- SQL Lab launch handoff,
|
||||
- export generation.
|
||||
|
||||
- **API contract tests**
|
||||
- typed schema responses,
|
||||
- RBAC guards,
|
||||
- session lifecycle endpoints,
|
||||
- field-level semantic operations,
|
||||
- mapping approval endpoints.
|
||||
|
||||
#### Frontend
|
||||
- **Component tests**
|
||||
- source intake states,
|
||||
- validation triage,
|
||||
- semantic conflict view,
|
||||
- clarification dialog,
|
||||
- mapping review,
|
||||
- compiled preview stale/error behavior,
|
||||
- launch summary gating.
|
||||
|
||||
- **Route integration tests**
|
||||
- progressive loading,
|
||||
- save/resume flows,
|
||||
- recovery-required state,
|
||||
- warning approval requirements.
|
||||
|
||||
#### External dependency handling
|
||||
- mock Superset endpoints with version-shaped fixtures,
|
||||
- mock LLM structured outputs deterministically,
|
||||
- verify degraded paths explicitly instead of pretending success.
|
||||
|
||||
### Rationale
|
||||
The feature blends deterministic parsing, stateful orchestration, and external-service dependencies. It needs more than happy-path smoke tests.
|
||||
|
||||
### Alternatives considered
|
||||
- **Rely on manual quickstart only**
|
||||
Rejected because the feature has too many state transitions and edge cases.
|
||||
- **Mock everything at unit level only**
|
||||
Rejected because contract drift with Superset and frontend schemas would go undetected.
|
||||
- **Run live LLM/Superset tests in CI**
|
||||
Rejected because determinism and cost would be poor.
|
||||
|
||||
---
|
||||
|
||||
## 12. Incremental Delivery Strategy
|
||||
|
||||
### Decision
|
||||
Implement in three milestones:
|
||||
|
||||
### Milestone 1 — Sessioned Auto Review
|
||||
Includes:
|
||||
- source intake,
|
||||
- session persistence,
|
||||
- profile/finding generation,
|
||||
- semantic-source application,
|
||||
- exportable documentation/validation outputs.
|
||||
|
||||
### Milestone 2 — Guided Clarification
|
||||
Includes:
|
||||
- clarification engine,
|
||||
- resumable dialogue state,
|
||||
- conflict review,
|
||||
- field-level semantic overrides/locks.
|
||||
|
||||
### Milestone 3 — Controlled Execution
|
||||
Includes:
|
||||
- imported filter recovery,
|
||||
- template-variable mapping,
|
||||
- mapping approval workflow,
|
||||
- compiled SQL preview,
|
||||
- SQL Lab launch,
|
||||
- audited run context.
|
||||
|
||||
### Rationale
|
||||
This delivers user value earlier and reduces risk by separating understanding from execution. It also directly addresses the reviewer feedback that the feature should not be treated as a monolithic drop.
|
||||
|
||||
### Alternatives considered
|
||||
- **Single all-at-once delivery**
|
||||
Rejected due to risk concentration and hard-to-demonstrate progress.
|
||||
- **Execution-first milestone**
|
||||
Rejected because trust, semantics, and readiness are prerequisites for safe launch.
|
||||
|
||||
---
|
||||
|
||||
## 13. Risk Resolution Outcomes
|
||||
|
||||
### Decision
|
||||
Treat the following as non-negotiable gates:
|
||||
- no Superset-side preview support → stop execution-scope planning
|
||||
- no reliable SQL Lab session handoff → block launch design
|
||||
- no typed API schemas → block parallel frontend/backend implementation
|
||||
- no explicit semantic/clarification modules → block Phase 1 contract completion
|
||||
|
||||
### Rationale
|
||||
These are architecture-shaping constraints, not minor implementation details.
|
||||
|
||||
### Alternatives considered
|
||||
- **Continue with placeholders and fill gaps during coding**
|
||||
Rejected because the feature is too stateful and contract-heavy for speculative implementation.
|
||||
|
||||
---
|
||||
|
||||
## 14. Review Feedback Incorporated
|
||||
|
||||
### Decision
|
||||
The next design phase must explicitly address the previously identified review gaps by:
|
||||
- adding `SemanticSourceResolver`,
|
||||
- adding `ClarificationEngine`,
|
||||
- expanding `SupersetCompilationAdapter` into a broader extraction + compilation boundary or adding `SupersetContextExtractor`,
|
||||
- fully typing `api.yaml`,
|
||||
- adding mapping and field-level endpoints,
|
||||
- adding session lifecycle endpoints,
|
||||
- adding error-path coverage to `quickstart.md`,
|
||||
- preserving delivery milestones in the plan.
|
||||
|
||||
### Rationale
|
||||
These gaps are valid and materially affect implementation readiness. Folding them into Phase 1 keeps the planning workflow honest and actionable.
|
||||
|
||||
### Alternatives considered
|
||||
- **Ignore review findings until code review**
|
||||
Rejected because several issues are architectural, not cosmetic.
|
||||
|
||||
---
|
||||
|
||||
## Final Phase 0 Result
|
||||
|
||||
All identified `NEEDS CLARIFICATION` items from the plan are resolved at the planning level.
|
||||
|
||||
The implementation should proceed into Phase 1 with:
|
||||
- expanded typed data model,
|
||||
- expanded contracts,
|
||||
- expanded API surface,
|
||||
- negative-path quickstart scenarios,
|
||||
- post-design constitution re-check.
|
||||
Reference in New Issue
Block a user