Таски готовы

2026-03-16 23:11:19 +03:00
parent 493a73827a
commit 9cae07a3b4
24 changed files with 10614 additions and 8733 deletions
--- a/specs/027-dataset-llm-orchestration/research.md
+++ b/specs/027-dataset-llm-orchestration/research.md
@@ -0,0 +1,506 @@
+# Research: LLM Dataset Orchestration
+
+**Feature**: [LLM Dataset Orchestration](./spec.md)  
+**Branch**: `027-dataset-llm-orchestration`  
+**Date**: 2026-03-16
+
+## Scope
+
+This document resolves the Phase 0 technical unknowns identified in [`specs/027-dataset-llm-orchestration/plan.md`](./plan.md) and establishes implementation decisions for:
+- Superset link parsing and context recovery
+- runtime variable and Jinja discovery
+- compiled SQL preview strategy
+- SQL Lab launch handoff
+- semantic source resolution architecture
+- clarification session architecture
+- typed API scope
+- persistence, testing, and rollout strategy
+
+---
+
+## 1. Superset Link Intake and Native Filter Recovery
+
+### Decision
+Use a dedicated backend adapter, `SupersetContextExtractor`, to parse supported Superset URLs and recover context in two layers:
+1. **URL-level recovery** from dashboard, dataset, and query parameters in the provided link
+2. **Superset API enrichment** using the resolved dashboard/dataset identifiers to recover saved filter context and related dataset metadata
+
+The recovered filter set will be stored as an `ImportedFilterSet` with provenance per value:
+- `superset_native`
+- `superset_url`
+- `manual`
+- `inferred`
+
+If only partial filter recovery is possible, the session enters `recovery_required` rather than failing closed.
+
+### Rationale
+The UX requires recovery to feel progressive rather than brittle. A pure URL parser would be too fragile and version-specific. A pure API lookup would miss context encoded in the link itself. A two-layer strategy preserves value even when one source is incomplete.
+
+This also matches the spec’s requirement that partial recovery remain usable and explicitly explained.
+
+### Alternatives considered
+- **Parse only URL query parameters**  
+  Rejected because it is too dependent on frontend link shape and misses server-side saved context.
+- **Resolve everything only through dashboard API after identifying the dashboard**  
+  Rejected because some context lives directly in the URL and should be captured before API reconciliation.
+- **Block session start if any filter cannot be recovered**  
+  Rejected because it violates the UX principle of progressive value and FR-025.
+
+---
+
+## 2. Runtime Template Variable and Jinja Discovery
+
+### Decision
+Detect runtime execution variables using a hybrid discovery pipeline:
+1. Load dataset detail from Superset via existing [`backend/src/core/superset_client.py`](backend/src/core/superset_client.py)
+2. Extract `sql`, `is_sqllab_view`, metric expressions, and other query-bearing fields where available
+3. Apply backend parsing rules for:
+   - `{{ ... }}`
+   - `filter_values('...')`
+   - common template parameter references
+   - Jinja variables in SQL text and metric expressions
+4. Normalize discovered variables into typed `TemplateVariable` entries with:
+   - variable name
+   - source expression
+   - inferred kind (`native_filter`, `parameter`, `derived`, `unknown`)
+   - required/optional status
+   - mapping status
+
+The parser will not execute Jinja locally. It only discovers variables and structure.
+
+### Rationale
+The UX and specification both require execution transparency, but also require Superset to remain the source of truth for final SQL compilation. Discovery is safe locally; execution is not. Separating variable discovery from compilation preserves WYSIWWR and avoids local behavior drift.
+
+### Alternatives considered
+- **Use only string regex over SQL text**  
+  Rejected because it is simple but insufficient for richer query-bearing payloads.
+- **Execute local Jinja rendering to discover required params**  
+  Rejected because it conflicts with the UX principle that only Superset defines final execution.
+- **Rely on LLM to infer variables from SQL text**  
+  Rejected because variable discovery must be deterministic and auditable.
+
+---
+
+## 3. Compiled SQL Preview Strategy
+
+### Decision
+Compiled preview will be generated only through a dedicated Superset-side compilation call path managed by `SupersetCompilationAdapter`. The adapter will:
+1. receive the candidate run context,
+2. submit it to a Superset-compatible preview/compilation flow,
+3. store the exact returned compiled SQL plus metadata,
+4. mark preview state as `ready`, `pending`, `failed`, or `stale`.
+
+Launch remains blocked unless preview state is `ready` and bound to the current mapping/version snapshot.
+
+#### Superset Version Compatibility Matrix
+The system will explicitly check `SupersetClient.get_version()` during intake to determine available features:
+- **v3.x+**: Supports all features (context recovery, native filter import, SQL compilation preview, SQL Lab launch).
+- **v2.x**: Supports best-effort context recovery; SQL compilation preview may require fallback to Manual Launch.
+- **Legacy/Unknown**: Automatic recovery disabled; session enters `recovery_required` mode immediately.
+
+#### Fallback Strategy (Preview Availability < 100%)
+If Superset-side compilation is unavailable (e.g., version mismatch or unsupported dataset type):
+- The system will allow **Manual Launch Approval** by a user with `dataset:session:approve` permission.
+- The UI will explicitly display: "Native Preview Unavailable - Manual Validation Required."
+- The audit record will mark the run as "Launched without Compiled Preview."
+
+### Rationale
+This is a direct requirement from FR-029 and the UX reference’s WYSIWWR principle. The compiled preview must be the same source of truth as the real execution path. Even a “close enough” local approximation would undermine trust and violate the spec.
+
+### Alternatives considered
+- **Local best-effort Jinja compilation with warning banner**  
+  Rejected because the spec explicitly forbids fallback to unverified local approximation.
+- **No preview, only launch-time validation**  
+  Rejected because it breaks both UX and explicit launch gate requirements.
+- **Preview generated by the LLM**  
+  Rejected because the LLM must not write or edit SQL directly.
+
+---
+
+## 4. SQL Lab Launch as Canonical Audited Target
+
+### Decision
+Treat Superset SQL Lab as the primary canonical launch target for this feature. `DatasetReviewOrchestrator.launch_dataset` will:
+1. verify preview readiness,
+2. verify required-value completeness,
+3. verify warning approvals,
+4. create or bind a SQL Lab execution session in Superset,
+5. persist a `DatasetRunContext` snapshot with session reference, approved mappings, semantic decisions, and preview fingerprint.
+
+#### Fallback path: Export Prepared Context
+If SQL Lab session creation is unavailable:
+- Provide an **Export Prepared Run Context (JSON/SQL)** action.
+- This allows users to manually take the compiled SQL (if available) or the effective parameter payload to Superset.
+- The system still records the `DatasetRunContext` as "Exported for Manual Launch" to preserve audit integrity.
+
+### Rationale
+The specification explicitly names SQL Lab as canonical. The system must preserve auditability and replay, not merely trigger execution. Persisting the SQL Lab reference and run context as a single audited boundary supports both operational trust and later reopening.
+
+### Alternatives considered
+- **Run query outside SQL Lab through another Superset endpoint**  
+  Rejected because it violates the canonical target clarified in the spec.
+- **Redirect user to a generic analytical view with params**  
+  Rejected because it is weaker for audit and replay.
+- **Allow fallback execution when SQL Lab creation fails**  
+  Rejected because the feature’s safety model depends on the audited target.
+
+---
+
+## 5. Semantic Enrichment Architecture
+
+### Decision
+Split semantic enrichment into a dedicated `SemanticSourceResolver` domain module instead of embedding all logic inside `DatasetReviewOrchestrator`.
+
+Planned responsibilities:
+- `resolve_from_file`
+- `resolve_from_dictionary`
+- `resolve_from_reference_dataset`
+- `rank_candidates`
+- `detect_conflicts`
+- `apply_field_decision`
+
+Candidate ranking will follow the required confidence hierarchy:
+1. exact dictionary/file match
+2. trusted reference dataset match
+3. fuzzy semantic match
+4. AI-generated draft
+
+Manual overrides are represented as a lockable terminal provenance state and cannot be overwritten implicitly.
+
+### Rationale
+The feature’s semantic logic is too substantial to remain an untyped orchestrator subroutine. The reviewer feedback correctly identified this as a likely god-object risk. A dedicated resolver improves testability, contract clarity, and future extensibility.
+
+### Alternatives considered
+- **Keep enrichment inside orchestrator only**  
+  Rejected because it centralizes too many responsibilities in a single Complexity 5 module.
+- **Push matching into frontend only**  
+  Rejected because provenance, confidence, and audit decisions must be persisted server-side.
+- **Use only AI generation when dictionaries are absent**  
+  Rejected because “reuse before invention” is a core UX principle.
+
+---
+
+## 6. Clarification Session Architecture
+
+### Decision
+Create a dedicated `ClarificationEngine` service with persistent `ClarificationSession` state. The engine will manage:
+- unresolved-item prioritization,
+- one-question-at-a-time question generation,
+- best-guess suggestion packaging,
+- answer recording,
+- progress summarization,
+- resume-from-last-open-item behavior.
+
+Answers are persisted before session state advancement.
+
+#### LLM Quality & Feedback Mechanism
+To maintain high trust in AI-generated summaries, match recommendations, and questions:
+- **User Feedback**: UI will include inline feedback (👍/👎) for all AI-drafted content.
+- **Audit Logging**: Feedback is persisted in `SemanticFieldDecision` and `ClarificationAnswer` records for prompt engineering improvement.
+- **Confidence Triage**: Low-confidence LLM outputs (score < 0.7) will be automatically marked for "expert review" rather than presented as confirmed suggestions.
+
+#### Question Generation Strategy
+Question templates and logic will be developed in Milestone 2:
+- **Evaluation Criteria**: Questions must be judged on: relevancy, business impact explanation, and actionability of suggestions.
+- **Templates**: Standardized prompts for field ambiguity, filter conflict, and missing run-time values.
+- **Human-in-the-loop**: High-risk semantic conflicts will prioritize "expert review" options.
+
+### Rationale
+Guided clarification is not a simple endpoint action; it is a stateful interaction model with resumability and audit expectations. Treating it as a first-class module aligns with FR-012 to FR-020 and supports deterministic behavior rather than ad hoc agent chat state.
+
+### Alternatives considered
+- **Use assistant conversation history as the only clarification state**  
+  Rejected because feature state and conversational transcript are not the same thing.
+- **Generate all questions at once**  
+  Rejected because the UX explicitly requires one question at a time.
+- **Do not persist skipped questions separately**  
+  Rejected because resume and expert-review flows need explicit unresolved-state semantics.
+
+---
+
+## 7. API Contract Granularity
+
+### Decision
+Expand the API beyond the current session/preview/launch skeleton into a typed session contract set with explicit lifecycle and field-level operations.
+
+Required endpoint groups:
+1. **Session lifecycle**
+   - `POST /sessions`
+   - `GET /sessions`
+   - `GET /sessions/{id}`
+   - `PATCH /sessions/{id}`
+   - `DELETE /sessions/{id}`
+
+2. **Semantic source and field-level semantic actions**
+   - `POST /sessions/{id}/semantic-source`
+   - `PATCH /sessions/{id}/fields/{field_id}/semantic`
+   - `POST /sessions/{id}/fields/{field_id}/lock`
+   - `POST /sessions/{id}/fields/{field_id}/unlock`
+
+3. **Clarification**
+   - `GET /sessions/{id}/clarification`
+   - `POST /sessions/{id}/clarification/answers`
+   - `POST /sessions/{id}/clarification/resume`
+
+4. **Mapping review**
+   - `GET /sessions/{id}/mappings`
+   - `PATCH /sessions/{id}/mappings/{mapping_id}`
+   - `POST /sessions/{id}/mappings/{mapping_id}/approve`
+
+5. **Preview and launch**
+   - `POST /sessions/{id}/preview`
+   - `POST /sessions/{id}/launch`
+
+6. **Exports**
+   - `GET /sessions/{id}/exports/documentation`
+   - `GET /sessions/{id}/exports/validation`
+
+Core schemas must be fully typed, especially:
+- `ValidationFinding`
+- `ImportedFilter`
+- `TemplateVariable`
+- `MappingDecision`
+- `SemanticFieldEntry`
+- `ClarificationQuestion`
+- `DatasetRunContextSummary`
+- `CompiledPreview`
+
+### Rationale
+The existing API draft is structurally correct but too loose for backend/frontend parallel work. Typed schemas are a blocker for safe implementation, especially given mapping approvals, field-level overrides, and resumable state.
+
+### Alternatives considered
+- **Minimal object blobs in `SessionDetail`**  
+  Rejected because they prevent stable frontend typing and contract validation.
+- **Expose only session-level mutations**  
+  Rejected because FR-040 and mapping approval UX require field-level actions.
+- **Defer exports until later**  
+  Rejected because FR-035 requires exportable outputs as part of the feature.
+
+---
+
+## 8. Persistence Model
+
+### Decision
+Persist dataset review workflow as first-class application entities rather than storing everything in ad hoc task payloads.
+
+Persistence boundaries will include:
+- `DatasetReviewSession`
+- `DatasetProfileSnapshot`
+- `ValidationFindingRecord`
+- `SemanticFieldDecision`
+- `ImportedFilterRecord`
+- `TemplateVariableRecord`
+- `MappingApprovalRecord`
+- `ClarificationSessionRecord`
+- `ClarificationAnswerRecord`
+- `CompiledPreviewRecord`
+- `DatasetRunContextRecord`
+
+Long-running task execution remains in the existing task system, but tasks reference session identifiers rather than replacing session state.
+
+### Rationale
+The current task system is good for observability and async work, but not sufficient as the canonical store for resumable feature state. The feature requires reopening, editing, exporting, and replaying a session independently of task lifecycle.
+
+### Alternatives considered
+- **Store all workflow state inside task params/results**  
+  Rejected because tasks are execution records, not editable session aggregates.
+- **Use filesystem-only session snapshots**  
+  Rejected because this weakens structured querying and RBAC alignment.
+- **Persist only final run context**  
+  Rejected because clarification and review resume need mid-flow state.
+
+---
+
+## 9. Async and Observability Strategy
+
+### Decision
+Use `TaskManager` for expensive or multi-step operations:
+- initial deep context recovery,
+- semantic enrichment imports over large sources,
+- preview generation when upstream latency is non-trivial,
+- launch handoff if it includes multi-step Superset orchestration.
+
+Short synchronous operations may still exist for simple field edits and approvals, but session state transitions must remain observable and reflected in the session model.
+
+### Rationale
+This aligns with the repository constitution and existing operational model. It also supports progressive UX milestones and live readiness changes.
+
+### Alternatives considered
+- **Make all session operations synchronous**  
+  Rejected because upstream Superset and enrichment work may exceed acceptable UI latency.
+- **Use frontend polling only with no task integration**  
+  Rejected because the repository already has stronger task observability patterns.
+
+---
+
+## 10. Frontend Integration Pattern
+
+### Decision
+Implement the feature on top of the existing [`frontend/src/lib/api.js`](frontend/src/lib/api.js) wrapper conventions and Svelte 5 rune-based patterns. Add dedicated wrapper functions under the frontend API layer for dataset orchestration rather than embedding endpoint strings inside components.
+
+The main route will be a dedicated review workspace separate from the current [`frontend/src/routes/datasets/+page.svelte`](frontend/src/routes/datasets/+page.svelte), which remains the dataset hub/list page.
+
+### Rationale
+The current dataset page already serves as a listing and bulk-operation view. The new orchestration flow is a multi-state workspace and should not overload the existing hub component. Using wrapper functions preserves constitution compliance and testability.
+
+### Alternatives considered
+- **Extend the existing dataset hub page for the whole flow**  
+  Rejected because it would overload a currently unrelated list page with a large state machine.
+- **Call endpoints directly with native `fetch` in new components**  
+  Rejected by constitution and existing frontend architecture.
+- **Build the whole flow inside assistant chat only**  
+  Rejected because the UX reference defines a structured workspace, not chat-only interaction.
+
+---
+
+## 11. Testing Strategy
+
+### Decision
+Use a three-layer testing strategy:
+
+#### Backend
+- **Unit tests**
+  - semantic ranking/conflict rules,
+  - clarification prioritization,
+  - mapping approval guards,
+  - preview stale/ready rules,
+  - launch blocking conditions.
+
+- **Integration tests**
+  - session persistence,
+  - Superset context extraction with fixtures,
+  - compiled preview orchestration,
+  - SQL Lab launch handoff,
+  - export generation.
+
+- **API contract tests**
+  - typed schema responses,
+  - RBAC guards,
+  - session lifecycle endpoints,
+  - field-level semantic operations,
+  - mapping approval endpoints.
+
+#### Frontend
+- **Component tests**
+  - source intake states,
+  - validation triage,
+  - semantic conflict view,
+  - clarification dialog,
+  - mapping review,
+  - compiled preview stale/error behavior,
+  - launch summary gating.
+
+- **Route integration tests**
+  - progressive loading,
+  - save/resume flows,
+  - recovery-required state,
+  - warning approval requirements.
+
+#### External dependency handling
+- mock Superset endpoints with version-shaped fixtures,
+- mock LLM structured outputs deterministically,
+- verify degraded paths explicitly instead of pretending success.
+
+### Rationale
+The feature blends deterministic parsing, stateful orchestration, and external-service dependencies. It needs more than happy-path smoke tests.
+
+### Alternatives considered
+- **Rely on manual quickstart only**  
+  Rejected because the feature has too many state transitions and edge cases.
+- **Mock everything at unit level only**  
+  Rejected because contract drift with Superset and frontend schemas would go undetected.
+- **Run live LLM/Superset tests in CI**  
+  Rejected because determinism and cost would be poor.
+
+---
+
+## 12. Incremental Delivery Strategy
+
+### Decision
+Implement in three milestones:
+
+### Milestone 1 — Sessioned Auto Review
+Includes:
+- source intake,
+- session persistence,
+- profile/finding generation,
+- semantic-source application,
+- exportable documentation/validation outputs.
+
+### Milestone 2 — Guided Clarification
+Includes:
+- clarification engine,
+- resumable dialogue state,
+- conflict review,
+- field-level semantic overrides/locks.
+
+### Milestone 3 — Controlled Execution
+Includes:
+- imported filter recovery,
+- template-variable mapping,
+- mapping approval workflow,
+- compiled SQL preview,
+- SQL Lab launch,
+- audited run context.
+
+### Rationale
+This delivers user value earlier and reduces risk by separating understanding from execution. It also directly addresses the reviewer feedback that the feature should not be treated as a monolithic drop.
+
+### Alternatives considered
+- **Single all-at-once delivery**  
+  Rejected due to risk concentration and hard-to-demonstrate progress.
+- **Execution-first milestone**  
+  Rejected because trust, semantics, and readiness are prerequisites for safe launch.
+
+---
+
+## 13. Risk Resolution Outcomes
+
+### Decision
+Treat the following as non-negotiable gates:
+- no Superset-side preview support → stop execution-scope planning
+- no reliable SQL Lab session handoff → block launch design
+- no typed API schemas → block parallel frontend/backend implementation
+- no explicit semantic/clarification modules → block Phase 1 contract completion
+
+### Rationale
+These are architecture-shaping constraints, not minor implementation details.
+
+### Alternatives considered
+- **Continue with placeholders and fill gaps during coding**  
+  Rejected because the feature is too stateful and contract-heavy for speculative implementation.
+
+---
+
+## 14. Review Feedback Incorporated
+
+### Decision
+The next design phase must explicitly address the previously identified review gaps by:
+- adding `SemanticSourceResolver`,
+- adding `ClarificationEngine`,
+- expanding `SupersetCompilationAdapter` into a broader extraction + compilation boundary or adding `SupersetContextExtractor`,
+- fully typing `api.yaml`,
+- adding mapping and field-level endpoints,
+- adding session lifecycle endpoints,
+- adding error-path coverage to `quickstart.md`,
+- preserving delivery milestones in the plan.
+
+### Rationale
+These gaps are valid and materially affect implementation readiness. Folding them into Phase 1 keeps the planning workflow honest and actionable.
+
+### Alternatives considered
+- **Ignore review findings until code review**  
+  Rejected because several issues are architectural, not cosmetic.
+
+---
+
+## Final Phase 0 Result
+
+All identified `NEEDS CLARIFICATION` items from the plan are resolved at the planning level.
+
+The implementation should proceed into Phase 1 with:
+- expanded typed data model,
+- expanded contracts,
+- expanded API surface,
+- negative-path quickstart scenarios,
+- post-design constitution re-check.