Move dataset review clarification into the assistant workspace and rework the review page into a chat-centric layout with execution rails. Add session-scoped assistant actions for mappings, semantic fields, and SQL preview generation. Introduce optimistic locking for dataset review mutations, propagate session versions through API responses, and mask imported filter values before assistant exposure. Refresh tests, i18n, and spec artifacts to match the new workflow. BREAKING CHANGE: dataset review mutation endpoints now require the X-Session-Version header, and clarification is no longer handled through ClarificationDialog-based flows
776 lines
27 KiB
Markdown
776 lines
27 KiB
Markdown
# Data Model: LLM Dataset Orchestration
|
|
|
|
**Feature**: [LLM Dataset Orchestration](./spec.md)
|
|
**Branch**: `027-dataset-llm-orchestration`
|
|
**Date**: 2026-03-16
|
|
|
|
## Overview
|
|
|
|
This document defines the domain entities, relationships, lifecycle states, and validation rules for the dataset review, semantic enrichment, clarification, preview, and launch workflow described in [`spec.md`](./spec.md) and grounded by the decisions in [`research.md`](./research.md).
|
|
|
|
The model is intentionally split into:
|
|
- **session aggregate** entities for resumable workflow state,
|
|
- **semantic/provenance** entities for enrichment and conflict handling,
|
|
- **execution** entities for mapping, preview, and launch audit,
|
|
- **export** projections for sharing outputs.
|
|
|
|
---
|
|
|
|
## 1. Core Aggregate: DatasetReviewSession
|
|
|
|
### Entity: `SessionCollaborator`
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `user_id` | string | yes | Collaborating user ID |
|
|
| `role` | enum | yes | `viewer`, `reviewer`, `approver` |
|
|
| `added_at` | datetime | yes | When they were added |
|
|
|
|
### Entity: `DatasetReviewSession`
|
|
|
|
Represents the top-level resumable workflow container for one dataset review/execution effort.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `session_id` | string (UUID) | yes | Stable unique identifier for the review session |
|
|
| `user_id` | string | yes | Authenticated User ID of the session owner |
|
|
| `collaborators` | list[SessionCollaborator] | no | Shared access and roles |
|
|
| `environment_id` | string | yes | Superset environment context |
|
|
| `source_kind` | enum | yes | Origin kind: `superset_link`, `dataset_selection` |
|
|
| `source_input` | string | yes | Original link or selected dataset reference |
|
|
| `dataset_ref` | string | yes | Canonical dataset reference used by the feature |
|
|
| `dataset_id` | integer \| null | no | Superset dataset id when resolved |
|
|
| `dashboard_id` | integer \| null | no | Superset dashboard id if imported from dashboard link |
|
|
| `readiness_state` | enum | yes | Current workflow readiness state |
|
|
| `recommended_action` | enum | yes | Explicit next recommended action |
|
|
| `version` | integer | yes | Optimistic-lock version incremented on every persisted session mutation |
|
|
| `status` | enum | yes | Session lifecycle status |
|
|
| `current_phase` | enum | yes | Active workflow phase |
|
|
| `active_task_id` | string \| null | no | Linked long-running task if one is active |
|
|
| `last_preview_id` | string \| null | no | Most recent preview snapshot |
|
|
| `last_run_context_id` | string \| null | no | Most recent launch audit record |
|
|
| `created_at` | datetime | yes | Session creation timestamp |
|
|
| `updated_at` | datetime | yes | Last mutation timestamp |
|
|
| `last_activity_at` | datetime | yes | Last user/system activity timestamp |
|
|
| `closed_at` | datetime \| null | no | Terminal close/archive timestamp |
|
|
|
|
### Validation rules
|
|
- `session_id` must be globally unique.
|
|
- `source_input` must be non-empty.
|
|
- `environment_id` must resolve to a configured environment.
|
|
- `readiness_state` and `recommended_action` must always be present.
|
|
- `version` starts at `0` on session creation and increments monotonically after every successful session mutation.
|
|
- `user_id` ownership must be enforced for all mutations, unless collaborator roles allow otherwise.
|
|
- `dataset_id` becomes required before preview or launch phases.
|
|
- `last_preview_id` must refer to a preview generated from the same session.
|
|
- Mutating requests must include the caller's last observed session version; mismatches are rejected as optimistic-lock conflicts rather than silently merged.
|
|
|
|
### Enums
|
|
|
|
#### `SessionStatus`
|
|
- `active`
|
|
- `paused`
|
|
- `completed`
|
|
- `archived`
|
|
- `cancelled`
|
|
|
|
#### `SessionPhase`
|
|
- `intake`
|
|
- `recovery`
|
|
- `review`
|
|
- `semantic_review`
|
|
- `clarification`
|
|
- `mapping_review`
|
|
- `preview`
|
|
- `launch`
|
|
- `post_run`
|
|
|
|
#### `ReadinessState`
|
|
- `empty`
|
|
- `importing`
|
|
- `review_ready`
|
|
- `semantic_source_review_needed`
|
|
- `clarification_needed`
|
|
- `clarification_active`
|
|
- `mapping_review_needed`
|
|
- `compiled_preview_ready`
|
|
- `partially_ready`
|
|
- `run_ready`
|
|
- `run_in_progress`
|
|
- `completed`
|
|
- `recovery_required`
|
|
|
|
#### `RecommendedAction`
|
|
- `import_from_superset`
|
|
- `review_documentation`
|
|
- `apply_semantic_source`
|
|
- `start_clarification`
|
|
- `answer_next_question`
|
|
- `approve_mapping`
|
|
- `generate_sql_preview`
|
|
- `complete_required_values`
|
|
- `launch_dataset`
|
|
- `resume_session`
|
|
- `export_outputs`
|
|
|
|
---
|
|
|
|
## 2. Dataset Profile and Review State
|
|
|
|
### Entity: `DatasetProfile`
|
|
|
|
Consolidated interpretation of dataset meaning, semantics, filters, assumptions, and readiness.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `profile_id` | string (UUID) | yes | Unique profile id |
|
|
| `session_id` | string | yes | Parent session |
|
|
| `dataset_name` | string | yes | Display dataset name |
|
|
| `schema_name` | string \| null | no | Schema if available |
|
|
| `database_name` | string \| null | no | Database if available |
|
|
| `business_summary` | text | yes | Human-readable summary |
|
|
| `business_summary_source` | enum | yes | Provenance of summary |
|
|
| `description` | text \| null | no | Dataset-level description |
|
|
| `dataset_type` | enum \| null | no | `table`, `virtual`, `sqllab_view`, `unknown` |
|
|
| `is_sqllab_view` | boolean | yes | Whether dataset is SQL Lab derived |
|
|
| `completeness_score` | number \| null | no | Optional normalized completeness score |
|
|
| `confidence_state` | enum | yes | Overall confidence posture |
|
|
| `has_blocking_findings` | boolean | yes | Derived summary flag |
|
|
| `has_warning_findings` | boolean | yes | Derived summary flag |
|
|
| `manual_summary_locked` | boolean | yes | Protects user-entered summary |
|
|
| `created_at` | datetime | yes | Created timestamp |
|
|
| `updated_at` | datetime | yes | Updated timestamp |
|
|
|
|
### Validation rules
|
|
- `business_summary` must always contain a usable string; if weak, it may be skeletal but not null.
|
|
- `manual_summary_locked=true` prevents later automatic overwrite.
|
|
- `session_id` must be unique if only one active profile snapshot is stored per session, or versioned if snapshots are retained.
|
|
- `confidence_state` must reflect highest unresolved-risk posture, not just optimistic confidence.
|
|
|
|
#### `BusinessSummarySource`
|
|
- `confirmed`
|
|
- `imported`
|
|
- `inferred`
|
|
- `ai_draft`
|
|
- `manual_override`
|
|
|
|
#### `ConfidenceState`
|
|
- `confirmed`
|
|
- `mostly_confirmed`
|
|
- `mixed`
|
|
- `low_confidence`
|
|
- `unresolved`
|
|
|
|
---
|
|
|
|
## 3. Validation Findings
|
|
|
|
### Entity: `ValidationFinding`
|
|
|
|
Represents a blocking issue, warning, or informational observation.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `finding_id` | string (UUID) | yes | Unique finding id |
|
|
| `session_id` | string | yes | Parent session |
|
|
| `area` | enum | yes | Affected domain area |
|
|
| `severity` | enum | yes | `blocking`, `warning`, `informational` |
|
|
| `code` | string | yes | Stable machine-readable finding code |
|
|
| `title` | string | yes | Short label |
|
|
| `message` | text | yes | Actionable human-readable explanation |
|
|
| `resolution_state` | enum | yes | Current resolution status |
|
|
| `resolution_note` | text \| null | no | Optional explanation or approval note |
|
|
| `caused_by_ref` | string \| null | no | Related field/filter/mapping/question id |
|
|
| `created_at` | datetime | yes | Creation timestamp |
|
|
| `resolved_at` | datetime \| null | no | Resolution timestamp |
|
|
|
|
### Validation rules
|
|
- `severity` must be one of the allowed values.
|
|
- `resolution_state=resolved` or `approved` requires either a system resolution event or user action.
|
|
- `launch` is blocked if any open `blocking` finding remains.
|
|
- `warning` findings tied to mapping transformations require explicit approval before launch if marked launch-sensitive.
|
|
|
|
#### `FindingArea`
|
|
- `source_intake`
|
|
- `dataset_profile`
|
|
- `semantic_enrichment`
|
|
- `clarification`
|
|
- `filter_recovery`
|
|
- `template_mapping`
|
|
- `compiled_preview`
|
|
- `launch`
|
|
- `audit`
|
|
|
|
#### `ResolutionState`
|
|
- `open`
|
|
- `resolved`
|
|
- `approved`
|
|
- `skipped`
|
|
- `deferred`
|
|
- `expert_review`
|
|
|
|
---
|
|
|
|
## 4. Semantic Source and Field Decisions
|
|
|
|
### Entity: `SemanticSource`
|
|
|
|
Represents a trusted or candidate source of semantic metadata.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `source_id` | string (UUID) | yes | Unique source id |
|
|
| `session_id` | string | yes | Parent session |
|
|
| `source_type` | enum | yes | Origin kind |
|
|
| `source_ref` | string | yes | External reference, dataset ref, or uploaded artifact ref |
|
|
| `source_version` | string | yes | Version/Snapshot for propagation tracking |
|
|
| `display_name` | string | yes | Human-readable source name |
|
|
| `trust_level` | enum | yes | Source trust tier |
|
|
| `schema_overlap_score` | number \| null | no | Optional overlap signal |
|
|
| `status` | enum | yes | Availability/applicability status |
|
|
| `created_at` | datetime | yes | Creation timestamp |
|
|
|
|
#### `SemanticSourceType`
|
|
- `uploaded_file`
|
|
- `connected_dictionary`
|
|
- `reference_dataset`
|
|
- `neighbor_dataset`
|
|
- `ai_generated`
|
|
|
|
#### `TrustLevel`
|
|
- `trusted`
|
|
- `recommended`
|
|
- `candidate`
|
|
- `generated`
|
|
|
|
#### `SemanticSourceStatus`
|
|
- `available`
|
|
- `selected`
|
|
- `applied`
|
|
- `rejected`
|
|
- `partial`
|
|
- `failed`
|
|
|
|
---
|
|
|
|
### Entity: `SemanticFieldEntry`
|
|
|
|
Canonical semantic state for one dataset field or metric.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `field_id` | string (UUID) | yes | Unique field semantic id |
|
|
| `session_id` | string | yes | Parent session |
|
|
| `field_name` | string | yes | Physical field/metric name |
|
|
| `field_kind` | enum | yes | `column`, `metric`, `filter_dimension`, `parameter` |
|
|
| `verbose_name` | string \| null | no | Display label |
|
|
| `description` | text \| null | no | Human-readable description |
|
|
| `display_format` | string \| null | no | Formatting metadata such as d3 format |
|
|
| `provenance` | enum | yes | Final chosen source class |
|
|
| `source_id` | string \| null | no | Winning source |
|
|
| `confidence_rank` | integer \| null | no | Final applied ranking |
|
|
| `is_locked` | boolean | yes | Manual override protection |
|
|
| `has_conflict` | boolean | yes | Whether competing candidates exist |
|
|
| `needs_review` | boolean | yes | Whether user review is still needed |
|
|
| `last_changed_by` | enum | yes | `system`, `user`, `agent` |
|
|
| `user_feedback` | enum | no | User feedback: `up`, `down`, `null` |
|
|
| `created_at` | datetime | yes | Creation timestamp |
|
|
| `updated_at` | datetime | yes | Updated timestamp |
|
|
|
|
### Validation rules
|
|
- `field_name` must be unique per `session_id + field_kind`.
|
|
- `is_locked=true` prevents automatic overwrite.
|
|
- `provenance=manual_override` implies `is_locked=true`.
|
|
- `has_conflict=true` requires at least one competing candidate record.
|
|
- Fuzzy/applied inferred values must keep `needs_review=true` until confirmed if policy requires explicit review.
|
|
|
|
#### `FieldKind`
|
|
- `column`
|
|
- `metric`
|
|
- `filter_dimension`
|
|
- `parameter`
|
|
|
|
#### `FieldProvenance`
|
|
- `dictionary_exact`
|
|
- `reference_imported`
|
|
- `fuzzy_inferred`
|
|
- `ai_generated`
|
|
- `manual_override`
|
|
- `unresolved`
|
|
|
|
---
|
|
|
|
### Entity: `SemanticCandidate`
|
|
|
|
Stores competing candidate values before or alongside final field decision.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `candidate_id` | string (UUID) | yes | Unique candidate id |
|
|
| `field_id` | string | yes | Parent semantic field |
|
|
| `source_id` | string \| null | no | Candidate source |
|
|
| `candidate_rank` | integer | yes | Lower is stronger |
|
|
| `match_type` | enum | yes | Exact, imported, fuzzy, generated |
|
|
| `confidence_score` | number | yes | Normalized score |
|
|
| `proposed_verbose_name` | string \| null | no | Candidate verbose name |
|
|
| `proposed_description` | text \| null | no | Candidate description |
|
|
| `proposed_display_format` | string \| null | no | Candidate display format |
|
|
| `status` | enum | yes | Candidate lifecycle |
|
|
| `created_at` | datetime | yes | Creation timestamp |
|
|
|
|
#### `CandidateMatchType`
|
|
- `exact`
|
|
- `reference`
|
|
- `fuzzy`
|
|
- `generated`
|
|
|
|
#### `CandidateStatus`
|
|
- `proposed`
|
|
- `accepted`
|
|
- `rejected`
|
|
- `superseded`
|
|
|
|
---
|
|
|
|
## 5. Imported Filters and Runtime Variables
|
|
|
|
### Entity: `ImportedFilter`
|
|
|
|
Represents one recovered or user-supplied filter value.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `filter_id` | string (UUID) | yes | Unique filter id |
|
|
| `session_id` | string | yes | Parent session |
|
|
| `filter_name` | string | yes | Source filter name |
|
|
| `display_name` | string \| null | no | User-facing label |
|
|
| `raw_value` | json | yes | Original recovered value |
|
|
| `raw_value_masked` | boolean | yes | Whether the stored or exposed raw value has been masked/redacted for assistant or LLM-facing use |
|
|
| `normalized_value` | json \| null | no | Optional transformed value |
|
|
| `source` | enum | yes | Origin of the filter |
|
|
| `confidence_state` | enum | yes | Confidence/provenance class |
|
|
| `requires_confirmation` | boolean | yes | Whether explicit review is needed |
|
|
| `recovery_status` | enum | yes | Recovery completeness |
|
|
| `notes` | text \| null | no | Recovery explanation |
|
|
| `created_at` | datetime | yes | Creation timestamp |
|
|
| `updated_at` | datetime | yes | Updated timestamp |
|
|
|
|
#### `FilterSource`
|
|
- `superset_native`
|
|
- `superset_url`
|
|
- `manual`
|
|
- `inferred`
|
|
|
|
#### `FilterConfidenceState`
|
|
- `confirmed`
|
|
- `imported`
|
|
- `inferred`
|
|
- `ai_draft`
|
|
- `unresolved`
|
|
|
|
#### `FilterRecoveryStatus`
|
|
- `recovered`
|
|
- `partial`
|
|
- `missing`
|
|
- `conflicted`
|
|
|
|
### Validation rules
|
|
- `raw_value` may be stored for audit and replay, but any context passed into assistant or LLM-facing orchestration must use a masked/redacted representation when the value may contain PII or other sensitive identifiers.
|
|
- `raw_value_masked=true` is required whenever the exported assistant context omits or redacts sensitive substrings from the original filter payload.
|
|
- Masking policy must preserve enough structure for mapping and clarification, for example key shape, value type, cardinality hints, and non-sensitive tokens.
|
|
|
|
---
|
|
|
|
### Entity: `TemplateVariable`
|
|
|
|
Represents a runtime variable discovered from dataset execution logic.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `variable_id` | string (UUID) | yes | Unique variable id |
|
|
| `session_id` | string | yes | Parent session |
|
|
| `variable_name` | string | yes | Canonical runtime variable name |
|
|
| `expression_source` | text | yes | Raw expression or snippet where variable was found |
|
|
| `variable_kind` | enum | yes | Detected variable class |
|
|
| `is_required` | boolean | yes | Whether launch requires a mapped value |
|
|
| `default_value` | json \| null | no | Optional default |
|
|
| `mapping_status` | enum | yes | Current mapping state |
|
|
| `created_at` | datetime | yes | Creation timestamp |
|
|
| `updated_at` | datetime | yes | Updated timestamp |
|
|
|
|
#### `VariableKind`
|
|
- `native_filter`
|
|
- `parameter`
|
|
- `derived`
|
|
- `unknown`
|
|
|
|
#### `MappingStatus`
|
|
- `unmapped`
|
|
- `proposed`
|
|
- `approved`
|
|
- `overridden`
|
|
- `invalid`
|
|
|
|
---
|
|
|
|
## 6. Mapping Review and Warning Approvals
|
|
|
|
### Entity: `ExecutionMapping`
|
|
|
|
Represents the mapping between a recovered filter and a runtime variable.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `mapping_id` | string (UUID) | yes | Unique mapping id |
|
|
| `session_id` | string | yes | Parent session |
|
|
| `filter_id` | string | yes | Source imported filter |
|
|
| `variable_id` | string | yes | Target template variable |
|
|
| `mapping_method` | enum | yes | How mapping was produced |
|
|
| `raw_input_value` | json | yes | Original input |
|
|
| `effective_value` | json \| null | no | Value to send to preview/launch |
|
|
| `transformation_note` | text \| null | no | Explanation of normalization |
|
|
| `warning_level` | enum \| null | no | Warning classification if transformation is risky |
|
|
| `requires_explicit_approval` | boolean | yes | Whether launch gate applies |
|
|
| `approval_state` | enum | yes | Approval lifecycle |
|
|
| `approved_by_user_id` | string \| null | no | Approver if approved |
|
|
| `approved_at` | datetime \| null | no | Approval timestamp |
|
|
| `created_at` | datetime | yes | Creation timestamp |
|
|
| `updated_at` | datetime | yes | Updated timestamp |
|
|
|
|
### Validation rules
|
|
- `filter_id + variable_id` must be unique per session unless versioning is used.
|
|
- `requires_explicit_approval=true` implies launch is blocked while `approval_state != approved`.
|
|
- `effective_value` is required before preview when variable is required.
|
|
- user override should set `mapping_method=manual_override`.
|
|
|
|
#### `MappingMethod`
|
|
- `direct_match`
|
|
- `heuristic_match`
|
|
- `semantic_match`
|
|
- `manual_override`
|
|
|
|
#### `MappingWarningLevel`
|
|
- `low`
|
|
- `medium`
|
|
- `high`
|
|
|
|
#### `ApprovalState`
|
|
- `pending`
|
|
- `approved`
|
|
- `rejected`
|
|
- `not_required`
|
|
|
|
---
|
|
|
|
## 7. Clarification Workflow
|
|
|
|
### Entity: `ClarificationSession`
|
|
|
|
Stores resumable clarification flow state for one review session.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `clarification_session_id` | string (UUID) | yes | Unique clarification session id |
|
|
| `session_id` | string | yes | Parent review session |
|
|
| `status` | enum | yes | Clarification lifecycle |
|
|
| `current_question_id` | string \| null | no | Current active question |
|
|
| `resolved_count` | integer | yes | Count of answered/resolved items |
|
|
| `remaining_count` | integer | yes | Count of unresolved items |
|
|
| `summary_delta` | text \| null | no | Human-readable change summary |
|
|
| `started_at` | datetime | yes | Start time |
|
|
| `updated_at` | datetime | yes | Last update |
|
|
| `completed_at` | datetime \| null | no | End time |
|
|
|
|
#### `ClarificationStatus`
|
|
- `pending`
|
|
- `active`
|
|
- `paused`
|
|
- `completed`
|
|
- `cancelled`
|
|
|
|
---
|
|
|
|
### Entity: `ClarificationQuestion`
|
|
|
|
Represents one focused question in the clarification flow.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `question_id` | string (UUID) | yes | Unique question id |
|
|
| `clarification_session_id` | string | yes | Parent clarification session |
|
|
| `topic_ref` | string | yes | Related field/finding/mapping id |
|
|
| `question_text` | text | yes | Focused question |
|
|
| `why_it_matters` | text | yes | Business significance explanation |
|
|
| `current_guess` | text \| null | no | Best guess if available |
|
|
| `priority` | integer | yes | Order score |
|
|
| `state` | enum | yes | Question lifecycle |
|
|
| `created_at` | datetime | yes | Creation timestamp |
|
|
| `updated_at` | datetime | yes | Updated timestamp |
|
|
|
|
#### `QuestionState`
|
|
- `open`
|
|
- `answered`
|
|
- `skipped`
|
|
- `expert_review`
|
|
- `superseded`
|
|
|
|
---
|
|
|
|
### Entity: `ClarificationOption`
|
|
|
|
Suggested selectable answer option for a question.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `option_id` | string (UUID) | yes | Unique option id |
|
|
| `question_id` | string | yes | Parent question |
|
|
| `label` | string | yes | UI label |
|
|
| `value` | string | yes | Stored answer payload |
|
|
| `is_recommended` | boolean | yes | Whether this is the recommended option |
|
|
| `display_order` | integer | yes | UI ordering |
|
|
|
|
---
|
|
|
|
### Entity: `ClarificationAnswer`
|
|
|
|
Stores user response to one clarification question.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `answer_id` | string (UUID) | yes | Unique answer id |
|
|
| `question_id` | string | yes | Parent question |
|
|
| `answer_kind` | enum | yes | How user responded |
|
|
| `answer_value` | text \| null | no | Selected/custom answer |
|
|
| `answered_by_user_id` | string | yes | Responding user |
|
|
| `impact_summary` | text \| null | no | Optional summary of resulting state changes |
|
|
| `created_at` | datetime | yes | Answer timestamp |
|
|
|
|
#### `AnswerKind`
|
|
- `selected`
|
|
- `custom`
|
|
- `skipped`
|
|
- `expert_review`
|
|
|
|
### Validation rules
|
|
- Each active question may have at most one current answer.
|
|
- `custom` answers require non-empty `answer_value`.
|
|
- `selected` answers must correspond to a valid option or normalized payload.
|
|
- `expert_review` leaves the related topic unresolved but marked intentionally deferred.
|
|
|
|
---
|
|
|
|
## 8. Preview and Launch Audit
|
|
|
|
### Entity: `CompiledPreview`
|
|
|
|
Stores the exact Superset-returned compiled SQL preview.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `preview_id` | string (UUID) | yes | Unique preview id |
|
|
| `session_id` | string | yes | Parent session |
|
|
| `preview_status` | enum | yes | Preview lifecycle state |
|
|
| `compiled_sql` | text \| null | no | Exact compiled SQL if successful |
|
|
| `preview_fingerprint` | string | yes | Snapshot hash of mapping/inputs used |
|
|
| `compiled_by` | enum | yes | Must be `superset` |
|
|
| `error_code` | string \| null | no | Optional failure code |
|
|
| `error_details` | text \| null | no | Readable preview error |
|
|
| `compiled_at` | datetime \| null | no | Successful compile timestamp |
|
|
| `created_at` | datetime | yes | Record creation timestamp |
|
|
|
|
### Validation rules
|
|
- `compiled_by` must be `superset`.
|
|
- `compiled_sql` is required when `preview_status=ready`.
|
|
- `compiled_sql` must be null when `preview_status=failed` unless partial diagnostics are intentionally stored elsewhere.
|
|
- `preview_fingerprint` must be compared against current session inputs before launch.
|
|
- Launch requires `preview_status=ready` and matching current fingerprint.
|
|
|
|
#### `PreviewStatus`
|
|
- `pending`
|
|
- `ready`
|
|
- `failed`
|
|
- `stale`
|
|
|
|
---
|
|
|
|
### Entity: `DatasetRunContext`
|
|
|
|
Audited execution snapshot created at launch.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `run_context_id` | string (UUID) | yes | Unique run context id |
|
|
| `session_id` | string | yes | Parent review session |
|
|
| `dataset_ref` | string | yes | Canonical dataset identity |
|
|
| `environment_id` | string | yes | Execution environment |
|
|
| `preview_id` | string | yes | Bound compiled preview |
|
|
| `sql_lab_session_ref` | string | yes | Canonical SQL Lab reference |
|
|
| `effective_filters` | json | yes | Final filter payload |
|
|
| `template_params` | json | yes | Final template parameter object |
|
|
| `approved_mapping_ids` | json array | yes | Explicit approvals used for launch |
|
|
| `semantic_decision_refs` | json array | yes | Applied semantic decision references |
|
|
| `open_warning_refs` | json array | yes | Warnings that remained visible at launch |
|
|
| `launch_status` | enum | yes | Launch outcome |
|
|
| `launch_error` | text \| null | no | Error if launch failed |
|
|
| `created_at` | datetime | yes | Launch record timestamp |
|
|
|
|
### Validation rules
|
|
- `preview_id` must reference a `CompiledPreview` with `ready` status.
|
|
- `sql_lab_session_ref` is mandatory for successful launch.
|
|
- `effective_filters` and `template_params` must match the preview fingerprint used.
|
|
- `launch_status=started` or `success` requires a non-empty SQL Lab reference.
|
|
|
|
#### `LaunchStatus`
|
|
- `started`
|
|
- `success`
|
|
- `failed`
|
|
|
|
---
|
|
|
|
## 9. Export Projections
|
|
|
|
### Entity: `ExportArtifact`
|
|
|
|
Tracks generated exports for sharing documentation and validation outputs.
|
|
|
|
| Field | Type | Required | Description |
|
|
|---|---|---:|---|
|
|
| `artifact_id` | string (UUID) | yes | Unique artifact id |
|
|
| `session_id` | string | yes | Parent session |
|
|
| `artifact_type` | enum | yes | Export type |
|
|
| `format` | enum | yes | File/output format |
|
|
| `storage_ref` | string | yes | Storage/file reference |
|
|
| `created_by_user_id` | string | yes | Requesting user |
|
|
| `created_at` | datetime | yes | Artifact creation time |
|
|
|
|
#### `ArtifactType`
|
|
- `documentation`
|
|
- `validation_report`
|
|
- `run_summary`
|
|
|
|
#### `ArtifactFormat`
|
|
- `json`
|
|
- `markdown`
|
|
- `csv`
|
|
- `pdf`
|
|
|
|
---
|
|
|
|
## 10. Relationships
|
|
|
|
## One-to-one / aggregate-root relationships
|
|
- `DatasetReviewSession` → `DatasetProfile` (current active profile view)
|
|
- `DatasetReviewSession` → `ClarificationSession` (current or latest)
|
|
- `DatasetReviewSession` → `CompiledPreview` (latest/current preview)
|
|
- `DatasetReviewSession` → `DatasetRunContext` (latest/current launch audit)
|
|
|
|
## One-to-many relationships
|
|
- `DatasetReviewSession` → many `ValidationFinding`
|
|
- `DatasetReviewSession` → many `SemanticSource`
|
|
- `DatasetReviewSession` → many `SemanticFieldEntry`
|
|
- `SemanticFieldEntry` → many `SemanticCandidate`
|
|
- `DatasetReviewSession` → many `ImportedFilter`
|
|
- `DatasetReviewSession` → many `TemplateVariable`
|
|
- `DatasetReviewSession` → many `ExecutionMapping`
|
|
- `ClarificationSession` → many `ClarificationQuestion`
|
|
- `ClarificationQuestion` → many `ClarificationOption`
|
|
- `ClarificationQuestion` → zero/one current `ClarificationAnswer`
|
|
- `DatasetReviewSession` → many `ExportArtifact`
|
|
- `DatasetReviewSession` → many `SessionEvent`
|
|
- `DatasetReviewSession` → many `SessionEvent`
|
|
|
|
---
|
|
|
|
## 11. Derived Rules and Invariants
|
|
|
|
### Run readiness invariant
|
|
A session is `run_ready` only if:
|
|
- no open blocking findings remain,
|
|
- all required template variables have approved/effective mappings,
|
|
- all launch-sensitive mapping warnings have been explicitly approved,
|
|
- a non-stale `CompiledPreview` exists for the current fingerprint.
|
|
|
|
### Manual intent invariant
|
|
If a field is manually overridden:
|
|
- `SemanticFieldEntry.is_locked = true`
|
|
- `SemanticFieldEntry.provenance = manual_override`
|
|
- later imports or inferred candidates may be recorded, but cannot replace the active value automatically.
|
|
|
|
### Progressive recovery invariant
|
|
Partial Superset recovery must preserve usable state:
|
|
- imported filters may be `partial`,
|
|
- unresolved variables may remain `unmapped`,
|
|
- findings must explain what is still missing,
|
|
- session remains resumable.
|
|
|
|
### Clarification persistence invariant
|
|
Clarification answers must be persisted before:
|
|
- finding severity is downgraded,
|
|
- profile state is updated,
|
|
- current question pointer advances.
|
|
|
|
### Preview truth invariant
|
|
Compiled preview must be:
|
|
- generated by Superset,
|
|
- tied to the exact current effective inputs,
|
|
- treated as invalid if mappings/values change afterward.
|
|
|
|
---
|
|
|
|
## 12. Migration & Evolution Strategy
|
|
- **Baseline**: The initial implementation (Milestone 1) will include the core session and profile entities.
|
|
- **Incremental Growth**: Subsequent milestones will add clarification, mapping, and launch audit entities via standard SQLAlchemy migrations.
|
|
- **Compatibility**: The `DatasetReviewSession` aggregate root will remain the stable entry point for all sub-entities to ensure forward compatibility with saved user state.
|
|
|
|
## 13. Suggested Backend DTO Grouping
|
|
|
|
The future API and persistence layers should group models roughly as follows:
|
|
|
|
### Session DTOs
|
|
- `SessionSummary`
|
|
- `SessionDetail`
|
|
- `SessionListItem`
|
|
|
|
`SessionSummary` and `SessionDetail` should both surface the current `version` so frontend workspace state, collaborator actions, and assistant-driven mutations can use the same optimistic-lock boundary.
|
|
|
|
### Review DTOs
|
|
- `DatasetProfileDto`
|
|
- `ValidationFindingDto`
|
|
- `ReadinessChecklistDto`
|
|
|
|
### Semantic DTOs
|
|
- `SemanticSourceDto`
|
|
- `SemanticFieldEntryDto`
|
|
- `SemanticCandidateDto`
|
|
|
|
### Clarification DTOs
|
|
- `ClarificationSessionDto`
|
|
- `ClarificationQuestionDto`
|
|
- `ClarificationAnswerRequest`
|
|
|
|
### Execution DTOs
|
|
- `ImportedFilterDto`
|
|
- `TemplateVariableDto`
|
|
- `ExecutionMappingDto`
|
|
- `CompiledPreviewDto`
|
|
- `LaunchSummaryDto`
|
|
|
|
### Export DTOs
|
|
- `ExportArtifactDto`
|
|
|
|
---
|
|
|
|
## 13. Open Modeling Notes Resolved
|
|
|
|
The Phase 0 research questions are considered resolved for design purposes:
|
|
- SQL preview is modeled as a first-class persisted artifact.
|
|
- SQL Lab is modeled as the only canonical launch target.
|
|
- semantic resolution and clarification are modeled as separate domain boundaries.
|
|
- field-level overrides and mapping approvals are first-class entities.
|
|
- session persistence is separate from task execution state.
|
|
|
|
This model is ready to drive:
|
|
- [`contracts/modules.md`](./contracts/modules.md)
|
|
- [`contracts/api.yaml`](./contracts/api.yaml)
|
|
- [`quickstart.md`](./quickstart.md)
|