Таски готовы

2026-03-16 23:11:19 +03:00
parent 493a73827a
commit 9cae07a3b4
24 changed files with 10614 additions and 8733 deletions
--- a/specs/027-dataset-llm-orchestration/data-model.md
+++ b/specs/027-dataset-llm-orchestration/data-model.md
@@ -0,0 +1,764 @@
+# Data Model: LLM Dataset Orchestration
+
+**Feature**: [LLM Dataset Orchestration](./spec.md)  
+**Branch**: `027-dataset-llm-orchestration`  
+**Date**: 2026-03-16
+
+## Overview
+
+This document defines the domain entities, relationships, lifecycle states, and validation rules for the dataset review, semantic enrichment, clarification, preview, and launch workflow described in [`spec.md`](./spec.md) and grounded by the decisions in [`research.md`](./research.md).
+
+The model is intentionally split into:
+- **session aggregate** entities for resumable workflow state,
+- **semantic/provenance** entities for enrichment and conflict handling,
+- **execution** entities for mapping, preview, and launch audit,
+- **export** projections for sharing outputs.
+
+---
+
+## 1. Core Aggregate: DatasetReviewSession
+
+### Entity: `SessionCollaborator`
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `user_id` | string | yes | Collaborating user ID |
+| `role` | enum | yes | `viewer`, `reviewer`, `approver` |
+| `added_at` | datetime | yes | When they were added |
+
+### Entity: `DatasetReviewSession`
+
+Represents the top-level resumable workflow container for one dataset review/execution effort.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `session_id` | string (UUID) | yes | Stable unique identifier for the review session |
+| `user_id` | string | yes | Authenticated User ID of the session owner |
+| `collaborators` | list[SessionCollaborator] | no | Shared access and roles |
+| `environment_id` | string | yes | Superset environment context |
+| `source_kind` | enum | yes | Origin kind: `superset_link`, `dataset_selection` |
+| `source_input` | string | yes | Original link or selected dataset reference |
+| `dataset_ref` | string | yes | Canonical dataset reference used by the feature |
+| `dataset_id` | integer \| null | no | Superset dataset id when resolved |
+| `dashboard_id` | integer \| null | no | Superset dashboard id if imported from dashboard link |
+| `readiness_state` | enum | yes | Current workflow readiness state |
+| `recommended_action` | enum | yes | Explicit next recommended action |
+| `status` | enum | yes | Session lifecycle status |
+| `current_phase` | enum | yes | Active workflow phase |
+| `active_task_id` | string \| null | no | Linked long-running task if one is active |
+| `last_preview_id` | string \| null | no | Most recent preview snapshot |
+| `last_run_context_id` | string \| null | no | Most recent launch audit record |
+| `created_at` | datetime | yes | Session creation timestamp |
+| `updated_at` | datetime | yes | Last mutation timestamp |
+| `last_activity_at` | datetime | yes | Last user/system activity timestamp |
+| `closed_at` | datetime \| null | no | Terminal close/archive timestamp |
+
+### Validation rules
+- `session_id` must be globally unique.
+- `source_input` must be non-empty.
+- `environment_id` must resolve to a configured environment.
+- `readiness_state` and `recommended_action` must always be present.
+- `user_id` ownership must be enforced for all mutations, unless collaborator roles allow otherwise.
+- `dataset_id` becomes required before preview or launch phases.
+- `last_preview_id` must refer to a preview generated from the same session.
+
+### Enums
+
+#### `SessionStatus`
+- `active`
+- `paused`
+- `completed`
+- `archived`
+- `cancelled`
+
+#### `SessionPhase`
+- `intake`
+- `recovery`
+- `review`
+- `semantic_review`
+- `clarification`
+- `mapping_review`
+- `preview`
+- `launch`
+- `post_run`
+
+#### `ReadinessState`
+- `empty`
+- `importing`
+- `review_ready`
+- `semantic_source_review_needed`
+- `clarification_needed`
+- `clarification_active`
+- `mapping_review_needed`
+- `compiled_preview_ready`
+- `partially_ready`
+- `run_ready`
+- `run_in_progress`
+- `completed`
+- `recovery_required`
+
+#### `RecommendedAction`
+- `import_from_superset`
+- `review_documentation`
+- `apply_semantic_source`
+- `start_clarification`
+- `answer_next_question`
+- `approve_mapping`
+- `generate_sql_preview`
+- `complete_required_values`
+- `launch_dataset`
+- `resume_session`
+- `export_outputs`
+
+---
+
+## 2. Dataset Profile and Review State
+
+### Entity: `DatasetProfile`
+
+Consolidated interpretation of dataset meaning, semantics, filters, assumptions, and readiness.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `profile_id` | string (UUID) | yes | Unique profile id |
+| `session_id` | string | yes | Parent session |
+| `dataset_name` | string | yes | Display dataset name |
+| `schema_name` | string \| null | no | Schema if available |
+| `database_name` | string \| null | no | Database if available |
+| `business_summary` | text | yes | Human-readable summary |
+| `business_summary_source` | enum | yes | Provenance of summary |
+| `description` | text \| null | no | Dataset-level description |
+| `dataset_type` | enum \| null | no | `table`, `virtual`, `sqllab_view`, `unknown` |
+| `is_sqllab_view` | boolean | yes | Whether dataset is SQL Lab derived |
+| `completeness_score` | number \| null | no | Optional normalized completeness score |
+| `confidence_state` | enum | yes | Overall confidence posture |
+| `has_blocking_findings` | boolean | yes | Derived summary flag |
+| `has_warning_findings` | boolean | yes | Derived summary flag |
+| `manual_summary_locked` | boolean | yes | Protects user-entered summary |
+| `created_at` | datetime | yes | Created timestamp |
+| `updated_at` | datetime | yes | Updated timestamp |
+
+### Validation rules
+- `business_summary` must always contain a usable string; if weak, it may be skeletal but not null.
+- `manual_summary_locked=true` prevents later automatic overwrite.
+- `session_id` must be unique if only one active profile snapshot is stored per session, or versioned if snapshots are retained.
+- `confidence_state` must reflect highest unresolved-risk posture, not just optimistic confidence.
+
+#### `BusinessSummarySource`
+- `confirmed`
+- `imported`
+- `inferred`
+- `ai_draft`
+- `manual_override`
+
+#### `ConfidenceState`
+- `confirmed`
+- `mostly_confirmed`
+- `mixed`
+- `low_confidence`
+- `unresolved`
+
+---
+
+## 3. Validation Findings
+
+### Entity: `ValidationFinding`
+
+Represents a blocking issue, warning, or informational observation.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `finding_id` | string (UUID) | yes | Unique finding id |
+| `session_id` | string | yes | Parent session |
+| `area` | enum | yes | Affected domain area |
+| `severity` | enum | yes | `blocking`, `warning`, `informational` |
+| `code` | string | yes | Stable machine-readable finding code |
+| `title` | string | yes | Short label |
+| `message` | text | yes | Actionable human-readable explanation |
+| `resolution_state` | enum | yes | Current resolution status |
+| `resolution_note` | text \| null | no | Optional explanation or approval note |
+| `caused_by_ref` | string \| null | no | Related field/filter/mapping/question id |
+| `created_at` | datetime | yes | Creation timestamp |
+| `resolved_at` | datetime \| null | no | Resolution timestamp |
+
+### Validation rules
+- `severity` must be one of the allowed values.
+- `resolution_state=resolved` or `approved` requires either a system resolution event or user action.
+- `launch` is blocked if any open `blocking` finding remains.
+- `warning` findings tied to mapping transformations require explicit approval before launch if marked launch-sensitive.
+
+#### `FindingArea`
+- `source_intake`
+- `dataset_profile`
+- `semantic_enrichment`
+- `clarification`
+- `filter_recovery`
+- `template_mapping`
+- `compiled_preview`
+- `launch`
+- `audit`
+
+#### `ResolutionState`
+- `open`
+- `resolved`
+- `approved`
+- `skipped`
+- `deferred`
+- `expert_review`
+
+---
+
+## 4. Semantic Source and Field Decisions
+
+### Entity: `SemanticSource`
+
+Represents a trusted or candidate source of semantic metadata.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `source_id` | string (UUID) | yes | Unique source id |
+| `session_id` | string | yes | Parent session |
+| `source_type` | enum | yes | Origin kind |
+| `source_ref` | string | yes | External reference, dataset ref, or uploaded artifact ref |
+| `source_version` | string | yes | Version/Snapshot for propagation tracking |
+| `display_name` | string | yes | Human-readable source name |
+| `trust_level` | enum | yes | Source trust tier |
+| `schema_overlap_score` | number \| null | no | Optional overlap signal |
+| `status` | enum | yes | Availability/applicability status |
+| `created_at` | datetime | yes | Creation timestamp |
+
+#### `SemanticSourceType`
+- `uploaded_file`
+- `connected_dictionary`
+- `reference_dataset`
+- `neighbor_dataset`
+- `ai_generated`
+
+#### `TrustLevel`
+- `trusted`
+- `recommended`
+- `candidate`
+- `generated`
+
+#### `SemanticSourceStatus`
+- `available`
+- `selected`
+- `applied`
+- `rejected`
+- `partial`
+- `failed`
+
+---
+
+### Entity: `SemanticFieldEntry`
+
+Canonical semantic state for one dataset field or metric.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `field_id` | string (UUID) | yes | Unique field semantic id |
+| `session_id` | string | yes | Parent session |
+| `field_name` | string | yes | Physical field/metric name |
+| `field_kind` | enum | yes | `column`, `metric`, `filter_dimension`, `parameter` |
+| `verbose_name` | string \| null | no | Display label |
+| `description` | text \| null | no | Human-readable description |
+| `display_format` | string \| null | no | Formatting metadata such as d3 format |
+| `provenance` | enum | yes | Final chosen source class |
+| `source_id` | string \| null | no | Winning source |
+| `confidence_rank` | integer \| null | no | Final applied ranking |
+| `is_locked` | boolean | yes | Manual override protection |
+| `has_conflict` | boolean | yes | Whether competing candidates exist |
+| `needs_review` | boolean | yes | Whether user review is still needed |
+| `last_changed_by` | enum | yes | `system`, `user`, `agent` |
+| `user_feedback` | enum | no | User feedback: `up`, `down`, `null` |
+| `created_at` | datetime | yes | Creation timestamp |
+| `updated_at` | datetime | yes | Updated timestamp |
+
+### Validation rules
+- `field_name` must be unique per `session_id + field_kind`.
+- `is_locked=true` prevents automatic overwrite.
+- `provenance=manual_override` implies `is_locked=true`.
+- `has_conflict=true` requires at least one competing candidate record.
+- Fuzzy/applied inferred values must keep `needs_review=true` until confirmed if policy requires explicit review.
+
+#### `FieldKind`
+- `column`
+- `metric`
+- `filter_dimension`
+- `parameter`
+
+#### `FieldProvenance`
+- `dictionary_exact`
+- `reference_imported`
+- `fuzzy_inferred`
+- `ai_generated`
+- `manual_override`
+- `unresolved`
+
+---
+
+### Entity: `SemanticCandidate`
+
+Stores competing candidate values before or alongside final field decision.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `candidate_id` | string (UUID) | yes | Unique candidate id |
+| `field_id` | string | yes | Parent semantic field |
+| `source_id` | string \| null | no | Candidate source |
+| `candidate_rank` | integer | yes | Lower is stronger |
+| `match_type` | enum | yes | Exact, imported, fuzzy, generated |
+| `confidence_score` | number | yes | Normalized score |
+| `proposed_verbose_name` | string \| null | no | Candidate verbose name |
+| `proposed_description` | text \| null | no | Candidate description |
+| `proposed_display_format` | string \| null | no | Candidate display format |
+| `status` | enum | yes | Candidate lifecycle |
+| `created_at` | datetime | yes | Creation timestamp |
+
+#### `CandidateMatchType`
+- `exact`
+- `reference`
+- `fuzzy`
+- `generated`
+
+#### `CandidateStatus`
+- `proposed`
+- `accepted`
+- `rejected`
+- `superseded`
+
+---
+
+## 5. Imported Filters and Runtime Variables
+
+### Entity: `ImportedFilter`
+
+Represents one recovered or user-supplied filter value.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `filter_id` | string (UUID) | yes | Unique filter id |
+| `session_id` | string | yes | Parent session |
+| `filter_name` | string | yes | Source filter name |
+| `display_name` | string \| null | no | User-facing label |
+| `raw_value` | json | yes | Original recovered value |
+| `normalized_value` | json \| null | no | Optional transformed value |
+| `source` | enum | yes | Origin of the filter |
+| `confidence_state` | enum | yes | Confidence/provenance class |
+| `requires_confirmation` | boolean | yes | Whether explicit review is needed |
+| `recovery_status` | enum | yes | Recovery completeness |
+| `notes` | text \| null | no | Recovery explanation |
+| `created_at` | datetime | yes | Creation timestamp |
+| `updated_at` | datetime | yes | Updated timestamp |
+
+#### `FilterSource`
+- `superset_native`
+- `superset_url`
+- `manual`
+- `inferred`
+
+#### `FilterConfidenceState`
+- `confirmed`
+- `imported`
+- `inferred`
+- `ai_draft`
+- `unresolved`
+
+#### `FilterRecoveryStatus`
+- `recovered`
+- `partial`
+- `missing`
+- `conflicted`
+
+---
+
+### Entity: `TemplateVariable`
+
+Represents a runtime variable discovered from dataset execution logic.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `variable_id` | string (UUID) | yes | Unique variable id |
+| `session_id` | string | yes | Parent session |
+| `variable_name` | string | yes | Canonical runtime variable name |
+| `expression_source` | text | yes | Raw expression or snippet where variable was found |
+| `variable_kind` | enum | yes | Detected variable class |
+| `is_required` | boolean | yes | Whether launch requires a mapped value |
+| `default_value` | json \| null | no | Optional default |
+| `mapping_status` | enum | yes | Current mapping state |
+| `created_at` | datetime | yes | Creation timestamp |
+| `updated_at` | datetime | yes | Updated timestamp |
+
+#### `VariableKind`
+- `native_filter`
+- `parameter`
+- `derived`
+- `unknown`
+
+#### `MappingStatus`
+- `unmapped`
+- `proposed`
+- `approved`
+- `overridden`
+- `invalid`
+
+---
+
+## 6. Mapping Review and Warning Approvals
+
+### Entity: `ExecutionMapping`
+
+Represents the mapping between a recovered filter and a runtime variable.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `mapping_id` | string (UUID) | yes | Unique mapping id |
+| `session_id` | string | yes | Parent session |
+| `filter_id` | string | yes | Source imported filter |
+| `variable_id` | string | yes | Target template variable |
+| `mapping_method` | enum | yes | How mapping was produced |
+| `raw_input_value` | json | yes | Original input |
+| `effective_value` | json \| null | no | Value to send to preview/launch |
+| `transformation_note` | text \| null | no | Explanation of normalization |
+| `warning_level` | enum \| null | no | Warning classification if transformation is risky |
+| `requires_explicit_approval` | boolean | yes | Whether launch gate applies |
+| `approval_state` | enum | yes | Approval lifecycle |
+| `approved_by_user_id` | string \| null | no | Approver if approved |
+| `approved_at` | datetime \| null | no | Approval timestamp |
+| `created_at` | datetime | yes | Creation timestamp |
+| `updated_at` | datetime | yes | Updated timestamp |
+
+### Validation rules
+- `filter_id + variable_id` must be unique per session unless versioning is used.
+- `requires_explicit_approval=true` implies launch is blocked while `approval_state != approved`.
+- `effective_value` is required before preview when variable is required.
+- user override should set `mapping_method=manual_override`.
+
+#### `MappingMethod`
+- `direct_match`
+- `heuristic_match`
+- `semantic_match`
+- `manual_override`
+
+#### `MappingWarningLevel`
+- `low`
+- `medium`
+- `high`
+
+#### `ApprovalState`
+- `pending`
+- `approved`
+- `rejected`
+- `not_required`
+
+---
+
+## 7. Clarification Workflow
+
+### Entity: `ClarificationSession`
+
+Stores resumable clarification flow state for one review session.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `clarification_session_id` | string (UUID) | yes | Unique clarification session id |
+| `session_id` | string | yes | Parent review session |
+| `status` | enum | yes | Clarification lifecycle |
+| `current_question_id` | string \| null | no | Current active question |
+| `resolved_count` | integer | yes | Count of answered/resolved items |
+| `remaining_count` | integer | yes | Count of unresolved items |
+| `summary_delta` | text \| null | no | Human-readable change summary |
+| `started_at` | datetime | yes | Start time |
+| `updated_at` | datetime | yes | Last update |
+| `completed_at` | datetime \| null | no | End time |
+
+#### `ClarificationStatus`
+- `pending`
+- `active`
+- `paused`
+- `completed`
+- `cancelled`
+
+---
+
+### Entity: `ClarificationQuestion`
+
+Represents one focused question in the clarification flow.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `question_id` | string (UUID) | yes | Unique question id |
+| `clarification_session_id` | string | yes | Parent clarification session |
+| `topic_ref` | string | yes | Related field/finding/mapping id |
+| `question_text` | text | yes | Focused question |
+| `why_it_matters` | text | yes | Business significance explanation |
+| `current_guess` | text \| null | no | Best guess if available |
+| `priority` | integer | yes | Order score |
+| `state` | enum | yes | Question lifecycle |
+| `created_at` | datetime | yes | Creation timestamp |
+| `updated_at` | datetime | yes | Updated timestamp |
+
+#### `QuestionState`
+- `open`
+- `answered`
+- `skipped`
+- `expert_review`
+- `superseded`
+
+---
+
+### Entity: `ClarificationOption`
+
+Suggested selectable answer option for a question.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `option_id` | string (UUID) | yes | Unique option id |
+| `question_id` | string | yes | Parent question |
+| `label` | string | yes | UI label |
+| `value` | string | yes | Stored answer payload |
+| `is_recommended` | boolean | yes | Whether this is the recommended option |
+| `display_order` | integer | yes | UI ordering |
+
+---
+
+### Entity: `ClarificationAnswer`
+
+Stores user response to one clarification question.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `answer_id` | string (UUID) | yes | Unique answer id |
+| `question_id` | string | yes | Parent question |
+| `answer_kind` | enum | yes | How user responded |
+| `answer_value` | text \| null | no | Selected/custom answer |
+| `answered_by_user_id` | string | yes | Responding user |
+| `impact_summary` | text \| null | no | Optional summary of resulting state changes |
+| `created_at` | datetime | yes | Answer timestamp |
+
+#### `AnswerKind`
+- `selected`
+- `custom`
+- `skipped`
+- `expert_review`
+
+### Validation rules
+- Each active question may have at most one current answer.
+- `custom` answers require non-empty `answer_value`.
+- `selected` answers must correspond to a valid option or normalized payload.
+- `expert_review` leaves the related topic unresolved but marked intentionally deferred.
+
+---
+
+## 8. Preview and Launch Audit
+
+### Entity: `CompiledPreview`
+
+Stores the exact Superset-returned compiled SQL preview.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `preview_id` | string (UUID) | yes | Unique preview id |
+| `session_id` | string | yes | Parent session |
+| `preview_status` | enum | yes | Preview lifecycle state |
+| `compiled_sql` | text \| null | no | Exact compiled SQL if successful |
+| `preview_fingerprint` | string | yes | Snapshot hash of mapping/inputs used |
+| `compiled_by` | enum | yes | Must be `superset` |
+| `error_code` | string \| null | no | Optional failure code |
+| `error_details` | text \| null | no | Readable preview error |
+| `compiled_at` | datetime \| null | no | Successful compile timestamp |
+| `created_at` | datetime | yes | Record creation timestamp |
+
+### Validation rules
+- `compiled_by` must be `superset`.
+- `compiled_sql` is required when `preview_status=ready`.
+- `compiled_sql` must be null when `preview_status=failed` unless partial diagnostics are intentionally stored elsewhere.
+- `preview_fingerprint` must be compared against current session inputs before launch.
+- Launch requires `preview_status=ready` and matching current fingerprint.
+
+#### `PreviewStatus`
+- `pending`
+- `ready`
+- `failed`
+- `stale`
+
+---
+
+### Entity: `DatasetRunContext`
+
+Audited execution snapshot created at launch.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `run_context_id` | string (UUID) | yes | Unique run context id |
+| `session_id` | string | yes | Parent review session |
+| `dataset_ref` | string | yes | Canonical dataset identity |
+| `environment_id` | string | yes | Execution environment |
+| `preview_id` | string | yes | Bound compiled preview |
+| `sql_lab_session_ref` | string | yes | Canonical SQL Lab reference |
+| `effective_filters` | json | yes | Final filter payload |
+| `template_params` | json | yes | Final template parameter object |
+| `approved_mapping_ids` | json array | yes | Explicit approvals used for launch |
+| `semantic_decision_refs` | json array | yes | Applied semantic decision references |
+| `open_warning_refs` | json array | yes | Warnings that remained visible at launch |
+| `launch_status` | enum | yes | Launch outcome |
+| `launch_error` | text \| null | no | Error if launch failed |
+| `created_at` | datetime | yes | Launch record timestamp |
+
+### Validation rules
+- `preview_id` must reference a `CompiledPreview` with `ready` status.
+- `sql_lab_session_ref` is mandatory for successful launch.
+- `effective_filters` and `template_params` must match the preview fingerprint used.
+- `launch_status=started` or `success` requires a non-empty SQL Lab reference.
+
+#### `LaunchStatus`
+- `started`
+- `success`
+- `failed`
+
+---
+
+## 9. Export Projections
+
+### Entity: `ExportArtifact`
+
+Tracks generated exports for sharing documentation and validation outputs.
+
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `artifact_id` | string (UUID) | yes | Unique artifact id |
+| `session_id` | string | yes | Parent session |
+| `artifact_type` | enum | yes | Export type |
+| `format` | enum | yes | File/output format |
+| `storage_ref` | string | yes | Storage/file reference |
+| `created_by_user_id` | string | yes | Requesting user |
+| `created_at` | datetime | yes | Artifact creation time |
+
+#### `ArtifactType`
+- `documentation`
+- `validation_report`
+- `run_summary`
+
+#### `ArtifactFormat`
+- `json`
+- `markdown`
+- `csv`
+- `pdf`
+
+---
+
+## 10. Relationships
+
+## One-to-one / aggregate-root relationships
+- `DatasetReviewSession` → `DatasetProfile` (current active profile view)
+- `DatasetReviewSession` → `ClarificationSession` (current or latest)
+- `DatasetReviewSession` → `CompiledPreview` (latest/current preview)
+- `DatasetReviewSession` → `DatasetRunContext` (latest/current launch audit)
+
+## One-to-many relationships
+- `DatasetReviewSession` → many `ValidationFinding`
+- `DatasetReviewSession` → many `SemanticSource`
+- `DatasetReviewSession` → many `SemanticFieldEntry`
+- `SemanticFieldEntry` → many `SemanticCandidate`
+- `DatasetReviewSession` → many `ImportedFilter`
+- `DatasetReviewSession` → many `TemplateVariable`
+- `DatasetReviewSession` → many `ExecutionMapping`
+- `ClarificationSession` → many `ClarificationQuestion`
+- `ClarificationQuestion` → many `ClarificationOption`
+- `ClarificationQuestion` → zero/one current `ClarificationAnswer`
+- `DatasetReviewSession` → many `ExportArtifact`
+- `DatasetReviewSession` → many `SessionEvent`
+- `DatasetReviewSession` → many `SessionEvent`
+
+---
+
+## 11. Derived Rules and Invariants
+
+### Run readiness invariant
+A session is `run_ready` only if:
+- no open blocking findings remain,
+- all required template variables have approved/effective mappings,
+- all launch-sensitive mapping warnings have been explicitly approved,
+- a non-stale `CompiledPreview` exists for the current fingerprint.
+
+### Manual intent invariant
+If a field is manually overridden:
+- `SemanticFieldEntry.is_locked = true`
+- `SemanticFieldEntry.provenance = manual_override`
+- later imports or inferred candidates may be recorded, but cannot replace the active value automatically.
+
+### Progressive recovery invariant
+Partial Superset recovery must preserve usable state:
+- imported filters may be `partial`,
+- unresolved variables may remain `unmapped`,
+- findings must explain what is still missing,
+- session remains resumable.
+
+### Clarification persistence invariant
+Clarification answers must be persisted before:
+- finding severity is downgraded,
+- profile state is updated,
+- current question pointer advances.
+
+### Preview truth invariant
+Compiled preview must be:
+- generated by Superset,
+- tied to the exact current effective inputs,
+- treated as invalid if mappings/values change afterward.
+
+---
+
+## 12. Migration & Evolution Strategy
+- **Baseline**: The initial implementation (Milestone 1) will include the core session and profile entities.
+- **Incremental Growth**: Subsequent milestones will add clarification, mapping, and launch audit entities via standard SQLAlchemy migrations.
+- **Compatibility**: The `DatasetReviewSession` aggregate root will remain the stable entry point for all sub-entities to ensure forward compatibility with saved user state.
+
+## 13. Suggested Backend DTO Grouping
+
+The future API and persistence layers should group models roughly as follows:
+
+### Session DTOs
+- `SessionSummary`
+- `SessionDetail`
+- `SessionListItem`
+
+### Review DTOs
+- `DatasetProfileDto`
+- `ValidationFindingDto`
+- `ReadinessChecklistDto`
+
+### Semantic DTOs
+- `SemanticSourceDto`
+- `SemanticFieldEntryDto`
+- `SemanticCandidateDto`
+
+### Clarification DTOs
+- `ClarificationSessionDto`
+- `ClarificationQuestionDto`
+- `ClarificationAnswerRequest`
+
+### Execution DTOs
+- `ImportedFilterDto`
+- `TemplateVariableDto`
+- `ExecutionMappingDto`
+- `CompiledPreviewDto`
+- `LaunchSummaryDto`
+
+### Export DTOs
+- `ExportArtifactDto`
+
+---
+
+## 13. Open Modeling Notes Resolved
+
+The Phase 0 research questions are considered resolved for design purposes:
+- SQL preview is modeled as a first-class persisted artifact.
+- SQL Lab is modeled as the only canonical launch target.
+- semantic resolution and clarification are modeled as separate domain boundaries.
+- field-level overrides and mapping approvals are first-class entities.
+- session persistence is separate from task execution state.
+
+This model is ready to drive:
+- [`contracts/modules.md`](./contracts/modules.md)
+- [`contracts/api.yaml`](./contracts/api.yaml)
+- [`quickstart.md`](./quickstart.md)