ss-tools/specs/027-dataset-llm-orchestration/data-model.md

# Data Model: LLM Dataset Orchestration

**Feature**: [LLM Dataset Orchestration](./spec.md)
**Branch**: `027-dataset-llm-orchestration`
**Date**: 2026-03-16

## Overview

This document defines the domain entities, relationships, lifecycle states, and validation rules for the dataset review, semantic enrichment, clarification, preview, and launch workflow described in [`spec.md`](./spec.md) and grounded by the decisions in [`research.md`](./research.md).

The model is intentionally split into:
- **session aggregate** entities for resumable workflow state,
- **semantic/provenance** entities for enrichment and conflict handling,
- **execution** entities for mapping, preview, and launch audit,
- **export** projections for sharing outputs.

---

## 1. Core Aggregate: DatasetReviewSession

### Entity: `SessionCollaborator`

| Field | Type | Required | Description |
|---|---|---:|---|
| `user_id` | string | yes | Collaborating user ID |
| `role` | enum | yes | `viewer`, `reviewer`, `approver` |
| `added_at` | datetime | yes | When they were added |

### Entity: `DatasetReviewSession`

Represents the top-level resumable workflow container for one dataset review/execution effort.

| Field | Type | Required | Description |
|---|---|---:|---|
| `session_id` | string (UUID) | yes | Stable unique identifier for the review session |
| `user_id` | string | yes | Authenticated User ID of the session owner |
| `collaborators` | list[SessionCollaborator] | no | Shared access and roles |
| `environment_id` | string | yes | Superset environment context |
| `source_kind` | enum | yes | Origin kind: `superset_link`, `dataset_selection` |
| `source_input` | string | yes | Original link or selected dataset reference |
| `dataset_ref` | string | yes | Canonical dataset reference used by the feature |
| `dataset_id` | integer \| null | no | Superset dataset id when resolved |
| `dashboard_id` | integer \| null | no | Superset dashboard id if imported from dashboard link |
| `readiness_state` | enum | yes | Current workflow readiness state |
| `recommended_action` | enum | yes | Explicit next recommended action |
| `version` | integer | yes | Optimistic-lock version incremented on every persisted session mutation |
| `status` | enum | yes | Session lifecycle status |
| `current_phase` | enum | yes | Active workflow phase |
| `active_task_id` | string \| null | no | Linked long-running task if one is active |
| `last_preview_id` | string \| null | no | Most recent preview snapshot |
| `last_run_context_id` | string \| null | no | Most recent launch audit record |
| `created_at` | datetime | yes | Session creation timestamp |
| `updated_at` | datetime | yes | Last mutation timestamp |
| `last_activity_at` | datetime | yes | Last user/system activity timestamp |
| `closed_at` | datetime \| null | no | Terminal close/archive timestamp |

### Validation rules
- `session_id` must be globally unique.
- `source_input` must be non-empty.
- `environment_id` must resolve to a configured environment.
- `readiness_state` and `recommended_action` must always be present.
- `version` starts at `0` on session creation and increments monotonically after every successful session mutation.
- `user_id` ownership must be enforced for all mutations, unless collaborator roles allow otherwise.
- `dataset_id` becomes required before preview or launch phases.
- `last_preview_id` must refer to a preview generated from the same session.
- Mutating requests must include the caller's last observed session version; mismatches are rejected as optimistic-lock conflicts rather than silently merged.

### Enums

#### `SessionStatus`
- `active`
- `paused`
- `completed`
- `archived`
- `cancelled`

#### `SessionPhase`
- `intake`
- `recovery`
- `review`
- `semantic_review`
- `clarification`
- `mapping_review`
- `preview`
- `launch`
- `post_run`

#### `ReadinessState`
- `empty`
- `importing`
- `review_ready`
- `semantic_source_review_needed`
- `clarification_needed`
- `clarification_active`
- `mapping_review_needed`
- `compiled_preview_ready`
- `partially_ready`
- `run_ready`
- `run_in_progress`
- `completed`
- `recovery_required`

#### `RecommendedAction`
- `import_from_superset`
- `review_documentation`
- `apply_semantic_source`
- `start_clarification`
- `answer_next_question`
- `approve_mapping`
- `generate_sql_preview`
- `complete_required_values`
- `launch_dataset`
- `resume_session`
- `export_outputs`

---

## 2. Dataset Profile and Review State

### Entity: `DatasetProfile`

Consolidated interpretation of dataset meaning, semantics, filters, assumptions, and readiness.

| Field | Type | Required | Description |
|---|---|---:|---|
| `profile_id` | string (UUID) | yes | Unique profile id |
| `session_id` | string | yes | Parent session |
| `dataset_name` | string | yes | Display dataset name |
| `schema_name` | string \| null | no | Schema if available |
| `database_name` | string \| null | no | Database if available |
| `business_summary` | text | yes | Human-readable summary |
| `business_summary_source` | enum | yes | Provenance of summary |
| `description` | text \| null | no | Dataset-level description |
| `dataset_type` | enum \| null | no | `table`, `virtual`, `sqllab_view`, `unknown` |
| `is_sqllab_view` | boolean | yes | Whether dataset is SQL Lab derived |
| `completeness_score` | number \| null | no | Optional normalized completeness score |
| `confidence_state` | enum | yes | Overall confidence posture |
| `has_blocking_findings` | boolean | yes | Derived summary flag |
| `has_warning_findings` | boolean | yes | Derived summary flag |
| `manual_summary_locked` | boolean | yes | Protects user-entered summary |
| `created_at` | datetime | yes | Created timestamp |
| `updated_at` | datetime | yes | Updated timestamp |

### Validation rules
- `business_summary` must always contain a usable string; if weak, it may be skeletal but not null.
- `manual_summary_locked=true` prevents later automatic overwrite.
- `session_id` must be unique if only one active profile snapshot is stored per session, or versioned if snapshots are retained.
- `confidence_state` must reflect highest unresolved-risk posture, not just optimistic confidence.

#### `BusinessSummarySource`
- `confirmed`
- `imported`
- `inferred`
- `ai_draft`
- `manual_override`

#### `ConfidenceState`
- `confirmed`
- `mostly_confirmed`
- `mixed`
- `low_confidence`
- `unresolved`

---

## 3. Validation Findings

### Entity: `ValidationFinding`

Represents a blocking issue, warning, or informational observation.

| Field | Type | Required | Description |
|---|---|---:|---|
| `finding_id` | string (UUID) | yes | Unique finding id |
| `session_id` | string | yes | Parent session |
| `area` | enum | yes | Affected domain area |
| `severity` | enum | yes | `blocking`, `warning`, `informational` |
| `code` | string | yes | Stable machine-readable finding code |
| `title` | string | yes | Short label |
| `message` | text | yes | Actionable human-readable explanation |
| `resolution_state` | enum | yes | Current resolution status |
| `resolution_note` | text \| null | no | Optional explanation or approval note |
| `caused_by_ref` | string \| null | no | Related field/filter/mapping/question id |
| `created_at` | datetime | yes | Creation timestamp |
| `resolved_at` | datetime \| null | no | Resolution timestamp |

### Validation rules
- `severity` must be one of the allowed values.
- `resolution_state=resolved` or `approved` requires either a system resolution event or user action.
- `launch` is blocked if any open `blocking` finding remains.
- `warning` findings tied to mapping transformations require explicit approval before launch if marked launch-sensitive.

#### `FindingArea`
- `source_intake`
- `dataset_profile`
- `semantic_enrichment`
- `clarification`
- `filter_recovery`
- `template_mapping`
- `compiled_preview`
- `launch`
- `audit`

#### `ResolutionState`
- `open`
- `resolved`
- `approved`
- `skipped`
- `deferred`
- `expert_review`

---

## 4. Semantic Source and Field Decisions

### Entity: `SemanticSource`

Represents a trusted or candidate source of semantic metadata.

| Field | Type | Required | Description |
|---|---|---:|---|
| `source_id` | string (UUID) | yes | Unique source id |
| `session_id` | string | yes | Parent session |
| `source_type` | enum | yes | Origin kind |
| `source_ref` | string | yes | External reference, dataset ref, or uploaded artifact ref |
| `source_version` | string | yes | Version/Snapshot for propagation tracking |
| `display_name` | string | yes | Human-readable source name |
| `trust_level` | enum | yes | Source trust tier |
| `schema_overlap_score` | number \| null | no | Optional overlap signal |
| `status` | enum | yes | Availability/applicability status |
| `created_at` | datetime | yes | Creation timestamp |

#### `SemanticSourceType`
- `uploaded_file`
- `connected_dictionary`
- `reference_dataset`
- `neighbor_dataset`
- `ai_generated`

#### `TrustLevel`
- `trusted`
- `recommended`
- `candidate`
- `generated`

#### `SemanticSourceStatus`
- `available`
- `selected`
- `applied`
- `rejected`
- `partial`
- `failed`

---

### Entity: `SemanticFieldEntry`

Canonical semantic state for one dataset field or metric.

| Field | Type | Required | Description |
|---|---|---:|---|
| `field_id` | string (UUID) | yes | Unique field semantic id |
| `session_id` | string | yes | Parent session |
| `field_name` | string | yes | Physical field/metric name |
| `field_kind` | enum | yes | `column`, `metric`, `filter_dimension`, `parameter` |
| `verbose_name` | string \| null | no | Display label |
| `description` | text \| null | no | Human-readable description |
| `display_format` | string \| null | no | Formatting metadata such as d3 format |
| `provenance` | enum | yes | Final chosen source class |
| `source_id` | string \| null | no | Winning source |
| `confidence_rank` | integer \| null | no | Final applied ranking |
| `is_locked` | boolean | yes | Manual override protection |
| `has_conflict` | boolean | yes | Whether competing candidates exist |
| `needs_review` | boolean | yes | Whether user review is still needed |
| `last_changed_by` | enum | yes | `system`, `user`, `agent` |
| `user_feedback` | enum | no | User feedback: `up`, `down`, `null` |
| `created_at` | datetime | yes | Creation timestamp |
| `updated_at` | datetime | yes | Updated timestamp |

### Validation rules
- `field_name` must be unique per `session_id + field_kind`.
- `is_locked=true` prevents automatic overwrite.
- `provenance=manual_override` implies `is_locked=true`.
- `has_conflict=true` requires at least one competing candidate record.
- Fuzzy/applied inferred values must keep `needs_review=true` until confirmed if policy requires explicit review.

#### `FieldKind`
- `column`
- `metric`
- `filter_dimension`
- `parameter`

#### `FieldProvenance`
- `dictionary_exact`
- `reference_imported`
- `fuzzy_inferred`
- `ai_generated`
- `manual_override`
- `unresolved`

---

### Entity: `SemanticCandidate`

Stores competing candidate values before or alongside final field decision.

| Field | Type | Required | Description |
|---|---|---:|---|
| `candidate_id` | string (UUID) | yes | Unique candidate id |
| `field_id` | string | yes | Parent semantic field |
| `source_id` | string \| null | no | Candidate source |
| `candidate_rank` | integer | yes | Lower is stronger |
| `match_type` | enum | yes | Exact, imported, fuzzy, generated |
| `confidence_score` | number | yes | Normalized score |
| `proposed_verbose_name` | string \| null | no | Candidate verbose name |
| `proposed_description` | text \| null | no | Candidate description |
| `proposed_display_format` | string \| null | no | Candidate display format |
| `status` | enum | yes | Candidate lifecycle |
| `created_at` | datetime | yes | Creation timestamp |

#### `CandidateMatchType`
- `exact`
- `reference`
- `fuzzy`
- `generated`

#### `CandidateStatus`
- `proposed`
- `accepted`
- `rejected`
- `superseded`

---

## 5. Imported Filters and Runtime Variables

### Entity: `ImportedFilter`

Represents one recovered or user-supplied filter value.

| Field | Type | Required | Description |
|---|---|---:|---|
| `filter_id` | string (UUID) | yes | Unique filter id |
| `session_id` | string | yes | Parent session |
| `filter_name` | string | yes | Source filter name |
| `display_name` | string \| null | no | User-facing label |
| `raw_value` | json | yes | Original recovered value |
| `raw_value_masked` | boolean | yes | Whether the stored or exposed raw value has been masked/redacted for assistant or LLM-facing use |
| `normalized_value` | json \| null | no | Optional transformed value |
| `source` | enum | yes | Origin of the filter |
| `confidence_state` | enum | yes | Confidence/provenance class |
| `requires_confirmation` | boolean | yes | Whether explicit review is needed |
| `recovery_status` | enum | yes | Recovery completeness |
| `notes` | text \| null | no | Recovery explanation |
| `created_at` | datetime | yes | Creation timestamp |
| `updated_at` | datetime | yes | Updated timestamp |

#### `FilterSource`
- `superset_native`
- `superset_url`
- `manual`
- `inferred`

#### `FilterConfidenceState`
- `confirmed`
- `imported`
- `inferred`
- `ai_draft`
- `unresolved`

#### `FilterRecoveryStatus`
- `recovered`
- `partial`
- `missing`
- `conflicted`

### Validation rules
- `raw_value` may be stored for audit and replay, but any context passed into assistant or LLM-facing orchestration must use a masked/redacted representation when the value may contain PII or other sensitive identifiers.
- `raw_value_masked=true` is required whenever the exported assistant context omits or redacts sensitive substrings from the original filter payload.
- Masking policy must preserve enough structure for mapping and clarification, for example key shape, value type, cardinality hints, and non-sensitive tokens.

---

### Entity: `TemplateVariable`

Represents a runtime variable discovered from dataset execution logic.

| Field | Type | Required | Description |
|---|---|---:|---|
| `variable_id` | string (UUID) | yes | Unique variable id |
| `session_id` | string | yes | Parent session |
| `variable_name` | string | yes | Canonical runtime variable name |
| `expression_source` | text | yes | Raw expression or snippet where variable was found |
| `variable_kind` | enum | yes | Detected variable class |
| `is_required` | boolean | yes | Whether launch requires a mapped value |
| `default_value` | json \| null | no | Optional default |
| `mapping_status` | enum | yes | Current mapping state |
| `created_at` | datetime | yes | Creation timestamp |
| `updated_at` | datetime | yes | Updated timestamp |

#### `VariableKind`
- `native_filter`
- `parameter`
- `derived`
- `unknown`

#### `MappingStatus`
- `unmapped`
- `proposed`
- `approved`
- `overridden`
- `invalid`

---

## 6. Mapping Review and Warning Approvals

### Entity: `ExecutionMapping`

Represents the mapping between a recovered filter and a runtime variable.

| Field | Type | Required | Description |
|---|---|---:|---|
| `mapping_id` | string (UUID) | yes | Unique mapping id |
| `session_id` | string | yes | Parent session |
| `filter_id` | string | yes | Source imported filter |
| `variable_id` | string | yes | Target template variable |
| `mapping_method` | enum | yes | How mapping was produced |
| `raw_input_value` | json | yes | Original input |
| `effective_value` | json \| null | no | Value to send to preview/launch |
| `transformation_note` | text \| null | no | Explanation of normalization |
| `warning_level` | enum \| null | no | Warning classification if transformation is risky |
| `requires_explicit_approval` | boolean | yes | Whether launch gate applies |
| `approval_state` | enum | yes | Approval lifecycle |
| `approved_by_user_id` | string \| null | no | Approver if approved |
| `approved_at` | datetime \| null | no | Approval timestamp |
| `created_at` | datetime | yes | Creation timestamp |
| `updated_at` | datetime | yes | Updated timestamp |

### Validation rules
- `filter_id + variable_id` must be unique per session unless versioning is used.
- `requires_explicit_approval=true` implies launch is blocked while `approval_state != approved`.
- `effective_value` is required before preview when variable is required.
- user override should set `mapping_method=manual_override`.

#### `MappingMethod`
- `direct_match`
- `heuristic_match`
- `semantic_match`
- `manual_override`

#### `MappingWarningLevel`
- `low`
- `medium`
- `high`

#### `ApprovalState`
- `pending`
- `approved`
- `rejected`
- `not_required`

---

## 7. Clarification Workflow

### Entity: `ClarificationSession`

Stores resumable clarification flow state for one review session.

| Field | Type | Required | Description |
|---|---|---:|---|
| `clarification_session_id` | string (UUID) | yes | Unique clarification session id |
| `session_id` | string | yes | Parent review session |
| `status` | enum | yes | Clarification lifecycle |
| `current_question_id` | string \| null | no | Current active question |
| `resolved_count` | integer | yes | Count of answered/resolved items |
| `remaining_count` | integer | yes | Count of unresolved items |
| `summary_delta` | text \| null | no | Human-readable change summary |
| `started_at` | datetime | yes | Start time |
| `updated_at` | datetime | yes | Last update |
| `completed_at` | datetime \| null | no | End time |

#### `ClarificationStatus`
- `pending`
- `active`
- `paused`
- `completed`
- `cancelled`

---

### Entity: `ClarificationQuestion`

Represents one focused question in the clarification flow.

| Field | Type | Required | Description |
|---|---|---:|---|
| `question_id` | string (UUID) | yes | Unique question id |
| `clarification_session_id` | string | yes | Parent clarification session |
| `topic_ref` | string | yes | Related field/finding/mapping id |
| `question_text` | text | yes | Focused question |
| `why_it_matters` | text | yes | Business significance explanation |
| `current_guess` | text \| null | no | Best guess if available |
| `priority` | integer | yes | Order score |
| `state` | enum | yes | Question lifecycle |
| `created_at` | datetime | yes | Creation timestamp |
| `updated_at` | datetime | yes | Updated timestamp |

#### `QuestionState`
- `open`
- `answered`
- `skipped`
- `expert_review`
- `superseded`

---

### Entity: `ClarificationOption`

Suggested selectable answer option for a question.

| Field | Type | Required | Description |
|---|---|---:|---|
| `option_id` | string (UUID) | yes | Unique option id |
| `question_id` | string | yes | Parent question |
| `label` | string | yes | UI label |
| `value` | string | yes | Stored answer payload |
| `is_recommended` | boolean | yes | Whether this is the recommended option |
| `display_order` | integer | yes | UI ordering |

---

### Entity: `ClarificationAnswer`

Stores user response to one clarification question.

| Field | Type | Required | Description |
|---|---|---:|---|
| `answer_id` | string (UUID) | yes | Unique answer id |
| `question_id` | string | yes | Parent question |
| `answer_kind` | enum | yes | How user responded |
| `answer_value` | text \| null | no | Selected/custom answer |
| `answered_by_user_id` | string | yes | Responding user |
| `impact_summary` | text \| null | no | Optional summary of resulting state changes |
| `created_at` | datetime | yes | Answer timestamp |

#### `AnswerKind`
- `selected`
- `custom`
- `skipped`
- `expert_review`

### Validation rules
- Each active question may have at most one current answer.
- `custom` answers require non-empty `answer_value`.
- `selected` answers must correspond to a valid option or normalized payload.
- `expert_review` leaves the related topic unresolved but marked intentionally deferred.

---

## 8. Preview and Launch Audit

### Entity: `CompiledPreview`

Stores the exact Superset-returned compiled SQL preview.

| Field | Type | Required | Description |
|---|---|---:|---|
| `preview_id` | string (UUID) | yes | Unique preview id |
| `session_id` | string | yes | Parent session |
| `preview_status` | enum | yes | Preview lifecycle state |
| `compiled_sql` | text \| null | no | Exact compiled SQL if successful |
| `preview_fingerprint` | string | yes | Snapshot hash of mapping/inputs used |
| `compiled_by` | enum | yes | Must be `superset` |
| `error_code` | string \| null | no | Optional failure code |
| `error_details` | text \| null | no | Readable preview error |
| `compiled_at` | datetime \| null | no | Successful compile timestamp |
| `created_at` | datetime | yes | Record creation timestamp |

### Validation rules
- `compiled_by` must be `superset`.
- `compiled_sql` is required when `preview_status=ready`.
- `compiled_sql` must be null when `preview_status=failed` unless partial diagnostics are intentionally stored elsewhere.
- `preview_fingerprint` must be compared against current session inputs before launch.
- Launch requires `preview_status=ready` and matching current fingerprint.

#### `PreviewStatus`
- `pending`
- `ready`
- `failed`
- `stale`

---

### Entity: `DatasetRunContext`

Audited execution snapshot created at launch.

| Field | Type | Required | Description |
|---|---|---:|---|
| `run_context_id` | string (UUID) | yes | Unique run context id |
| `session_id` | string | yes | Parent review session |
| `dataset_ref` | string | yes | Canonical dataset identity |
| `environment_id` | string | yes | Execution environment |
| `preview_id` | string | yes | Bound compiled preview |
| `sql_lab_session_ref` | string | yes | Canonical SQL Lab reference |
| `effective_filters` | json | yes | Final filter payload |
| `template_params` | json | yes | Final template parameter object |
| `approved_mapping_ids` | json array | yes | Explicit approvals used for launch |
| `semantic_decision_refs` | json array | yes | Applied semantic decision references |
| `open_warning_refs` | json array | yes | Warnings that remained visible at launch |
| `launch_status` | enum | yes | Launch outcome |
| `launch_error` | text \| null | no | Error if launch failed |
| `created_at` | datetime | yes | Launch record timestamp |

### Validation rules
- `preview_id` must reference a `CompiledPreview` with `ready` status.
- `sql_lab_session_ref` is mandatory for successful launch.
- `effective_filters` and `template_params` must match the preview fingerprint used.
- `launch_status=started` or `success` requires a non-empty SQL Lab reference.

#### `LaunchStatus`
- `started`
- `success`
- `failed`

---

## 9. Export Projections

### Entity: `ExportArtifact`

Tracks generated exports for sharing documentation and validation outputs.

| Field | Type | Required | Description |
|---|---|---:|---|
| `artifact_id` | string (UUID) | yes | Unique artifact id |
| `session_id` | string | yes | Parent session |
| `artifact_type` | enum | yes | Export type |
| `format` | enum | yes | File/output format |
| `storage_ref` | string | yes | Storage/file reference |
| `created_by_user_id` | string | yes | Requesting user |
| `created_at` | datetime | yes | Artifact creation time |

#### `ArtifactType`
- `documentation`
- `validation_report`
- `run_summary`

#### `ArtifactFormat`
- `json`
- `markdown`
- `csv`
- `pdf`

---

## 10. Relationships

## One-to-one / aggregate-root relationships
- `DatasetReviewSession` → `DatasetProfile` (current active profile view)
- `DatasetReviewSession` → `ClarificationSession` (current or latest)
- `DatasetReviewSession` → `CompiledPreview` (latest/current preview)
- `DatasetReviewSession` → `DatasetRunContext` (latest/current launch audit)

## One-to-many relationships
- `DatasetReviewSession` → many `ValidationFinding`
- `DatasetReviewSession` → many `SemanticSource`
- `DatasetReviewSession` → many `SemanticFieldEntry`
- `SemanticFieldEntry` → many `SemanticCandidate`
- `DatasetReviewSession` → many `ImportedFilter`
- `DatasetReviewSession` → many `TemplateVariable`
- `DatasetReviewSession` → many `ExecutionMapping`
- `ClarificationSession` → many `ClarificationQuestion`
- `ClarificationQuestion` → many `ClarificationOption`
- `ClarificationQuestion` → zero/one current `ClarificationAnswer`
- `DatasetReviewSession` → many `ExportArtifact`
- `DatasetReviewSession` → many `SessionEvent`
- `DatasetReviewSession` → many `SessionEvent`

---

## 11. Derived Rules and Invariants

### Run readiness invariant
A session is `run_ready` only if:
- no open blocking findings remain,
- all required template variables have approved/effective mappings,
- all launch-sensitive mapping warnings have been explicitly approved,
- a non-stale `CompiledPreview` exists for the current fingerprint.

### Manual intent invariant
If a field is manually overridden:
- `SemanticFieldEntry.is_locked = true`
- `SemanticFieldEntry.provenance = manual_override`
- later imports or inferred candidates may be recorded, but cannot replace the active value automatically.

### Progressive recovery invariant
Partial Superset recovery must preserve usable state:
- imported filters may be `partial`,
- unresolved variables may remain `unmapped`,
- findings must explain what is still missing,
- session remains resumable.

### Clarification persistence invariant
Clarification answers must be persisted before:
- finding severity is downgraded,
- profile state is updated,
- current question pointer advances.

### Preview truth invariant
Compiled preview must be:
- generated by Superset,
- tied to the exact current effective inputs,
- treated as invalid if mappings/values change afterward.

---

## 12. Migration & Evolution Strategy
- **Baseline**: The initial implementation (Milestone 1) will include the core session and profile entities.
- **Incremental Growth**: Subsequent milestones will add clarification, mapping, and launch audit entities via standard SQLAlchemy migrations.
- **Compatibility**: The `DatasetReviewSession` aggregate root will remain the stable entry point for all sub-entities to ensure forward compatibility with saved user state.

## 13. Suggested Backend DTO Grouping

The future API and persistence layers should group models roughly as follows:

### Session DTOs
- `SessionSummary`
- `SessionDetail`
- `SessionListItem`

`SessionSummary` and `SessionDetail` should both surface the current `version` so frontend workspace state, collaborator actions, and assistant-driven mutations can use the same optimistic-lock boundary.

### Review DTOs
- `DatasetProfileDto`
- `ValidationFindingDto`
- `ReadinessChecklistDto`

### Semantic DTOs
- `SemanticSourceDto`
- `SemanticFieldEntryDto`
- `SemanticCandidateDto`

### Clarification DTOs
- `ClarificationSessionDto`
- `ClarificationQuestionDto`
- `ClarificationAnswerRequest`

### Execution DTOs
- `ImportedFilterDto`
- `TemplateVariableDto`
- `ExecutionMappingDto`
- `CompiledPreviewDto`
- `LaunchSummaryDto`

### Export DTOs
- `ExportArtifactDto`

---

## 13. Open Modeling Notes Resolved

The Phase 0 research questions are considered resolved for design purposes:
- SQL preview is modeled as a first-class persisted artifact.
- SQL Lab is modeled as the only canonical launch target.
- semantic resolution and clarification are modeled as separate domain boundaries.
- field-level overrides and mapping approvals are first-class entities.
- session persistence is separate from task execution state.

This model is ready to drive:
- [`contracts/modules.md`](./contracts/modules.md)
- [`contracts/api.yaml`](./contracts/api.yaml)
- [`quickstart.md`](./quickstart.md)