Files

2026-03-16 23:11:19 +03:00

24 KiB

Raw Blame History

Semantic Module Contracts: LLM Dataset Orchestration

Feature: LLM Dataset Orchestration
Branch: 027-dataset-llm-orchestration

This document defines the semantic contracts for the core components of the Dataset LLM Orchestration feature, following the GRACE-Poly Standard.

1. Backend Modules

[DEF:DatasetReviewOrchestrator:Module]

@COMPLEXITY: 5

@PURPOSE: Coordinate the full dataset review session lifecycle across intake, recovery, semantic review, clarification, mapping review, preview generation, and launch.

@LAYER: Domain

@RELATION: [DEPENDS_ON] ->[DatasetReviewSessionRepository]

@RELATION: [DEPENDS_ON] ->[SemanticSourceResolver]

@RELATION: [DEPENDS_ON] ->[ClarificationEngine]

@RELATION: [DEPENDS_ON] ->[SupersetContextExtractor]

@RELATION: [DEPENDS_ON] ->[SupersetCompilationAdapter]

@RELATION: [DEPENDS_ON] ->[TaskManager]

@PRE: session mutations must execute inside a persisted session boundary scoped to one authenticated user.

@POST: state transitions are persisted atomically and emit observable progress for long-running steps.

@SIDE_EFFECT: creates task records, updates session aggregates, triggers upstream Superset calls, persists audit artifacts.

@DATA_CONTRACT: Input[SessionCommand] -> Output[DatasetReviewSession | CompiledPreview | DatasetRunContext]

@INVARIANT: Launch is blocked unless a current session has no open blocking findings, all launch-sensitive mappings are approved, and a non-stale Superset-generated compiled preview matches the current input fingerprint.

@TEST_CONTRACT: start_or_resume_session -> returns persisted session shell with recommended next action

@TEST_SCENARIO: launch_gate_blocks_stale_preview -> launch rejected when preview fingerprint no longer matches current mapping inputs

@TEST_EDGE: missing_dataset_ref -> blocking failure

@TEST_EDGE: stale_preview -> blocking failure

@TEST_EDGE: sql_lab_launch_failure -> terminal failed launch state with audit record

@TEST_INVARIANT: launch_gate -> VERIFIED_BY: [launch_gate_blocks_stale_preview]

ƒ start_session

@PURPOSE: Initialize a new session from a Superset link or dataset selection and trigger context recovery.

@PRE: source input is non-empty and environment is accessible.

@POST: session exists in persisted storage with intake/recovery state and task linkage when async work is required.

@SIDE_EFFECT: persists session and may enqueue recovery task.

ƒ apply_semantic_source

@PURPOSE: Apply a selected semantic source and update field-level candidate/decision state.

@PRE: source exists and session is not terminal.

@POST: semantic field entries and findings reflect selected-source outcomes without overwriting locked manual values.

@SIDE_EFFECT: updates semantic decisions and conflict findings.

ƒ record_clarification_answer

@PURPOSE: Persist one clarification answer and re-evaluate profile, findings, and readiness.

@PRE: target question belongs to the session’s active clarification session.

@POST: answer is saved before current-question pointer advances.

@SIDE_EFFECT: updates clarification and finding state.

ƒ prepare_launch_preview

@PURPOSE: Assemble effective execution inputs and trigger Superset-side preview compilation.

@PRE: all required variables have candidate values or explicitly accepted defaults.

@POST: returns preview artifact in pending, ready, failed, or stale state.

@SIDE_EFFECT: persists preview attempt and upstream compilation diagnostics.

ƒ launch_dataset

@PURPOSE: Start the approved dataset execution through SQL Lab and persist run context for audit/replay.

@PRE: session is run-ready and compiled preview is current.

@POST: returns persisted run context with SQL Lab session reference and launch outcome.

@SIDE_EFFECT: creates SQL Lab execution session and audit snapshot.

[/DEF:DatasetReviewOrchestrator:Module]

[DEF:DatasetReviewSessionRepository:Module]

@COMPLEXITY: 5

@PURPOSE: Persist and retrieve dataset review session aggregates, including readiness, findings, semantic decisions, clarification state, previews, and run contexts.

@LAYER: Domain

@RELATION: [DEPENDS_ON] ->[DatasetReviewSession]

@RELATION: [DEPENDS_ON] ->[DatasetProfile]

@RELATION: [DEPENDS_ON] ->[ValidationFinding]

@RELATION: [DEPENDS_ON] ->[CompiledPreview]

@PRE: repository operations execute within authenticated request or task scope.

@POST: session aggregate reads are structurally consistent and writes preserve ownership and version semantics.

@SIDE_EFFECT: reads/writes application persistence layer.

@DATA_CONTRACT: Input[SessionMutation] -> Output[PersistedSessionAggregate]

@INVARIANT: answers, mapping approvals, preview artifacts, and launch snapshots are never attributed to the wrong user or session.

@TEST_CONTRACT: save_then_resume -> persisted session can be reopened without losing semantic/manual/clarification state

@TEST_SCENARIO: resume_session_preserves_manual_overrides -> locked semantic fields remain active after reload

@TEST_EDGE: foreign_user_access -> rejected

@TEST_EDGE: missing_session -> not found

@TEST_EDGE: partial_preview_snapshot -> preserved but not marked launchable

@TEST_INVARIANT: ownership_scope -> VERIFIED_BY: [foreign_user_access]

ƒ create_session

@PURPOSE: Persist initial session shell.

ƒ load_session_detail

@PURPOSE: Return the full session aggregate for API/frontend use.

ƒ save_profile_and_findings

@PURPOSE: Persist profile and validation state together.

ƒ save_preview

@PURPOSE: Persist compiled preview attempt and mark older fingerprints stale.

ƒ save_run_context

@PURPOSE: Persist immutable launch audit snapshot.

[/DEF:DatasetReviewSessionRepository:Module]

[DEF:SemanticSourceResolver:Module]

@COMPLEXITY: 4

@PURPOSE: Resolve, rank, and apply semantic metadata candidates from files, connected dictionaries, reference datasets, and AI generation fallback.

@LAYER: Domain

@RELATION: [DEPENDS_ON] ->[LLMProviderService]

@RELATION: [DEPENDS_ON] ->[SemanticSource]

@RELATION: [DEPENDS_ON] ->[SemanticFieldEntry]

@RELATION: [DEPENDS_ON] ->[SemanticCandidate]

@PRE: selected source and target field set must be known.

@POST: candidate ranking follows the configured confidence hierarchy and unresolved fuzzy matches remain reviewable.

@SIDE_EFFECT: may create conflict findings and semantic candidate records.

@INVARIANT: Manual overrides are never silently replaced by imported, inferred, or AI-generated values.

@TEST_CONTRACT: rank_candidates -> exact dictionary beats reference import beats fuzzy beats AI draft

@TEST_SCENARIO: manual_lock_survives_reimport -> locked field remains active after another source is applied

@TEST_EDGE: malformed_source_payload -> failed source application with explanatory finding

@TEST_EDGE: conflicting_sources -> conflict state preserved for review

@TEST_EDGE: no_trusted_matches -> AI draft fallback only

@TEST_INVARIANT: confidence_hierarchy -> VERIFIED_BY: [rank_candidates]

ƒ resolve_from_file

@PURPOSE: Normalize uploaded semantic file records into field-level candidates.

ƒ resolve_from_dictionary

@PURPOSE: Resolve candidates from connected tabular dictionary sources.

ƒ resolve_from_reference_dataset

@PURPOSE: Reuse semantic metadata from trusted Superset datasets.

ƒ rank_candidates

@PURPOSE: Apply confidence ordering and determine best candidate per field.

ƒ detect_conflicts

@PURPOSE: Mark competing candidate sets that require explicit user review.

ƒ apply_field_decision

@PURPOSE: Accept, reject, or manually override a field-level semantic value.

[/DEF:SemanticSourceResolver:Module]

[DEF:ClarificationEngine:Module]

@COMPLEXITY: 4

@PURPOSE: Manage one-question-at-a-time clarification sessions, including prioritization, answer persistence, and readiness impact updates.

@LAYER: Domain

@RELATION: [DEPENDS_ON] ->[ClarificationSession]

@RELATION: [DEPENDS_ON] ->[ClarificationQuestion]

@RELATION: [DEPENDS_ON] ->[ClarificationAnswer]

@RELATION: [DEPENDS_ON] ->[ValidationFinding]

@PRE: target session contains unresolved or contradictory review state.

@POST: every recorded answer updates the clarification session and associated session state deterministically.

@SIDE_EFFECT: creates clarification questions, persists answers, updates findings/profile state.

@INVARIANT: Clarification answers are persisted before the current question pointer or readiness state is advanced.

@TEST_CONTRACT: next_question_selection -> returns only one highest-priority unresolved question at a time

@TEST_SCENARIO: save_and_resume_clarification -> reopening session restores current question and prior answers

@TEST_EDGE: skipped_question -> unresolved topic remains visible

@TEST_EDGE: expert_review_marked -> topic deferred without false resolution

@TEST_EDGE: duplicate_answer_submission -> idempotent or rejected deterministically

@TEST_INVARIANT: single_active_question -> VERIFIED_BY: [next_question_selection]

ƒ start_or_resume

@PURPOSE: Open clarification mode on the highest-priority unresolved question.

ƒ build_question_payload

@PURPOSE: Return question, why-it-matters text, current guess, and suggested options.

ƒ record_answer

@PURPOSE: Persist one answer and compute state impact.

ƒ summarize_progress

@PURPOSE: Produce the clarification change summary shown on exit or pause.

[/DEF:ClarificationEngine:Module]

[DEF:SupersetContextExtractor:Module]

@COMPLEXITY: 4

@LAYER: Infra

@RELATION: [CALLS] ->[SupersetClient]

@RELATION: [DEPENDS_ON] ->[ImportedFilter]

@RELATION: [DEPENDS_ON] ->[TemplateVariable]

@PRE: Superset link or dataset reference must be parseable enough to resolve an environment-scoped target resource.

@POST: returns the best available recovered context with explicit provenance and partial-recovery markers when necessary.

@SIDE_EFFECT: performs upstream Superset API reads.

@INVARIANT: Partial recovery is surfaced explicitly and never misrepresented as fully confirmed context.

@TEST_CONTRACT: recover_context_from_link -> output distinguishes URL-derived, native-filter-derived, and unresolved context

@TEST_SCENARIO: partial_filter_recovery_marks_recovery_required -> session remains usable but not falsely complete

@TEST_EDGE: unsupported_link_shape -> intake failure with actionable finding

@TEST_EDGE: dataset_without_filters -> successful dataset recovery with empty imported filter set

@TEST_EDGE: missing_dashboard_binding -> partial recovery only

@TEST_INVARIANT: provenance_visibility -> VERIFIED_BY: [recover_context_from_link]

ƒ parse_superset_link

@PURPOSE: Extract candidate identifiers and query state from supported Superset URLs.

ƒ recover_imported_filters

@PURPOSE: Build imported filter entries from URL state and Superset-side saved context.

ƒ discover_template_variables

@PURPOSE: Detect runtime variables and Jinja references from dataset query-bearing fields.

ƒ build_recovery_summary

@PURPOSE: Summarize recovered, partial, and unresolved context for session state and UX.

[/DEF:SupersetContextExtractor:Module]

[DEF:SupersetCompilationAdapter:Module]

@COMPLEXITY: 4

@PURPOSE: Interact with Superset preview compilation and SQL Lab execution endpoints using the current approved execution context.

@LAYER: Infra

@RELATION: [CALLS] ->[SupersetClient]

@RELATION: [DEPENDS_ON] ->[CompiledPreview]

@RELATION: [DEPENDS_ON] ->[DatasetRunContext]

@PRE: effective template params and dataset execution reference are available.

@POST: preview and launch calls return Superset-originated artifacts or explicit errors.

@SIDE_EFFECT: performs upstream Superset preview and SQL Lab calls.

@INVARIANT: The adapter never fabricates compiled SQL locally; preview truth is delegated to Superset only.

@TEST_CONTRACT: compile_then_launch -> launch uses the same effective input fingerprint verified in preview

@TEST_SCENARIO: preview_failure_blocks_launch -> no SQL Lab session is created after failed preview

@TEST_EDGE: compilation_endpoint_error -> failed preview artifact with readable diagnostics

@TEST_EDGE: sql_lab_creation_error -> failed launch audit state

@TEST_EDGE: fingerprint_mismatch -> launch rejected

@TEST_INVARIANT: superset_truth_source -> VERIFIED_BY: [preview_failure_blocks_launch]

ƒ compile_preview

@PURPOSE: Request Superset-side compiled SQL preview for the current effective inputs.

ƒ mark_preview_stale

@PURPOSE: Invalidate previous preview after mapping or value changes.

ƒ create_sql_lab_session

@PURPOSE: Create the canonical audited execution session after all launch gates pass.

[/DEF:SupersetCompilationAdapter:Module]

2. Frontend Components

ƒ handleSourceSubmit

ƒ handleResumeSession

ƒ handleLaunch

ƒ submitSupersetLink

ƒ submitDatasetSelection

ƒ groupFindingsBySeverity

ƒ jumpToFindingTarget

ƒ applyManualOverride

ƒ applyCandidateSelection

ƒ submitAnswer

ƒ skipQuestion

ƒ pauseClarification

ƒ approveMapping

ƒ overrideMappingValue

ƒ requestPreview

ƒ showPreviewErrorTarget

ƒ buildLaunchSummary

ƒ confirmLaunch

3. Contract Coverage Notes

The feature requires:

dedicated semantic resolution contracts instead of hiding source-ranking logic inside orchestration,
a first-class clarification engine because guided ambiguity resolution is a persisted workflow, not a simple endpoint,
a Superset extraction boundary distinct from preview/launch behavior,
UI contracts that cover the UX state machine rather than only the happy path.

These contracts are intended to align directly with:

24 KiB Raw Blame History Unescape Escape

Semantic Module Contracts: LLM Dataset Orchestration

1. Backend Modules

[DEF:DatasetReviewOrchestrator:Module]

@COMPLEXITY: 5

@PURPOSE: Coordinate the full dataset review session lifecycle across intake, recovery, semantic review, clarification, mapping review, preview generation, and launch.

@LAYER: Domain

@RELATION: [DEPENDS_ON] ->[DatasetReviewSessionRepository]

@RELATION: [DEPENDS_ON] ->[SemanticSourceResolver]

@RELATION: [DEPENDS_ON] ->[ClarificationEngine]

@RELATION: [DEPENDS_ON] ->[SupersetContextExtractor]

@RELATION: [DEPENDS_ON] ->[SupersetCompilationAdapter]

@RELATION: [DEPENDS_ON] ->[TaskManager]

@PRE: session mutations must execute inside a persisted session boundary scoped to one authenticated user.

@POST: state transitions are persisted atomically and emit observable progress for long-running steps.

@SIDE_EFFECT: creates task records, updates session aggregates, triggers upstream Superset calls, persists audit artifacts.

@DATA_CONTRACT: Input[SessionCommand] -> Output[DatasetReviewSession | CompiledPreview | DatasetRunContext]

@INVARIANT: Launch is blocked unless a current session has no open blocking findings, all launch-sensitive mappings are approved, and a non-stale Superset-generated compiled preview matches the current input fingerprint.

@TEST_CONTRACT: start_or_resume_session -> returns persisted session shell with recommended next action

@TEST_SCENARIO: launch_gate_blocks_stale_preview -> launch rejected when preview fingerprint no longer matches current mapping inputs

@TEST_EDGE: missing_dataset_ref -> blocking failure

@TEST_EDGE: stale_preview -> blocking failure

@TEST_EDGE: sql_lab_launch_failure -> terminal failed launch state with audit record

@TEST_INVARIANT: launch_gate -> VERIFIED_BY: [launch_gate_blocks_stale_preview]

ƒ start_session

@PURPOSE: Initialize a new session from a Superset link or dataset selection and trigger context recovery.

@PRE: source input is non-empty and environment is accessible.

@POST: session exists in persisted storage with intake/recovery state and task linkage when async work is required.

@SIDE_EFFECT: persists session and may enqueue recovery task.

ƒ apply_semantic_source

@PURPOSE: Apply a selected semantic source and update field-level candidate/decision state.

@PRE: source exists and session is not terminal.

@POST: semantic field entries and findings reflect selected-source outcomes without overwriting locked manual values.

@SIDE_EFFECT: updates semantic decisions and conflict findings.

ƒ record_clarification_answer

@PURPOSE: Persist one clarification answer and re-evaluate profile, findings, and readiness.

@PRE: target question belongs to the session’s active clarification session.

@POST: answer is saved before current-question pointer advances.

@SIDE_EFFECT: updates clarification and finding state.

ƒ prepare_launch_preview

@PURPOSE: Assemble effective execution inputs and trigger Superset-side preview compilation.

@PRE: all required variables have candidate values or explicitly accepted defaults.

@POST: returns preview artifact in pending, ready, failed, or stale state.

@SIDE_EFFECT: persists preview attempt and upstream compilation diagnostics.

ƒ launch_dataset

@PURPOSE: Start the approved dataset execution through SQL Lab and persist run context for audit/replay.

@PRE: session is run-ready and compiled preview is current.

@POST: returns persisted run context with SQL Lab session reference and launch outcome.

@SIDE_EFFECT: creates SQL Lab execution session and audit snapshot.

[/DEF:DatasetReviewOrchestrator:Module]

[DEF:DatasetReviewSessionRepository:Module]

@COMPLEXITY: 5

@PURPOSE: Persist and retrieve dataset review session aggregates, including readiness, findings, semantic decisions, clarification state, previews, and run contexts.

@LAYER: Domain

@RELATION: [DEPENDS_ON] ->[DatasetReviewSession]

@RELATION: [DEPENDS_ON] ->[DatasetProfile]

@RELATION: [DEPENDS_ON] ->[ValidationFinding]

@RELATION: [DEPENDS_ON] ->[CompiledPreview]

@PRE: repository operations execute within authenticated request or task scope.

@POST: session aggregate reads are structurally consistent and writes preserve ownership and version semantics.

@SIDE_EFFECT: reads/writes application persistence layer.

@DATA_CONTRACT: Input[SessionMutation] -> Output[PersistedSessionAggregate]

@INVARIANT: answers, mapping approvals, preview artifacts, and launch snapshots are never attributed to the wrong user or session.

@TEST_CONTRACT: save_then_resume -> persisted session can be reopened without losing semantic/manual/clarification state

@TEST_SCENARIO: resume_session_preserves_manual_overrides -> locked semantic fields remain active after reload

@TEST_EDGE: foreign_user_access -> rejected

@TEST_EDGE: missing_session -> not found

@TEST_EDGE: partial_preview_snapshot -> preserved but not marked launchable

@TEST_INVARIANT: ownership_scope -> VERIFIED_BY: [foreign_user_access]

ƒ create_session

@PURPOSE: Persist initial session shell.

ƒ load_session_detail

@PURPOSE: Return the full session aggregate for API/frontend use.

ƒ save_profile_and_findings

@PURPOSE: Persist profile and validation state together.

ƒ save_preview

@PURPOSE: Persist compiled preview attempt and mark older fingerprints stale.

ƒ save_run_context

@PURPOSE: Persist immutable launch audit snapshot.

[/DEF:DatasetReviewSessionRepository:Module]

24 KiB

Raw Blame History