feat(ui): add chat-driven dataset review flow

Move dataset review clarification into the assistant workspace and rework the review page into a chat-centric layout with execution rails. Add session-scoped assistant actions for mappings, semantic fields, and SQL preview generation. Introduce optimistic locking for dataset review mutations, propagate session versions through API responses, and mask imported filter values before assistant exposure. Refresh tests, i18n, and spec artifacts to match the new workflow. BREAKING CHANGE: dataset review mutation endpoints now require the X-Session-Version header, and clarification is no longer handled through ClarificationDialog-based flows
2026-03-26 13:33:12 +03:00
parent d7911fb2f1
commit 7c85552132
74 changed files with 6122 additions and 2970 deletions
--- a/specs/027-dataset-llm-orchestration/tasks.md
+++ b/specs/027-dataset-llm-orchestration/tasks.md
@@ -5,105 +5,65 @@

 ---

-## Phase 1: Setup
+## Rebaseline Note

- [x] T001 Initialize backend service directory structure for `dataset_review` in `backend/src/services/dataset_review/`
- [x] T002 Initialize frontend component directory for `dataset-review` in `frontend/src/lib/components/dataset-review/`
- [x] T003 Register `ff_dataset_auto_review`, `ff_dataset_clarification`, and `ff_dataset_execution` feature flags in configuration
- [x] T004 [P] Seed new `DATASET_REVIEW_*` permissions in `backend/src/scripts/seed_permissions.py`
+This task list is rebaselined to the approved mixed-initiative assistant scope from `027-task.md`.
+
+Previously completed implementation checkboxes remain historical only. They are no longer treated as authoritative for feature acceptance until the codebase is aligned with the refreshed spec artifacts below.

 ---

-## Phase 2: Foundational Layer
+## Phase A: Spec Refresh Gate

- [x] T005 [P] Implement Core SQLAlchemy models for session, profile, and findings in `backend/src/models/dataset_review.py`
- [x] T006 [P] Implement Semantic, Mapping, and Clarification models in `backend/src/models/dataset_review.py`
- [x] T007 [P] Implement Preview and Launch Audit models in `backend/src/models/dataset_review.py`
- [x] T008 [P] Implement `DatasetReviewSessionRepository` (CRITICAL: C5, PRE: auth scope, POST: consistent aggregates, INVARIANTS: ownership scope) in `backend/src/services/dataset_review/repositories/session_repository.py`
- [x] T009 [P] Create Pydantic schemas for Session Summary and Detail in `backend/src/schemas/dataset_review.py`
- [x] T010 [P] Create Svelte store for session management in `frontend/src/lib/stores/datasetReviewSession.js`
+- [x] T001 Refresh `ux_reference.md` to replace modal clarification with `AssistantChatPanel` mixed-initiative behavior
+- [x] T002 Refresh `spec.md` to update FR-013 and add FR-045/FR-046
+- [x] T003 Refresh `contracts/api.yaml` to add `dataset_review_session_id` to `AssistantMessageRequest` and expose session version in session DTOs
+- [x] T004 Refresh `data-model.md` to add optimistic-lock `version` and PII masking requirements for `ImportedFilter.raw_value`
+- [x] T005 Refresh `contracts/modules.md` to route clarification through `AssistantChatPanel`, expose orchestrator state to `AssistantApi`, and retire `ClarificationDialog`

 ---

-## Phase 3: User Story 1 — Automatic Review (P1)
+## Phase B: Backend Rebaseline Work

-**Goal**: Submission of link/dataset produces immediate readable summary and semantic enrichment from trusted sources.
-
-**Independent Test**: Submit a Superset link; verify session created, summary generated, and findings populated without manual intervention.
-
- [X] T011 [P] [US1] Implement `StartSessionRequest` and lifecycle endpoints in `backend/src/api/routes/dataset_review.py`
- [X] T012 [US1] Implement `DatasetReviewOrchestrator.start_session` (CRITICAL: C5, PRE: non-empty input, POST: enqueued recovery, BELIEF: uses `belief_scope`) in `backend/src/services/dataset_review/orchestrator.py`
- [X] T013 [P] [US1] Implement `SupersetContextExtractor.parse_superset_link` (CRITICAL: C4, PRE: parseable link, POST: resolved target, REL: uses `SupersetClient`) in `backend/src/core/utils/superset_context_extractor.py`
- [X] T014 [US1] Implement `SemanticSourceResolver.resolve_from_dictionary` (CRITICAL: C4, PRE: source exists, POST: confidence-ranked candidates) in `backend/src/services/dataset_review/semantic_resolver.py`
- [X] T015 [US1] Implement Documentation and Validation export endpoints (JSON/Markdown) in `backend/src/api/routes/dataset_review.py`
- [X] T016 [P] [US1] Implement `SourceIntakePanel` (C3, UX_STATE: Idle/Validating/Rejected) in `frontend/src/lib/components/dataset-review/SourceIntakePanel.svelte`
- [X] T017 [P] [US1] Implement `ValidationFindingsPanel` (C3, UX_STATE: Blocking/Warning/Info) in `frontend/src/lib/components/dataset-review/ValidationFindingsPanel.svelte`
- [X] T018 [US1] Create main `DatasetReviewWorkspace` (CRITICAL: C5, UX_STATE: Empty/Importing/Review) in `frontend/src/routes/datasets/review/[id]/+page.svelte`
- [x] T019 [US1] Verify implementation matches ux_reference.md (Happy Path & Errors)
- [x] T020 [US1] Acceptance: Perform semantic audit & algorithm emulation by Tester
+- [ ] T006 Implement optimistic locking in `backend/src/services/dataset_review/repositories/session_repository.py` using `DatasetReviewSession.version` and conflict semantics
+- [ ] T007 Update dataset review session schemas and route handlers to require and return session version consistently
+- [ ] T008 Add `dataset_review_session_id` to `backend/src/api/routes/assistant.py::AssistantMessageRequest`
+- [ ] T009 Load dataset-review context into assistant planning flow when `dataset_review_session_id` is present
+- [ ] T010 Add assistant intent routing for dataset-review commands (`APPROVE_MAPPINGS`, `SET_FIELD_SEMANTICS`, `GENERATE_SQL_PREVIEW`) through `DatasetReviewOrchestrator`
+- [ ] T011 Add PII masking/redaction for imported filter values before assistant or LLM-facing context assembly
+- [ ] T012 Preserve backend mutation observability with semantic logging and conflict-safe mutation boundaries

 ---

-## Phase 4: User Story 2 — Guided Clarification (P2)
+## Phase C: Frontend Rebaseline Work

-**Goal**: Resolve ambiguities and conflicting metadata through one-question-at-a-time dialogue.
-
-**Independent Test**: Open a session with unresolved findings; answer questions one by one and verify readiness state updates in real-time.
-
- [X] T021 [P] [US2] Implement `ClarificationEngine.build_question_payload` (CRITICAL: C4, PRE: unresolved state, POST: prioritized question) in `backend/src/services/dataset_review/clarification_engine.py`
- [X] T022 [US2] Implement `ClarificationEngine.record_answer` (CRITICAL: C4, PRE: question active, POST: answer persisted before state advance) in `backend/src/services/dataset_review/clarification_engine.py`
- [X] T023 [P] [US2] Implement field-level semantic override and lock endpoints in `backend/src/api/routes/dataset_review.py`
- [X] T024 [US2] Implement `SemanticLayerReview` component (C3, UX_STATE: Conflicted/Manual) in `frontend/src/lib/components/dataset-review/SemanticLayerReview.svelte`
- [X] T025 [P] [US2] Implement `ClarificationDialog` (C3, UX_STATE: Question/Saving/Completed, REL: binds to `assistantChat`) in `frontend/src/lib/components/dataset-review/ClarificationDialog.svelte`
- [X] T026 [US2] Implement LLM feedback (👍/👎) storage and UI handlers in `backend/src/api/routes/dataset_review.py`
- [x] T027 [US2] Verify implementation matches ux_reference.md (Happy Path & Errors)
- [x] T028 [US2] Acceptance: Perform semantic audit & algorithm emulation by Tester
+- [ ] T013 Integrate `AssistantChatPanel` into `frontend/src/routes/datasets/review/[id]/+page.svelte` as the clarification surface
+- [ ] T014 Remove or retire `ClarificationDialog` usage from the dataset review workflow
+- [ ] T015 Add inline **[✨ Ask AI]** / **[✨ Improve]** triggers for findings, partial filters, mappings, and business summary surfaces
+- [ ] T016 Add workspace highlight synchronization when assistant prompts reference `field_id`, `filter_id`, mapping, or finding targets
+- [ ] T017 Render assistant confirmation cards for state-changing actions tied to dataset review sessions
+- [ ] T018 Update SQL preview refresh behavior so rapid mapping changes produce `stale` state and debounce/explicit regeneration behavior

 ---

-## Phase 5: User Story 3 — Controlled Execution (P3)
+## Phase D: Validation After Rebaseline

-**Goal**: Review mappings, generate Superset-side preview, and launch audited SQL Lab execution.
-
-**Independent Test**: Map filters to variables; trigger preview; verify launch blocked until preview succeeds; verify SQL Lab session creation.
-
- [X] T029 [P] [US3] Implement `SupersetContextExtractor.recover_imported_filters` and variable discovery in `backend/src/core/utils/superset_context_extractor.py`
- [X] T030 [US3] Implement `SupersetCompilationAdapter.compile_preview` (CRITICAL: C4, PRE: effective inputs available, POST: Superset-compiled SQL only) in `backend/src/core/utils/superset_compilation_adapter.py`
- [X] T031 [US3] Implement `DatasetReviewOrchestrator.launch_dataset` (CRITICAL: C5, PRE: run-ready + preview match, POST: audited run context) in `backend/src/services/dataset_review/orchestrator.py`
- [X] T032 [P] [US3] Implement mapping approval and preview trigger endpoints in `backend/src/api/routes/dataset_review.py`
- [X] T033 [P] [US3] Implement `ExecutionMappingReview` component (C3, UX_STATE: WarningApproval/Approved) in `frontend/src/lib/components/dataset-review/ExecutionMappingReview.svelte`
- [X] T034 [P] [US3] Implement `CompiledSQLPreview` component (C3, UX_STATE: Ready/Stale/Error) in `frontend/src/lib/components/dataset-review/CompiledSQLPreview.svelte`
- [X] T035 [US3] Implement `LaunchConfirmationPanel` (C3, UX_STATE: Blocked/Ready/Submitted) in `frontend/src/lib/components/dataset-review/LaunchConfirmationPanel.svelte`
- [x] T036 [US3] Verify implementation matches ux_reference.md (Happy Path & Errors)
- [x] T037 [US3] Acceptance: Perform semantic audit & algorithm emulation by Tester
-
---
-
-## Final Phase: Polish & Security
-
- [X] T038 Implement `SessionEvent` logger and persistence logic in `backend/src/services/dataset_review/event_logger.py`
- [X] T039 Implement automatic version propagation logic for updated `SemanticSource` entities
- [X] T040 Add batch approval API and UI actions for mapping/semantics
- [X] T041 Add integration tests for Superset version compatibility matrix in `backend/tests/services/dataset_review/test_superset_matrix.py`
- [X] T042 Final audit of RBAC enforcement across all session-mutation endpoints
- [X] T043 Verify i18n coverage for all user-facing strings in `frontend/src/lib/i18n/`
+- [ ] T019 Verify feature behavior against refreshed `ux_reference.md`, especially mixed-initiative clarification and context actions
+- [ ] T020 Verify WYSIWWR, Superset-only SQL compilation, session-version conflict handling, and PII masking boundaries
+- [ ] T021 Run semantic audit for updated 027 contracts before implementation handoff closure

 ---

 ## Dependencies & Strategy

-### Story Completion Order
-1. **Foundation** (Blocking: T005-T010)
-2. **User Story 1** (Blocking for US2 and US3)
-3. **User Story 2** (Can be implemented in parallel with US3 parts, but requires US1 findings)
-4. **User Story 3** (Final terminal action)
+### Delivery Order
+1. **Spec Refresh Gate**
+2. **Backend optimistic locking and assistant routing**
+3. **Frontend assistant-panel integration and contextual UX**
+4. **Validation and semantic audit**

-### Parallel Execution Opportunities
- T011, T013, T016 (API, Parser, UI Setup) can run simultaneously once T001-T010 are done.
- T021 and T025 (Clarification Backend/Frontend) can run in parallel.
- T030 and T034 (Preview Backend/Frontend) can run in parallel.
-
-### Implementation Strategy
- **MVP First**: Implement US1 with hardcoded trusted sources to prove the session/summary lifecycle.
- **Incremental Delivery**: Release US1 for documentation value, then US2 for metadata cleanup, finally US3 for execution.
- **WYSIWWR Guard**: T030 must never be compromised; if Superset API fails, implementation must prioritize the "Manual Launch" fallback defined in research.
+### Scope Invariants
+- Keep feature work bounded to `027-dataset-llm-orchestration` implementation surfaces.
+- Preserve **WYSIWWR** and Superset-only SQL compilation.
+- Do not allow assistant commands to bypass explicit approval or launch gates.
+- Keep i18n, Tailwind-first UI, `requestApi` / `fetchApi`, and `TaskManager` conventions intact.