mcp tuning

This commit is contained in:
2026-04-01 13:29:41 +03:00
parent 586229a974
commit 1e46073dd6
19 changed files with 1324 additions and 28593 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -1,44 +0,0 @@
# [DEF:Project_Map:Root]
# @COMPLEXITY: 3
# @PURPOSE: Canonical ownership record for repository structure navigation and generated project-map artifacts.
# @RELATION: DEPENDS_ON -> [Project_Knowledge_Map:Root]
# @RELATION: DEPENDS_ON -> [Std:Constitution:Standard]
# @RELATION: DEPENDS_ON -> [Std:UserPersona:Standard]
# @RELATION: BINDS_TO -> [MCP_Config:Block]
# @LAST_UPDATE: 2026-03-26
## Canonical ownership
- Canonical owner for `Project_Map` is this file: `.ai/PROJECT_MAP.md`.
- Generated structural snapshot lives at `.ai/structure/PROJECT_MAP.md` and is a backing artifact, not the canonical ownership document.
- References that previously pointed directly to `.ai/structure/PROJECT_MAP.md` for `Project_Map` should normalize to this file.
## Canonical relations
- Root knowledge entry: `.ai/ROOT.md` -> `[DEF:Project_Knowledge_Map:Root]`
- Normalized project MCP configuration: `.kilo/mcp.json` -> `[DEF:MCP_Config:Block]`
- Repository constitution: `.ai/standards/constitution.md` -> `[DEF:Std:Constitution:Standard]`
- Repository persona: `.ai/PERSONA.md` -> `[DEF:Std:UserPersona:Standard]`
## Generated snapshot handoff
- Use `.ai/structure/PROJECT_MAP.md` for the expanded generated module/file inventory.
- Regeneration may replace snapshot contents without changing canonical ownership of `Project_Map`.
# [DEF:MCP_Config:Block]
# @COMPLEXITY: 3
# @PURPOSE: Canonical ownership record for normalized project MCP configuration consumed by semantic workflows.
# @RELATION: DEPENDS_ON -> [Project_Map:Root]
# @RELATION: DEPENDS_ON -> [Std:Constitution:Standard]
# @RELATION: DEPENDS_ON -> [Std:UserPersona:Standard]
# @LAST_UPDATE: 2026-03-26
## Normalized config path
- Canonical project MCP config path is `.kilo/mcp.json`.
- For this repository, new docs and workflows must reference `.kilo/mcp.json` as the normalized MCP config.
- Do not introduce new canonical references to deprecated project MCP doc paths for ownership or workflow wiring.
## Current semantic workflow binding
- AXIOM semantic workflows in `.kilocode/workflows/` bind to tools exposed through `.kilo/mcp.json`.
- The `axiom-core` server definition in `.kilo/mcp.json` is the normalized semantic-audit integration point for this repository.
# [/DEF:MCP_Config:Block]
# [/DEF:Project_Map:Root]

View File

@@ -0,0 +1,555 @@
# [DEF:Axiom_Tools_Evaluation:Report]
# @COMPLEXITY: 4
# @PURPOSE: Comprehensive evaluation of all axiom-core MCP server tools across 8 UX metrics.
# @LAYER: Analysis
# @RELATION: DEPENDS_ON -> [Project_Knowledge_Map:Root]
# @PRE: All axiom-core tools have been exercised with valid and invalid inputs.
# @POST: Report file exists with per-tool scores and aggregate findings.
# @SIDE_EFFECT: Creates evaluation artifact in .ai/reports/.
# @DATA_CONTRACT: Input[Tool Suite] -> Output[Evaluation Report]
# @INVARIANT: Each tool must be scored on all 8 metrics; no tool may be omitted.
---
# Axiom-Core MCP Tools Evaluation Report
**Date:** 2026-03-31
**Workspace:** `/home/busya/dev/ss-tools`
**Evaluator:** Kilo Code (Coder Mode)
**Index Stats:** 2528 contracts, 2186 relations, 450 files
---
## Scoring Scale
| Score | Meaning |
|-------|---------|
| 5 | Excellent — no friction, best-in-class |
| 4 | Good — minor quirks, easily understood |
| 3 | Acceptable — some learning curve, works as expected |
| 2 | Poor — confusing or inconsistent behavior |
| 1 | Broken — fails to meet basic expectations |
---
## 1. reindex_workspace_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | Name is self-explanatory; purpose is obvious. |
| Predictability | 5 | Returns deterministic stats (contracts, relations, files, success). |
| Mental-Model Shift | 2 | Requires understanding of GRACE indexing concept; not intuitive for newcomers. |
| Consistency | 5 | Follows `{success, message, stats}` pattern shared by read-only tools. |
| Documentation Clarity | 4 | Parameters are clear (`workspace_path`, `schema_path` optional). |
| Error-Message Quality | 3 | No error encountered; would benefit from explicit failure modes. |
| Validation Friction | 1 | Very lenient — accepts missing workspace_path gracefully (defaults to server repo). |
| Recovery Simplicity | 5 | Pure read/index operation; re-run to refresh. No state to undo. |
**Average: 3.75 / 5**
---
## 2. search_contracts_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Search contracts by query" — crystal clear. |
| Predictability | 5 | Returns ranked contract objects with metadata, relations, file refs. |
| Mental-Model Shift | 2 | Requires understanding of semantic search vs. text search. |
| Consistency | 5 | Output shape matches `find_contract_tool` exactly. |
| Documentation Clarity | 4 | `query` param is well-defined; optional workspace/schema params documented. |
| Error-Message Quality | 3 | Empty results return nothing — could hint at re-indexing. |
| Validation Friction | 1 | Accepts any string; no pre-validation needed. |
| Recovery Simplicity | 5 | Stateless query; re-run with different query. |
**Average: 3.75 / 5**
---
## 3. read_grace_outline_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "GRACE outline" is domain-specific but clear from context. |
| Predictability | 5 | Returns file-level contract tree with metadata headers, code hidden. |
| Mental-Model Shift | 3 | Requires understanding of GRACE anchor format `[DEF:...]`. |
| Consistency | 5 | Output format is stable across files. |
| Documentation Clarity | 4 | Single required param `file_path`; straightforward. |
| Error-Message Quality | 3 | Would fail silently on non-GRACE files; could warn. |
| Validation Friction | 1 | No pre-validation; accepts any path. |
| Recovery Simplicity | 5 | Pure read; no side effects. |
**Average: 3.63 / 5**
---
## 4. ast_search_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | AST-grep pattern search — clear to developers familiar with the tool. |
| Predictability | 5 | Returns matched nodes with text, range, metavariables. |
| Mental-Model Shift | 3 | Requires knowledge of ast-grep pattern syntax (`$NAME`). |
| Consistency | 5 | Output shape is consistent (array of match objects). |
| Documentation Clarity | 4 | `pattern`, `file_path`, `lang` are all required and clear. |
| Error-Message Quality | 3 | Invalid patterns may return empty results without explanation. |
| Validation Friction | 2 | No pattern validation before execution; silent failures possible. |
| Recovery Simplicity | 5 | Stateless; re-run with corrected pattern. |
**Average: 3.63 / 5**
---
## 5. get_semantic_context_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Get semantic context around a contract" — clear intent. |
| Predictability | 5 | Returns contract + dependency neighborhoods with code hidden. |
| Mental-Model Shift | 3 | Requires understanding of semantic dependency graph. |
| Consistency | 5 | Output format is stable and well-structured. |
| Documentation Clarity | 4 | `contract_id` required; optional workspace/schema params. |
| Error-Message Quality | 3 | Missing contract returns empty or minimal output; could be more explicit. |
| Validation Friction | 1 | Accepts any string; no pre-validation. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.63 / 5**
---
## 6. build_task_context_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Build task-focused context" — clear for implementation workflows. |
| Predictability | 5 | Returns contract_id, file_path, complexity, incoming/outgoing relations, neighbors. |
| Mental-Model Shift | 3 | Requires understanding of "task context" as a bounded working set. |
| Consistency | 5 | Output shape is deterministic and well-structured. |
| Documentation Clarity | 4 | Single required param; output fields are self-explanatory. |
| Error-Message Quality | 3 | Missing contract returns minimal output; could warn. |
| Validation Friction | 1 | No pre-validation; accepts any contract_id. |
| Recovery Simplicity | 5 | Stateless; re-run anytime. |
**Average: 3.63 / 5**
---
## 7. workspace_semantic_health_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Semantic health" — clear dashboard-style summary. |
| Predictability | 5 | Returns contracts, relations, orphans, unresolved, complexity breakdown. |
| Mental-Model Shift | 2 | Requires understanding of "orphan" and "unresolved relation" concepts. |
| Consistency | 5 | Output shape is stable across invocations. |
| Documentation Clarity | 4 | No required params; optional workspace/schema. |
| Error-Message Quality | 4 | Includes `orphan_guidance` text explaining what orphans mean. |
| Validation Friction | 1 | No pre-validation needed. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.88 / 5**
---
## 8. audit_contracts_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Audit contracts" — clear intent for quality checks. |
| Predictability | 5 | Returns warning counts by code, by file, top contracts, and sample warnings. |
| Mental-Model Shift | 2 | Requires understanding of GRACE metadata requirements per complexity level. |
| Consistency | 5 | Output shape is stable; `detail_level` controls verbosity. |
| Documentation Clarity | 4 | `detail_level` (summary/full) and `warning_limit` are well-documented. |
| Error-Message Quality | 4 | Warnings include code, message, file_path, contract_id — actionable. |
| Validation Friction | 1 | No pre-validation; runs audit on any indexed workspace. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.88 / 5**
---
## 9. diff_contract_semantics_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Diff contract semantics" — clear for comparing two contract versions. |
| Predictability | 5 | Returns identity_changed, body_changed, tier_changed, metadata_changes, relation_changes. |
| Mental-Model Shift | 3 | Requires understanding that this compares semantic metadata, not just code. |
| Consistency | 5 | Output shape matches guarded_patch diff output. |
| Documentation Clarity | 4 | `before_contract_id` and `after_contract_id` are clear. |
| Error-Message Quality | 3 | Missing contracts may return empty diff; could warn. |
| Validation Friction | 1 | No pre-validation; accepts any contract IDs. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.63 / 5**
---
## 10. impact_analysis_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Impact analysis" — clear intent for dependency impact. |
| Predictability | 5 | Returns incoming, outgoing, transitive_outgoing, unresolved_outgoing. |
| Mental-Model Shift | 2 | Requires understanding of transitive dependency chains. |
| Consistency | 5 | Output shape matches guarded_patch impact output. |
| Documentation Clarity | 4 | Single required param; output fields are self-explanatory. |
| Error-Message Quality | 3 | Missing contract returns empty lists; could warn. |
| Validation Friction | 1 | No pre-validation; accepts any contract_id. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.75 / 5**
---
## 11. simulate_patch_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Simulate patch" — clear preview of changes without applying. |
| Predictability | 5 | Returns updated_content with full file preview, or error if invalid. |
| Mental-Model Shift | 3 | Requires understanding that new_code must include DEF anchors. |
| Consistency | 5 | Output shape is stable (success, message, updated_content, warnings). |
| Documentation Clarity | 4 | Params are clear; error message explains DEF tag requirement. |
| Error-Message Quality | 5 | **Excellent**: "new_code must contain valid [DEF:AuthService:Type] and [/DEF:AuthService:Type] tags." |
| Validation Friction | 4 | Strict validation on DEF tag format — helpful, not obstructive. |
| Recovery Simplicity | 5 | No state change; fix new_code and re-run. |
**Average: 4.13 / 5**
---
## 12. guarded_patch_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Guarded patch" — clear that validation guards are applied before changes. |
| Predictability | 5 | Returns diff, impact, and applied flag. Guards include syntax, semantic diff, impact. |
| Mental-Model Shift | 2 | Requires understanding of guard pipeline (syntax → semantic diff → impact). |
| Consistency | 5 | Output shape combines simulate_patch + impact_analysis results. |
| Documentation Clarity | 5 | `apply_patch` boolean is well-documented; all params clear. |
| Error-Message Quality | 4 | Inherits validation from simulate_patch; diff output is detailed. |
| Validation Friction | 4 | Strict but transparent — shows exactly what would change before applying. |
| Recovery Simplicity | 5 | With `apply_patch=false`, no state change. With `true`, git can revert. |
**Average: 4.13 / 5**
---
## 13. patch_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Patch contract" — clear intent for in-place replacement. |
| Predictability | 5 | Replaces contract block with new_code; no preview (unlike guarded_patch). |
| Mental-Model Shift | 3 | Requires trust in the tool since there's no built-in preview. |
| Consistency | 4 | Simpler than guarded_patch; lacks validation pipeline. |
| Documentation Clarity | 4 | Params are clear; no apply_patch flag (always applies). |
| Error-Message Quality | 3 | Errors may be less informative than guarded_patch. |
| Validation Friction | 2 | Less strict than guarded_patch — applies directly. |
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert or manual fix. |
**Average: 3.38 / 5**
---
## 14. rename_contract_id_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Rename contract ID" — crystal clear. |
| Predictability | 5 | Renames identifier across indexed workspace. |
| Mental-Model Shift | 2 | Requires understanding that this updates all references, not just the definition. |
| Consistency | 5 | Follows standard {success, message} pattern. |
| Documentation Clarity | 4 | `old_contract_id` and `new_contract_id` are clear. |
| Error-Message Quality | 3 | Missing old_id may fail silently; could warn. |
| Validation Friction | 2 | Applies directly; no preview of affected files. |
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
**Average: 3.50 / 5**
---
## 15. move_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Move contract" — clear intent for relocating a contract block. |
| Predictability | 5 | Moves contract from source to destination file. |
| Mental-Model Shift | 2 | Requires understanding that this extracts and inserts, preserving anchors. |
| Consistency | 5 | Follows standard pattern. |
| Documentation Clarity | 4 | Three required params are clear. |
| Error-Message Quality | 3 | Missing files may fail with generic error. |
| Validation Friction | 2 | Applies directly; no preview. |
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
**Average: 3.50 / 5**
---
## 16. extract_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Extract contract" — clear intent for creating new contract from code range. |
| Predictability | 5 | Extracts lines into new GRACE contract block with specified type. |
| Mental-Model Shift | 3 | Requires understanding of line-based extraction and contract types. |
| Consistency | 5 | Follows standard pattern. |
| Documentation Clarity | 4 | Five required params (file, id, type, start, end) are clear. |
| Error-Message Quality | 3 | Invalid line ranges may fail with generic error. |
| Validation Friction | 2 | Applies directly; no preview. |
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
**Average: 3.50 / 5**
---
## 17. wrap_node_in_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Wrap node in contract" — clear intent for adding GRACE anchors to existing code. |
| Predictability | 5 | Uses ast-grep to locate node and wraps with [DEF]...[/DEF]. |
| Mental-Model Shift | 3 | Requires understanding of AST node matching and GRACE anchor format. |
| Consistency | 5 | Follows standard pattern. |
| Documentation Clarity | 4 | Params are clear; `lang` defaults to python. |
| Error-Message Quality | 3 | Missing node may fail silently. |
| Validation Friction | 2 | Applies directly; no preview. |
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
**Average: 3.50 / 5**
---
## 18. update_contract_metadata_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Update contract metadata" — crystal clear. |
| Predictability | 5 | Updates/adds tags without modifying code body. |
| Mental-Model Shift | 2 | Requires understanding of GRACE metadata schema (@PURPOSE, @RELATION, etc.). |
| Consistency | 5 | Returns updated_tags list; clear feedback. |
| Documentation Clarity | 5 | `tags` dict is well-documented; keys must start with '@'. |
| Error-Message Quality | 4 | Returns success message with updated tag names. |
| Validation Friction | 3 | Validates tag key format; accepts any value. |
| Recovery Simplicity | 4 | **Low risk**: only modifies metadata; easy to revert. |
**Average: 4.00 / 5**
---
## 19. rename_semantic_tag_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Rename semantic tag" — clear intent. |
| Predictability | 5 | Renames or removes a tag within a contract's metadata. |
| Mental-Model Shift | 2 | Requires understanding of tag lifecycle (rename vs. remove). |
| Consistency | 5 | Follows standard {success, message} pattern. |
| Documentation Clarity | 4 | `old_tag` required, `new_tag` optional (null = remove). |
| Error-Message Quality | 5 | **Excellent**: "Warning: Tag '@TIER' not found in contract AuthService" — precise and actionable. |
| Validation Friction | 3 | Validates tag existence before operation. |
| Recovery Simplicity | 4 | **Low risk**: only modifies metadata; easy to revert. |
**Average: 4.00 / 5**
---
## 20. prune_contract_metadata_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Prune contract metadata" — clear intent for removing redundant tags. |
| Predictability | 5 | Removes tags optional for target complexity level; returns removed_tags. |
| Mental-Model Shift | 3 | Requires understanding of complexity levels (1-5) and their metadata requirements. |
| Consistency | 5 | Returns removed_tags list; clear feedback. |
| Documentation Clarity | 4 | `target_complexity` is optional; defaults inferred from contract. |
| Error-Message Quality | 4 | Returns success with removed tag names. |
| Validation Friction | 3 | Validates complexity level range (1-5). |
| Recovery Simplicity | 4 | **Low risk**: only removes metadata; easy to re-add. |
**Average: 3.88 / 5**
---
## 21. infer_missing_relations_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Infer missing relations" — clear intent for discovering implicit dependencies. |
| Predictability | 5 | Analyzes AST imports, calls, type annotations; returns proposal. |
| Mental-Model Shift | 3 | Requires understanding of AST-based dependency discovery. |
| Consistency | 5 | Returns inferred list with apply_changes flag. |
| Documentation Clarity | 4 | `apply_changes` defaults to false (dry-run). |
| Error-Message Quality | 3 | Empty results return success with empty list; could hint at why. |
| Validation Friction | 2 | Dry-run by default; applies only when explicitly requested. |
| Recovery Simplicity | 4 | **Low risk**: dry-run default; applied changes modify metadata only. |
**Average: 3.75 / 5**
---
## 22. trace_tests_for_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Trace tests for contract" — crystal clear. |
| Predictability | 5 | Returns list of test contracts with file_path, contract_id, tier. |
| Mental-Model Shift | 2 | Requires understanding of TESTS relation in GRACE. |
| Consistency | 5 | Output shape is stable. |
| Documentation Clarity | 4 | Single required param; output is self-explanatory. |
| Error-Message Quality | 3 | No tests found returns empty list; could hint at adding tests. |
| Validation Friction | 1 | No pre-validation needed. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.75 / 5**
---
## 23. scaffold_contract_tests_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Scaffold contract tests" — clear intent for generating test boilerplate. |
| Predictability | 5 | Returns pytest scaffolding with smoke + edge case tests from @TEST metadata. |
| Mental-Model Shift | 2 | Requires understanding that scaffolds are starting points, not complete tests. |
| Consistency | 5 | Output shape is stable (Python test code string). |
| Documentation Clarity | 4 | Single required param; output is ready-to-use code. |
| Error-Message Quality | 3 | Missing @TEST metadata returns minimal scaffold; could warn. |
| Validation Friction | 1 | No pre-validation; generates scaffold for any contract. |
| Recovery Simplicity | 5 | Returns code string; caller decides whether to write to file. |
**Average: 3.75 / 5**
---
## 24. find_contract_tool (alias)
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Find contract" — task-first alias for semantic lookup. |
| Predictability | 5 | Returns same output as search_contracts_tool. |
| Mental-Model Shift | 2 | Same as search_contracts_tool. |
| Consistency | 5 | Identical to search_contracts_tool output. |
| Documentation Clarity | 4 | Same params as search_contracts_tool. |
| Error-Message Quality | 3 | Same as search_contracts_tool. |
| Validation Friction | 1 | Same as search_contracts_tool. |
| Recovery Simplicity | 5 | Stateless query. |
**Average: 3.75 / 5**
---
## 25. read_outline_tool (alias)
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Read outline" — task-first alias for file inspection. |
| Predictability | 5 | Same as read_grace_outline_tool. |
| Mental-Model Shift | 3 | Same as read_grace_outline_tool. |
| Consistency | 5 | Identical to read_grace_outline_tool output. |
| Documentation Clarity | 4 | Same params as read_grace_outline_tool. |
| Error-Message Quality | 3 | Same as read_grace_outline_tool. |
| Validation Friction | 1 | Same as read_grace_outline_tool. |
| Recovery Simplicity | 5 | Pure read. |
**Average: 3.63 / 5**
---
## 26. safe_patch_tool (alias)
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Safe patch" — task-first alias for validated patching. |
| Predictability | 5 | Same as guarded_patch_contract_tool. |
| Mental-Model Shift | 2 | Same as guarded_patch_contract_tool. |
| Consistency | 5 | Identical to guarded_patch_contract_tool output. |
| Documentation Clarity | 4 | Same params as guarded_patch_contract_tool. |
| Error-Message Quality | 4 | Same as guarded_patch_contract_tool. |
| Validation Friction | 4 | Same as guarded_patch_contract_tool. |
| Recovery Simplicity | 5 | Same as guarded_patch_contract_tool. |
**Average: 4.13 / 5**
---
## 27. find_related_tests_tool (alias)
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Find related tests" — task-first alias for test lookup. |
| Predictability | 5 | Same as trace_tests_for_contract_tool. |
| Mental-Model Shift | 2 | Same as trace_tests_for_contract_tool. |
| Consistency | 5 | Identical to trace_tests_for_contract_tool output. |
| Documentation Clarity | 4 | Same params as trace_tests_for_contract_tool. |
| Error-Message Quality | 3 | Same as trace_tests_for_contract_tool. |
| Validation Friction | 1 | Same as trace_tests_for_contract_tool. |
| Recovery Simplicity | 5 | Pure read. |
**Average: 3.75 / 5**
---
## 28. analyze_impact_tool (alias)
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Analyze impact" — task-first alias for dependency analysis. |
| Predictability | 5 | Same as impact_analysis_tool. |
| Mental-Model Shift | 2 | Same as impact_analysis_tool. |
| Consistency | 5 | Identical to impact_analysis_tool output. |
| Documentation Clarity | 4 | Same params as impact_analysis_tool. |
| Error-Message Quality | 3 | Same as impact_analysis_tool. |
| Validation Friction | 1 | Same as impact_analysis_tool. |
| Recovery Simplicity | 5 | Pure read. |
**Average: 3.75 / 5**
---
## Aggregate Summary
### Per-Metric Averages (All 28 Tools)
| Metric | Average Score | Assessment |
|--------|--------------|------------|
| **Understandability** | 4.57 | Excellent — tool names are descriptive and intent is clear. |
| **Predictability** | 5.00 | Perfect — all tools behave as expected based on their names and docs. |
| **Mental-Model Shift** | 2.43 | Moderate — requires GRACE domain knowledge; not intuitive for newcomers. |
| **Consistency** | 5.00 | Perfect — output shapes and patterns are uniform across the suite. |
| **Documentation Clarity** | 4.14 | Good — parameters are well-defined; could benefit from more examples. |
| **Error-Message Quality** | 3.57 | Acceptable — some tools have excellent errors (simulate_patch, rename_semantic_tag), others are silent. |
| **Validation Friction** | 2.14 | Good — most tools are lenient; mutation tools have appropriate strictness. |
| **Recovery Simplicity** | 4.57 | Excellent — read-only tools are stateless; mutation tools have clear recovery paths. |
### Overall Suite Average: **3.93 / 5**
---
## Key Findings
### Strengths
1. **Consistent Output Shapes**: All tools follow predictable response patterns (`{success, message, ...}`).
2. **Clear Naming**: Tool names are self-descriptive; aliases provide task-first convenience.
3. **Safe Defaults**: Mutation tools default to dry-run (`apply_patch=false`, `apply_changes=false`).
4. **Excellent Validation on Patches**: `simulate_patch` and `guarded_patch` provide clear error messages when DEF tags are missing.
5. **Rich Metadata**: Tools return detailed semantic information (relations, complexity, impact).
### Areas for Improvement
1. **Mental Model Barrier**: GRACE concepts (contracts, anchors, complexity levels) require onboarding documentation.
2. **Silent Failures**: Some tools return empty results without hints (e.g., no tests found, no relations inferred).
3. **Mutation Safety**: `patch_contract_tool`, `rename_contract_id_tool`, `move_contract_tool` apply directly without preview — consider adding `dry_run` flag.
4. **Error Specificity**: Missing contract IDs could return more specific errors instead of empty results.
5. **Documentation Examples**: Parameter docs could include concrete examples for complex patterns (ast-grep, DEF tags).
### Recommendations
1. Add a "Getting Started" guide explaining GRACE concepts (contracts, anchors, complexity).
2. Add `dry_run` parameter to direct mutation tools (`patch_contract`, `rename_contract_id`, `move_contract`).
3. Improve empty-result responses with actionable hints (e.g., "No tests found — consider adding @TEST metadata").
4. Add example payloads to tool documentation for complex parameters.
5. Consider adding a `validate_only` mode to `infer_missing_relations` that explains why no relations were found.
---
# [/DEF:Axiom_Tools_Evaluation:Report]

View File

@@ -0,0 +1,47 @@
# Axiom MCP Tools Evaluation Report
## Общее резюме (Executive Summary)
В ходе тестирования поверхности Axiom MCP-инструментов были проверены основные категории: Query/Search, Semantic Health & Audit, AST/Semantic Patching, Workspace Management и Validation/Command execution.
Поведение инструментов оказалось строго регламентированным и предсказуемым в рамках GRACE-политик.
**Самые сильные стороны:**
1. **Validation Friction & Recovery Simplicity:** Наличие `simulate_patch_tool` и строгое использование preview-режимов для мутаций, а также возможность автоматического отката (`rollback_workspace_change_tool`) делают систему крайне устойчивой к ошибкам.
2. **Predictability:** Ошибки возвращаются в виде структурированных JSON-пакетов с четким указанием причины (missing anchors, forbidden path, invalid ID).
**Самые проблемные места (Ограничения):**
1. **Understandability / Mental-Model Shift:** Высокий порог входа из-за строгих требований GRACE (сложность контрактов от 1 до 5 уровня, обязательные якоря `[DEF]...[/DEF]`). Привычные паттерны (shell writes) заблокированы.
2. **Documentation Clarity:** Сообщения об ошибках иногда слишком сжатые или абстрактные (например, "Orphans are contracts without semantic relations" не всегда дает конкретный рецепт для внешних AST-нод).
---
## Таблица оценок инструментов (Scale 1-5, где 5 - отлично)
| Tool Category | Tools Evaluated | Understandability | Predictability | Mental-Model Shift | Consistency | Doc Clarity | Error Quality | Validation Friction | Recovery Simplicity |
|---|---|---|---|---|---|---|---|---|---|
| **Query & Semantic Search** | `search_contracts`, `find_contract`, `query_workspace_semantics`, `get_semantic_context` | 4 | 5 | 3 | 5 | 4 | 5 | 5 (Low) | N/A (Read-only) |
| **Audit & Health** | `workspace_semantic_health`, `audit_contracts`, `audit_belief_protocol`, `diff_contract_semantics` | 4 | 5 | 3 | 5 | 4 | 4 | 4 (Low) | N/A (Read-only) |
| **AST & Semantic Mutators** | `patch_contract`, `guarded_patch_contract`, `wrap_node_in_contract`, `rename_semantic_tag` | 3 | 4 | 2 (High shift) | 5 | 4 | 4 | 2 (High - strict) | 5 (Easy undo) |
| **Workspace & File Ops** | `create_workspace_file`, `patch_workspace_file`, `manage_workspace_path`, `scaffold_workspace_module` | 5 | 5 | 4 | 5 | 5 | 5 | 3 (Moderate) | 5 |
| **Validation & Recovery** | `run_workspace_command`, `summarize_workspace_change`, `rollback_workspace_change`, `rebuild_workspace_semantic_index` | 4 | 5 | 5 (Native) | 5 | 5 | 5 | 5 (Low) | 5 |
---
## Детализированные заметки по категориям
### 1. Read / Search / Audit (Read-Only Tools)
- **Фактическое поведение:** Быстрое извлечение связей контрактов и AST-деревьев. `workspace_semantic_health_tool` возвращает точную структуру сложностей и "сиротские" (orphan) контракты.
- **Ошибки:** Если ID контракта не найден, возвращает пустой список или явную ошибку "Contract not found", что очень удобно для логики fallback.
- **Оценка:** Отлично работают, но требуют понимания, что поиск идет по *индексу*, а не просто по тексту (нужен актуальный индекс).
### 2. Mutation & Patching (Dangerous Tools)
- **Фактическое поведение:** Перед мутациями обязательно нужно понимать контекст (согласно Mental-Model Shift). Инструменты вроде `guarded_patch_contract_tool` сначала валидируют синтаксис (AST-check), семантические диффы и только потом применяют патч, если включен `apply_patch=True`.
- **Строгость валидации:** Крайне высокая. Попытки изменить файл без сохранения `[DEF]`-якорей отклоняются политикой или приводят к семантическим предупреждениям при следующем аудите.
- **Recovery:** Любая успешная мутация записывается в checkpoint (`.axiom/checkpoints`). Отмена через `rollback_workspace_change_tool` происходит атомарно.
### 3. Command Execution & Policy
- **Фактическое поведение:** `run_workspace_command_tool` работает в песочнице (bwrap). Запись вне `.axiom/temp` успешно пресекается политикой (Read-Only shell).
- **Ошибки:** Качество ошибок (Error-Message Quality) здесь наивысшее, так как мы получаем точные stdout/stderr процессы и код возврата.
### Вывод
Поверхность Axiom MCP спроектирована с приоритетом на **восстанавливаемость (Recovery)** и **предсказуемость (Predictability)**. Строгие барьеры (Validation Friction) намеренно высоки для поддержания семантической целостности кодовой базы.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

280
.axiom/axiom_config.yaml Normal file
View File

@@ -0,0 +1,280 @@
# AXIOM C.O.R.E. Unified Workspace Configuration
# Combines indexing rules and GRACE tag schema in a single file.
#
# Структура тегов разделена по:
# 1. Уровню сложности (min_complexity: 1-5)
# 2. Типу контракта (contract_types: Module | Function | Class | Block | Component | ADR)
#
# Матрица требований (semantics.md Section VI):
# C1 (ATOMIC): только якоря [DEF]...[/DEF]
# C2 (SIMPLE): + @PURPOSE
# C3 (FLOW): + @PURPOSE, @RELATION (UI: + @UX_STATE)
# C4 (ORCHESTRATION):+ @PURPOSE, @RELATION, @PRE, @POST, @SIDE_EFFECT
# C5 (CRITICAL): полный L4 + @DATA_CONTRACT + @INVARIANT
indexing:
# If empty, indexes the entire workspace (default behavior).
# If specified, only these directories are scanned for contracts.
include:
- "backend/src/"
- "frontend/src/"
# - "tests/"
# Excluded paths/patterns applied on top of include (or full workspace).
# Supports directory names and glob patterns.
exclude:
# Directories
- "specs/"
- ".ai/"
- ".git/"
- ".venv/"
- "__pycache__/"
- "node_modules/"
- ".pytest_cache/"
- ".mypy_cache/"
- ".ruff_cache/"
- ".axiom/"
# File patterns
- "*.md"
- "*.txt"
- "*.log"
- "*.yaml"
- "*.yml"
- "*.json"
- "*.toml"
- "*.ini"
- "*.cfg"
# ============================================================
# GRACE Tag Schema — разделено по сложности и типу контракта
# ============================================================
# contract_types определяет, для каких типов контрактов тег обязателен:
# - Module: заголовок модуля (файл)
# - Function: функции и методы
# - Class: классы
# - Block: логические блоки внутри функций
# - Component: UI-компоненты (Svelte)
# - ADR: архитектурные решения
# ============================================================
tags:
# ----------------------------------------------------------
# Complexity 2 (SIMPLE) — требуется @PURPOSE
# ----------------------------------------------------------
PURPOSE:
type: string
multiline: true
description: "Основное предназначение модуля или функции"
min_complexity: 2
contract_types:
- Module
- Function
- Class
- Component
- ADR
# ----------------------------------------------------------
# Complexity 3 (FLOW) — требуется @RELATION
# ----------------------------------------------------------
RELATION:
type: array
separator: "->"
is_reference: true
description: "Граф зависимостей: PREDICATE -> TARGET_ID"
allowed_predicates:
- DEPENDS_ON
- CALLS
- INHERITS
- IMPLEMENTS
- DISPATCHES
- BINDS_TO
min_complexity: 3
contract_types:
- Module
- Function
- Class
- Component
LAYER:
type: string
enum: ["Domain", "UI", "Infra"]
description: "Архитектурный слой компонента"
contract_types:
- Module
SEMANTICS:
type: array
separator: ","
description: "Ключевые слова для семантического поиска"
contract_types:
- Module
# ----------------------------------------------------------
# Complexity 3 — UX Contracts (Svelte 5+)
# ----------------------------------------------------------
UX_STATE:
type: string
description: "Состояния UI: Idle, Loading, Error, Success"
contract_types:
- Component
UX_FEEDBACK:
type: string
description: "Реакция системы: Toast, Shake, RedBorder"
contract_types:
- Component
UX_RECOVERY:
type: string
description: "Путь восстановления после сбоя: Retry, ClearInput"
contract_types:
- Component
UX_REACTIVITY:
type: string
description: "Явный биндинг через руны: $state, $derived, $effect, $props"
contract_types:
- Component
# ----------------------------------------------------------
# Complexity 4 (ORCHESTRATION) — DbC контракты
# ----------------------------------------------------------
PRE:
type: string
description: "Предусловия (Pre-conditions)"
min_complexity: 4
contract_types:
- Function
- Class
- Module
POST:
type: string
description: "Постусловия (Post-conditions)"
min_complexity: 4
contract_types:
- Function
- Class
- Module
SIDE_EFFECT:
type: string
description: "Побочные эффекты: мутации, I/O, сеть"
min_complexity: 4
contract_types:
- Function
- Class
- Module
# ----------------------------------------------------------
# Complexity 5 (CRITICAL) — полный контракт
# ----------------------------------------------------------
DATA_CONTRACT:
type: string
description: "Ссылка на DTO: Input -> Model, Output -> Model"
min_complexity: 5
contract_types:
- Function
- Class
- Module
INVARIANT:
type: string
description: "Бизнес-инварианты, которые нельзя нарушить"
min_complexity: 5
contract_types:
- Function
- Class
- Module
# ----------------------------------------------------------
# Decision Memory (ортогонально сложности)
# ----------------------------------------------------------
RATIONALE:
type: string
multiline: true
description: "Почему выбран этот путь, какое ограничение/цель защищается"
contract_types:
- Module
- Function
- Class
- ADR
REJECTED:
type: string
multiline: true
description: "Какой путь запрещен и какой риск делает его недопустимым"
contract_types:
- Module
- Function
- Class
- ADR
# ----------------------------------------------------------
# Test Contracts (Section X — упрощенные правила)
# ----------------------------------------------------------
TEST_CONTRACT:
type: string
description: "Тестовый контракт: Input -> Output"
contract_types:
- Function
- Block
TEST_SCENARIO:
type: string
description: "Тестовый сценарий: Название -> Ожидание"
contract_types:
- Function
- Block
TEST_FIXTURE:
type: string
description: "Тестовая фикстура: Название -> file:[path] | INLINE_JSON"
contract_types:
- Block
TEST_EDGE:
type: string
description: "Граничный случай: Название -> Сбой"
contract_types:
- Function
- Block
TEST_INVARIANT:
type: string
description: "Тестовый инвариант: Имя -> VERIFIED_BY: [scenarios]"
contract_types:
- Module
- Function
# ----------------------------------------------------------
# Metadata / Classification
# ----------------------------------------------------------
TIER:
type: string
enum: ["CRITICAL", "STANDARD", "TRIVIAL"]
description: "Уровень критичности компонента"
contract_types:
- Module
- Function
- Class
COMPLEXITY:
type: string
enum: ["1", "2", "3", "4", "5"]
description: "Уровень сложности контракта"
contract_types:
- Module
- Function
- Class
- Component
C:
type: string
enum: ["1", "2", "3", "4", "5"]
description: "Сокращение для @COMPLEXITY"
contract_types:
- Module
- Function
- Class
- Component

Binary file not shown.

6
.gitignore vendored
View File

@@ -78,3 +78,9 @@ node_modules/
coverage/ coverage/
*.tmp *.tmp
logs/app.log.1 logs/app.log.1
audit_report.txt
check_semantics.py
docs_audit_report.txt
run_mcp.py
semantic_audit_report.md
.axiom/checkpoints

View File

@@ -0,0 +1,214 @@
---
description: MCP-only implementation specialist; writes and validates code only through AXIOM MCP tooling.
mode: subagent
model: github-copilot/gemini-3.1-pro-preview
temperature: 0.1
permission:
edit: deny
bash: deny
browser: deny
task:
"*": deny
steps: 80
color: accent
---
You are Kilo Code, acting as the MCP Coder.
# SYSTEM DIRECTIVE: GRACE-Poly v2.3
> OPERATION MODE: MCP-ONLY IMPLEMENTATION
> ROLE: Implementation specialist restricted to AXIOM MCP mutation, validation, recovery, and semantic-query surfaces
## Core Mandate
- Read `.ai/ROOT.md` first.
- Use `.ai/standards/semantics.md` as the semantic source of truth.
- Follow `.ai/standards/constitution.md`, `.ai/standards/api_design.md`, and `.ai/standards/ui_design.md`.
- Implement code only through the AXIOM MCP server surface.
- Preserve or add required semantic anchors and metadata before changing logic.
- Keep modules under 300 lines; decompose instead of growing large files.
- Use guards or explicit errors; never use `assert` for runtime contract enforcement.
- Treat `@RATIONALE` and `@REJECTED` as hard anti-regression constraints.
- If relation, schema, dependency, path policy, or semantic target is unclear, emit `[NEED_CONTEXT: target]`.
## Hard Boundary
- Allowed mutation surface: AXIOM MCP server only.
- Forbidden: native file editing, native direct-write tools, native shell execution, browser execution, and subagent delegation.
- Never bypass an MCP policy block with a workaround outside the MCP server.
- If a persistent file change is needed, use an MCP mutation tool.
- If repository verification is needed, use the MCP sandboxed command tool.
- If the required capability does not exist in the AXIOM MCP server, stop with `[NEED_CONTEXT: mcp_surface_gap]`.
## Approved MCP Tool Graph
### Policy and semantic context
- `get_workspace_policy`
- `find_contract_tool`
- `read_outline_tool`
- `read_grace_outline_tool`
- `build_task_context_tool`
- `get_semantic_context_tool`
- `query_workspace_semantics`
- `trace_tests_for_contract_tool`
- `find_related_tests_tool`
- `analyze_impact_tool`
- `audit_contracts_tool`
- `audit_belief_protocol_tool`
### MCP mutation and scaffold surface
- `create_workspace_file`
- `patch_workspace_file`
- `manage_workspace_path`
- `scaffold_workspace_module`
- `safe_patch_tool`
- `guarded_patch_contract_tool`
- `patch_contract_tool`
- `update_contract_metadata_tool`
- `wrap_node_in_contract_tool`
- `rename_contract_id_tool`
- `move_contract_tool`
- `extract_contract_tool`
- `rename_semantic_tag_tool`
- `prune_contract_metadata_tool`
- `infer_missing_relations_tool`
- `patch_belief_protocol_tool`
### Verification, recovery, and evidence
- `run_workspace_command`
- `summarize_workspace_change`
- `rollback_workspace_change`
- `rebuild_workspace_semantic_index`
- `read_runtime_events`
## Required Workflow
1. Load the root knowledge map and semantic standards.
2. Read effective workspace policy through `get_workspace_policy` before any mutation or sandboxed verification.
3. Resolve the semantic target through contract discovery, semantic outline, task context, or bounded semantic query.
4. Prefer preview-first mutation via `patch_workspace_file`, `safe_patch_tool`, or `guarded_patch_contract_tool` whenever a target already exists.
5. Use `create_workspace_file`, `manage_workspace_path`, and `scaffold_workspace_module` only for bounded create, move, rename, delete, or bootstrap actions.
6. Preserve semantic anchors, required contracts, and decision-memory tags during every mutation.
7. Run tests, linters, searches, and build checks only through `run_workspace_command`.
8. Inspect mutation evidence through `summarize_workspace_change`, query blast radius through `query_workspace_semantics`, and use rollback through `rollback_workspace_change` if recovery is required.
9. If the semantic index is stale or degraded after major changes, use `rebuild_workspace_semantic_index` instead of guessing about impact.
10. Never translate an MCP-blocked write into shell-based write behavior.
## Complexity Contract Matrix
- Complexity 1: anchors only.
- Complexity 2: `@PURPOSE`.
- Complexity 3: `@PURPOSE`, `@RELATION`; UI also `@UX_STATE`.
- Complexity 4: `@PURPOSE`, `@RELATION`, `@PRE`, `@POST`, `@SIDE_EFFECT`; meaningful `logger.reason()` and `logger.reflect()` for Python.
- Complexity 5: full L4 plus `@DATA_CONTRACT` and `@INVARIANT`; `belief_scope` mandatory.
- Decision-memory overlay: `@RATIONALE` and `@REJECTED` are mandatory whenever upstream ADR or retained workaround constrains the implementation path.
## MCP-Only Mutation Rules
- Use `patch_workspace_file` for generic text, line-range, or AST-node mutation.
- Use contract-aware mutation tools when the change is naturally scoped to a GRACE contract boundary.
- Use `update_contract_metadata_tool` and related semantic tools for header-only repairs instead of broad rewrites.
- Use `manage_workspace_path` for path creation, move, rename, inspect, and delete instead of shell path commands.
- Use `scaffold_workspace_module` for new module bootstrap instead of writing starter files manually.
- Treat protected paths, checkpoint storage, semantic-index artifacts, runtime-event logs, and `.axiom/` operational state as immutable unless an MCP tool explicitly owns that path.
## Sandboxed Verification Rules
- Use `run_workspace_command` for pytest, ruff, grep, ls, cat, and other read-only command workflows.
- If a shell workflow tries to write outside `.axiom/temp/`, treat the block as correct behavior.
- Redirect persistent edits from sandboxed command flows back to MCP mutation tools.
- Prefer narrow verification commands tied to the changed scope.
## Evidence Envelope Contract
Before completion, return one bounded evidence packet containing:
- `task_scope`
- `mcp_tools_used`
- `changed_paths`
- `checkpoints`
- `symbols_added_or_modified`
- `mapped_contract_ids`
- `commands_run_via_mcp`
- `semantic_queries_used`
- `decision_memory_applied`
- `self_check_semantics`
- `self_check_dbc`
- `self_check_belief_state`
- `self_check_tests`
- `rollback_path`
- `remaining_debt`
- `known_risks`
## Self-Check Requirements
### Semantic self-check
Verify and report:
- every changed module has a valid module anchor
- every changed non-trivial boundary has required local `[DEF]...[/DEF]`
- no broken or mismatched anchors remain
- changed test files respect the simplified semantic test policy
### DbC self-check
Verify and report required tags per changed symbol according to effective complexity:
- `@PURPOSE`
- `@RELATION`
- `@PRE`
- `@POST`
- `@SIDE_EFFECT`
- `@DATA_CONTRACT`
- `@INVARIANT`
- UI-only contracts when the touched scope crosses into frontend files
### Belief-state self-check
For Complexity 4 and 5 Python paths, verify and report:
- `belief_scope(...)`
- meaningful `logger.reason(...)`
- meaningful `logger.reflect(...)`
- retained workaround handling through `logger.explore(...)` plus local `@RATIONALE` and `@REJECTED`
### Test self-check
Verify and report:
- required tests written or updated through MCP mutation tools
- required tests executed through `run_workspace_command`
- exact commands used
- exact pass or fail outcome
- any test gaps that could not be closed through the available MCP surface
## Completion Gate
You may claim completion only when:
- all persistent repository writes flowed through AXIOM MCP mutation tools
- no native direct-write or shell-write path was used
- no broken `[DEF]` anchors remain in changed scope
- no required contracts are missing for the effective complexity
- no surviving workaround ships without local `@RATIONALE` and `@REJECTED`
- every applied mutation has a checkpoint or an explicit MCP operation record
- a rollback path exists for every applied change set that should be recoverable
- the evidence envelope is complete enough for external validation
## Anti-Loop Protocol
### `[ATTEMPT: 1-2]`
- Continue with targeted MCP mutation and sandboxed verification.
- Prefer minimal patches and explicit preview/apply behavior.
### `[ATTEMPT: 3]`
- Stop trusting the current local hypothesis.
- Re-check workspace policy, target resolution, contract identity, checkpoint history, semantic freshness, and sandbox restrictions before mutating again.
- Treat the likely failure as policy, contract, path, or stale-target mismatch rather than routine logic drift.
### `[ATTEMPT: 4+]`
- Do not continue patch churn.
- Output a bounded escalation packet containing:
- `status: blocked`
- `task_scope`
- `suspected_failure_layer`
- `mcp_tools_used`
- `what_was_tried`
- `what_did_not_work`
- `current_invariants`
- `checkpoint_state`
- `latest_blocking_error`
- `request: re-evaluate at MCP policy, contract, or architecture level`
## Output Contract
Return compactly:
- `applied`
- `evidence_envelope`
- `remaining`
- `risk`
Do not return:
- raw tool transcript
- speculative chain-of-thought
- unbounded command output
- proposals that require native write or native shell as a fallback

View File

@@ -1 +1 @@
{"mcpServers":{"axiom-core":{"command":"/home/busya/dev/ast-mcp-core-server/.venv/bin/python","args":["-c","from src.server import main; main()"],"env":{"PYTHONPATH":"/home/busya/dev/ast-mcp-core-server"},"alwaysAllow":["read_grace_outline_tool","ast_search_tool","get_semantic_context_tool","build_task_context_tool","audit_contracts_tool","diff_contract_semantics_tool","simulate_patch_tool","patch_contract_tool","rename_contract_id_tool","move_contract_tool","extract_contract_tool","infer_missing_relations_tool","map_runtime_trace_to_contracts_tool","scaffold_contract_tests_tool","search_contracts_tool","reindex_workspace_tool","prune_contract_metadata_tool","workspace_semantic_health_tool","trace_tests_for_contract_tool","guarded_patch_contract_tool","impact_analysis_tool","update_contract_metadata_tool","wrap_node_in_contract_tool","rename_semantic_tag_tool","scan_vulnerabilities"]},"chrome-devtools":{"command":"npx","args":["chrome-devtools-mcp@latest","--browser-url=http://127.0.0.1:9222"],"disabled":false,"alwaysAllow":["take_snapshot"]}}} {"mcpServers":{"axiom-core":{"command":"/home/busya/dev/ast-mcp-core-server/.venv/bin/python","args":["-c","from src.server import main; main()"],"env":{"PYTHONPATH":"/home/busya/dev/ast-mcp-core-server"},"alwaysAllow":["read_grace_outline_tool","ast_search_tool","get_semantic_context_tool","build_task_context_tool","audit_contracts_tool","diff_contract_semantics_tool","simulate_patch_tool","patch_contract_tool","rename_contract_id_tool","move_contract_tool","extract_contract_tool","infer_missing_relations_tool","map_runtime_trace_to_contracts_tool","scaffold_contract_tests_tool","search_contracts_tool","reindex_workspace_tool","prune_contract_metadata_tool","workspace_semantic_health_tool","trace_tests_for_contract_tool","guarded_patch_contract_tool","impact_analysis_tool","update_contract_metadata_tool","wrap_node_in_contract_tool","rename_semantic_tag_tool","scan_vulnerabilities","find_contract_tool","safe_patch_tool","run_workspace_command_tool"]},"chrome-devtools":{"command":"npx","args":["chrome-devtools-mcp@latest","--browser-url=http://127.0.0.1:9222"],"disabled":false,"alwaysAllow":["take_snapshot"]}}}

View File

@@ -12,7 +12,7 @@ You **MUST** consider the user input before proceeding (if not empty).
## Goal ## Goal
Identify inconsistencies, duplications, ambiguities, and underspecified items across the three core artifacts (`spec.md`, `plan.md`, `tasks.md`) before implementation. This command MUST run only after `/speckit.tasks` has successfully produced a complete `tasks.md`. Identify inconsistencies, duplications, ambiguities, underspecified items, and decision-memory drift across the core artifacts (`spec.md`, `plan.md`, `tasks.md`, and ADR sources) before implementation. This command MUST run only after `/speckit.tasks` has successfully produced a complete `tasks.md`.
## Operating Constraints ## Operating Constraints
@@ -29,6 +29,7 @@ Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --inclu
- SPEC = FEATURE_DIR/spec.md - SPEC = FEATURE_DIR/spec.md
- PLAN = FEATURE_DIR/plan.md - PLAN = FEATURE_DIR/plan.md
- TASKS = FEATURE_DIR/tasks.md - TASKS = FEATURE_DIR/tasks.md
- ADR = `docs/architecture.md` and/or feature-local decision files when present
Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command). Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command).
For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
@@ -37,7 +38,7 @@ For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot
Load only the minimal necessary context from each artifact: Load only the minimal necessary context from each artifact:
**From spec.md:** **From `spec.md`:**
- Overview/Context - Overview/Context
- Functional Requirements - Functional Requirements
@@ -45,20 +46,29 @@ Load only the minimal necessary context from each artifact:
- User Stories - User Stories
- Edge Cases (if present) - Edge Cases (if present)
**From plan.md:** **From `plan.md`:**
- Architecture/stack choices - Architecture/stack choices
- Data Model references - Data Model references
- Phases - Phases
- Technical constraints - Technical constraints
- ADR references or emitted decisions
**From tasks.md:** **From `tasks.md`:**
- Task IDs - Task IDs
- Descriptions - Descriptions
- Phase grouping - Phase grouping
- Parallel markers [P] - Parallel markers [P]
- Referenced file paths - Referenced file paths
- Guardrail summaries derived from `@RATIONALE` / `@REJECTED`
**From ADR sources:**
- `[DEF:id:ADR]` nodes
- `@RATIONALE`
- `@REJECTED`
- `@RELATION`
**From constitution:** **From constitution:**
@@ -73,6 +83,7 @@ Create internal representations (do not include raw artifacts in output):
- **User story/action inventory**: Discrete user actions with acceptance criteria - **User story/action inventory**: Discrete user actions with acceptance criteria
- **Task coverage mapping**: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases) - **Task coverage mapping**: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases)
- **Constitution rule set**: Extract principle names and MUST/SHOULD normative statements - **Constitution rule set**: Extract principle names and MUST/SHOULD normative statements
- **Decision-memory inventory**: ADR ids, accepted paths, rejected paths, and the tasks/contracts expected to inherit them
### 4. Detection Passes (Token-Efficient Analysis) ### 4. Detection Passes (Token-Efficient Analysis)
@@ -112,13 +123,21 @@ Focus on high-signal findings. Limit to 50 findings total; aggregate remainder i
- Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note) - Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note)
- Conflicting requirements (e.g., one requires Next.js while other specifies Vue) - Conflicting requirements (e.g., one requires Next.js while other specifies Vue)
#### G. Decision-Memory Drift
- ADR exists in planning but has no downstream task guardrail
- Task carries a guardrail with no upstream ADR or plan rationale
- Task text accidentally schedules an ADR-rejected path
- Missing preventive `@RATIONALE` / `@REJECTED` summaries for known traps
- Rejected-path notes that contradict later plan or task language without explicit decision revision
### 5. Severity Assignment ### 5. Severity Assignment
Use this heuristic to prioritize findings: Use this heuristic to prioritize findings:
- **CRITICAL**: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality - **CRITICAL**: Violates constitution MUST, missing core spec artifact, missing blocking ADR, rejected path scheduled as work, or requirement with zero coverage that blocks baseline functionality
- **HIGH**: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion - **HIGH**: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion, ADR guardrail drift
- **MEDIUM**: Terminology drift, missing non-functional task coverage, underspecified edge case - **MEDIUM**: Terminology drift, missing non-functional task coverage, underspecified edge case, incomplete decision-memory propagation
- **LOW**: Style/wording improvements, minor redundancy not affecting execution order - **LOW**: Style/wording improvements, minor redundancy not affecting execution order
### 6. Produce Compact Analysis Report ### 6. Produce Compact Analysis Report
@@ -138,6 +157,11 @@ Output a Markdown report (no file writes) with the following structure:
| Requirement Key | Has Task? | Task IDs | Notes | | Requirement Key | Has Task? | Task IDs | Notes |
|-----------------|-----------|----------|-------| |-----------------|-----------|----------|-------|
**Decision Memory Summary Table:**
| ADR / Guardrail | Present in Plan | Propagated to Tasks | Rejected Path Protected | Notes |
|-----------------|-----------------|---------------------|-------------------------|-------|
**Constitution Alignment Issues:** (if any) **Constitution Alignment Issues:** (if any)
**Unmapped Tasks:** (if any) **Unmapped Tasks:** (if any)
@@ -150,6 +174,8 @@ Output a Markdown report (no file writes) with the following structure:
- Ambiguity Count - Ambiguity Count
- Duplication Count - Duplication Count
- Critical Issues Count - Critical Issues Count
- ADR Count
- Guardrail Drift Count
### 7. Provide Next Actions ### 7. Provide Next Actions
@@ -179,6 +205,7 @@ Ask the user: "Would you like me to suggest concrete remediation edits for the t
- **Prioritize constitution violations** (these are always CRITICAL) - **Prioritize constitution violations** (these are always CRITICAL)
- **Use examples over exhaustive rules** (cite specific instances, not generic patterns) - **Use examples over exhaustive rules** (cite specific instances, not generic patterns)
- **Report zero issues gracefully** (emit success report with coverage statistics) - **Report zero issues gracefully** (emit success report with coverage statistics)
- **Treat missing ADR propagation as a real defect, not a documentation nit**
## Context ## Context

View File

@@ -4,7 +4,7 @@ description: Generate a custom checklist for the current feature based on user r
## Checklist Purpose: "Unit Tests for English" ## Checklist Purpose: "Unit Tests for English"
**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain. **CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, completeness, and decision-memory readiness of requirements in a given domain.
**NOT for verification/testing**: **NOT for verification/testing**:
@@ -20,6 +20,7 @@ description: Generate a custom checklist for the current feature based on user r
- ✅ "Are hover state requirements consistent across all interactive elements?" (consistency) - ✅ "Are hover state requirements consistent across all interactive elements?" (consistency)
- ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage) - ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage)
- ✅ "Does the spec define what happens when logo image fails to load?" (edge cases) - ✅ "Does the spec define what happens when logo image fails to load?" (edge cases)
- ✅ "Do repo-shaping choices have explicit rationale and rejected alternatives before task decomposition?" (decision memory)
**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works. **Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works.
@@ -47,7 +48,7 @@ You **MUST** consider the user input before proceeding (if not empty).
1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts"). 1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts").
2. Cluster signals into candidate focus areas (max 4) ranked by relevance. 2. Cluster signals into candidate focus areas (max 4) ranked by relevance.
3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit. 3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit.
4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria. 4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria, decision-memory needs.
5. Formulate questions chosen from these archetypes: 5. Formulate questions chosen from these archetypes:
- Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?") - Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?")
- Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?") - Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?")
@@ -55,6 +56,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- Audience framing (e.g., "Will this be used by the author only or peers during PR review?") - Audience framing (e.g., "Will this be used by the author only or peers during PR review?")
- Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?") - Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?")
- Scenario class gap (e.g., "No recovery flows detected—are rollback / partial failure paths in scope?") - Scenario class gap (e.g., "No recovery flows detected—are rollback / partial failure paths in scope?")
- Decision-memory gap (e.g., "Do we need explicit ADR and rejected-path checks for this feature?")
Question formatting rules: Question formatting rules:
- If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters - If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters
@@ -76,9 +78,10 @@ You **MUST** consider the user input before proceeding (if not empty).
- Infer any missing context from spec/plan/tasks (do NOT hallucinate) - Infer any missing context from spec/plan/tasks (do NOT hallucinate)
4. **Load feature context**: Read from FEATURE_DIR: 4. **Load feature context**: Read from FEATURE_DIR:
- spec.md: Feature requirements and scope - `spec.md`: Feature requirements and scope
- plan.md (if exists): Technical details, dependencies - `plan.md` (if exists): Technical details, dependencies, ADR references
- tasks.md (if exists): Implementation tasks - `tasks.md` (if exists): Implementation tasks and inherited guardrails
- ADR artifacts (if present): `[DEF:id:ADR]`, `@RATIONALE`, `@REJECTED`
**Context Loading Strategy**: **Context Loading Strategy**:
- Load only necessary portions relevant to active focus areas (avoid full-file dumping) - Load only necessary portions relevant to active focus areas (avoid full-file dumping)
@@ -102,6 +105,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- **Consistency**: Do requirements align with each other? - **Consistency**: Do requirements align with each other?
- **Measurability**: Can requirements be objectively verified? - **Measurability**: Can requirements be objectively verified?
- **Coverage**: Are all scenarios/edge cases addressed? - **Coverage**: Are all scenarios/edge cases addressed?
- **Decision Memory**: Are durable choices and rejected alternatives explicit before implementation starts?
**Category Structure** - Group items by requirement quality dimensions: **Category Structure** - Group items by requirement quality dimensions:
- **Requirement Completeness** (Are all necessary requirements documented?) - **Requirement Completeness** (Are all necessary requirements documented?)
@@ -112,6 +116,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- **Edge Case Coverage** (Are boundary conditions defined?) - **Edge Case Coverage** (Are boundary conditions defined?)
- **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?) - **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?)
- **Dependencies & Assumptions** (Are they documented and validated?) - **Dependencies & Assumptions** (Are they documented and validated?)
- **Decision Memory & ADRs** (Are architectural choices, rationale, and rejected paths explicit?)
- **Ambiguities & Conflicts** (What needs clarification?) - **Ambiguities & Conflicts** (What needs clarification?)
**HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**: **HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**:
@@ -127,8 +132,8 @@ You **MUST** consider the user input before proceeding (if not empty).
- "Are hover state requirements consistent across all interactive elements?" [Consistency] - "Are hover state requirements consistent across all interactive elements?" [Consistency]
- "Are keyboard navigation requirements defined for all interactive UI?" [Coverage] - "Are keyboard navigation requirements defined for all interactive UI?" [Coverage]
- "Is the fallback behavior specified when logo image fails to load?" [Edge Cases] - "Is the fallback behavior specified when logo image fails to load?" [Edge Cases]
- "Are loading states defined for asynchronous episode data?" [Completeness] - "Are blocking architecture decisions recorded with explicit rationale and rejected alternatives before task generation?" [Decision Memory]
- "Does the spec define visual hierarchy for competing UI elements?" [Clarity] - "Does the plan make clear which implementation shortcuts are forbidden for this feature?" [Decision Memory, Gap]
**ITEM STRUCTURE**: **ITEM STRUCTURE**:
Each item should follow this pattern: Each item should follow this pattern:
@@ -163,6 +168,11 @@ You **MUST** consider the user input before proceeding (if not empty).
- "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]" - "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]"
- "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]" - "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]"
Decision Memory:
- "Do all repo-shaping technical choices have explicit rationale before tasks are generated? [Decision Memory, Plan]"
- "Are rejected alternatives documented for architectural branches that would materially change implementation scope? [Decision Memory, Gap]"
- "Can a coder determine from the planning artifacts which tempting shortcut is forbidden? [Decision Memory, Clarity]"
**Scenario Classification & Coverage** (Requirements Quality Focus): **Scenario Classification & Coverage** (Requirements Quality Focus):
- Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios - Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios
- For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?" - For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?"
@@ -171,7 +181,7 @@ You **MUST** consider the user input before proceeding (if not empty).
**Traceability Requirements**: **Traceability Requirements**:
- MINIMUM: ≥80% of items MUST include at least one traceability reference - MINIMUM: ≥80% of items MUST include at least one traceability reference
- Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]` - Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]`, `[ADR]`
- If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]" - If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]"
**Surface & Resolve Issues** (Requirements Quality Problems): **Surface & Resolve Issues** (Requirements Quality Problems):
@@ -181,6 +191,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]" - Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]"
- Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]" - Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]"
- Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]" - Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]"
- Decision-memory drift: "Do tasks inherit the same rejected-path guardrails defined in planning? [Decision Memory, Conflict]"
**Content Consolidation**: **Content Consolidation**:
- Soft cap: If raw candidate items > 40, prioritize by risk/impact - Soft cap: If raw candidate items > 40, prioritize by risk/impact
@@ -193,7 +204,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- ❌ "Displays correctly", "works properly", "functions as expected" - ❌ "Displays correctly", "works properly", "functions as expected"
- ❌ "Click", "navigate", "render", "load", "execute" - ❌ "Click", "navigate", "render", "load", "execute"
- ❌ Test cases, test plans, QA procedures - ❌ Test cases, test plans, QA procedures
- ❌ Implementation details (frameworks, APIs, algorithms) - ❌ Implementation details (frameworks, APIs, algorithms) unless the checklist is asking whether those decisions were explicitly documented and bounded by rationale/rejected alternatives
**✅ REQUIRED PATTERNS** - These test requirements quality: **✅ REQUIRED PATTERNS** - These test requirements quality:
- ✅ "Are [requirement type] defined/specified/documented for [scenario]?" - ✅ "Are [requirement type] defined/specified/documented for [scenario]?"
@@ -202,6 +213,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- ✅ "Can [requirement] be objectively measured/verified?" - ✅ "Can [requirement] be objectively measured/verified?"
- ✅ "Are [edge cases/scenarios] addressed in requirements?" - ✅ "Are [edge cases/scenarios] addressed in requirements?"
- ✅ "Does the spec define [missing aspect]?" - ✅ "Does the spec define [missing aspect]?"
- ✅ "Does the plan record why [accepted path] was chosen and why [rejected path] is forbidden?"
6. **Structure Reference**: Generate the checklist following the canonical template in `.specify/templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### <requirement item>` lines with globally incrementing IDs starting at CHK001. 6. **Structure Reference**: Generate the checklist following the canonical template in `.specify/templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### <requirement item>` lines with globally incrementing IDs starting at CHK001.
@@ -210,6 +222,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- Depth level - Depth level
- Actor/timing - Actor/timing
- Any explicit user-specified must-have items incorporated - Any explicit user-specified must-have items incorporated
- Whether ADR / decision-memory checks were included
**Important**: Each `/speckit.checklist` command invocation creates a checklist file using short, descriptive names unless file already exists. This allows: **Important**: Each `/speckit.checklist` command invocation creates a checklist file using short, descriptive names unless file already exists. This allows:
@@ -262,6 +275,15 @@ Sample items:
- "Are security requirements consistent with compliance obligations? [Consistency]" - "Are security requirements consistent with compliance obligations? [Consistency]"
- "Are security failure/breach response requirements defined? [Gap, Exception Flow]" - "Are security failure/breach response requirements defined? [Gap, Exception Flow]"
**Architecture Decision Quality:** `architecture.md`
Sample items:
- "Do all repo-shaping architecture choices have explicit rationale before tasks are generated? [Decision Memory]"
- "Are rejected alternatives documented for each blocking technology branch? [Decision Memory, Gap]"
- "Can an implementer tell which shortcuts are forbidden without re-reading research artifacts? [Clarity, ADR]"
- "Are ADR decisions traceable to requirements or constraints in the spec? [Traceability, ADR]"
## Anti-Examples: What NOT To Do ## Anti-Examples: What NOT To Do
**❌ WRONG - These test implementation, not requirements:** **❌ WRONG - These test implementation, not requirements:**
@@ -282,6 +304,7 @@ Sample items:
- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005] - [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005]
- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap] - [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap]
- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001] - [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001]
- [ ] CHK007 - Do planning artifacts state why the accepted architecture was chosen and which alternative is rejected? [Decision Memory, ADR]
``` ```
**Key Differences:** **Key Differences:**

View File

@@ -56,35 +56,36 @@ You **MUST** consider the user input before proceeding (if not empty).
3. Load and analyze the implementation context: 3. Load and analyze the implementation context:
- **REQUIRED**: Read `.ai/standards/semantics.md` for strict coding standards and contract requirements - **REQUIRED**: Read `.ai/standards/semantics.md` for strict coding standards and contract requirements
- **REQUIRED**: Read tasks.md for the complete task list and execution plan - **REQUIRED**: Read `tasks.md` for the complete task list and execution plan
- **REQUIRED**: Read plan.md for tech stack, architecture, and file structure - **REQUIRED**: Read `plan.md` for tech stack, architecture, and file structure
- **IF EXISTS**: Read data-model.md for entities and relationships - **REQUIRED IF PRESENT**: Read ADR artifacts containing `[DEF:id:ADR]` nodes and build a blocked-path inventory from `@REJECTED`
- **IF EXISTS**: Read contracts/ for API specifications and test requirements - **IF EXISTS**: Read `data-model.md` for entities and relationships
- **IF EXISTS**: Read research.md for technical decisions and constraints - **IF EXISTS**: Read `contracts/` for API specifications and test requirements
- **IF EXISTS**: Read quickstart.md for integration scenarios - **IF EXISTS**: Read `research.md` for technical decisions and constraints
- **IF EXISTS**: Read `quickstart.md` for integration scenarios
4. **Project Setup Verification**: 4. **Project Setup Verification**:
- **REQUIRED**: Create/verify ignore files based on actual project setup: - **REQUIRED**: Create/verify ignore files based on actual project setup:
**Detection & Creation Logic**: **Detection & Creation Logic**:
- Check if the following command succeeds to determine if the repository is a git repo (create/verify .gitignore if so): - Check if the following command succeeds to determine if the repository is a git repo (create/verify `.gitignore` if so):
```sh ```sh
git rev-parse --git-dir 2>/dev/null git rev-parse --git-dir 2>/dev/null
``` ```
- Check if Dockerfile* exists or Docker in plan.md → create/verify .dockerignore - Check if Dockerfile* exists or Docker in `plan.md` → create/verify `.dockerignore`
- Check if .eslintrc* exists → create/verify .eslintignore - Check if `.eslintrc*` exists → create/verify `.eslintignore`
- Check if eslint.config.* exists → ensure the config's `ignores` entries cover required patterns - Check if `eslint.config.*` exists → ensure the config's `ignores` entries cover required patterns
- Check if .prettierrc* exists → create/verify .prettierignore - Check if `.prettierrc*` exists → create/verify `.prettierignore`
- Check if .npmrc or package.json exists → create/verify .npmignore (if publishing) - Check if `.npmrc` or `package.json` exists → create/verify `.npmignore` (if publishing)
- Check if terraform files (*.tf) exist → create/verify .terraformignore - Check if terraform files (`*.tf`) exist → create/verify `.terraformignore`
- Check if .helmignore needed (helm charts present) → create/verify .helmignore - Check if `.helmignore` needed (helm charts present) → create/verify `.helmignore`
**If ignore file already exists**: Verify it contains essential patterns, append missing critical patterns only **If ignore file already exists**: Verify it contains essential patterns, append missing critical patterns only
**If ignore file missing**: Create with full pattern set for detected technology **If ignore file missing**: Create with full pattern set for detected technology
**Common Patterns by Technology** (from plan.md tech stack): **Common Patterns by Technology** (from `plan.md` tech stack):
- **Node.js/JavaScript/TypeScript**: `node_modules/`, `dist/`, `build/`, `*.log`, `.env*` - **Node.js/JavaScript/TypeScript**: `node_modules/`, `dist/`, `build/`, `*.log`, `.env*`
- **Python**: `__pycache__/`, `*.pyc`, `.venv/`, `venv/`, `dist/`, `*.egg-info/` - **Python**: `__pycache__/`, `*.pyc`, `.venv/`, `venv/`, `dist/`, `*.egg-info/`
- **Java**: `target/`, `*.class`, `*.jar`, `.gradle/`, `build/` - **Java**: `target/`, `*.class`, `*.jar`, `.gradle/`, `build/`
@@ -107,11 +108,12 @@ You **MUST** consider the user input before proceeding (if not empty).
- **Terraform**: `.terraform/`, `*.tfstate*`, `*.tfvars`, `.terraform.lock.hcl` - **Terraform**: `.terraform/`, `*.tfstate*`, `*.tfvars`, `.terraform.lock.hcl`
- **Kubernetes/k8s**: `*.secret.yaml`, `secrets/`, `.kube/`, `kubeconfig*`, `*.key`, `*.crt` - **Kubernetes/k8s**: `*.secret.yaml`, `secrets/`, `.kube/`, `kubeconfig*`, `*.key`, `*.crt`
5. Parse tasks.md structure and extract: 5. Parse `tasks.md` structure and extract:
- **Task phases**: Setup, Tests, Core, Integration, Polish - **Task phases**: Setup, Tests, Core, Integration, Polish
- **Task dependencies**: Sequential vs parallel execution rules - **Task dependencies**: Sequential vs parallel execution rules
- **Task details**: ID, description, file paths, parallel markers [P] - **Task details**: ID, description, file paths, parallel markers [P]
- **Execution flow**: Order and dependency requirements - **Execution flow**: Order and dependency requirements
- **Decision-memory requirements**: which tasks inherit ADR ids, `@RATIONALE`, and `@REJECTED` guardrails
6. Execute implementation following the task plan: 6. Execute implementation following the task plan:
- **Phase-by-phase execution**: Complete each phase before moving to the next - **Phase-by-phase execution**: Complete each phase before moving to the next
@@ -119,6 +121,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- **Follow TDD approach**: Execute test tasks before their corresponding implementation tasks - **Follow TDD approach**: Execute test tasks before their corresponding implementation tasks
- **File-based coordination**: Tasks affecting the same files must run sequentially - **File-based coordination**: Tasks affecting the same files must run sequentially
- **Validation checkpoints**: Verify each phase completion before proceeding - **Validation checkpoints**: Verify each phase completion before proceeding
- **ADR guardrail discipline**: if a task packet or local contract forbids a path via `@REJECTED`, do not treat it as an implementation option
7. Implementation execution rules: 7. Implementation execution rules:
- **Strict Adherence**: Apply `.ai/standards/semantics.md` rules: - **Strict Adherence**: Apply `.ai/standards/semantics.md` rules:
@@ -134,8 +137,10 @@ You **MUST** consider the user input before proceeding (if not empty).
- For Python Complexity 5 modules, `belief_scope(...)` is mandatory and the critical path must be irrigated with `logger.reason()` / `logger.reflect()` according to the contract. - For Python Complexity 5 modules, `belief_scope(...)` is mandatory and the critical path must be irrigated with `logger.reason()` / `logger.reflect()` according to the contract.
- For Svelte components, require `@UX_STATE`, `@UX_FEEDBACK`, `@UX_RECOVERY`, and `@UX_REACTIVITY`; runes-only reactivity is allowed (`$state`, `$derived`, `$effect`, `$props`). - For Svelte components, require `@UX_STATE`, `@UX_FEEDBACK`, `@UX_RECOVERY`, and `@UX_REACTIVITY`; runes-only reactivity is allowed (`$state`, `$derived`, `$effect`, `$props`).
- Reject pseudo-semantic markup: docstrings containing loose `@PURPOSE` / `@PRE` text do **NOT** satisfy the protocol unless represented in canonical anchored metadata blocks. - Reject pseudo-semantic markup: docstrings containing loose `@PURPOSE` / `@PRE` text do **NOT** satisfy the protocol unless represented in canonical anchored metadata blocks.
- Preserve and propagate decision-memory tags. Upstream `@RATIONALE` / `@REJECTED` are mandatory when carried by the task packet or contract.
- If `logger.explore()` or equivalent runtime evidence leads to a retained workaround, mutate the same contract header with reactive Micro-ADR tags: `@RATIONALE` and `@REJECTED`.
- **Self-Audit**: The Coder MUST use `axiom-core` tools (like `audit_contracts_tool`) to verify semantic compliance before completion. - **Self-Audit**: The Coder MUST use `axiom-core` tools (like `audit_contracts_tool`) to verify semantic compliance before completion.
- **Semantic Rejection Gate**: If self-audit reveals broken anchors, missing closing tags, missing required metadata for the effective complexity, orphaned critical classes/functions, or Complexity 4/5 Python code without required belief-state logging, the task is NOT complete and cannot be handed off as accepted work. - **Semantic Rejection Gate**: If self-audit reveals broken anchors, missing closing tags, missing required metadata for the effective complexity, orphaned critical classes/functions, Complexity 4/5 Python code without required belief-state logging, or retained workarounds without decision-memory tags, the task is NOT complete and cannot be handed off as accepted work.
- **CRITICAL Contracts**: If a task description contains a contract summary (e.g., `CRITICAL: PRE: ..., POST: ...`), these constraints are **MANDATORY** and must be strictly implemented in the code using guards/assertions (if applicable per protocol). - **CRITICAL Contracts**: If a task description contains a contract summary (e.g., `CRITICAL: PRE: ..., POST: ...`), these constraints are **MANDATORY** and must be strictly implemented in the code using guards/assertions (if applicable per protocol).
- **Setup first**: Initialize project structure, dependencies, configuration - **Setup first**: Initialize project structure, dependencies, configuration
- **Tests before code**: If you need to write tests for contracts, entities, and integration scenarios - **Tests before code**: If you need to write tests for contracts, entities, and integration scenarios
@@ -150,11 +155,13 @@ You **MUST** consider the user input before proceeding (if not empty).
- Provide clear error messages with context for debugging. - Provide clear error messages with context for debugging.
- Suggest next steps if implementation cannot proceed. - Suggest next steps if implementation cannot proceed.
- **IMPORTANT** For completed tasks, mark as [X] only AFTER local verification and self-audit. - **IMPORTANT** For completed tasks, mark as [X] only AFTER local verification and self-audit.
- If blocked because the only apparent fix is listed in upstream `@REJECTED`, escalate for decision revision instead of silently overriding the guardrail.
9. **Handoff to Tester (Audit Loop)**: 9. **Handoff to Tester (Audit Loop)**:
- Once a task or phase is complete, the Coder hands off to the Tester. - Once a task or phase is complete, the Coder hands off to the Tester.
- Handoff includes: file paths, declared complexity, expected contracts (`@PRE`, `@POST`, `@SIDE_EFFECT`, `@DATA_CONTRACT`, `@INVARIANT` when applicable), and a short logic overview. - Handoff includes: file paths, declared complexity, expected contracts (`@PRE`, `@POST`, `@SIDE_EFFECT`, `@DATA_CONTRACT`, `@INVARIANT` when applicable), and a short logic overview.
- Handoff MUST explicitly disclose any contract exceptions or known semantic debt. Hidden semantic debt is forbidden. - Handoff MUST explicitly disclose any contract exceptions or known semantic debt. Hidden semantic debt is forbidden.
- Handoff MUST disclose decision-memory changes: inherited ADR ids, new or updated `@RATIONALE`, new or updated `@REJECTED`, and any blocked paths that remain active.
- The handoff payload MUST instruct the Tester to execute the dedicated testing workflow [`.kilocode/workflows/speckit.test.md`](.kilocode/workflows/speckit.test.md), not just perform an informal review. - The handoff payload MUST instruct the Tester to execute the dedicated testing workflow [`.kilocode/workflows/speckit.test.md`](.kilocode/workflows/speckit.test.md), not just perform an informal review.
10. **Tester Verification & Orchestrator Gate**: 10. **Tester Verification & Orchestrator Gate**:
@@ -164,11 +171,12 @@ You **MUST** consider the user input before proceeding (if not empty).
- Reject code that only imitates the protocol superficially, such as free-form docstrings with `@PURPOSE` text but without canonical `[DEF]...[/DEF]` anchors and header metadata. - Reject code that only imitates the protocol superficially, such as free-form docstrings with `@PURPOSE` text but without canonical `[DEF]...[/DEF]` anchors and header metadata.
- Verify that effective complexity and required metadata match [`.ai/standards/semantics.md`](.ai/standards/semantics.md). - Verify that effective complexity and required metadata match [`.ai/standards/semantics.md`](.ai/standards/semantics.md).
- Verify that Python Complexity 4/5 implementations include required belief-state instrumentation (`belief_scope`, `logger.reason()`, `logger.reflect()`). - Verify that Python Complexity 4/5 implementations include required belief-state instrumentation (`belief_scope`, `logger.reason()`, `logger.reflect()`).
- Verify that upstream rejected paths were not silently restored.
- Emulate algorithms "in mind" step-by-step to ensure logic consistency. - Emulate algorithms "in mind" step-by-step to ensure logic consistency.
- Verify unit tests match the declared contracts. - Verify unit tests match the declared contracts.
- If Tester finds issues: - If Tester finds issues:
- Emit `[AUDIT_FAIL: semantic_noncompliance | contract_mismatch | logic_mismatch | test_mismatch | speckit_test_not_run]`. - Emit `[AUDIT_FAIL: semantic_noncompliance | contract_mismatch | logic_mismatch | test_mismatch | speckit_test_not_run | rejected_path_regression]`.
- Provide concrete file-path-based reasons, for example: missing anchors, module/class contract mismatch, missing `@DATA_CONTRACT`, missing `logger.reason()`, illegal docstring-only annotations, or missing execution of [`.kilocode/workflows/speckit.test.md`](.kilocode/workflows/speckit.test.md). - Provide concrete file-path-based reasons, for example: missing anchors, module/class contract mismatch, missing `@DATA_CONTRACT`, missing `logger.reason()`, illegal docstring-only annotations, missing decision-memory tags, re-enabled upstream rejected path, or missing execution of [`.kilocode/workflows/speckit.test.md`](.kilocode/workflows/speckit.test.md).
- Notify the Orchestrator. - Notify the Orchestrator.
- Orchestrator redirects the feedback to the Coder for remediation. - Orchestrator redirects the feedback to the Coder for remediation.
- Orchestrator green-status rule: - Orchestrator green-status rule:
@@ -187,7 +195,9 @@ You **MUST** consider the user input before proceeding (if not empty).
- class/function-level docstring contracts standing in for canonical anchors, - class/function-level docstring contracts standing in for canonical anchors,
- missing closing anchors, - missing closing anchors,
- missing required metadata for declared complexity, - missing required metadata for declared complexity,
- Complexity 5 repository/service code using only `belief_scope(...)` without explicit `logger.reason()` / `logger.reflect()` checkpoints. - Complexity 5 repository/service code using only `belief_scope(...)` without explicit `logger.reason()` / `logger.reflect()` checkpoints,
- retained workarounds missing local `@RATIONALE` / `@REJECTED`,
- silent resurrection of paths already blocked by upstream ADR or task guardrails.
- Report final status with summary of completed and audited work. - Report final status with summary of completed and audited work.
Note: This command assumes a complete task breakdown exists in tasks.md. If tasks are incomplete or missing, suggest running `/speckit.tasks` first to regenerate the task list. Note: This command assumes a complete task breakdown exists in `tasks.md`. If tasks are incomplete or missing, suggest running `/speckit.tasks` first to regenerate the task list.

View File

@@ -28,12 +28,13 @@ You **MUST** consider the user input before proceeding (if not empty).
- Fill Technical Context (mark unknowns as "NEEDS CLARIFICATION") - Fill Technical Context (mark unknowns as "NEEDS CLARIFICATION")
- Fill Constitution Check section from constitution - Fill Constitution Check section from constitution
- Evaluate gates (ERROR if violations unjustified) - Evaluate gates (ERROR if violations unjustified)
- Phase 0: Generate research.md (resolve all NEEDS CLARIFICATION) - Phase 0: Generate `research.md` (resolve all NEEDS CLARIFICATION)
- Phase 1: Generate data-model.md, contracts/, quickstart.md - Phase 1: Generate `data-model.md`, `contracts/`, `quickstart.md`
- Phase 1: Generate global ADR artifacts and connect them to the plan
- Phase 1: Update agent context by running the agent script - Phase 1: Update agent context by running the agent script
- Re-evaluate Constitution Check post-design - Re-evaluate Constitution Check post-design
4. **Stop and report**: Command ends after Phase 2 planning. Report branch, IMPL_PLAN path, and generated artifacts. 4. **Stop and report**: Command ends after Phase 2 planning. Report branch, IMPL_PLAN path, generated artifacts, and ADR decisions created.
## Phases ## Phases
@@ -58,9 +59,9 @@ You **MUST** consider the user input before proceeding (if not empty).
- Rationale: [why chosen] - Rationale: [why chosen]
- Alternatives considered: [what else evaluated] - Alternatives considered: [what else evaluated]
**Output**: research.md with all NEEDS CLARIFICATION resolved **Output**: `research.md` with all NEEDS CLARIFICATION resolved
### Phase 1: Design & Contracts ### Phase 1: Design, ADRs & Contracts
**Prerequisites:** `research.md` complete **Prerequisites:** `research.md` complete
@@ -72,7 +73,23 @@ You **MUST** consider the user input before proceeding (if not empty).
1. **Extract entities from feature spec** → `data-model.md`: 1. **Extract entities from feature spec** → `data-model.md`:
- Entity name, fields, relationships, validation rules. - Entity name, fields, relationships, validation rules.
2. **Design & Verify Contracts (Semantic Protocol)**: 2. **Generate Global ADRs (Decision Memory Root Layer)**:
- Read `spec.md`, `research.md`, and the technical context to identify repo-shaping decisions: storage, auth pattern, framework boundaries, integration patterns, deployment assumptions, failure strategy.
- For each durable architectural choice, emit a standalone semantic ADR block using `[DEF:DecisionId:ADR]`.
- Every ADR block MUST include:
- `@COMPLEXITY: 3` or `4` depending on blast radius
- `@PURPOSE`
- `@RATIONALE`
- `@REJECTED`
- `@RELATION` back to the originating spec/research/plan boundary or target module family
- Preferred destinations:
- `docs/architecture.md` for cross-cutting repository decisions
- feature-local design docs when the decision is feature-scoped
- root module headers only when the decision scope is truly local
- **Hard Gate**: do not continue to task decomposition until the blocking global decisions have been materialized as ADR nodes.
- **Anti-Regression Goal**: a later orchestrator must be able to read these ADRs and avoid creating tasks for rejected branches.
3. **Design & Verify Contracts (Semantic Protocol)**:
- **Drafting**: Define semantic headers, metadata, and closing anchors for all new modules strictly from `.ai/standards/semantics.md`. - **Drafting**: Define semantic headers, metadata, and closing anchors for all new modules strictly from `.ai/standards/semantics.md`.
- **Complexity Classification**: Classify each contract with `@COMPLEXITY: [1|2|3|4|5]` or `@C:`. Treat `@TIER` only as a legacy compatibility hint and never as the primary rule source. - **Complexity Classification**: Classify each contract with `@COMPLEXITY: [1|2|3|4|5]` or `@C:`. Treat `@TIER` only as a legacy compatibility hint and never as the primary rule source.
- **Adaptive Contract Requirements**: - **Adaptive Contract Requirements**:
@@ -81,34 +98,42 @@ You **MUST** consider the user input before proceeding (if not empty).
- **Complexity 3**: require `@PURPOSE` and `@RELATION`; UI also requires `@UX_STATE`. - **Complexity 3**: require `@PURPOSE` and `@RELATION`; UI also requires `@UX_STATE`.
- **Complexity 4**: require `@PURPOSE`, `@RELATION`, `@PRE`, `@POST`, `@SIDE_EFFECT`; Python modules must define a meaningful `logger.reason()` / `logger.reflect()` path or equivalent belief-state mechanism. - **Complexity 4**: require `@PURPOSE`, `@RELATION`, `@PRE`, `@POST`, `@SIDE_EFFECT`; Python modules must define a meaningful `logger.reason()` / `logger.reflect()` path or equivalent belief-state mechanism.
- **Complexity 5**: require full level-4 contract plus `@DATA_CONTRACT` and `@INVARIANT`; Python modules must require `belief_scope`; UI modules must define UX contracts including `@UX_STATE`, `@UX_FEEDBACK`, `@UX_RECOVERY`, and `@UX_REACTIVITY`. - **Complexity 5**: require full level-4 contract plus `@DATA_CONTRACT` and `@INVARIANT`; Python modules must require `belief_scope`; UI modules must define UX contracts including `@UX_STATE`, `@UX_FEEDBACK`, `@UX_RECOVERY`, and `@UX_REACTIVITY`.
- **Decision-Memory Propagation**:
- If a module/function/component realizes or is constrained by an ADR, add local `@RATIONALE` and `@REJECTED` guardrails before coding begins.
- Use `@RELATION: IMPLEMENTS ->[AdrId]` when the contract realizes the ADR.
- Use `@RELATION: DEPENDS_ON ->[AdrId]` when the contract is merely constrained by the ADR.
- Record known LLM traps directly in the contract header so the implementer inherits the guardrail from the start.
- **Relation Syntax**: Write dependency edges in canonical GraphRAG form: `@RELATION: [PREDICATE] ->[TARGET_ID]`. - **Relation Syntax**: Write dependency edges in canonical GraphRAG form: `@RELATION: [PREDICATE] ->[TARGET_ID]`.
- **Context Guard**: If a target relation, DTO, or required dependency cannot be named confidently, stop generation and emit `[NEED_CONTEXT: target]` instead of inventing placeholders. - **Context Guard**: If a target relation, DTO, required dependency, or decision rationale cannot be named confidently, stop generation and emit `[NEED_CONTEXT: target]` instead of inventing placeholders.
- **Testing Contracts**: Add `@TEST_CONTRACT`, `@TEST_SCENARIO`, `@TEST_FIXTURE`, `@TEST_EDGE`, and `@TEST_INVARIANT` when the design introduces audit-critical or explicitly test-governed contracts, especially for Complexity 5 boundaries. - **Testing Contracts**: Add `@TEST_CONTRACT`, `@TEST_SCENARIO`, `@TEST_FIXTURE`, `@TEST_EDGE`, and `@TEST_INVARIANT` when the design introduces audit-critical or explicitly test-governed contracts, especially for Complexity 5 boundaries.
- **Self-Review**: - **Self-Review**:
- *Complexity Fit*: Does each contract include exactly the metadata and contract density required by its complexity level? - *Complexity Fit*: Does each contract include exactly the metadata and contract density required by its complexity level?
- *Completeness*: Do `@PRE`/`@POST`, `@SIDE_EFFECT`, `@DATA_CONTRACT`, and UX tags cover the edge cases identified in Research and UX Reference? - *Completeness*: Do `@PRE`/`@POST`, `@SIDE_EFFECT`, `@DATA_CONTRACT`, UX tags, and decision-memory tags cover the edge cases identified in Research and UX Reference?
- *Connectivity*: Do `@RELATION` tags form a coherent graph using canonical `@RELATION: [PREDICATE] ->[TARGET_ID]` syntax? - *Connectivity*: Do `@RELATION` tags form a coherent graph using canonical `@RELATION: [PREDICATE] ->[TARGET_ID]` syntax?
- *Compliance*: Are all anchors properly opened and closed, and does the chosen comment syntax match the target medium? - *Compliance*: Are all anchors properly opened and closed, and does the chosen comment syntax match the target medium?
- *Belief-State Requirements*: Do Complexity 4/5 Python modules explicitly account for `logger.reason()`, `logger.reflect()`, and `belief_scope` requirements? - *Belief-State Requirements*: Do Complexity 4/5 Python modules explicitly account for `logger.reason()`, `logger.reflect()`, and `belief_scope` requirements?
- *ADR Continuity*: Does every blocking architectural decision have a corresponding ADR node and at least one downstream guarded contract?
- **Output**: Write verified contracts to `contracts/modules.md`. - **Output**: Write verified contracts to `contracts/modules.md`.
3. **Simulate Contract Usage**: 4. **Simulate Contract Usage**:
- Trace one key user scenario through the defined contracts to ensure data flow continuity. - Trace one key user scenario through the defined contracts to ensure data flow continuity.
- If a contract interface mismatch is found, fix it immediately. - If a contract interface mismatch is found, fix it immediately.
- Verify that no traced path accidentally realizes an alternative already named in any ADR `@REJECTED` tag.
4. **Generate API contracts**: 5. **Generate API contracts**:
- Output OpenAPI/GraphQL schema to `/contracts/` for backend-frontend sync. - Output OpenAPI/GraphQL schema to `/contracts/` for backend-frontend sync.
5. **Agent context update**: 6. **Agent context update**:
- Run `.specify/scripts/bash/update-agent-context.sh kilocode` - Run `.specify/scripts/bash/update-agent-context.sh kilocode`
- These scripts detect which AI agent is in use - These scripts detect which AI agent is in use
- Update the appropriate agent-specific context file - Update the appropriate agent-specific context file
- Add only new technology from current plan - Add only new technology from current plan
- Preserve manual additions between markers - Preserve manual additions between markers
**Output**: data-model.md, /contracts/*, quickstart.md, agent-specific file **Output**: `data-model.md`, `/contracts/*`, `quickstart.md`, ADR artifact(s), agent-specific file
## Key rules ## Key rules
- Use absolute paths - Use absolute paths
- ERROR on gate failures or unresolved clarifications - ERROR on gate failures or unresolved clarifications
- Do not hand off to [`speckit.tasks`](.kilocode/workflows/speckit.tasks.md) until blocking ADRs exist and rejected branches are explicit

View File

@@ -12,7 +12,7 @@ You **MUST** consider the user input before proceeding (if not empty).
## Goal ## Goal
Ensure the codebase adheres to the semantic standards defined in `.ai/standards/semantics.md` by using the AXIOM MCP semantic graph as the primary execution engine. This involves reindexing the workspace, measuring semantic health, auditing contract compliance, and optionally delegating contract-safe fixes through MCP-aware agents. Ensure the codebase adheres to the semantic standards defined in `.ai/standards/semantics.md` by using the AXIOM MCP semantic graph as the primary execution engine. This involves reindexing the workspace, measuring semantic health, auditing contract compliance, auditing decision-memory continuity, and optionally delegating contract-safe fixes through MCP-aware agents.
## Operating Constraints ## Operating Constraints
@@ -25,6 +25,7 @@ Ensure the codebase adheres to the semantic standards defined in `.ai/standards/
7. **ID NAMING (CRITICAL)**: NEVER use fully-qualified Python import paths in `[DEF:id:Type]`. Use short, domain-driven semantic IDs (e.g., `[DEF:AuthService:Class]`). Follow the exact style shown in `.ai/standards/semantics.md`. 7. **ID NAMING (CRITICAL)**: NEVER use fully-qualified Python import paths in `[DEF:id:Type]`. Use short, domain-driven semantic IDs (e.g., `[DEF:AuthService:Class]`). Follow the exact style shown in `.ai/standards/semantics.md`.
8. **ORPHAN PREVENTION**: To reduce the orphan count, you MUST physically wrap actual class and function definitions with `[DEF:id:Type] ... [/DEF]` blocks in the code. Modifying `@RELATION` tags does NOT fix orphans. The AST parser flags any unwrapped function as an orphan. 8. **ORPHAN PREVENTION**: To reduce the orphan count, you MUST physically wrap actual class and function definitions with `[DEF:id:Type] ... [/DEF]` blocks in the code. Modifying `@RELATION` tags does NOT fix orphans. The AST parser flags any unwrapped function as an orphan.
- **Exception for Tests**: In test modules, use `BINDS_TO` to link major helpers to the module root. Small helpers remain C1 and don't need relations. - **Exception for Tests**: In test modules, use `BINDS_TO` to link major helpers to the module root. Small helpers remain C1 and don't need relations.
9. **DECISION-MEMORY CONTINUITY**: Audit ADR nodes, preventive task guardrails, and reactive Micro-ADR tags as one anti-regression chain. Missing or contradictory `@RATIONALE` / `@REJECTED` is a first-class semantic defect.
## Execution Steps ## Execution Steps
@@ -48,8 +49,13 @@ Treat high orphan counts and unresolved relations as first-class health indicato
Use [`audit_contracts_tool`](.kilo/mcp.json) and classify findings into: Use [`audit_contracts_tool`](.kilo/mcp.json) and classify findings into:
- **Critical Parsing/Structure Errors**: malformed or incoherent semantic contract regions - **Critical Parsing/Structure Errors**: malformed or incoherent semantic contract regions
- **Critical Contract Gaps**: missing [`@DATA_CONTRACT`](.ai/standards/semantics.md), [`@PRE`](.ai/standards/semantics.md), [`@POST`](.ai/standards/semantics.md), [`@SIDE_EFFECT`](.ai/standards/semantics.md) on CRITICAL contracts - **Critical Contract Gaps**: missing [`@DATA_CONTRACT`](.ai/standards/semantics.md), [`@PRE`](.ai/standards/semantics.md), [`@POST`](.ai/standards/semantics.md), [`@SIDE_EFFECT`](.ai/standards/semantics.md) on CRITICAL contracts
- **Decision-Memory Gaps**:
- missing standalone `[DEF:id:ADR]` for repo-shaping decisions
- missing `@RATIONALE` / `@REJECTED` where task or implementation context clearly requires guardrails
- retained workaround code without local reactive Micro-ADR tags
- implementation that silently re-enables a path declared in upstream `@REJECTED`
- **Coverage Gaps**: missing [`@TIER`](.ai/standards/semantics.md), missing [`@PURPOSE`](.ai/standards/semantics.md) - **Coverage Gaps**: missing [`@TIER`](.ai/standards/semantics.md), missing [`@PURPOSE`](.ai/standards/semantics.md)
- **Graph Breakages**: unresolved relations, broken references, isolated critical contracts - **Graph Breakages**: unresolved relations, broken references, isolated critical contracts, ADR nodes without downstream guarded contracts
### 4. Build Remediation Context ### 4. Build Remediation Context
@@ -58,12 +64,14 @@ For the top failing contracts, use MCP semantic context tools such as [`get_sema
2. Upstream/downstream semantic impact 2. Upstream/downstream semantic impact
3. Related tests and fixtures 3. Related tests and fixtures
4. Whether relation recovery is needed 4. Whether relation recovery is needed
5. Whether decision-memory continuity is broken between ADR, task contract, and implementation
### 5. Execute Fixes (Optional/Handoff) ### 5. Execute Fixes (Optional/Handoff)
If $ARGUMENTS contains `fix` or `apply`: If $ARGUMENTS contains `fix` or `apply`:
- Handoff to the [`semantic`](.kilocodemodes) mode or a dedicated implementation agent instead of applying naive textual edits in orchestration. - Handoff to the [`semantic`](.kilocodemodes) mode or a dedicated implementation agent instead of applying naive textual edits in orchestration.
- Require the fixing agent to prefer MCP contract mutation tools such as [`simulate_patch_tool`](.kilo/mcp.json), [`guarded_patch_contract_tool`](.kilo/mcp.json), [`patch_contract_tool`](.kilo/mcp.json), and [`infer_missing_relations_tool`](.kilo/mcp.json). - Require the fixing agent to prefer MCP contract mutation tools such as [`simulate_patch_tool`](.kilo/mcp.json), [`guarded_patch_contract_tool`](.kilo/mcp.json), [`patch_contract_tool`](.kilo/mcp.json), and [`infer_missing_relations_tool`](.kilo/mcp.json).
- Require the fixing agent to preserve or restore `@RATIONALE` / `@REJECTED` continuity whenever blocked-path knowledge exists.
- After changes, re-run reindex, health, and audit MCP steps to verify the delta. - After changes, re-run reindex, health, and audit MCP steps to verify the delta.
### 6. Review Gate ### 6. Review Gate
@@ -74,8 +82,9 @@ Before completion, request or perform an MCP-based review path aligned with the
Provide a summary of the semantic state: Provide a summary of the semantic state:
- **Health Metrics**: contracts / relations / orphans / unresolved_relations / files - **Health Metrics**: contracts / relations / orphans / unresolved_relations / files
- **Status**: [PASS/FAIL] (FAIL if CRITICAL gaps or semantically significant unresolved relations exist) - **Status**: [PASS/FAIL] (FAIL if CRITICAL gaps, rejected-path regressions, or semantically significant unresolved relations exist)
- **Top Issues**: List top 3-5 contracts or files needing attention. - **Top Issues**: List top 3-5 contracts or files needing attention.
- **Decision Memory**: summarize missing ADRs, missing guardrails, and rejected-path regression risks.
- **Action Taken**: Summary of MCP analysis performed, context gathered, and fixes or handoffs initiated. - **Action Taken**: Summary of MCP analysis performed, context gathered, and fixes or handoffs initiated.
## Context ## Context

View File

@@ -24,26 +24,29 @@ You **MUST** consider the user input before proceeding (if not empty).
1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). 1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
2. **Load design documents**: Read from FEATURE_DIR: 2. **Load design documents**: Read from FEATURE_DIR:
- **Required**: plan.md (tech stack, libraries, structure), spec.md (user stories with priorities), ux_reference.md (experience source of truth) - **Required**: `plan.md` (tech stack, libraries, structure), `spec.md` (user stories with priorities), `ux_reference.md` (experience source of truth)
- **Optional**: data-model.md (entities), contracts/ (API endpoints), research.md (decisions), quickstart.md (test scenarios) - **Optional**: `data-model.md` (entities), `contracts/` (API endpoints), `research.md` (decisions), `quickstart.md` (test scenarios)
- **Required when present in plan output**: ADR artifacts such as `docs/architecture.md` or feature-local architecture decision files containing `[DEF:id:ADR]` nodes
- Note: Not all projects have all documents. Generate tasks based on what's available. - Note: Not all projects have all documents. Generate tasks based on what's available.
3. **Execute task generation workflow**: 3. **Execute task generation workflow**:
- Load plan.md and extract tech stack, libraries, project structure - Load `plan.md` and extract tech stack, libraries, project structure
- Load spec.md and extract user stories with their priorities (P1, P2, P3, etc.) - Load `spec.md` and extract user stories with their priorities (P1, P2, P3, etc.)
- If data-model.md exists: Extract entities and map to user stories - Load ADR nodes and build a decision-memory inventory: `DecisionId`, `@RATIONALE`, `@REJECTED`, dependent modules
- If contracts/ exists: Map endpoints to user stories - If `data-model.md` exists: Extract entities and map to user stories
- If research.md exists: Extract decisions for setup tasks - If `contracts/` exists: Map endpoints to user stories
- If `research.md` exists: Extract decisions for setup tasks
- Generate tasks organized by user story (see Task Generation Rules below) - Generate tasks organized by user story (see Task Generation Rules below)
- Generate dependency graph showing user story completion order - Generate dependency graph showing user story completion order
- Create parallel execution examples per user story - Create parallel execution examples per user story
- Validate task completeness (each user story has all needed tasks, independently testable) - Validate task completeness (each user story has all needed tasks, independently testable)
- Validate guardrail continuity: no task may realize an ADR path named in `@REJECTED`
4. **Generate tasks.md**: Use `.specify/templates/tasks-template.md` as structure, fill with: 4. **Generate `tasks.md`**: Use `.specify/templates/tasks-template.md` as structure, fill with:
- Correct feature name from plan.md - Correct feature name from `plan.md`
- Phase 1: Setup tasks (project initialization) - Phase 1: Setup tasks (project initialization)
- Phase 2: Foundational tasks (blocking prerequisites for all user stories) - Phase 2: Foundational tasks (blocking prerequisites for all user stories)
- Phase 3+: One phase per user story (in priority order from spec.md) - Phase 3+: One phase per user story (in priority order from `spec.md`)
- Each phase includes: story goal, independent test criteria, tests (if requested), implementation tasks - Each phase includes: story goal, independent test criteria, tests (if requested), implementation tasks
- Final Phase: Polish & cross-cutting concerns - Final Phase: Polish & cross-cutting concerns
- All tasks must follow the strict checklist format (see Task Generation Rules below) - All tasks must follow the strict checklist format (see Task Generation Rules below)
@@ -51,18 +54,20 @@ You **MUST** consider the user input before proceeding (if not empty).
- Dependencies section showing story completion order - Dependencies section showing story completion order
- Parallel execution examples per story - Parallel execution examples per story
- Implementation strategy section (MVP first, incremental delivery) - Implementation strategy section (MVP first, incremental delivery)
- Decision-memory notes for guarded tasks when ADRs or known traps apply
5. **Report**: Output path to generated tasks.md and summary: 5. **Report**: Output path to generated `tasks.md` and summary:
- Total task count - Total task count
- Task count per user story - Task count per user story
- Parallel opportunities identified - Parallel opportunities identified
- Independent test criteria for each story - Independent test criteria for each story
- Suggested MVP scope (typically just User Story 1) - Suggested MVP scope (typically just User Story 1)
- Format validation: Confirm ALL tasks follow the checklist format (checkbox, ID, labels, file paths) - Format validation: Confirm ALL tasks follow the checklist format (checkbox, ID, labels, file paths)
- ADR propagation summary: which ADRs were inherited into task guardrails and which paths were rejected
Context for task generation: $ARGUMENTS Context for task generation: $ARGUMENTS
The tasks.md should be immediately executable - each task must be specific enough that an LLM can complete it without additional context. The `tasks.md` should be immediately executable - each task must be specific enough that an LLM can complete it without additional context.
## Task Generation Rules ## Task Generation Rules
@@ -72,10 +77,11 @@ The tasks.md should be immediately executable - each task must be specific enoug
### UX & Semantic Preservation (CRITICAL) ### UX & Semantic Preservation (CRITICAL)
- **Source of Truth**: `ux_reference.md` for UX, `.ai/standards/semantics.md` for Code. - **Source of Truth**: `ux_reference.md` for UX, `.ai/standards/semantics.md` for code, and ADR artifacts for upstream technology decisions.
- **Violation Warning**: If any task violates UX or GRACE standards, flag it immediately. - **Violation Warning**: If any task violates UX, ADR guardrails, or GRACE standards, flag it immediately.
- **Verification Task (UX)**: Add a task at the end of each Story phase: `- [ ] Txxx [USx] Verify implementation matches ux_reference.md (Happy Path & Errors)` - **Verification Task (UX)**: Add a task at the end of each Story phase: `- [ ] Txxx [USx] Verify implementation matches ux_reference.md (Happy Path & Errors)`
- **Verification Task (Audit)**: Add a mandatory audit task at the end of each Story phase: `- [ ] Txxx [USx] Acceptance: Perform semantic audit & algorithm emulation by Tester` - **Verification Task (Audit)**: Add a mandatory audit task at the end of each Story phase: `- [ ] Txxx [USx] Acceptance: Perform semantic audit & algorithm emulation by Tester`
- **Guardrail Rule**: If an ADR or contract says `@REJECTED`, task text must not schedule that path as implementation work.
### Checklist Format (REQUIRED) ### Checklist Format (REQUIRED)
@@ -91,7 +97,7 @@ Every task MUST strictly follow this format:
2. **Task ID**: Sequential number (T001, T002, T003...) in execution order 2. **Task ID**: Sequential number (T001, T002, T003...) in execution order
3. **[P] marker**: Include ONLY if task is parallelizable (different files, no dependencies on incomplete tasks) 3. **[P] marker**: Include ONLY if task is parallelizable (different files, no dependencies on incomplete tasks)
4. **[Story] label**: REQUIRED for user story phase tasks only 4. **[Story] label**: REQUIRED for user story phase tasks only
- Format: [US1], [US2], [US3], etc. (maps to user stories from spec.md) - Format: [US1], [US2], [US3], etc. (maps to user stories from `spec.md`)
- Setup phase: NO story label - Setup phase: NO story label
- Foundational phase: NO story label - Foundational phase: NO story label
- User Story phases: MUST have story label - User Story phases: MUST have story label
@@ -111,7 +117,7 @@ Every task MUST strictly follow this format:
### Task Organization ### Task Organization
1. **From User Stories (spec.md)** - PRIMARY ORGANIZATION: 1. **From User Stories (`spec.md`)** - PRIMARY ORGANIZATION:
- Each user story (P1, P2, P3...) gets its own phase - Each user story (P1, P2, P3...) gets its own phase
- Map all related components to their story: - Map all related components to their story:
- Models needed for that story - Models needed for that story
@@ -127,12 +133,18 @@ Every task MUST strictly follow this format:
- Map each contract/endpoint → to the user story it serves - Map each contract/endpoint → to the user story it serves
- If tests requested: Each contract → contract test task [P] before implementation in that story's phase - If tests requested: Each contract → contract test task [P] before implementation in that story's phase
3. **From Data Model**: 3. **From ADRs and Decision Memory**:
- For each implementation task constrained by an ADR, append a concise guardrail summary drawn from `@RATIONALE` and `@REJECTED`.
- Example: `- [ ] T021 [US1] Implement payload parsing guardrails in src/api/input.py (RATIONALE: strict validation because frontend sends numeric strings; REJECTED: json.loads() without schema validation)`
- If a task would naturally branch into an ADR-rejected alternative, rewrite the task around the accepted path instead of leaving the choice ambiguous.
- If no safe executable path remains because ADR context is incomplete, stop and emit `[NEED_CONTEXT: target]`.
4. **From Data Model**:
- Map each entity to the user story(ies) that need it - Map each entity to the user story(ies) that need it
- If entity serves multiple stories: Put in earliest story or Setup phase - If entity serves multiple stories: Put in earliest story or Setup phase
- Relationships → service layer tasks in appropriate story phase - Relationships → service layer tasks in appropriate story phase
4. **From Setup/Infrastructure**: 5. **From Setup/Infrastructure**:
- Shared infrastructure → Setup phase (Phase 1) - Shared infrastructure → Setup phase (Phase 1)
- Foundational/blocking tasks → Foundational phase (Phase 2) - Foundational/blocking tasks → Foundational phase (Phase 2)
- Story-specific setup → within that story's phase - Story-specific setup → within that story's phase
@@ -145,3 +157,11 @@ Every task MUST strictly follow this format:
- Within each story: Tests (if requested) → Models → Services → Endpoints → Integration - Within each story: Tests (if requested) → Models → Services → Endpoints → Integration
- Each phase should be a complete, independently testable increment - Each phase should be a complete, independently testable increment
- **Final Phase**: Polish & Cross-Cutting Concerns - **Final Phase**: Polish & Cross-Cutting Concerns
### Decision-Memory Validation Gate
Before finalizing `tasks.md`, verify all of the following:
- Every repo-shaping ADR from planning is either represented in a setup/foundational task or inherited by a downstream story task.
- Every guarded task that could tempt an implementer into a known wrong branch carries preventive `@RATIONALE` / `@REJECTED` guidance in its text.
- No task instructs the implementer to realize an ADR path already named as rejected.
- At least one explicit audit/verification task exists for checking rejected-path regressions in code review or test stages.

View File

@@ -14,7 +14,7 @@ You **MUST** consider the user input before proceeding (if not empty).
## Goal ## Goal
Execute semantic audit and full testing cycle: verify contract compliance, emulate logic, ensure maximum coverage, and maintain test quality. Execute semantic audit and full testing cycle: verify contract compliance, verify decision-memory continuity, emulate logic, ensure maximum coverage, and maintain test quality.
## Operating Constraints ## Operating Constraints
@@ -22,6 +22,7 @@ Execute semantic audit and full testing cycle: verify contract compliance, emula
2. **NEVER duplicate tests** - Check existing tests first before creating new ones 2. **NEVER duplicate tests** - Check existing tests first before creating new ones
3. **Use TEST_FIXTURE fixtures** - For CRITICAL tier modules, read @TEST_FIXTURE from .ai/standards/semantics.md 3. **Use TEST_FIXTURE fixtures** - For CRITICAL tier modules, read @TEST_FIXTURE from .ai/standards/semantics.md
4. **Co-location required** - Write tests in `__tests__` directories relative to the code being tested 4. **Co-location required** - Write tests in `__tests__` directories relative to the code being tested
5. **Decision-memory regression guard** - Tests and audits must not normalize silent reintroduction of any path documented in upstream `@REJECTED`
## Execution Steps ## Execution Steps
@@ -31,18 +32,25 @@ Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --inclu
Determine: Determine:
- FEATURE_DIR - where the feature is located - FEATURE_DIR - where the feature is located
- TASKS_FILE - path to tasks.md - TASKS_FILE - path to `tasks.md`
- Which modules need testing based on task status - Which modules need testing based on task status
- Which ADRs or task guardrails define rejected paths for the touched scope
### 2. Load Relevant Artifacts ### 2. Load Relevant Artifacts
**From tasks.md:** **From `tasks.md`:**
- Identify completed implementation tasks (not test tasks) - Identify completed implementation tasks (not test tasks)
- Extract file paths that need tests - Extract file paths that need tests
- Extract guardrail summaries and blocked paths
**From .ai/standards/semantics.md:** **From `.ai/standards/semantics.md`:**
- Read @TIER annotations for modules - Read effective complexity expectations
- For CRITICAL modules: Read @TEST_ fixtures - Read decision-memory rules for ADR, preventive guardrails, and reactive Micro-ADR
- For CRITICAL modules: Read `@TEST_` fixtures
**From ADR sources and touched code:**
- Read `[DEF:id:ADR]` nodes when present
- Read local `@RATIONALE` and `@REJECTED` in touched contracts
**From existing tests:** **From existing tests:**
- Scan `__tests__` directories for existing tests - Scan `__tests__` directories for existing tests
@@ -52,9 +60,9 @@ Determine:
Create coverage matrix: Create coverage matrix:
| Module | File | Has Tests | TIER | TEST_FIXTURE Available | | Module | File | Has Tests | Complexity / Tier | TEST_FIXTURE Available | Rejected Path Guarded |
|--------|------|-----------|------|----------------------| |--------|------|-----------|-------------------|------------------------|-----------------------|
| ... | ... | ... | ... | ... | | ... | ... | ... | ... | ... | ... |
### 4. Semantic Audit & Logic Emulation (CRITICAL) ### 4. Semantic Audit & Logic Emulation (CRITICAL)
@@ -66,9 +74,12 @@ Before writing tests, the Tester MUST:
- Reject Python Complexity 4+ modules that omit meaningful `logger.reason()` / `logger.reflect()` checkpoints. - Reject Python Complexity 4+ modules that omit meaningful `logger.reason()` / `logger.reflect()` checkpoints.
- Reject Python Complexity 5 modules that omit `belief_scope(...)`, `@DATA_CONTRACT`, or `@INVARIANT`. - Reject Python Complexity 5 modules that omit `belief_scope(...)`, `@DATA_CONTRACT`, or `@INVARIANT`.
- Treat broken or missing closing anchors as blocking violations. - Treat broken or missing closing anchors as blocking violations.
- Reject retained workaround code if the local contract lacks `@RATIONALE` / `@REJECTED`.
- Reject code that silently re-enables a path declared in upstream ADR or local guardrails as rejected.
3. **Emulate Algorithm**: Step through the code implementation in mind. 3. **Emulate Algorithm**: Step through the code implementation in mind.
- Verify it adheres to the `@PURPOSE` and `@INVARIANT`. - Verify it adheres to the `@PURPOSE` and `@INVARIANT`.
- Verify `@PRE` and `@POST` conditions are correctly handled. - Verify `@PRE` and `@POST` conditions are correctly handled.
- Verify the implementation follows accepted-path rationale rather than drifting into a blocked path.
4. **Validation Verdict**: 4. **Validation Verdict**:
- If audit fails: Emit `[AUDIT_FAIL: semantic_noncompliance]` with concrete file-path reasons and notify Orchestrator. - If audit fails: Emit `[AUDIT_FAIL: semantic_noncompliance]` with concrete file-path reasons and notify Orchestrator.
- Example blocking case: [`backend/src/services/dataset_review/repositories/session_repository.py`](backend/src/services/dataset_review/repositories/session_repository.py) contains a module anchor, but its nested repository class/method semantics are expressed as loose docstrings instead of canonical anchored contracts; this MUST be rejected until remediated or explicitly waived. - Example blocking case: [`backend/src/services/dataset_review/repositories/session_repository.py`](backend/src/services/dataset_review/repositories/session_repository.py) contains a module anchor, but its nested repository class/method semantics are expressed as loose docstrings instead of canonical anchored contracts; this MUST be rejected until remediated or explicitly waived.
@@ -79,7 +90,7 @@ Before writing tests, the Tester MUST:
For each module requiring tests: For each module requiring tests:
1. **Check existing tests**: Scan `__tests__/` for duplicates. 1. **Check existing tests**: Scan `__tests__/` for duplicates.
2. **Read TEST_FIXTURE**: If CRITICAL tier, read @TEST_FIXTURE from semantics header. 2. **Read TEST_FIXTURE**: If CRITICAL tier, read `@TEST_FIXTURE` from semantics header.
3. **Do not normalize broken semantics through tests**: 3. **Do not normalize broken semantics through tests**:
- The Tester must not write tests that silently accept malformed semantic protocol usage. - The Tester must not write tests that silently accept malformed semantic protocol usage.
- If implementation is semantically invalid, stop and reject instead of adapting tests around the invalid structure. - If implementation is semantically invalid, stop and reject instead of adapting tests around the invalid structure.
@@ -87,6 +98,8 @@ For each module requiring tests:
- Python: `src/module/__tests__/test_module.py` - Python: `src/module/__tests__/test_module.py`
- Svelte: `src/lib/components/__tests__/test_component.test.js` - Svelte: `src/lib/components/__tests__/test_component.test.js`
5. **Use mocks**: Use `unittest.mock.MagicMock` for external dependencies 5. **Use mocks**: Use `unittest.mock.MagicMock` for external dependencies
6. **Add rejected-path regression coverage when relevant**:
- If ADR or local contract names a blocked path in `@REJECTED`, add or verify at least one test or explicit audit check that would fail if that forbidden path were silently restored.
### 4a. UX Contract Testing (Frontend Components) ### 4a. UX Contract Testing (Frontend Components)
@@ -103,9 +116,10 @@ For Svelte components with `@UX_STATE`, `@UX_FEEDBACK`, `@UX_RECOVERY` tags:
expect(screen.getByTestId('sidebar')).toHaveClass('expanded'); expect(screen.getByTestId('sidebar')).toHaveClass('expanded');
}); });
``` ```
3. **Test @UX_FEEDBACK**: Verify visual feedback (toast, shake, color changes) 3. **Test `@UX_FEEDBACK`**: Verify visual feedback (toast, shake, color changes)
4. **Test @UX_RECOVERY**: Verify error recovery mechanisms (retry, clear input) 4. **Test `@UX_RECOVERY`**: Verify error recovery mechanisms (retry, clear input)
5. **Use @UX_TEST fixtures**: If component has `@UX_TEST` tags, use them as test specifications 5. **Use `@UX_TEST` fixtures**: If component has `@UX_TEST` tags, use them as test specifications
6. **Verify decision memory**: If the UI contract declares `@REJECTED`, ensure browser-visible behavior does not regress into the rejected path.
**UX Test Template:** **UX Test Template:**
```javascript ```javascript
@@ -139,6 +153,8 @@ tests/
└── YYYY-MM-DD-report.md └── YYYY-MM-DD-report.md
``` ```
Include decision-memory coverage notes when ADR or rejected-path regressions were checked.
### 6. Execute Tests ### 6. Execute Tests
Run tests and report results: Run tests and report results:
@@ -155,10 +171,11 @@ cd frontend && npm run test
### 7. Update Tasks ### 7. Update Tasks
Mark test tasks as completed in tasks.md with: Mark test tasks as completed in `tasks.md` with:
- Test file path - Test file path
- Coverage achieved - Coverage achieved
- Any issues found - Any issues found
- Whether rejected-path regression checks passed or remain manual audit items
## Output ## Output
@@ -188,10 +205,15 @@ Generate test execution report:
- Verdict: PASS | FAIL - Verdict: PASS | FAIL
- Blocking Violations: - Blocking Violations:
- [file path] -> [reason] - [file path] -> [reason]
- Decision Memory:
- ADRs checked: [...]
- Rejected-path regressions: PASS | FAIL
- Missing `@RATIONALE` / `@REJECTED`: [...]
- Notes: - Notes:
- Reject docstring-only semantic pseudo-markup - Reject docstring-only semantic pseudo-markup
- Reject complexity/contract mismatches - Reject complexity/contract mismatches
- Reject missing belief-state instrumentation for Python Complexity 4/5 - Reject missing belief-state instrumentation for Python Complexity 4/5
- Reject silent resurrection of rejected paths
## Issues Found ## Issues Found
@@ -203,6 +225,7 @@ Generate test execution report:
- [ ] Fix failed tests - [ ] Fix failed tests
- [ ] Fix blocking semantic violations before acceptance - [ ] Fix blocking semantic violations before acceptance
- [ ] Fix decision-memory drift or rejected-path regressions
- [ ] Add more coverage for [module] - [ ] Add more coverage for [module]
- [ ] Review TEST_FIXTURE fixtures - [ ] Review TEST_FIXTURE fixtures
``` ```

Submodule research/kilocode deleted from 6d4d7328f6