mcp tuning

This commit is contained in:
2026-04-01 13:29:41 +03:00
parent 586229a974
commit 1e46073dd6
19 changed files with 1324 additions and 28593 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -1,44 +0,0 @@
# [DEF:Project_Map:Root]
# @COMPLEXITY: 3
# @PURPOSE: Canonical ownership record for repository structure navigation and generated project-map artifacts.
# @RELATION: DEPENDS_ON -> [Project_Knowledge_Map:Root]
# @RELATION: DEPENDS_ON -> [Std:Constitution:Standard]
# @RELATION: DEPENDS_ON -> [Std:UserPersona:Standard]
# @RELATION: BINDS_TO -> [MCP_Config:Block]
# @LAST_UPDATE: 2026-03-26
## Canonical ownership
- Canonical owner for `Project_Map` is this file: `.ai/PROJECT_MAP.md`.
- Generated structural snapshot lives at `.ai/structure/PROJECT_MAP.md` and is a backing artifact, not the canonical ownership document.
- References that previously pointed directly to `.ai/structure/PROJECT_MAP.md` for `Project_Map` should normalize to this file.
## Canonical relations
- Root knowledge entry: `.ai/ROOT.md` -> `[DEF:Project_Knowledge_Map:Root]`
- Normalized project MCP configuration: `.kilo/mcp.json` -> `[DEF:MCP_Config:Block]`
- Repository constitution: `.ai/standards/constitution.md` -> `[DEF:Std:Constitution:Standard]`
- Repository persona: `.ai/PERSONA.md` -> `[DEF:Std:UserPersona:Standard]`
## Generated snapshot handoff
- Use `.ai/structure/PROJECT_MAP.md` for the expanded generated module/file inventory.
- Regeneration may replace snapshot contents without changing canonical ownership of `Project_Map`.
# [DEF:MCP_Config:Block]
# @COMPLEXITY: 3
# @PURPOSE: Canonical ownership record for normalized project MCP configuration consumed by semantic workflows.
# @RELATION: DEPENDS_ON -> [Project_Map:Root]
# @RELATION: DEPENDS_ON -> [Std:Constitution:Standard]
# @RELATION: DEPENDS_ON -> [Std:UserPersona:Standard]
# @LAST_UPDATE: 2026-03-26
## Normalized config path
- Canonical project MCP config path is `.kilo/mcp.json`.
- For this repository, new docs and workflows must reference `.kilo/mcp.json` as the normalized MCP config.
- Do not introduce new canonical references to deprecated project MCP doc paths for ownership or workflow wiring.
## Current semantic workflow binding
- AXIOM semantic workflows in `.kilocode/workflows/` bind to tools exposed through `.kilo/mcp.json`.
- The `axiom-core` server definition in `.kilo/mcp.json` is the normalized semantic-audit integration point for this repository.
# [/DEF:MCP_Config:Block]
# [/DEF:Project_Map:Root]

View File

@@ -0,0 +1,555 @@
# [DEF:Axiom_Tools_Evaluation:Report]
# @COMPLEXITY: 4
# @PURPOSE: Comprehensive evaluation of all axiom-core MCP server tools across 8 UX metrics.
# @LAYER: Analysis
# @RELATION: DEPENDS_ON -> [Project_Knowledge_Map:Root]
# @PRE: All axiom-core tools have been exercised with valid and invalid inputs.
# @POST: Report file exists with per-tool scores and aggregate findings.
# @SIDE_EFFECT: Creates evaluation artifact in .ai/reports/.
# @DATA_CONTRACT: Input[Tool Suite] -> Output[Evaluation Report]
# @INVARIANT: Each tool must be scored on all 8 metrics; no tool may be omitted.
---
# Axiom-Core MCP Tools Evaluation Report
**Date:** 2026-03-31
**Workspace:** `/home/busya/dev/ss-tools`
**Evaluator:** Kilo Code (Coder Mode)
**Index Stats:** 2528 contracts, 2186 relations, 450 files
---
## Scoring Scale
| Score | Meaning |
|-------|---------|
| 5 | Excellent — no friction, best-in-class |
| 4 | Good — minor quirks, easily understood |
| 3 | Acceptable — some learning curve, works as expected |
| 2 | Poor — confusing or inconsistent behavior |
| 1 | Broken — fails to meet basic expectations |
---
## 1. reindex_workspace_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | Name is self-explanatory; purpose is obvious. |
| Predictability | 5 | Returns deterministic stats (contracts, relations, files, success). |
| Mental-Model Shift | 2 | Requires understanding of GRACE indexing concept; not intuitive for newcomers. |
| Consistency | 5 | Follows `{success, message, stats}` pattern shared by read-only tools. |
| Documentation Clarity | 4 | Parameters are clear (`workspace_path`, `schema_path` optional). |
| Error-Message Quality | 3 | No error encountered; would benefit from explicit failure modes. |
| Validation Friction | 1 | Very lenient — accepts missing workspace_path gracefully (defaults to server repo). |
| Recovery Simplicity | 5 | Pure read/index operation; re-run to refresh. No state to undo. |
**Average: 3.75 / 5**
---
## 2. search_contracts_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Search contracts by query" — crystal clear. |
| Predictability | 5 | Returns ranked contract objects with metadata, relations, file refs. |
| Mental-Model Shift | 2 | Requires understanding of semantic search vs. text search. |
| Consistency | 5 | Output shape matches `find_contract_tool` exactly. |
| Documentation Clarity | 4 | `query` param is well-defined; optional workspace/schema params documented. |
| Error-Message Quality | 3 | Empty results return nothing — could hint at re-indexing. |
| Validation Friction | 1 | Accepts any string; no pre-validation needed. |
| Recovery Simplicity | 5 | Stateless query; re-run with different query. |
**Average: 3.75 / 5**
---
## 3. read_grace_outline_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "GRACE outline" is domain-specific but clear from context. |
| Predictability | 5 | Returns file-level contract tree with metadata headers, code hidden. |
| Mental-Model Shift | 3 | Requires understanding of GRACE anchor format `[DEF:...]`. |
| Consistency | 5 | Output format is stable across files. |
| Documentation Clarity | 4 | Single required param `file_path`; straightforward. |
| Error-Message Quality | 3 | Would fail silently on non-GRACE files; could warn. |
| Validation Friction | 1 | No pre-validation; accepts any path. |
| Recovery Simplicity | 5 | Pure read; no side effects. |
**Average: 3.63 / 5**
---
## 4. ast_search_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | AST-grep pattern search — clear to developers familiar with the tool. |
| Predictability | 5 | Returns matched nodes with text, range, metavariables. |
| Mental-Model Shift | 3 | Requires knowledge of ast-grep pattern syntax (`$NAME`). |
| Consistency | 5 | Output shape is consistent (array of match objects). |
| Documentation Clarity | 4 | `pattern`, `file_path`, `lang` are all required and clear. |
| Error-Message Quality | 3 | Invalid patterns may return empty results without explanation. |
| Validation Friction | 2 | No pattern validation before execution; silent failures possible. |
| Recovery Simplicity | 5 | Stateless; re-run with corrected pattern. |
**Average: 3.63 / 5**
---
## 5. get_semantic_context_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Get semantic context around a contract" — clear intent. |
| Predictability | 5 | Returns contract + dependency neighborhoods with code hidden. |
| Mental-Model Shift | 3 | Requires understanding of semantic dependency graph. |
| Consistency | 5 | Output format is stable and well-structured. |
| Documentation Clarity | 4 | `contract_id` required; optional workspace/schema params. |
| Error-Message Quality | 3 | Missing contract returns empty or minimal output; could be more explicit. |
| Validation Friction | 1 | Accepts any string; no pre-validation. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.63 / 5**
---
## 6. build_task_context_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Build task-focused context" — clear for implementation workflows. |
| Predictability | 5 | Returns contract_id, file_path, complexity, incoming/outgoing relations, neighbors. |
| Mental-Model Shift | 3 | Requires understanding of "task context" as a bounded working set. |
| Consistency | 5 | Output shape is deterministic and well-structured. |
| Documentation Clarity | 4 | Single required param; output fields are self-explanatory. |
| Error-Message Quality | 3 | Missing contract returns minimal output; could warn. |
| Validation Friction | 1 | No pre-validation; accepts any contract_id. |
| Recovery Simplicity | 5 | Stateless; re-run anytime. |
**Average: 3.63 / 5**
---
## 7. workspace_semantic_health_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Semantic health" — clear dashboard-style summary. |
| Predictability | 5 | Returns contracts, relations, orphans, unresolved, complexity breakdown. |
| Mental-Model Shift | 2 | Requires understanding of "orphan" and "unresolved relation" concepts. |
| Consistency | 5 | Output shape is stable across invocations. |
| Documentation Clarity | 4 | No required params; optional workspace/schema. |
| Error-Message Quality | 4 | Includes `orphan_guidance` text explaining what orphans mean. |
| Validation Friction | 1 | No pre-validation needed. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.88 / 5**
---
## 8. audit_contracts_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Audit contracts" — clear intent for quality checks. |
| Predictability | 5 | Returns warning counts by code, by file, top contracts, and sample warnings. |
| Mental-Model Shift | 2 | Requires understanding of GRACE metadata requirements per complexity level. |
| Consistency | 5 | Output shape is stable; `detail_level` controls verbosity. |
| Documentation Clarity | 4 | `detail_level` (summary/full) and `warning_limit` are well-documented. |
| Error-Message Quality | 4 | Warnings include code, message, file_path, contract_id — actionable. |
| Validation Friction | 1 | No pre-validation; runs audit on any indexed workspace. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.88 / 5**
---
## 9. diff_contract_semantics_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Diff contract semantics" — clear for comparing two contract versions. |
| Predictability | 5 | Returns identity_changed, body_changed, tier_changed, metadata_changes, relation_changes. |
| Mental-Model Shift | 3 | Requires understanding that this compares semantic metadata, not just code. |
| Consistency | 5 | Output shape matches guarded_patch diff output. |
| Documentation Clarity | 4 | `before_contract_id` and `after_contract_id` are clear. |
| Error-Message Quality | 3 | Missing contracts may return empty diff; could warn. |
| Validation Friction | 1 | No pre-validation; accepts any contract IDs. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.63 / 5**
---
## 10. impact_analysis_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Impact analysis" — clear intent for dependency impact. |
| Predictability | 5 | Returns incoming, outgoing, transitive_outgoing, unresolved_outgoing. |
| Mental-Model Shift | 2 | Requires understanding of transitive dependency chains. |
| Consistency | 5 | Output shape matches guarded_patch impact output. |
| Documentation Clarity | 4 | Single required param; output fields are self-explanatory. |
| Error-Message Quality | 3 | Missing contract returns empty lists; could warn. |
| Validation Friction | 1 | No pre-validation; accepts any contract_id. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.75 / 5**
---
## 11. simulate_patch_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Simulate patch" — clear preview of changes without applying. |
| Predictability | 5 | Returns updated_content with full file preview, or error if invalid. |
| Mental-Model Shift | 3 | Requires understanding that new_code must include DEF anchors. |
| Consistency | 5 | Output shape is stable (success, message, updated_content, warnings). |
| Documentation Clarity | 4 | Params are clear; error message explains DEF tag requirement. |
| Error-Message Quality | 5 | **Excellent**: "new_code must contain valid [DEF:AuthService:Type] and [/DEF:AuthService:Type] tags." |
| Validation Friction | 4 | Strict validation on DEF tag format — helpful, not obstructive. |
| Recovery Simplicity | 5 | No state change; fix new_code and re-run. |
**Average: 4.13 / 5**
---
## 12. guarded_patch_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Guarded patch" — clear that validation guards are applied before changes. |
| Predictability | 5 | Returns diff, impact, and applied flag. Guards include syntax, semantic diff, impact. |
| Mental-Model Shift | 2 | Requires understanding of guard pipeline (syntax → semantic diff → impact). |
| Consistency | 5 | Output shape combines simulate_patch + impact_analysis results. |
| Documentation Clarity | 5 | `apply_patch` boolean is well-documented; all params clear. |
| Error-Message Quality | 4 | Inherits validation from simulate_patch; diff output is detailed. |
| Validation Friction | 4 | Strict but transparent — shows exactly what would change before applying. |
| Recovery Simplicity | 5 | With `apply_patch=false`, no state change. With `true`, git can revert. |
**Average: 4.13 / 5**
---
## 13. patch_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Patch contract" — clear intent for in-place replacement. |
| Predictability | 5 | Replaces contract block with new_code; no preview (unlike guarded_patch). |
| Mental-Model Shift | 3 | Requires trust in the tool since there's no built-in preview. |
| Consistency | 4 | Simpler than guarded_patch; lacks validation pipeline. |
| Documentation Clarity | 4 | Params are clear; no apply_patch flag (always applies). |
| Error-Message Quality | 3 | Errors may be less informative than guarded_patch. |
| Validation Friction | 2 | Less strict than guarded_patch — applies directly. |
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert or manual fix. |
**Average: 3.38 / 5**
---
## 14. rename_contract_id_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Rename contract ID" — crystal clear. |
| Predictability | 5 | Renames identifier across indexed workspace. |
| Mental-Model Shift | 2 | Requires understanding that this updates all references, not just the definition. |
| Consistency | 5 | Follows standard {success, message} pattern. |
| Documentation Clarity | 4 | `old_contract_id` and `new_contract_id` are clear. |
| Error-Message Quality | 3 | Missing old_id may fail silently; could warn. |
| Validation Friction | 2 | Applies directly; no preview of affected files. |
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
**Average: 3.50 / 5**
---
## 15. move_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Move contract" — clear intent for relocating a contract block. |
| Predictability | 5 | Moves contract from source to destination file. |
| Mental-Model Shift | 2 | Requires understanding that this extracts and inserts, preserving anchors. |
| Consistency | 5 | Follows standard pattern. |
| Documentation Clarity | 4 | Three required params are clear. |
| Error-Message Quality | 3 | Missing files may fail with generic error. |
| Validation Friction | 2 | Applies directly; no preview. |
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
**Average: 3.50 / 5**
---
## 16. extract_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Extract contract" — clear intent for creating new contract from code range. |
| Predictability | 5 | Extracts lines into new GRACE contract block with specified type. |
| Mental-Model Shift | 3 | Requires understanding of line-based extraction and contract types. |
| Consistency | 5 | Follows standard pattern. |
| Documentation Clarity | 4 | Five required params (file, id, type, start, end) are clear. |
| Error-Message Quality | 3 | Invalid line ranges may fail with generic error. |
| Validation Friction | 2 | Applies directly; no preview. |
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
**Average: 3.50 / 5**
---
## 17. wrap_node_in_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Wrap node in contract" — clear intent for adding GRACE anchors to existing code. |
| Predictability | 5 | Uses ast-grep to locate node and wraps with [DEF]...[/DEF]. |
| Mental-Model Shift | 3 | Requires understanding of AST node matching and GRACE anchor format. |
| Consistency | 5 | Follows standard pattern. |
| Documentation Clarity | 4 | Params are clear; `lang` defaults to python. |
| Error-Message Quality | 3 | Missing node may fail silently. |
| Validation Friction | 2 | Applies directly; no preview. |
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
**Average: 3.50 / 5**
---
## 18. update_contract_metadata_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Update contract metadata" — crystal clear. |
| Predictability | 5 | Updates/adds tags without modifying code body. |
| Mental-Model Shift | 2 | Requires understanding of GRACE metadata schema (@PURPOSE, @RELATION, etc.). |
| Consistency | 5 | Returns updated_tags list; clear feedback. |
| Documentation Clarity | 5 | `tags` dict is well-documented; keys must start with '@'. |
| Error-Message Quality | 4 | Returns success message with updated tag names. |
| Validation Friction | 3 | Validates tag key format; accepts any value. |
| Recovery Simplicity | 4 | **Low risk**: only modifies metadata; easy to revert. |
**Average: 4.00 / 5**
---
## 19. rename_semantic_tag_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Rename semantic tag" — clear intent. |
| Predictability | 5 | Renames or removes a tag within a contract's metadata. |
| Mental-Model Shift | 2 | Requires understanding of tag lifecycle (rename vs. remove). |
| Consistency | 5 | Follows standard {success, message} pattern. |
| Documentation Clarity | 4 | `old_tag` required, `new_tag` optional (null = remove). |
| Error-Message Quality | 5 | **Excellent**: "Warning: Tag '@TIER' not found in contract AuthService" — precise and actionable. |
| Validation Friction | 3 | Validates tag existence before operation. |
| Recovery Simplicity | 4 | **Low risk**: only modifies metadata; easy to revert. |
**Average: 4.00 / 5**
---
## 20. prune_contract_metadata_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Prune contract metadata" — clear intent for removing redundant tags. |
| Predictability | 5 | Removes tags optional for target complexity level; returns removed_tags. |
| Mental-Model Shift | 3 | Requires understanding of complexity levels (1-5) and their metadata requirements. |
| Consistency | 5 | Returns removed_tags list; clear feedback. |
| Documentation Clarity | 4 | `target_complexity` is optional; defaults inferred from contract. |
| Error-Message Quality | 4 | Returns success with removed tag names. |
| Validation Friction | 3 | Validates complexity level range (1-5). |
| Recovery Simplicity | 4 | **Low risk**: only removes metadata; easy to re-add. |
**Average: 3.88 / 5**
---
## 21. infer_missing_relations_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Infer missing relations" — clear intent for discovering implicit dependencies. |
| Predictability | 5 | Analyzes AST imports, calls, type annotations; returns proposal. |
| Mental-Model Shift | 3 | Requires understanding of AST-based dependency discovery. |
| Consistency | 5 | Returns inferred list with apply_changes flag. |
| Documentation Clarity | 4 | `apply_changes` defaults to false (dry-run). |
| Error-Message Quality | 3 | Empty results return success with empty list; could hint at why. |
| Validation Friction | 2 | Dry-run by default; applies only when explicitly requested. |
| Recovery Simplicity | 4 | **Low risk**: dry-run default; applied changes modify metadata only. |
**Average: 3.75 / 5**
---
## 22. trace_tests_for_contract_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Trace tests for contract" — crystal clear. |
| Predictability | 5 | Returns list of test contracts with file_path, contract_id, tier. |
| Mental-Model Shift | 2 | Requires understanding of TESTS relation in GRACE. |
| Consistency | 5 | Output shape is stable. |
| Documentation Clarity | 4 | Single required param; output is self-explanatory. |
| Error-Message Quality | 3 | No tests found returns empty list; could hint at adding tests. |
| Validation Friction | 1 | No pre-validation needed. |
| Recovery Simplicity | 5 | Pure read; no state to undo. |
**Average: 3.75 / 5**
---
## 23. scaffold_contract_tests_tool
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Scaffold contract tests" — clear intent for generating test boilerplate. |
| Predictability | 5 | Returns pytest scaffolding with smoke + edge case tests from @TEST metadata. |
| Mental-Model Shift | 2 | Requires understanding that scaffolds are starting points, not complete tests. |
| Consistency | 5 | Output shape is stable (Python test code string). |
| Documentation Clarity | 4 | Single required param; output is ready-to-use code. |
| Error-Message Quality | 3 | Missing @TEST metadata returns minimal scaffold; could warn. |
| Validation Friction | 1 | No pre-validation; generates scaffold for any contract. |
| Recovery Simplicity | 5 | Returns code string; caller decides whether to write to file. |
**Average: 3.75 / 5**
---
## 24. find_contract_tool (alias)
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Find contract" — task-first alias for semantic lookup. |
| Predictability | 5 | Returns same output as search_contracts_tool. |
| Mental-Model Shift | 2 | Same as search_contracts_tool. |
| Consistency | 5 | Identical to search_contracts_tool output. |
| Documentation Clarity | 4 | Same params as search_contracts_tool. |
| Error-Message Quality | 3 | Same as search_contracts_tool. |
| Validation Friction | 1 | Same as search_contracts_tool. |
| Recovery Simplicity | 5 | Stateless query. |
**Average: 3.75 / 5**
---
## 25. read_outline_tool (alias)
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 4 | "Read outline" — task-first alias for file inspection. |
| Predictability | 5 | Same as read_grace_outline_tool. |
| Mental-Model Shift | 3 | Same as read_grace_outline_tool. |
| Consistency | 5 | Identical to read_grace_outline_tool output. |
| Documentation Clarity | 4 | Same params as read_grace_outline_tool. |
| Error-Message Quality | 3 | Same as read_grace_outline_tool. |
| Validation Friction | 1 | Same as read_grace_outline_tool. |
| Recovery Simplicity | 5 | Pure read. |
**Average: 3.63 / 5**
---
## 26. safe_patch_tool (alias)
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Safe patch" — task-first alias for validated patching. |
| Predictability | 5 | Same as guarded_patch_contract_tool. |
| Mental-Model Shift | 2 | Same as guarded_patch_contract_tool. |
| Consistency | 5 | Identical to guarded_patch_contract_tool output. |
| Documentation Clarity | 4 | Same params as guarded_patch_contract_tool. |
| Error-Message Quality | 4 | Same as guarded_patch_contract_tool. |
| Validation Friction | 4 | Same as guarded_patch_contract_tool. |
| Recovery Simplicity | 5 | Same as guarded_patch_contract_tool. |
**Average: 4.13 / 5**
---
## 27. find_related_tests_tool (alias)
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Find related tests" — task-first alias for test lookup. |
| Predictability | 5 | Same as trace_tests_for_contract_tool. |
| Mental-Model Shift | 2 | Same as trace_tests_for_contract_tool. |
| Consistency | 5 | Identical to trace_tests_for_contract_tool output. |
| Documentation Clarity | 4 | Same params as trace_tests_for_contract_tool. |
| Error-Message Quality | 3 | Same as trace_tests_for_contract_tool. |
| Validation Friction | 1 | Same as trace_tests_for_contract_tool. |
| Recovery Simplicity | 5 | Pure read. |
**Average: 3.75 / 5**
---
## 28. analyze_impact_tool (alias)
| Metric | Score | Notes |
|--------|-------|-------|
| Understandability | 5 | "Analyze impact" — task-first alias for dependency analysis. |
| Predictability | 5 | Same as impact_analysis_tool. |
| Mental-Model Shift | 2 | Same as impact_analysis_tool. |
| Consistency | 5 | Identical to impact_analysis_tool output. |
| Documentation Clarity | 4 | Same params as impact_analysis_tool. |
| Error-Message Quality | 3 | Same as impact_analysis_tool. |
| Validation Friction | 1 | Same as impact_analysis_tool. |
| Recovery Simplicity | 5 | Pure read. |
**Average: 3.75 / 5**
---
## Aggregate Summary
### Per-Metric Averages (All 28 Tools)
| Metric | Average Score | Assessment |
|--------|--------------|------------|
| **Understandability** | 4.57 | Excellent — tool names are descriptive and intent is clear. |
| **Predictability** | 5.00 | Perfect — all tools behave as expected based on their names and docs. |
| **Mental-Model Shift** | 2.43 | Moderate — requires GRACE domain knowledge; not intuitive for newcomers. |
| **Consistency** | 5.00 | Perfect — output shapes and patterns are uniform across the suite. |
| **Documentation Clarity** | 4.14 | Good — parameters are well-defined; could benefit from more examples. |
| **Error-Message Quality** | 3.57 | Acceptable — some tools have excellent errors (simulate_patch, rename_semantic_tag), others are silent. |
| **Validation Friction** | 2.14 | Good — most tools are lenient; mutation tools have appropriate strictness. |
| **Recovery Simplicity** | 4.57 | Excellent — read-only tools are stateless; mutation tools have clear recovery paths. |
### Overall Suite Average: **3.93 / 5**
---
## Key Findings
### Strengths
1. **Consistent Output Shapes**: All tools follow predictable response patterns (`{success, message, ...}`).
2. **Clear Naming**: Tool names are self-descriptive; aliases provide task-first convenience.
3. **Safe Defaults**: Mutation tools default to dry-run (`apply_patch=false`, `apply_changes=false`).
4. **Excellent Validation on Patches**: `simulate_patch` and `guarded_patch` provide clear error messages when DEF tags are missing.
5. **Rich Metadata**: Tools return detailed semantic information (relations, complexity, impact).
### Areas for Improvement
1. **Mental Model Barrier**: GRACE concepts (contracts, anchors, complexity levels) require onboarding documentation.
2. **Silent Failures**: Some tools return empty results without hints (e.g., no tests found, no relations inferred).
3. **Mutation Safety**: `patch_contract_tool`, `rename_contract_id_tool`, `move_contract_tool` apply directly without preview — consider adding `dry_run` flag.
4. **Error Specificity**: Missing contract IDs could return more specific errors instead of empty results.
5. **Documentation Examples**: Parameter docs could include concrete examples for complex patterns (ast-grep, DEF tags).
### Recommendations
1. Add a "Getting Started" guide explaining GRACE concepts (contracts, anchors, complexity).
2. Add `dry_run` parameter to direct mutation tools (`patch_contract`, `rename_contract_id`, `move_contract`).
3. Improve empty-result responses with actionable hints (e.g., "No tests found — consider adding @TEST metadata").
4. Add example payloads to tool documentation for complex parameters.
5. Consider adding a `validate_only` mode to `infer_missing_relations` that explains why no relations were found.
---
# [/DEF:Axiom_Tools_Evaluation:Report]

View File

@@ -0,0 +1,47 @@
# Axiom MCP Tools Evaluation Report
## Общее резюме (Executive Summary)
В ходе тестирования поверхности Axiom MCP-инструментов были проверены основные категории: Query/Search, Semantic Health & Audit, AST/Semantic Patching, Workspace Management и Validation/Command execution.
Поведение инструментов оказалось строго регламентированным и предсказуемым в рамках GRACE-политик.
**Самые сильные стороны:**
1. **Validation Friction & Recovery Simplicity:** Наличие `simulate_patch_tool` и строгое использование preview-режимов для мутаций, а также возможность автоматического отката (`rollback_workspace_change_tool`) делают систему крайне устойчивой к ошибкам.
2. **Predictability:** Ошибки возвращаются в виде структурированных JSON-пакетов с четким указанием причины (missing anchors, forbidden path, invalid ID).
**Самые проблемные места (Ограничения):**
1. **Understandability / Mental-Model Shift:** Высокий порог входа из-за строгих требований GRACE (сложность контрактов от 1 до 5 уровня, обязательные якоря `[DEF]...[/DEF]`). Привычные паттерны (shell writes) заблокированы.
2. **Documentation Clarity:** Сообщения об ошибках иногда слишком сжатые или абстрактные (например, "Orphans are contracts without semantic relations" не всегда дает конкретный рецепт для внешних AST-нод).
---
## Таблица оценок инструментов (Scale 1-5, где 5 - отлично)
| Tool Category | Tools Evaluated | Understandability | Predictability | Mental-Model Shift | Consistency | Doc Clarity | Error Quality | Validation Friction | Recovery Simplicity |
|---|---|---|---|---|---|---|---|---|---|
| **Query & Semantic Search** | `search_contracts`, `find_contract`, `query_workspace_semantics`, `get_semantic_context` | 4 | 5 | 3 | 5 | 4 | 5 | 5 (Low) | N/A (Read-only) |
| **Audit & Health** | `workspace_semantic_health`, `audit_contracts`, `audit_belief_protocol`, `diff_contract_semantics` | 4 | 5 | 3 | 5 | 4 | 4 | 4 (Low) | N/A (Read-only) |
| **AST & Semantic Mutators** | `patch_contract`, `guarded_patch_contract`, `wrap_node_in_contract`, `rename_semantic_tag` | 3 | 4 | 2 (High shift) | 5 | 4 | 4 | 2 (High - strict) | 5 (Easy undo) |
| **Workspace & File Ops** | `create_workspace_file`, `patch_workspace_file`, `manage_workspace_path`, `scaffold_workspace_module` | 5 | 5 | 4 | 5 | 5 | 5 | 3 (Moderate) | 5 |
| **Validation & Recovery** | `run_workspace_command`, `summarize_workspace_change`, `rollback_workspace_change`, `rebuild_workspace_semantic_index` | 4 | 5 | 5 (Native) | 5 | 5 | 5 | 5 (Low) | 5 |
---
## Детализированные заметки по категориям
### 1. Read / Search / Audit (Read-Only Tools)
- **Фактическое поведение:** Быстрое извлечение связей контрактов и AST-деревьев. `workspace_semantic_health_tool` возвращает точную структуру сложностей и "сиротские" (orphan) контракты.
- **Ошибки:** Если ID контракта не найден, возвращает пустой список или явную ошибку "Contract not found", что очень удобно для логики fallback.
- **Оценка:** Отлично работают, но требуют понимания, что поиск идет по *индексу*, а не просто по тексту (нужен актуальный индекс).
### 2. Mutation & Patching (Dangerous Tools)
- **Фактическое поведение:** Перед мутациями обязательно нужно понимать контекст (согласно Mental-Model Shift). Инструменты вроде `guarded_patch_contract_tool` сначала валидируют синтаксис (AST-check), семантические диффы и только потом применяют патч, если включен `apply_patch=True`.
- **Строгость валидации:** Крайне высокая. Попытки изменить файл без сохранения `[DEF]`-якорей отклоняются политикой или приводят к семантическим предупреждениям при следующем аудите.
- **Recovery:** Любая успешная мутация записывается в checkpoint (`.axiom/checkpoints`). Отмена через `rollback_workspace_change_tool` происходит атомарно.
### 3. Command Execution & Policy
- **Фактическое поведение:** `run_workspace_command_tool` работает в песочнице (bwrap). Запись вне `.axiom/temp` успешно пресекается политикой (Read-Only shell).
- **Ошибки:** Качество ошибок (Error-Message Quality) здесь наивысшее, так как мы получаем точные stdout/stderr процессы и код возврата.
### Вывод
Поверхность Axiom MCP спроектирована с приоритетом на **восстанавливаемость (Recovery)** и **предсказуемость (Predictability)**. Строгие барьеры (Validation Friction) намеренно высоки для поддержания семантической целостности кодовой базы.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff