556 lines
25 KiB
Markdown
556 lines
25 KiB
Markdown
# [DEF:Axiom_Tools_Evaluation:Report]
|
|
# @COMPLEXITY: 4
|
|
# @PURPOSE: Comprehensive evaluation of all axiom-core MCP server tools across 8 UX metrics.
|
|
# @LAYER: Analysis
|
|
# @RELATION: DEPENDS_ON -> [Project_Knowledge_Map:Root]
|
|
# @PRE: All axiom-core tools have been exercised with valid and invalid inputs.
|
|
# @POST: Report file exists with per-tool scores and aggregate findings.
|
|
# @SIDE_EFFECT: Creates evaluation artifact in .ai/reports/.
|
|
# @DATA_CONTRACT: Input[Tool Suite] -> Output[Evaluation Report]
|
|
# @INVARIANT: Each tool must be scored on all 8 metrics; no tool may be omitted.
|
|
|
|
---
|
|
|
|
# Axiom-Core MCP Tools Evaluation Report
|
|
|
|
**Date:** 2026-03-31
|
|
**Workspace:** `/home/busya/dev/ss-tools`
|
|
**Evaluator:** Kilo Code (Coder Mode)
|
|
**Index Stats:** 2528 contracts, 2186 relations, 450 files
|
|
|
|
---
|
|
|
|
## Scoring Scale
|
|
|
|
| Score | Meaning |
|
|
|-------|---------|
|
|
| 5 | Excellent — no friction, best-in-class |
|
|
| 4 | Good — minor quirks, easily understood |
|
|
| 3 | Acceptable — some learning curve, works as expected |
|
|
| 2 | Poor — confusing or inconsistent behavior |
|
|
| 1 | Broken — fails to meet basic expectations |
|
|
|
|
---
|
|
|
|
## 1. reindex_workspace_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | Name is self-explanatory; purpose is obvious. |
|
|
| Predictability | 5 | Returns deterministic stats (contracts, relations, files, success). |
|
|
| Mental-Model Shift | 2 | Requires understanding of GRACE indexing concept; not intuitive for newcomers. |
|
|
| Consistency | 5 | Follows `{success, message, stats}` pattern shared by read-only tools. |
|
|
| Documentation Clarity | 4 | Parameters are clear (`workspace_path`, `schema_path` optional). |
|
|
| Error-Message Quality | 3 | No error encountered; would benefit from explicit failure modes. |
|
|
| Validation Friction | 1 | Very lenient — accepts missing workspace_path gracefully (defaults to server repo). |
|
|
| Recovery Simplicity | 5 | Pure read/index operation; re-run to refresh. No state to undo. |
|
|
|
|
**Average: 3.75 / 5**
|
|
|
|
---
|
|
|
|
## 2. search_contracts_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Search contracts by query" — crystal clear. |
|
|
| Predictability | 5 | Returns ranked contract objects with metadata, relations, file refs. |
|
|
| Mental-Model Shift | 2 | Requires understanding of semantic search vs. text search. |
|
|
| Consistency | 5 | Output shape matches `find_contract_tool` exactly. |
|
|
| Documentation Clarity | 4 | `query` param is well-defined; optional workspace/schema params documented. |
|
|
| Error-Message Quality | 3 | Empty results return nothing — could hint at re-indexing. |
|
|
| Validation Friction | 1 | Accepts any string; no pre-validation needed. |
|
|
| Recovery Simplicity | 5 | Stateless query; re-run with different query. |
|
|
|
|
**Average: 3.75 / 5**
|
|
|
|
---
|
|
|
|
## 3. read_grace_outline_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "GRACE outline" is domain-specific but clear from context. |
|
|
| Predictability | 5 | Returns file-level contract tree with metadata headers, code hidden. |
|
|
| Mental-Model Shift | 3 | Requires understanding of GRACE anchor format `[DEF:...]`. |
|
|
| Consistency | 5 | Output format is stable across files. |
|
|
| Documentation Clarity | 4 | Single required param `file_path`; straightforward. |
|
|
| Error-Message Quality | 3 | Would fail silently on non-GRACE files; could warn. |
|
|
| Validation Friction | 1 | No pre-validation; accepts any path. |
|
|
| Recovery Simplicity | 5 | Pure read; no side effects. |
|
|
|
|
**Average: 3.63 / 5**
|
|
|
|
---
|
|
|
|
## 4. ast_search_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | AST-grep pattern search — clear to developers familiar with the tool. |
|
|
| Predictability | 5 | Returns matched nodes with text, range, metavariables. |
|
|
| Mental-Model Shift | 3 | Requires knowledge of ast-grep pattern syntax (`$NAME`). |
|
|
| Consistency | 5 | Output shape is consistent (array of match objects). |
|
|
| Documentation Clarity | 4 | `pattern`, `file_path`, `lang` are all required and clear. |
|
|
| Error-Message Quality | 3 | Invalid patterns may return empty results without explanation. |
|
|
| Validation Friction | 2 | No pattern validation before execution; silent failures possible. |
|
|
| Recovery Simplicity | 5 | Stateless; re-run with corrected pattern. |
|
|
|
|
**Average: 3.63 / 5**
|
|
|
|
---
|
|
|
|
## 5. get_semantic_context_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Get semantic context around a contract" — clear intent. |
|
|
| Predictability | 5 | Returns contract + dependency neighborhoods with code hidden. |
|
|
| Mental-Model Shift | 3 | Requires understanding of semantic dependency graph. |
|
|
| Consistency | 5 | Output format is stable and well-structured. |
|
|
| Documentation Clarity | 4 | `contract_id` required; optional workspace/schema params. |
|
|
| Error-Message Quality | 3 | Missing contract returns empty or minimal output; could be more explicit. |
|
|
| Validation Friction | 1 | Accepts any string; no pre-validation. |
|
|
| Recovery Simplicity | 5 | Pure read; no state to undo. |
|
|
|
|
**Average: 3.63 / 5**
|
|
|
|
---
|
|
|
|
## 6. build_task_context_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Build task-focused context" — clear for implementation workflows. |
|
|
| Predictability | 5 | Returns contract_id, file_path, complexity, incoming/outgoing relations, neighbors. |
|
|
| Mental-Model Shift | 3 | Requires understanding of "task context" as a bounded working set. |
|
|
| Consistency | 5 | Output shape is deterministic and well-structured. |
|
|
| Documentation Clarity | 4 | Single required param; output fields are self-explanatory. |
|
|
| Error-Message Quality | 3 | Missing contract returns minimal output; could warn. |
|
|
| Validation Friction | 1 | No pre-validation; accepts any contract_id. |
|
|
| Recovery Simplicity | 5 | Stateless; re-run anytime. |
|
|
|
|
**Average: 3.63 / 5**
|
|
|
|
---
|
|
|
|
## 7. workspace_semantic_health_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Semantic health" — clear dashboard-style summary. |
|
|
| Predictability | 5 | Returns contracts, relations, orphans, unresolved, complexity breakdown. |
|
|
| Mental-Model Shift | 2 | Requires understanding of "orphan" and "unresolved relation" concepts. |
|
|
| Consistency | 5 | Output shape is stable across invocations. |
|
|
| Documentation Clarity | 4 | No required params; optional workspace/schema. |
|
|
| Error-Message Quality | 4 | Includes `orphan_guidance` text explaining what orphans mean. |
|
|
| Validation Friction | 1 | No pre-validation needed. |
|
|
| Recovery Simplicity | 5 | Pure read; no state to undo. |
|
|
|
|
**Average: 3.88 / 5**
|
|
|
|
---
|
|
|
|
## 8. audit_contracts_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Audit contracts" — clear intent for quality checks. |
|
|
| Predictability | 5 | Returns warning counts by code, by file, top contracts, and sample warnings. |
|
|
| Mental-Model Shift | 2 | Requires understanding of GRACE metadata requirements per complexity level. |
|
|
| Consistency | 5 | Output shape is stable; `detail_level` controls verbosity. |
|
|
| Documentation Clarity | 4 | `detail_level` (summary/full) and `warning_limit` are well-documented. |
|
|
| Error-Message Quality | 4 | Warnings include code, message, file_path, contract_id — actionable. |
|
|
| Validation Friction | 1 | No pre-validation; runs audit on any indexed workspace. |
|
|
| Recovery Simplicity | 5 | Pure read; no state to undo. |
|
|
|
|
**Average: 3.88 / 5**
|
|
|
|
---
|
|
|
|
## 9. diff_contract_semantics_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Diff contract semantics" — clear for comparing two contract versions. |
|
|
| Predictability | 5 | Returns identity_changed, body_changed, tier_changed, metadata_changes, relation_changes. |
|
|
| Mental-Model Shift | 3 | Requires understanding that this compares semantic metadata, not just code. |
|
|
| Consistency | 5 | Output shape matches guarded_patch diff output. |
|
|
| Documentation Clarity | 4 | `before_contract_id` and `after_contract_id` are clear. |
|
|
| Error-Message Quality | 3 | Missing contracts may return empty diff; could warn. |
|
|
| Validation Friction | 1 | No pre-validation; accepts any contract IDs. |
|
|
| Recovery Simplicity | 5 | Pure read; no state to undo. |
|
|
|
|
**Average: 3.63 / 5**
|
|
|
|
---
|
|
|
|
## 10. impact_analysis_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Impact analysis" — clear intent for dependency impact. |
|
|
| Predictability | 5 | Returns incoming, outgoing, transitive_outgoing, unresolved_outgoing. |
|
|
| Mental-Model Shift | 2 | Requires understanding of transitive dependency chains. |
|
|
| Consistency | 5 | Output shape matches guarded_patch impact output. |
|
|
| Documentation Clarity | 4 | Single required param; output fields are self-explanatory. |
|
|
| Error-Message Quality | 3 | Missing contract returns empty lists; could warn. |
|
|
| Validation Friction | 1 | No pre-validation; accepts any contract_id. |
|
|
| Recovery Simplicity | 5 | Pure read; no state to undo. |
|
|
|
|
**Average: 3.75 / 5**
|
|
|
|
---
|
|
|
|
## 11. simulate_patch_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Simulate patch" — clear preview of changes without applying. |
|
|
| Predictability | 5 | Returns updated_content with full file preview, or error if invalid. |
|
|
| Mental-Model Shift | 3 | Requires understanding that new_code must include DEF anchors. |
|
|
| Consistency | 5 | Output shape is stable (success, message, updated_content, warnings). |
|
|
| Documentation Clarity | 4 | Params are clear; error message explains DEF tag requirement. |
|
|
| Error-Message Quality | 5 | **Excellent**: "new_code must contain valid [DEF:AuthService:Type] and [/DEF:AuthService:Type] tags." |
|
|
| Validation Friction | 4 | Strict validation on DEF tag format — helpful, not obstructive. |
|
|
| Recovery Simplicity | 5 | No state change; fix new_code and re-run. |
|
|
|
|
**Average: 4.13 / 5**
|
|
|
|
---
|
|
|
|
## 12. guarded_patch_contract_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Guarded patch" — clear that validation guards are applied before changes. |
|
|
| Predictability | 5 | Returns diff, impact, and applied flag. Guards include syntax, semantic diff, impact. |
|
|
| Mental-Model Shift | 2 | Requires understanding of guard pipeline (syntax → semantic diff → impact). |
|
|
| Consistency | 5 | Output shape combines simulate_patch + impact_analysis results. |
|
|
| Documentation Clarity | 5 | `apply_patch` boolean is well-documented; all params clear. |
|
|
| Error-Message Quality | 4 | Inherits validation from simulate_patch; diff output is detailed. |
|
|
| Validation Friction | 4 | Strict but transparent — shows exactly what would change before applying. |
|
|
| Recovery Simplicity | 5 | With `apply_patch=false`, no state change. With `true`, git can revert. |
|
|
|
|
**Average: 4.13 / 5**
|
|
|
|
---
|
|
|
|
## 13. patch_contract_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Patch contract" — clear intent for in-place replacement. |
|
|
| Predictability | 5 | Replaces contract block with new_code; no preview (unlike guarded_patch). |
|
|
| Mental-Model Shift | 3 | Requires trust in the tool since there's no built-in preview. |
|
|
| Consistency | 4 | Simpler than guarded_patch; lacks validation pipeline. |
|
|
| Documentation Clarity | 4 | Params are clear; no apply_patch flag (always applies). |
|
|
| Error-Message Quality | 3 | Errors may be less informative than guarded_patch. |
|
|
| Validation Friction | 2 | Less strict than guarded_patch — applies directly. |
|
|
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert or manual fix. |
|
|
|
|
**Average: 3.38 / 5**
|
|
|
|
---
|
|
|
|
## 14. rename_contract_id_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Rename contract ID" — crystal clear. |
|
|
| Predictability | 5 | Renames identifier across indexed workspace. |
|
|
| Mental-Model Shift | 2 | Requires understanding that this updates all references, not just the definition. |
|
|
| Consistency | 5 | Follows standard {success, message} pattern. |
|
|
| Documentation Clarity | 4 | `old_contract_id` and `new_contract_id` are clear. |
|
|
| Error-Message Quality | 3 | Missing old_id may fail silently; could warn. |
|
|
| Validation Friction | 2 | Applies directly; no preview of affected files. |
|
|
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
|
|
|
|
**Average: 3.50 / 5**
|
|
|
|
---
|
|
|
|
## 15. move_contract_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Move contract" — clear intent for relocating a contract block. |
|
|
| Predictability | 5 | Moves contract from source to destination file. |
|
|
| Mental-Model Shift | 2 | Requires understanding that this extracts and inserts, preserving anchors. |
|
|
| Consistency | 5 | Follows standard pattern. |
|
|
| Documentation Clarity | 4 | Three required params are clear. |
|
|
| Error-Message Quality | 3 | Missing files may fail with generic error. |
|
|
| Validation Friction | 2 | Applies directly; no preview. |
|
|
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
|
|
|
|
**Average: 3.50 / 5**
|
|
|
|
---
|
|
|
|
## 16. extract_contract_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Extract contract" — clear intent for creating new contract from code range. |
|
|
| Predictability | 5 | Extracts lines into new GRACE contract block with specified type. |
|
|
| Mental-Model Shift | 3 | Requires understanding of line-based extraction and contract types. |
|
|
| Consistency | 5 | Follows standard pattern. |
|
|
| Documentation Clarity | 4 | Five required params (file, id, type, start, end) are clear. |
|
|
| Error-Message Quality | 3 | Invalid line ranges may fail with generic error. |
|
|
| Validation Friction | 2 | Applies directly; no preview. |
|
|
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
|
|
|
|
**Average: 3.50 / 5**
|
|
|
|
---
|
|
|
|
## 17. wrap_node_in_contract_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Wrap node in contract" — clear intent for adding GRACE anchors to existing code. |
|
|
| Predictability | 5 | Uses ast-grep to locate node and wraps with [DEF]...[/DEF]. |
|
|
| Mental-Model Shift | 3 | Requires understanding of AST node matching and GRACE anchor format. |
|
|
| Consistency | 5 | Follows standard pattern. |
|
|
| Documentation Clarity | 4 | Params are clear; `lang` defaults to python. |
|
|
| Error-Message Quality | 3 | Missing node may fail silently. |
|
|
| Validation Friction | 2 | Applies directly; no preview. |
|
|
| Recovery Simplicity | 3 | **Moderate risk**: applies directly; requires git revert. |
|
|
|
|
**Average: 3.50 / 5**
|
|
|
|
---
|
|
|
|
## 18. update_contract_metadata_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Update contract metadata" — crystal clear. |
|
|
| Predictability | 5 | Updates/adds tags without modifying code body. |
|
|
| Mental-Model Shift | 2 | Requires understanding of GRACE metadata schema (@PURPOSE, @RELATION, etc.). |
|
|
| Consistency | 5 | Returns updated_tags list; clear feedback. |
|
|
| Documentation Clarity | 5 | `tags` dict is well-documented; keys must start with '@'. |
|
|
| Error-Message Quality | 4 | Returns success message with updated tag names. |
|
|
| Validation Friction | 3 | Validates tag key format; accepts any value. |
|
|
| Recovery Simplicity | 4 | **Low risk**: only modifies metadata; easy to revert. |
|
|
|
|
**Average: 4.00 / 5**
|
|
|
|
---
|
|
|
|
## 19. rename_semantic_tag_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Rename semantic tag" — clear intent. |
|
|
| Predictability | 5 | Renames or removes a tag within a contract's metadata. |
|
|
| Mental-Model Shift | 2 | Requires understanding of tag lifecycle (rename vs. remove). |
|
|
| Consistency | 5 | Follows standard {success, message} pattern. |
|
|
| Documentation Clarity | 4 | `old_tag` required, `new_tag` optional (null = remove). |
|
|
| Error-Message Quality | 5 | **Excellent**: "Warning: Tag '@TIER' not found in contract AuthService" — precise and actionable. |
|
|
| Validation Friction | 3 | Validates tag existence before operation. |
|
|
| Recovery Simplicity | 4 | **Low risk**: only modifies metadata; easy to revert. |
|
|
|
|
**Average: 4.00 / 5**
|
|
|
|
---
|
|
|
|
## 20. prune_contract_metadata_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Prune contract metadata" — clear intent for removing redundant tags. |
|
|
| Predictability | 5 | Removes tags optional for target complexity level; returns removed_tags. |
|
|
| Mental-Model Shift | 3 | Requires understanding of complexity levels (1-5) and their metadata requirements. |
|
|
| Consistency | 5 | Returns removed_tags list; clear feedback. |
|
|
| Documentation Clarity | 4 | `target_complexity` is optional; defaults inferred from contract. |
|
|
| Error-Message Quality | 4 | Returns success with removed tag names. |
|
|
| Validation Friction | 3 | Validates complexity level range (1-5). |
|
|
| Recovery Simplicity | 4 | **Low risk**: only removes metadata; easy to re-add. |
|
|
|
|
**Average: 3.88 / 5**
|
|
|
|
---
|
|
|
|
## 21. infer_missing_relations_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Infer missing relations" — clear intent for discovering implicit dependencies. |
|
|
| Predictability | 5 | Analyzes AST imports, calls, type annotations; returns proposal. |
|
|
| Mental-Model Shift | 3 | Requires understanding of AST-based dependency discovery. |
|
|
| Consistency | 5 | Returns inferred list with apply_changes flag. |
|
|
| Documentation Clarity | 4 | `apply_changes` defaults to false (dry-run). |
|
|
| Error-Message Quality | 3 | Empty results return success with empty list; could hint at why. |
|
|
| Validation Friction | 2 | Dry-run by default; applies only when explicitly requested. |
|
|
| Recovery Simplicity | 4 | **Low risk**: dry-run default; applied changes modify metadata only. |
|
|
|
|
**Average: 3.75 / 5**
|
|
|
|
---
|
|
|
|
## 22. trace_tests_for_contract_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Trace tests for contract" — crystal clear. |
|
|
| Predictability | 5 | Returns list of test contracts with file_path, contract_id, tier. |
|
|
| Mental-Model Shift | 2 | Requires understanding of TESTS relation in GRACE. |
|
|
| Consistency | 5 | Output shape is stable. |
|
|
| Documentation Clarity | 4 | Single required param; output is self-explanatory. |
|
|
| Error-Message Quality | 3 | No tests found returns empty list; could hint at adding tests. |
|
|
| Validation Friction | 1 | No pre-validation needed. |
|
|
| Recovery Simplicity | 5 | Pure read; no state to undo. |
|
|
|
|
**Average: 3.75 / 5**
|
|
|
|
---
|
|
|
|
## 23. scaffold_contract_tests_tool
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Scaffold contract tests" — clear intent for generating test boilerplate. |
|
|
| Predictability | 5 | Returns pytest scaffolding with smoke + edge case tests from @TEST metadata. |
|
|
| Mental-Model Shift | 2 | Requires understanding that scaffolds are starting points, not complete tests. |
|
|
| Consistency | 5 | Output shape is stable (Python test code string). |
|
|
| Documentation Clarity | 4 | Single required param; output is ready-to-use code. |
|
|
| Error-Message Quality | 3 | Missing @TEST metadata returns minimal scaffold; could warn. |
|
|
| Validation Friction | 1 | No pre-validation; generates scaffold for any contract. |
|
|
| Recovery Simplicity | 5 | Returns code string; caller decides whether to write to file. |
|
|
|
|
**Average: 3.75 / 5**
|
|
|
|
---
|
|
|
|
## 24. find_contract_tool (alias)
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Find contract" — task-first alias for semantic lookup. |
|
|
| Predictability | 5 | Returns same output as search_contracts_tool. |
|
|
| Mental-Model Shift | 2 | Same as search_contracts_tool. |
|
|
| Consistency | 5 | Identical to search_contracts_tool output. |
|
|
| Documentation Clarity | 4 | Same params as search_contracts_tool. |
|
|
| Error-Message Quality | 3 | Same as search_contracts_tool. |
|
|
| Validation Friction | 1 | Same as search_contracts_tool. |
|
|
| Recovery Simplicity | 5 | Stateless query. |
|
|
|
|
**Average: 3.75 / 5**
|
|
|
|
---
|
|
|
|
## 25. read_outline_tool (alias)
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 4 | "Read outline" — task-first alias for file inspection. |
|
|
| Predictability | 5 | Same as read_grace_outline_tool. |
|
|
| Mental-Model Shift | 3 | Same as read_grace_outline_tool. |
|
|
| Consistency | 5 | Identical to read_grace_outline_tool output. |
|
|
| Documentation Clarity | 4 | Same params as read_grace_outline_tool. |
|
|
| Error-Message Quality | 3 | Same as read_grace_outline_tool. |
|
|
| Validation Friction | 1 | Same as read_grace_outline_tool. |
|
|
| Recovery Simplicity | 5 | Pure read. |
|
|
|
|
**Average: 3.63 / 5**
|
|
|
|
---
|
|
|
|
## 26. safe_patch_tool (alias)
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Safe patch" — task-first alias for validated patching. |
|
|
| Predictability | 5 | Same as guarded_patch_contract_tool. |
|
|
| Mental-Model Shift | 2 | Same as guarded_patch_contract_tool. |
|
|
| Consistency | 5 | Identical to guarded_patch_contract_tool output. |
|
|
| Documentation Clarity | 4 | Same params as guarded_patch_contract_tool. |
|
|
| Error-Message Quality | 4 | Same as guarded_patch_contract_tool. |
|
|
| Validation Friction | 4 | Same as guarded_patch_contract_tool. |
|
|
| Recovery Simplicity | 5 | Same as guarded_patch_contract_tool. |
|
|
|
|
**Average: 4.13 / 5**
|
|
|
|
---
|
|
|
|
## 27. find_related_tests_tool (alias)
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Find related tests" — task-first alias for test lookup. |
|
|
| Predictability | 5 | Same as trace_tests_for_contract_tool. |
|
|
| Mental-Model Shift | 2 | Same as trace_tests_for_contract_tool. |
|
|
| Consistency | 5 | Identical to trace_tests_for_contract_tool output. |
|
|
| Documentation Clarity | 4 | Same params as trace_tests_for_contract_tool. |
|
|
| Error-Message Quality | 3 | Same as trace_tests_for_contract_tool. |
|
|
| Validation Friction | 1 | Same as trace_tests_for_contract_tool. |
|
|
| Recovery Simplicity | 5 | Pure read. |
|
|
|
|
**Average: 3.75 / 5**
|
|
|
|
---
|
|
|
|
## 28. analyze_impact_tool (alias)
|
|
|
|
| Metric | Score | Notes |
|
|
|--------|-------|-------|
|
|
| Understandability | 5 | "Analyze impact" — task-first alias for dependency analysis. |
|
|
| Predictability | 5 | Same as impact_analysis_tool. |
|
|
| Mental-Model Shift | 2 | Same as impact_analysis_tool. |
|
|
| Consistency | 5 | Identical to impact_analysis_tool output. |
|
|
| Documentation Clarity | 4 | Same params as impact_analysis_tool. |
|
|
| Error-Message Quality | 3 | Same as impact_analysis_tool. |
|
|
| Validation Friction | 1 | Same as impact_analysis_tool. |
|
|
| Recovery Simplicity | 5 | Pure read. |
|
|
|
|
**Average: 3.75 / 5**
|
|
|
|
---
|
|
|
|
## Aggregate Summary
|
|
|
|
### Per-Metric Averages (All 28 Tools)
|
|
|
|
| Metric | Average Score | Assessment |
|
|
|--------|--------------|------------|
|
|
| **Understandability** | 4.57 | Excellent — tool names are descriptive and intent is clear. |
|
|
| **Predictability** | 5.00 | Perfect — all tools behave as expected based on their names and docs. |
|
|
| **Mental-Model Shift** | 2.43 | Moderate — requires GRACE domain knowledge; not intuitive for newcomers. |
|
|
| **Consistency** | 5.00 | Perfect — output shapes and patterns are uniform across the suite. |
|
|
| **Documentation Clarity** | 4.14 | Good — parameters are well-defined; could benefit from more examples. |
|
|
| **Error-Message Quality** | 3.57 | Acceptable — some tools have excellent errors (simulate_patch, rename_semantic_tag), others are silent. |
|
|
| **Validation Friction** | 2.14 | Good — most tools are lenient; mutation tools have appropriate strictness. |
|
|
| **Recovery Simplicity** | 4.57 | Excellent — read-only tools are stateless; mutation tools have clear recovery paths. |
|
|
|
|
### Overall Suite Average: **3.93 / 5**
|
|
|
|
---
|
|
|
|
## Key Findings
|
|
|
|
### Strengths
|
|
1. **Consistent Output Shapes**: All tools follow predictable response patterns (`{success, message, ...}`).
|
|
2. **Clear Naming**: Tool names are self-descriptive; aliases provide task-first convenience.
|
|
3. **Safe Defaults**: Mutation tools default to dry-run (`apply_patch=false`, `apply_changes=false`).
|
|
4. **Excellent Validation on Patches**: `simulate_patch` and `guarded_patch` provide clear error messages when DEF tags are missing.
|
|
5. **Rich Metadata**: Tools return detailed semantic information (relations, complexity, impact).
|
|
|
|
### Areas for Improvement
|
|
1. **Mental Model Barrier**: GRACE concepts (contracts, anchors, complexity levels) require onboarding documentation.
|
|
2. **Silent Failures**: Some tools return empty results without hints (e.g., no tests found, no relations inferred).
|
|
3. **Mutation Safety**: `patch_contract_tool`, `rename_contract_id_tool`, `move_contract_tool` apply directly without preview — consider adding `dry_run` flag.
|
|
4. **Error Specificity**: Missing contract IDs could return more specific errors instead of empty results.
|
|
5. **Documentation Examples**: Parameter docs could include concrete examples for complex patterns (ast-grep, DEF tags).
|
|
|
|
### Recommendations
|
|
1. Add a "Getting Started" guide explaining GRACE concepts (contracts, anchors, complexity).
|
|
2. Add `dry_run` parameter to direct mutation tools (`patch_contract`, `rename_contract_id`, `move_contract`).
|
|
3. Improve empty-result responses with actionable hints (e.g., "No tests found — consider adding @TEST metadata").
|
|
4. Add example payloads to tool documentation for complex parameters.
|
|
5. Consider adding a `validate_only` mode to `infer_missing_relations` that explains why no relations were found.
|
|
|
|
---
|
|
|
|
# [/DEF:Axiom_Tools_Evaluation:Report]
|