mcp tuning

This commit is contained in:
2026-04-01 13:29:41 +03:00
parent 586229a974
commit 1e46073dd6
19 changed files with 1324 additions and 28593 deletions

View File

@@ -4,7 +4,7 @@ description: Generate a custom checklist for the current feature based on user r
## Checklist Purpose: "Unit Tests for English"
**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain.
**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, completeness, and decision-memory readiness of requirements in a given domain.
**NOT for verification/testing**:
@@ -20,6 +20,7 @@ description: Generate a custom checklist for the current feature based on user r
- ✅ "Are hover state requirements consistent across all interactive elements?" (consistency)
- ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage)
- ✅ "Does the spec define what happens when logo image fails to load?" (edge cases)
- ✅ "Do repo-shaping choices have explicit rationale and rejected alternatives before task decomposition?" (decision memory)
**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works.
@@ -47,7 +48,7 @@ You **MUST** consider the user input before proceeding (if not empty).
1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts").
2. Cluster signals into candidate focus areas (max 4) ranked by relevance.
3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit.
4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria.
4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria, decision-memory needs.
5. Formulate questions chosen from these archetypes:
- Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?")
- Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?")
@@ -55,6 +56,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- Audience framing (e.g., "Will this be used by the author only or peers during PR review?")
- Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?")
- Scenario class gap (e.g., "No recovery flows detected—are rollback / partial failure paths in scope?")
- Decision-memory gap (e.g., "Do we need explicit ADR and rejected-path checks for this feature?")
Question formatting rules:
- If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters
@@ -76,9 +78,10 @@ You **MUST** consider the user input before proceeding (if not empty).
- Infer any missing context from spec/plan/tasks (do NOT hallucinate)
4. **Load feature context**: Read from FEATURE_DIR:
- spec.md: Feature requirements and scope
- plan.md (if exists): Technical details, dependencies
- tasks.md (if exists): Implementation tasks
- `spec.md`: Feature requirements and scope
- `plan.md` (if exists): Technical details, dependencies, ADR references
- `tasks.md` (if exists): Implementation tasks and inherited guardrails
- ADR artifacts (if present): `[DEF:id:ADR]`, `@RATIONALE`, `@REJECTED`
**Context Loading Strategy**:
- Load only necessary portions relevant to active focus areas (avoid full-file dumping)
@@ -102,6 +105,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- **Consistency**: Do requirements align with each other?
- **Measurability**: Can requirements be objectively verified?
- **Coverage**: Are all scenarios/edge cases addressed?
- **Decision Memory**: Are durable choices and rejected alternatives explicit before implementation starts?
**Category Structure** - Group items by requirement quality dimensions:
- **Requirement Completeness** (Are all necessary requirements documented?)
@@ -112,6 +116,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- **Edge Case Coverage** (Are boundary conditions defined?)
- **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?)
- **Dependencies & Assumptions** (Are they documented and validated?)
- **Decision Memory & ADRs** (Are architectural choices, rationale, and rejected paths explicit?)
- **Ambiguities & Conflicts** (What needs clarification?)
**HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**:
@@ -127,8 +132,8 @@ You **MUST** consider the user input before proceeding (if not empty).
- "Are hover state requirements consistent across all interactive elements?" [Consistency]
- "Are keyboard navigation requirements defined for all interactive UI?" [Coverage]
- "Is the fallback behavior specified when logo image fails to load?" [Edge Cases]
- "Are loading states defined for asynchronous episode data?" [Completeness]
- "Does the spec define visual hierarchy for competing UI elements?" [Clarity]
- "Are blocking architecture decisions recorded with explicit rationale and rejected alternatives before task generation?" [Decision Memory]
- "Does the plan make clear which implementation shortcuts are forbidden for this feature?" [Decision Memory, Gap]
**ITEM STRUCTURE**:
Each item should follow this pattern:
@@ -163,6 +168,11 @@ You **MUST** consider the user input before proceeding (if not empty).
- "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]"
- "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]"
Decision Memory:
- "Do all repo-shaping technical choices have explicit rationale before tasks are generated? [Decision Memory, Plan]"
- "Are rejected alternatives documented for architectural branches that would materially change implementation scope? [Decision Memory, Gap]"
- "Can a coder determine from the planning artifacts which tempting shortcut is forbidden? [Decision Memory, Clarity]"
**Scenario Classification & Coverage** (Requirements Quality Focus):
- Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios
- For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?"
@@ -171,7 +181,7 @@ You **MUST** consider the user input before proceeding (if not empty).
**Traceability Requirements**:
- MINIMUM: ≥80% of items MUST include at least one traceability reference
- Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]`
- Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]`, `[ADR]`
- If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]"
**Surface & Resolve Issues** (Requirements Quality Problems):
@@ -181,6 +191,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]"
- Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]"
- Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]"
- Decision-memory drift: "Do tasks inherit the same rejected-path guardrails defined in planning? [Decision Memory, Conflict]"
**Content Consolidation**:
- Soft cap: If raw candidate items > 40, prioritize by risk/impact
@@ -193,7 +204,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- ❌ "Displays correctly", "works properly", "functions as expected"
- ❌ "Click", "navigate", "render", "load", "execute"
- ❌ Test cases, test plans, QA procedures
- ❌ Implementation details (frameworks, APIs, algorithms)
- ❌ Implementation details (frameworks, APIs, algorithms) unless the checklist is asking whether those decisions were explicitly documented and bounded by rationale/rejected alternatives
**✅ REQUIRED PATTERNS** - These test requirements quality:
- ✅ "Are [requirement type] defined/specified/documented for [scenario]?"
@@ -202,6 +213,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- ✅ "Can [requirement] be objectively measured/verified?"
- ✅ "Are [edge cases/scenarios] addressed in requirements?"
- ✅ "Does the spec define [missing aspect]?"
- ✅ "Does the plan record why [accepted path] was chosen and why [rejected path] is forbidden?"
6. **Structure Reference**: Generate the checklist following the canonical template in `.specify/templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### <requirement item>` lines with globally incrementing IDs starting at CHK001.
@@ -210,6 +222,7 @@ You **MUST** consider the user input before proceeding (if not empty).
- Depth level
- Actor/timing
- Any explicit user-specified must-have items incorporated
- Whether ADR / decision-memory checks were included
**Important**: Each `/speckit.checklist` command invocation creates a checklist file using short, descriptive names unless file already exists. This allows:
@@ -262,6 +275,15 @@ Sample items:
- "Are security requirements consistent with compliance obligations? [Consistency]"
- "Are security failure/breach response requirements defined? [Gap, Exception Flow]"
**Architecture Decision Quality:** `architecture.md`
Sample items:
- "Do all repo-shaping architecture choices have explicit rationale before tasks are generated? [Decision Memory]"
- "Are rejected alternatives documented for each blocking technology branch? [Decision Memory, Gap]"
- "Can an implementer tell which shortcuts are forbidden without re-reading research artifacts? [Clarity, ADR]"
- "Are ADR decisions traceable to requirements or constraints in the spec? [Traceability, ADR]"
## Anti-Examples: What NOT To Do
**❌ WRONG - These test implementation, not requirements:**
@@ -282,6 +304,7 @@ Sample items:
- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005]
- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap]
- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001]
- [ ] CHK007 - Do planning artifacts state why the accepted architecture was chosen and which alternative is rejected? [Decision Memory, ADR]
```
**Key Differences:**