ss-tools/speckit.test.md at 027-dataset-llm-orchestration

Files

2026-04-01 13:29:41 +03:00

8.6 KiB

Raw Permalink Blame History

description

description
Generate tests, manage test documentation, and ensure maximum code coverage

User Input

$ARGUMENTS

You MUST consider the user input before proceeding (if not empty).

Goal

Execute semantic audit and full testing cycle: verify contract compliance, verify decision-memory continuity, emulate logic, ensure maximum coverage, and maintain test quality.

Operating Constraints

NEVER delete existing tests - Only update if they fail due to bugs in the test or implementation
NEVER duplicate tests - Check existing tests first before creating new ones
Use TEST_FIXTURE fixtures - For CRITICAL tier modules, read @TEST_FIXTURE from .ai/standards/semantics.md
Co-location required - Write tests in __tests__ directories relative to the code being tested
Decision-memory regression guard - Tests and audits must not normalize silent reintroduction of any path documented in upstream @REJECTED

Execution Steps

1. Analyze Context

Run .specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks from repo root and parse FEATURE_DIR and AVAILABLE_DOCS.

Determine:

FEATURE_DIR - where the feature is located
TASKS_FILE - path to tasks.md
Which modules need testing based on task status
Which ADRs or task guardrails define rejected paths for the touched scope

2. Load Relevant Artifacts

From tasks.md:

Identify completed implementation tasks (not test tasks)
Extract file paths that need tests
Extract guardrail summaries and blocked paths

From .ai/standards/semantics.md:

Read effective complexity expectations
Read decision-memory rules for ADR, preventive guardrails, and reactive Micro-ADR
For CRITICAL modules: Read @TEST_ fixtures

From ADR sources and touched code:

Read [DEF:id:ADR] nodes when present
Read local @RATIONALE and @REJECTED in touched contracts

From existing tests:

Scan __tests__ directories for existing tests
Identify test patterns and coverage gaps

3. Test Coverage Analysis

Create coverage matrix:

Module	File	Has Tests	Complexity / Tier	TEST_FIXTURE Available	Rejected Path Guarded
...	...	...	...	...	...

4. Semantic Audit & Logic Emulation (CRITICAL)

Before writing tests, the Tester MUST:

Run axiom-core.audit_contracts_tool: Identify semantic violations.
Run a protocol-shape review on touched files:
- Reject non-canonical semantic markup, including docstring-only annotations such as @PURPOSE, @PRE, or @INVARIANT written inside class/function docstrings without canonical [DEF]...[/DEF] anchors and header metadata.
- Reject files whose effective complexity contract is under-specified relative to .ai/standards/semantics.md.
- Reject Python Complexity 4+ modules that omit meaningful logger.reason() / logger.reflect() checkpoints.
- Reject Python Complexity 5 modules that omit belief_scope(...), @DATA_CONTRACT, or @INVARIANT.
- Treat broken or missing closing anchors as blocking violations.
- Reject retained workaround code if the local contract lacks @RATIONALE / @REJECTED.
- Reject code that silently re-enables a path declared in upstream ADR or local guardrails as rejected.
Emulate Algorithm: Step through the code implementation in mind.
- Verify it adheres to the @PURPOSE and @INVARIANT.
- Verify @PRE and @POST conditions are correctly handled.
- Verify the implementation follows accepted-path rationale rather than drifting into a blocked path.
Validation Verdict:
- If audit fails: Emit [AUDIT_FAIL: semantic_noncompliance] with concrete file-path reasons and notify Orchestrator.
- Example blocking case: backend/src/services/dataset_review/repositories/session_repository.py contains a module anchor, but its nested repository class/method semantics are expressed as loose docstrings instead of canonical anchored contracts; this MUST be rejected until remediated or explicitly waived.
- If audit passes: Proceed to writing/verifying tests.

5. Write Tests (TDD Approach)

For each module requiring tests:

Check existing tests: Scan __tests__/ for duplicates.
Read TEST_FIXTURE: If CRITICAL tier, read @TEST_FIXTURE from semantics header.
Do not normalize broken semantics through tests:
- The Tester must not write tests that silently accept malformed semantic protocol usage.
- If implementation is semantically invalid, stop and reject instead of adapting tests around the invalid structure.
Write test: Follow co-location strategy.
- Python: src/module/__tests__/test_module.py
- Svelte: src/lib/components/__tests__/test_component.test.js
Use mocks: Use unittest.mock.MagicMock for external dependencies
Add rejected-path regression coverage when relevant:
- If ADR or local contract names a blocked path in @REJECTED, add or verify at least one test or explicit audit check that would fail if that forbidden path were silently restored.

4a. UX Contract Testing (Frontend Components)

For Svelte components with @UX_STATE, @UX_FEEDBACK, @UX_RECOVERY tags:

Parse UX tags: Read component file and extract all @UX_* annotations

Generate UX tests: Create tests for each UX state transition

// Example: Testing @UX_STATE: Idle -> Expanded
it('should transition from Idle to Expanded on toggle click', async () => {
  render(Sidebar);
  const toggleBtn = screen.getByRole('button', { name: /toggle/i });
  await fireEvent.click(toggleBtn);
  expect(screen.getByTestId('sidebar')).toHaveClass('expanded');
});

Test @UX_FEEDBACK: Verify visual feedback (toast, shake, color changes)
Test @UX_RECOVERY: Verify error recovery mechanisms (retry, clear input)
Use @UX_TEST fixtures: If component has @UX_TEST tags, use them as test specifications
Verify decision memory: If the UI contract declares @REJECTED, ensure browser-visible behavior does not regress into the rejected path.

UX Test Template:

// [DEF:ComponentUXTests:Module]
// @C: 3
// @RELATION: VERIFIES -> ../Component.svelte
// @PURPOSE: Test UX states and transitions

describe('Component UX States', () => {
  // @UX_STATE: Idle -> {action: click, expected: Active}
  it('should transition Idle -> Active on click', async () => { ... });
  
  // @UX_FEEDBACK: Toast on success
  it('should show toast on successful action', async () => { ... });
  
  // @UX_RECOVERY: Retry on error
  it('should allow retry on error', async () => { ... });
});
// [/DEF:__tests__/test_Component:Module]

5. Test Documentation

Create/update documentation in specs/<feature>/tests/:

tests/
├── README.md           # Test strategy and overview
├── coverage.md         # Coverage matrix and reports
└── reports/
    └── YYYY-MM-DD-report.md

Include decision-memory coverage notes when ADR or rejected-path regressions were checked.

6. Execute Tests

Run tests and report results:

Backend:

cd backend && .venv/bin/python3 -m pytest -v

Frontend:

cd frontend && npm run test

7. Update Tasks

Mark test tasks as completed in tasks.md with:

Test file path
Coverage achieved
Any issues found
Whether rejected-path regression checks passed or remain manual audit items

Output

Generate test execution report:

# Test Report: [FEATURE]

**Date**: [YYYY-MM-DD]
**Executed by**: Tester Agent

## Coverage Summary

| Module | Tests | Coverage % |
|--------|-------|------------|
| ... | ... | ... |

## Test Results

- Total: [X]
- Passed: [X]
- Failed: [X]
- Skipped: [X]

## Semantic Audit Verdict

- Verdict: PASS | FAIL
- Blocking Violations:
  - [file path] -> [reason]
- Decision Memory:
  - ADRs checked: [...]
  - Rejected-path regressions: PASS | FAIL
  - Missing `@RATIONALE` / `@REJECTED`: [...]
- Notes:
  - Reject docstring-only semantic pseudo-markup
  - Reject complexity/contract mismatches
  - Reject missing belief-state instrumentation for Python Complexity 4/5
  - Reject silent resurrection of rejected paths

## Issues Found

| Test | Error | Resolution |
|------|-------|------------|
| ... | ... | ... |

## Next Steps

- [ ] Fix failed tests
- [ ] Fix blocking semantic violations before acceptance
- [ ] Fix decision-memory drift or rejected-path regressions
- [ ] Add more coverage for [module]
- [ ] Review TEST_FIXTURE fixtures

Context for Testing

$ARGUMENTS

8.6 KiB Raw Permalink Blame History