Skip to content

AI Verification

AI-generated code needs verification. ACT builds verification into every stage of the workflow so problems are caught early, not after deployment.

ACT uses three layers of automated verification:

Every phase of /act:workflow:work runs:

Terminal window
flutter analyze

This catches:

  • Type errors and null safety violations
  • Unused imports and variables
  • Lint rule violations
  • Breaking API changes from dependency upgrades

When it runs: After every phase completion, before committing.

ACT supports three levels of testing:

Test business logic, state management, and services in isolation.

Terminal window
flutter test

Test UI behavior, rendering, and interaction at the widget level.

Test complete user journeys across screens with stable selectors and deterministic test seams. See the Robot Testing playbook.

When they run: Continuously during implementation. ACT’s TDD discipline encourages tests to be written before implementation, so they run at every RED → GREEN cycle.

The /flutter-screenshot skill captures screenshots from running apps:

Terminal window
/flutter-screenshot ./screenshots/home-screen.png

Claude reads the screenshot and verifies the UI matches expectations. This catches:

  • Layout issues that pass analysis but look wrong
  • Color and styling mismatches
  • Missing or misplaced UI elements

When it runs: On demand during implementation, typically after UI changes.

ACT encourages vertical-slice TDD for tasks marked with TDD: in the plan:

  1. RED — Write one failing test for the next behavior
  2. GREEN — Write the minimum code to pass that test
  3. REFACTOR — Clean up while all tests remain green
  4. Repeat for the next behavior

This discipline ensures:

  • Tests verify real behavior, not imagined behavior
  • Implementation is minimal — no over-engineering
  • Every change is backed by a test

The key difference from typical AI testing: ACT writes one test at a time, not a batch of tests followed by a batch of implementation. This produces honest tests that actually catch regressions.

StageVerification
SpecClarifying questions catch ambiguity early
Refine SpecAdversarial review catches gaps and wrong assumptions
PlanCodebase research ensures plan follows existing patterns
Work (each phase)flutter analyze + flutter test
Work (TDD tasks)RED → GREEN → REFACTOR cycles
Work (UI tasks)Optional screenshot verification
ShipFull test suite + analysis before PR

If AI can verify its own work with a feedback loop, it will 2-3x the quality of the final result.

The cost of running flutter analyze and flutter test after each phase is trivial. The cost of shipping broken code is not. ACT errs on the side of verification.