AI Verification
If there is one thing that makes AI agents dramatically more effective, it is giving them a verification loop.
When an agent can check the results of its own work against its expectations, it can catch mistakes early, adjust course, and keep improving the output instead of blindly moving forward.
That idea is especially important in app development, where code can be technically correct and still be wrong in practice.
Why Flutter is a great fit
Section titled “Why Flutter is a great fit”One reason Flutter is such a good fit for AI-assisted development is that strong verification can happen very quickly.
If your codebase has good unit test and widget test coverage, agents can validate a large share of their work in headless mode:
flutter testThis has two major advantages:
- No emulator or simulator is required
- Feedback is fast enough to support tight AI development loops
That speed matters. Fast tests make it practical for an agent to try something, verify it, fix issues, and repeat many times in a single session.
How ACT uses verification loops
Section titled “How ACT uses verification loops”ACT is designed to push agents toward workflows where verification happens continuously, not at the very end.
Static analysis
Section titled “Static analysis”Every phase of /act-workflow-work runs:
flutter analyzeThis catches issues such as:
- Type errors and null safety violations
- Unused imports and variables
- Lint rule violations
- Breaking API changes from dependency upgrades
Test-driven implementation
Section titled “Test-driven implementation”ACT workflows encourage agents to take a TDD approach during implementation, and the act-flutter-tdd skill is specifically designed to apply that during the work phase.
In practice, that means writing code in a way that is covered by tests as it is being built, which helps prevent regressions later.
For tasks marked with TDD: in the plan, the intended loop is:
- RED - Write one failing test for the next behavior
- GREEN - Write the minimum code needed to pass
- REFACTOR - Clean up while keeping tests green
- Repeat for the next behavior
The intended behavior is one test at a time, followed by the minimum implementation needed to make that test pass. This creates a real feedback loop instead of generating a large batch of tests and code all at once.
Unit and widget tests
Section titled “Unit and widget tests”ACT supports unit and widget tests as the foundation of fast verification.
- Unit tests validate business logic, services, and state management in isolation
- Widget tests validate rendering, interaction, and UI behavior
flutter testBecause both run headlessly, they are ideal for AI workflows.
Robot journey tests
Section titled “Robot journey tests”Unit and widget tests are powerful, but they do not cover the entire testing spectrum. You may also want to verify complete user journeys across multiple screens.
For that, ACT includes a dedicated act-flutter-robot-testing skill. Robot tests are a higher-level way to structure widget tests so that complete journeys can be tested with stable selectors and deterministic test seams.
The key benefit is that they still run as widget tests, so they remain fast and headless. In the example below, the important thing to notice is that the test reads like a user journey while still running as a widget test.
Here’s an example of a robot test that was generated by ACT for one of my apps:
import 'package:flutter_test/flutter_test.dart';
import '../harness/app_harness.dart';import '../robots/app_robot.dart';
void main() { testWidgets('onboarding first launch flow', (tester) async { final harness = await createAppHarness(); addTearDown(() async => harness.dispose());
final robot = AppRobot(tester); await robot.pumpApp(harness.container);
robot.onboarding.expectVisible(); robot.onboarding.expectContinueDisabled();
await robot.onboarding.tapAddInvestment(); await robot.chooseInvestment.selectXau(); await robot.investmentForm.addApiBackedInvestment(name: 'Gold');
robot.onboarding.expectInvestmentVisible('Gold'); await robot.onboarding.tapInvestment('Gold'); robot.onboarding.expectEditInvestmentVisible(); await robot.onboarding.closeEditInvestment();
await robot.onboarding.tapContinue();
robot.portfolio.expectNoSnapshotsState(); });}What ACT does not fully solve yet
Section titled “What ACT does not fully solve yet”Robot tests are useful, but the long-term goal is true end-to-end verification.
That is the holy grail for AI automation: an agent making changes, exercising the real app end to end, and confirming that the result matches the intended behavior in production-like conditions.
ACT does not fully provide that yet, and that limitation is worth stating clearly. It is an important area for future work.
ACT also does not currently support a true UI testing workflow. As a fallback, you can take a screenshot of the running app, drop it into Claude or OpenCode, and provide guidance so the model can tweak the UI. In practice, this still requires a human to guide the agent one step at a time, so it is not yet a fully automated verification loop.
Verification across the workflow
Section titled “Verification across the workflow”Verification shows up in different forms throughout the ACT workflow:
- Spec - clarifying questions catch ambiguity early
- Refine Spec - adversarial review catches gaps and weak assumptions
- Plan - codebase research helps the plan match existing patterns
- Work -
flutter analyzeandflutter testcreate fast feedback loops - Ship - full analysis and tests run before creating a PR
The exact tools vary by task, but the principle stays the same: the more an agent can verify its own work, the more reliable the final result becomes.
The core idea
Section titled “The core idea”If AI can verify its own work with a feedback loop, it will 2-3x the quality of the final result.
The cost of running fast checks repeatedly is small. The cost of shipping broken behavior, regressions, or bad UI is not. ACT is designed to err on the side of verification.
Next steps
Section titled “Next steps”- Learn about Context Management — keeping AI focused