AI Verification

If there is one thing that makes AI agents dramatically more effective, it is giving them a verification loop.

When an agent can check the results of its own work against its expectations, it can catch mistakes early, adjust course, and keep improving the output instead of blindly moving forward.

That idea is especially important in app development, where code can be technically correct and still be wrong in practice.

Why Flutter is a great fit

One reason Flutter is such a good fit for AI-assisted development is that strong verification can happen very quickly.

If your codebase has good unit test and widget test coverage, agents can validate a large share of their work in headless mode:

flutter test

This has two major advantages:

No emulator or simulator is required
Feedback is fast enough to support tight AI development loops

That speed matters. Fast tests make it practical for an agent to try something, verify it, fix issues, and repeat many times in a single session.

How ACT uses verification loops

ACT is designed to push agents toward workflows where verification happens continuously, not at the very end.

Static analysis

Every phase of /act-workflow-work runs:

flutter analyze

This catches issues such as:

Type errors and null safety violations
Unused imports and variables
Lint rule violations
Breaking API changes from dependency upgrades

Test-driven implementation

ACT workflows encourage agents to take a TDD approach during implementation, and the act-flutter-tdd skill is specifically designed to apply that during the work phase.

In practice, that means writing code in a way that is covered by tests as it is being built, which helps prevent regressions later.

For tasks marked with TDD: in the plan, the intended loop is:

RED - Write one failing test for the next behavior
GREEN - Write the minimum code needed to pass
REFACTOR - Clean up while keeping tests green
Repeat for the next behavior

The intended behavior is one test at a time, followed by the minimum implementation needed to make that test pass. This creates a real feedback loop instead of generating a large batch of tests and code all at once.

ACT supports unit and widget tests as the foundation of fast verification.

Unit tests validate business logic, services, and state management in isolation
Widget tests validate rendering, interaction, and UI behavior

flutter test

Because both run headlessly, they are ideal for AI workflows.

Robot journey tests

Unit and widget tests are powerful, but they do not cover the entire testing spectrum. You may also want to verify complete user journeys across multiple screens.

For that, ACT includes a dedicated act-flutter-robot-testing skill. Robot tests are a higher-level way to structure widget tests so that complete journeys can be tested with stable selectors and deterministic test seams.

The key benefit is that they still run as widget tests, so they remain fast and headless. In the example below, the important thing to notice is that the test reads like a user journey while still running as a widget test.

Here’s an example of a robot test that was generated by ACT for one of my apps:

import 'package:flutter_test/flutter_test.dart';

import '../harness/app_harness.dart';
import '../robots/app_robot.dart';

void main() {
  testWidgets('onboarding first launch flow', (tester) async {
    final harness = await createAppHarness();
    addTearDown(() async => harness.dispose());

    final robot = AppRobot(tester);
    await robot.pumpApp(harness.container);

    robot.onboarding.expectVisible();
    robot.onboarding.expectContinueDisabled();

    await robot.onboarding.tapAddInvestment();
    await robot.chooseInvestment.selectXau();
    await robot.investmentForm.addApiBackedInvestment(name: 'Gold');

    robot.onboarding.expectInvestmentVisible('Gold');
    await robot.onboarding.tapInvestment('Gold');
    robot.onboarding.expectEditInvestmentVisible();
    await robot.onboarding.closeEditInvestment();

    await robot.onboarding.tapContinue();

    robot.portfolio.expectNoSnapshotsState();
  });
}

What ACT does not fully solve yet

Robot tests are useful, but the long-term goal is true end-to-end verification.

That is the holy grail for AI automation: an agent making changes, exercising the real app end to end, and confirming that the result matches the intended behavior in production-like conditions.

ACT does not fully provide that yet, and that limitation is worth stating clearly. It is an important area for future work.

ACT also does not currently support a true UI testing workflow. As a fallback, you can take a screenshot of the running app, drop it into Claude or OpenCode, and provide guidance so the model can tweak the UI. In practice, this still requires a human to guide the agent one step at a time, so it is not yet a fully automated verification loop.

Verification across the workflow

Verification shows up in different forms throughout the ACT workflow:

Spec - clarifying questions catch ambiguity early
Refine Spec - adversarial review catches gaps and weak assumptions
Plan - codebase research helps the plan match existing patterns
Work - flutter analyze and flutter test create fast feedback loops
Ship - full analysis and tests run before creating a PR

The exact tools vary by task, but the principle stays the same: the more an agent can verify its own work, the more reliable the final result becomes.

The core idea

If AI can verify its own work with a feedback loop, it will 2-3x the quality of the final result.

The cost of running fast checks repeatedly is small. The cost of shipping broken behavior, regressions, or bad UI is not. ACT is designed to err on the side of verification.

Next steps

Learn about Context Management — keeping AI focused