How we use AI-Native and human expertise in 4 stages for unmatched test accuracy

John Gluck
October 8, 2024

We’ve all heard the hype that AI is coming for our jobs — software testers especially. For testers, the fear is that AI could not only run tests but also create, maintain, and self-heal them as systems change. The people pushing this idea cite how fast AI is developing across fields like machine learning, natural language processing, and self-driving cars. The implication is that that similar progress could lead to fully autonomous testing systems.

At QA Wolf, we​ ​firmly believe in the power of AI automation—we’ve been trying it, too. But we also know that AI can’t do it all. The unique challenges of software testing, like understanding user intent and interpreting business logic, require the human touch that current AI technologies struggle with. And if AI struggles, it might file bugs where there are simply misunderstandings, use suboptimal fixes that don’t consider long-term stability, or miss coverage by mistakenly conflating separate user journeys that are subtly different.

Don’t get us wrong; AI is great for handling repetitive tasks and streamlining much of the testing process. That said, we fully expect testers to serve as teachers and supervisors while AI does the repetitive grunt work. This collaboration between tester and AI, where they complement each other's strengths, pays off daily. In the process we use to create and maintain tests, repair failures and report bugs, humans and AI play a role at each stage.

Stage 1: Use AI to turn video recordings into tests instantly

Here’s a breakdown of this process:

  1. A human tester recording a video of a customer workflow, narrating each step while interacting with their application.
  2. At the same time, our tool captures the Document Object Model (DOM) at each critical step.
  3. Once the recording is complete, the AI analyzes the actions and the narration.
  4. It uses this information to generate an outline with the Arrange, Act, Assert (AAA) framework
  5. Finally, it generates executable Playwright code automatically.

This saves us a ton of time. What would take minutes to hours to code manually, AI accomplishes in seconds. The DOM state capture helps AI understand the page structure by identifying elements, their hierarchy, and their relationships. This allows the AI to quickly adjust to changes in the UI, zeroing in on relevant components for testing without needing human intervention for every modification.

At the same time, the human narration component provides context, making sure the AI doesn’t miss the test’s goal or duplicate tests that have already been created. Like our testers, our AI uses the AAA framework to organize the test code efficiently and make the test easy to read for testers and customers.

Without human oversight, AI-generated code might have hidden issues or inefficiencies that lead to faulty tests and more maintenance work over time. That's why, after AI creates and tests the code, human testers step in to review it — making sure it’s accurate, handles data properly, and aligns with business goals.

Stage 2: Failure resolution–how we move beyond basic retries

Our original flake detection system, while effective, was pretty basic. It assumed any test failure was a flake and limited retries to three attempts. If one of those retries succeeded, the system would mark the test as flaky and move on. As a test built up a pass/fail history, we would flag individual tests with a long-term failure rate of 30% for manual investigation by our testers. Beyond that, the system didn't dig deeper. However, using AI allows us to be much more sophisticated about resolving failures.

Resolve flaky tests quickly with AI detection and fixes

With AI, failure resolution gets a lot smarter. AI doesn’t just rerun the test a few times and hope for the best. It understands the context of a failure and takes action based on that. Let’s say there was a failure because of a flake. AI provides proactive suggestions to improve test stability over time. While humans review these recommendations, the tests run alongside the review process. This approach reduces long-term failures, ultimately increasing the overall stability of the test suite. Customers get fewer disruptions, faster feedback cycles, and a more reliable testing process, which lets development teams focus on building features instead of constantly troubleshooting flaky tests.

Find and fix issues faster with AI-powered diagnosis

Instead of giving up after three retries, AI attempts different code modifications to fix the issue without human intervention. Human approval is a requirement every time AI modifies any code to fix a test. makes sure that the solution is correct and won’t lead to hidden bugs or recurring failures. Human testers also verify that AI-managed test data is handled correctly. Without this oversight, AI could miss subtle data management details, leading to conflicts in future test executions.

Stage 3: Keep tests up-to-date as code changes with AI

Unlike a human tester, our AI doesn’t just wait for manual updates when the application evolves — it adapts after detecting a test failure caused by changes in the DOM. When an element’s attributes or structure changes and causes a test to break, the AI recognizes the issue and adjusts the test to match the new layout.

In other words, our AI is dynamic — its element adaptation kicks in quickly to resolve any issues by realigning the test with the altered DOM. This approach is much more effective than traditional "auto-healing": It doesn’t blindly patch over issues without understanding why the failure occurred.


Autohealing often focuses on getting the test to pass quickly by trying to guess the element changes, but that occasionally introduces hidden problems or create unstable fixes. Unlike autohealing, dynamic element adaptation actively investigates the root cause — making informed adjustments based on what failed and making sure the changes are aligned with the test’s original intent. Ultimately, human testers still review the adaptation to guarantee long-term reliability, reducing the risk of temporary fixes that lead to recurring failures.

Stage 4: Streamline bug reporting and analysis with AI

If a test continues to fail after AI’s attempts to fix it, the AI keeps a detailed record of all the changes it made. This way, when humans review the failure, they can quickly see a history of what’s already been tried, which helps them get closer to identifying the root cause faster than if the AI hadn’t intervened.

AI also generates a detailed bug report — where human oversight is also requiredAI might flag non-critical issues or miss important context that only a human can identify. Human testers review and refine these reports, making sure they contain all the information developers need to tackle the real problem.

Combining AI and human expertise for better results

AI isn't coming for testers' jobs — at least not yet. Understandably, many testers are nervous about AI taking over, especially when you consider most of QA focuses on functional and performance testing.

Still, it’s important to remember that AI is capable of handling the basics, sure, but that’s where it stops. It’s great at creating routine tests and catching obvious bugs. It quickly adapts to minor changes in the UI, automates repetitive tasks, and reduces the burden of test maintenance. But when it comes to understanding context or spotting edge cases, AI is still no match for a human tester.

Most testers spend too much time on repetitive tasks, which leaves them with little room to focus on improving quality beyond the basics. We think there’s a lot more to the testing job than the basics. In practice, with AI handling the grunt work, testers get more cycles to dig deeper, tackle complex issues, and boost overall quality.

It helps to think of AI as the robot assistant rather than the robot overlord. It’s here to make your job easier, not to replace you. At least not entirely.  Yet.

Some disclaimer text about how subscribing also opts user into occasional promo spam

Keep reading

AI
Tests don't guess: Why you shouldn't trust codeless AI
AI
The five other types of AI healing in E2E test automation
AI
What are AI agents and how are they used in QA testing?