Parallel testing: what it is, why it's hard, and why you should do it

John Gluck
Noah Sussman
October 4, 2023

To understand the potential of fully parallel testing, imagine you're an owner of a restaurant. The most time-consuming aspect of your business is preparing the food, be it chopping vegetables, simmering a sauce, caramelizing onions, or what have you. Now, picture having the superpower to get multiple pairs of hands to work on these tasks simultaneously. The result? Faster meal prep and happier customers. 

The same principle applies to full parallelization of automated end-to-end testing. Teams that use fully parallel testing can start finding bugs and validating features more quickly in order to increase the rate at which they deliver software. So would you be surprised to learn that the tech industry largely overlooks parallel testing and just 7% of companies can run more than 50 tests in parallel (“State of Software Quality”, SmartBear, 2022)? Probably not. In fact, recent trends over the last five years show the number of tests companies run in parallel is declining.

At QA Wolf, we are dismayed but not surprised by this trend. Parallel-testing adoption should be increasing, not decreasing, which is why we built full test parallelization into our product. We understand that building a system for running tests in parallel is cost-prohibitive but the hidden costs of the alternatives to parallelism are no less, or maybe possibly be more expensive.  

Our service minimizes these costs for our customers. But before we talk about the cost details and the reasons why your team should consider fully parallel testing if they want to increase the speed and reliability of their software delivery, let’s dive into the history and challenges of parallel testing.

Parallel testing is hard

Parallel testing isn't losing popularity without reason. It's a tough nut to crack. Designing any system is hard enough, but fully parallel testing comes with unique and specialized problems distinct from those of typical application design. Tests aren't services; they are scripts, ideally short-lived. 

This short life cycle implies specific infrastructure design and configuration that is in the domain of rare specialists who know how to build this sort of thing. Moreover, if those scripts aren't wholly independent, running them in parallel can lead to sporadic or consistent failure.

Even if you believe you're running tests in parallel, if you're sharding, you're essentially running them sequentially on multiple execution nodes. Sharding, while a form of parallel testing, is not full parallelism. If you have 10 nodes and 200 tests, you are still running your tests sequentially, 10 at a time. This distinction will be important in a moment.

A brief history of parallel testing

Both the sequential and sharding methods of test execution are remnants of the past, a time before the cloud and containerization. Back then, parallelizing tests often meant investing in lots of expensive hardware, which made the ROI calculation out of balance.

The emergence of cloud CI services paved the way for sharded solutions in testing. Since many of these services offered sharding as a feature, teams quickly realized that adding nodes could speed up test runs. 

Even today, most end-to-end tests rely on unit-test executors. Since most unit-tests execution frameworks have built-in parallelism, enterprising testers over the years have experimented with using that parallelism to varying degrees of success but have largely failed. 

The reason they have failed is that unit tests differ from end-to-end tests in one key aspect; they are stateless. Ideally, when you execute a unit test, nothing about the application changes. On the other hand, when dealing with end-to-end tests, state change is the whole point; we are running the test to observe the state of the application changing to what we expect.

To make parallel testing work for end-to-end tests, testers needed to design their tests carefully to maintain isolation so their tests wouldn’t modify the state of any other object that any other test might also be consuming at the same time, which, if you are running all your tests in parallel, is potentially any other test in the project. 

Implementing test isolation is a complex task, even for someone who is a true believer in the value of parallelism. This approach becomes increasingly challenging to maintain when multiple team members aren’t aligned with parallelism as a priority. Some testers link tests together to "save time," forcing these tests to run sequentially and defeating the purpose of parallelism. Some testers don’t value or prioritize isolated tests because they have never seen parallelism work.

Enter containers

Fully parallel testing became more possible and cost-effective around 2017, thanks to AWS Fargate, EKS, and similar technologies. Before this, some enterprising testers attempted to achieve isolation in testing by running each test in an isolated process. While this solved part of the problem, designing such a system was just as tricky, if not more so, than ensuring isolation across hundreds of tests.

Before the wide acceptance of cloud-based processing, it was nearly impossible to implement such a system and doing so involved either hardware labs (which were super expensive) or virtual machines, which still would run into the processor’s limitations because, let’s face it, tests are scripts, and scripts are processor-bound. 

Containers, when implemented thoughtfully, address the challenge of test execution process isolation, especially when deployed in the cloud. They work as the individual sets of hands from our opening analogy, spinning up when needed, running tests in isolation without interfering with others, and disappearing when done. Containerized cloud execution solves most test parallelization issues, making you wonder why it's not more common.

The primary reason might be that companies underestimate the value of time savings. Let's talk more about that.

The longer you wait, the more you pay

Teams often optimize everything but their end-to-end test execution, unaware they are at war with math.

Risk of not parallelizing: Serialization (sequential) wastes time

The chart above demonstrates the rate of growth of our total test execution time running sequentially on a single processor. For instance, if one test takes five minutes, 200 tests take over 16.5 hours, and even small companies usually require around 200 tests for meaningful coverage. On the other hand, fully parallel testing keeps test execution times in check. A fully serialized approach is impractical for most companies needing substantial test coverage.

Cost of longer serialized builds

Our in-house calculator estimates the cost of the 200 test scenario above at $440K, which includes babysitting, (i.e., developer capacity wasted waiting for test results). As shown by Google's research, the longer your build is, the more your developer’s efficiency loss gets extended. On the other hand, a 15% reduction in build times resulted in one more deployment per week per developer.

 If we scale this approach out, then we can see that the growth is exponential.

Sharding: A costly proposition

Most teams these days take a sharded approach to test execution. Most modern CI servers have at least some built-in support for this approach and allow you to easily specify how many parallel nodes (read shards) you want to execute, going so far as to offer tips on tools you can use to evenly distribute your tests across those shards to balance the execution time accordingly.

As the number of tests grows, the number of shards needed also increases. The more shards you add, the faster your tests complete, but maintaining your test execution time will cost you.

If you want your tests to finish in 10 minutes, you must continue adding shards as the number of tests grows. By the time you have 200 tests, you need 100 shards. If your company can’t or doesn’t want to afford the additional shards, you have no choice but to increase your build time. 

QA Wolf’s calculator estimates the scenario above at $55K annually. Our investigation shows that vendors charge an average of $130 per node for 25 or fewer parallel nodes and about $100 for teams with over 25 nodes. Again, we baked in the cost of babysitting, but we haven’t included the cost of creating and maintaining your tests. While this scenario is cheaper than a purely serialized approach, it is optimistic and doesn’t account for growth.

But the growth math in this scenario is straightforward.You need another node at an additional $100 for every two tests you add. What is not visible is that if you don’t add that node, you pay $234 per 2 tests assuming a 15% reduction in developer capacity. By not adding a node and putting that cost onto your developer team, you lose $134. It’s cheaper to add a node. That’s how they get you.

The value of fully parallel testing

Organizations that use a sharded approach to test execution often discover that their application teams are contending with each other for nodes. Especially in companies with several teams that release together, if each team attempts to release and test multiple features and last-minute patches, such activity can exhaust the organization's available test nodes, causing long wait times as the various tests queue up. Often this scenario results in unnecessary friction between teams. 

However, teams that use fully parallel testing don’t have to fight with each other for resources. Fully parallel testing makes the release schedule more predictable and reliable. Teams can be confident that their tests will only take so long, no matter where they are in the release process, 

When the tests in CI are slow, it impacts not only the speed of deployment but also that of development. Since developers run the tests to regress the code base when they add new features, the speed of the tests in CI affects the rate at which new code can be created and merged. Faster tests mean developers can experiment with ideas, seek feedback, and iterate quickly. Slow tests hinder development and extend feedback loops. Fully parallel testing eliminates test resource contention.

Fully parallel testing also supports continuous development. Slow tests result in long wait times, leading to a delivery schedule that moves in increments that could be weeks or even months long. To achieve push-button deployment or true CD, teams should aim for a test suite that finishes running within minutes, probably less than 10. Continuous delivery is impossible when the team has to wait for extended periods before releasing code.

Ultimately, fully parallel testing gives the product a competitive edge. Faster time to market for features means greater revenue potential. Every hour spent waiting for the tests in CI to pass and every additional day the team delays the release date to accommodate slow tests increases the potential for and likelihood of lost revenue.

We’ve got the solution

Whether you are still running your tests serially or using sharding, transitioning to fully parallel testing is a must for any team hoping to increase delivery time while maintaining quality. With fully parallel testing, your test results are available in the time it takes to run your longest test. Fast test feedback opens the door to push-button deployment, unlocking various competitive advantages.

However, building a system that runs every test in parallel isn't a walk in the park. It requires highly specialized developers, a robust infrastructure and careful test scripting to avoid collisions. At QA Wolf, we've built a solution to tackle these challenges. We know that teams aim to boost the speed and reliability of their tests, whether they have 200 or 20,000 end-to-end tests. Our product handles them all in the time it takes to run your longest test. Say goodbye to slow tests and hello to efficient, rapid deployment.

Keep reading