Notes from the field: Why engineering teams struggle to scale their test coverage

Kirk Nathanson
April 6, 2023

Everyone knows that test automation (QA as a whole, really) is important, but it usually goes under-resourced until something happens. Maybe a major bug gets out and damages the brand, or loses a sale; maybe the team misses a release milestone; or maybe engineering leadership just wants to shift to continuous deployment. 

Whatever the reason, when companies decide that it's time to scale up automated testing, a directive comes down. Additional QA engineers are hired. New tools are procured. A company memo might be involved. And for a little while it seems like everything's working — this time we will reach 80% coverage.

Unfortunately, the initial enthusiasm can't sustain the scale of coverage teams want. As the product continues to grow, test coverage tends to plateau. Understanding why that happens (and how QA Wolf can help) will help you reach your coverage goals — and stay there for the long-term.

Obstacle #1: Human capacity

In their 2022 State of Software Quality Testing report, SmartBear found that “time and resources” is the biggest challenge facing 37% of QA teams. For an additional 23% of teams, “company priorities” were the biggest problem. 

Those numbers point to the trickiest part of scaling an automated test suite: you don’t get efficiencies of scale. There’s a linear increase in triage and maintenance time as the test suite grows. A suite of 100 tests that runs once a day will need 20–30 hours of maintenance each week. A suite of 200 tests will need 40–60 hours.1 So you have to add capacity in proportion to the core product. Roughly one QA engineer for every three front-end developers

We’ve noticed that when there isn’t sustained investment in new capacity, QA teams tend to shift their focus to high-profile features or reactively building tests where bugs are being reported. That leaves legacy tests untended and as they break down, they get pulled from the test suite so they don’t block releases. Inevitably, that’s when the bugs start coming back. 

The lesson is that QA is a marathon, not a sprint. To scale their test coverage, teams need to invest in QA proportionally to their investment in the core product. Forever. (Dun dun dun!) 

QA Wolf unlocks the real value of in-house QA teams

The fact is, QA teams are hamstrung by the ongoing maintenance needs of a test suite when there’s more value they could be adding to the company. QA Wolf removes that day-to-day burden by building and maintaining automated tests for 80%+ of user flows. That frees up in-house QA engineers to execute a larger strategic vision for product quality: whether that’s developing and implementing QA policies, incorporating customer feedback into product development, risk management, or just general continuous improvement initiatives. 

Obstacle #2: Technical complexity

Another obstacle that prevents teams from scaling up their test coverage is the technical complexity of the tests themselves. Modern web apps are built on dozens (sometimes hundreds) of third-party integrations, extensions, APIs and microservices. User flows are more complicated than ever. And automated tests need to generate test data from multiple sources. 

Coding these tests takes special skills, and the older frameworks like Cypress, Selenium, or Cucumber don’t support newer technology. In fact, Tricentis found that 71% of QA engineers automate tests on the basis of technical feasibility alone (Tricentis 2021). Which means that some (maybe most) of user flows go untested. 

If a user can do it, QA Wolf can test it

The QA Wolf platform uses Microsoft Playwright, which adds native support for a whole lot of things that older frameworks don’t handle very well like multi-user and multi-tab flows. We also have experience building tests outside of and around Playwright — blockchain and browser extensions are a big one here. Visual diffing, video and audio accuracy, email & SMS verification are all pretty standard for us at this point. 

You should challenge us if you think there’s a user flow we can’t automate. 

Obstacle #3: Infrastructure limits

The dream of automated testing is that developers get instant feedback on their code, every time they merge and deploy. Much easier said than done. Many of the companies we talk to don’t have parallel test run infrastructure in place. Which means one of two things happens as the test suite scales:

  1. Developers spend hours babysitting builds when they should be working on new features; or
  2. Teams run tests on a complete release candidate at the end of a sprint, waterfall style.

The challenge with infrastructure is similar to human capacity: the bigger the test suite the more it costs to run and the more complex it is to maintain. 

  • Server and compute costs: Browsers are resource intensive and even well-optimized clusters add up fast. As teams increase their coverage they incur big bills from their cloud hosting service. 
  • Test run fees: Most testing solutions charge by the test run, so teams have to cap the monthly runs which blocks developers from shipping
  • SDETs and DevOps: Someone has to manage the infrastructure. Maybe more than one person. Sometimes a whole team. 

Unlimited runs in 100% parallel are included with QA Wolf tests

All the tests that QA Wolf builds run in the QA Wolf cloud against whatever environments you want to test. Developers just make an API call or run a ‘curl’ command any time they want to run the test suite. It gives them instant feedback as they’re building, and constant assurance that they’re not shipping bugs or introducing a regression anywhere on the product. There are no overage fees for test runs, or hidden costs for large test suites — just predictable pricing based on the number of tests.

We’re happy to chat any time

Come talk to us about your challenges. Even if we don’t work together, we’re happy to talk about ways you could optimize your QA practice so that you’re shipping faster with fewer bugs. 

•••

1 30 hours of maintenance per week is based on the QA Wolf data that shows a 7.5% failure rate and an assumption that 20% of failures are bugs and 80% are broken tests. 500 runs * 7.5% failure rate = 37.5 failures per week * 80% = 30 broken tests. Assume maintenance time to be 45–60 minutes per test. 

Keep reading