Devs should own white-box testing; here’s why they shouldn’t own black-box, too

Kirk Nathanson
October 14, 2022

For any readers who aren’t familiar with the lingo, white-box tests (also called clearbox or open box) are automated tests that are aware of and interact with the underlying code. It’s the opposite of a black-box test, which tests the product like an end-user would see it from the rendered UI or APIs.

  • White-box: The test can see the code underneath, like unit and integration tests.
  • Black-box: The code is a total mystery to the test, like end-to-end or visual regression tests.

White-box tests improve code quality, but code quality is just one part of total product quality

Let’s start with what makes high-quality code. Every team will have slightly different definitions, and this list is far from exhaustive, but we can say that good code is…

  • Functional. The code executes the specific task it was written to do. 
  • Efficient. It should run quickly, and shouldn’t hog resources.
  • Maintainable. The code should be easy to modify without unintentionally affecting other parts of the system.
  • Clean and readable. It adheres to the language’s standards and conventions; the variables, functions, and classes are self-documenting; and the code base is organized.
  • Robust. It’s stable and can recover from unexpected errors.

We write white-box tests by the thousands to validate that our code meets those standards. This is work that developers have to do themselves, because the tests need direct access to the underlying functions and test-driven development means we structure our code around the tests. 

Testing for product quality with black-box E2E tests

Ironically, or just unfortunately, what we call “good code” is mostly invisible to the user. Yes, they notice inefficient code when it slows down their computer and sends their fans whirring, but what they’re really looking for is a product that is…

  • Easy to use. It’s intuitive and you can pick it up quickly. 
  • Responsive. It accurately responds to inputs and provides constant feedback. 
  • Well designed. It’s clean, attractive, focused, and uncluttered. 
  • Accessible. Usable by anyone, regardless of their physical abilities and limitations. 

The reason we do E2E testing is because unit and integration tests may pass on their own, but the final product can still have bugs. In a high-quality product, everything works together to create something greater than the sum of its parts.

Why effective teams offload blackbox E2E testing to dedicated experts

For end-to-end testing to be most effective, teams need at least 80% test coverage. So when you look at how low test coverage actually is across all industries, you really start to appreciate how difficult it can be.

In 2021, 90% of companies had less than 75% coverage. Two-thirds of companies had less than 50% coverage. And those numbers are down since 2018 (SmartBear, 2021). 

Let’s talk about why that is.

Teams drown under the volume of E2E tests that need maintenance 

Maintaining white-box tests is pretty simple. If there’s a change to the code, only the tests that are related to that code need to be updated—if anything. E2E tests are another animal altogether. 

E2E tests are full stack tests. They validate that everything is working as it should from the front-end to the APIs to the integrations. And the tests often overlap one another, with multiple tests running through the same functionality testing different use cases. 

That means if there’s a change to any part of the stack, no matter how small, you could break a couple or a couple dozen of your E2E tests. It puts an enormous maintenance burden on the whole organization to keep the E2E tests running. And the more tests you create, the greater the burden.

Writing and maintaining end-to-end tests limits the ability to work on new features

There are lots of hidden costs to shifting all the way left and putting QA responsibilities onto developers. The biggest is probably productivity loss.

Maintaining end-to-end tests can take 20–40% of a developer’s time. And in our experience it’s the first thing that overworked development teams neglect. This causes a chain reaction of delays: The flaky tests hold up deployments, then bugs get into production, and teams spend even more time re-deploying fixes. 

Engineering time is expensive and is much better spent building new features and adding value for customers than maintaining end-to-end tests. 

End-to-end testing is rarely a developer’s core competency

This is pretty evident just from the coverage levels that we see but it’s also shown in research: Just 45% of engineering teams believe they have the right testing strategy or process in place (World Quality Report, 2021). 

Another survey found that 70% of teams approach test design by intuition; just 46% have a methodical approach that provides efficient and effective test coverage (Tricentis, 2021). 

There’s an art and science to E2E testing and it’s not something that most front-end developers have learned to do. In fact, one of the most common things we hear from clients is that they simply don’t have the expertise to write E2E tests for their own applications:

The real benefit of QA Wolf is that we can test far more than we were able to test before — and more than we realistically ever would have been able to do — which has led to a much more stable and reliable application.
Collin Palmer, Product Manager, Padlet 

Let developers focus on white-box testing. QA Wolf will handle black-box testing. With full coverage, we’ll make magic. 

The QA Wolf platform and our in-house QA engineers ramp clients to 80%+ test coverage in four months. When a test fails, your QA concierge will investigate 24 hours a day. Flaky tests get fixed and re-run automatically, while bug reports get flagged for your developers or ticketed. 

As your product grows, we grow with you, keeping you at a minimum of 80% coverage at all times. If you ever want to leave, our tests code is in the Microsoft Playwright open source framework which you can export and use yourself. 

Keep reading