Why PR testing so often goes sideways

It seems like all the cool kids are running their end-to-end test suites on pull requests these days. And why not? At first glance, it sounds great: run your entire test suite on every pull request, catch issues early, and ship confidently.

We think there are better ways to ensure quality while maintaining release velocity, which we cover in another post—but if you’re dead set on running your tests on every PR you’ll want to make sure you have a few things in order before you transition. Otherwise, PR testing turns from a velocity multiplier into a test maintenance nightmare.

Here are just some of the reasons the best-laid plans to test PRs often go sideways.

1. The team’s not ready for the volume of flakes

A 10–15% flake rate when you’re running your test suite once a day is manageable. Out of a 200-test suite, that’s 20–30 tests. Keeping the same flake rate but running the suite five times a day means your team needs to investigate and re-run up to 150 flaking tests per day. Without a surge in headcount that’s not sustainable. Suddenly, the irritating hum of test maintenance becomes an inescapable cacophony.

But in fact, you won’t keep the same flake rate; it’ll go up. Keep reading to learn why.

2. The test infrastructure can’t handle the load

When you increase the number of tests you run each day without increasing the number of CI servers to run them, you end up clogging your pipeline with builds in the queue and nowhere to go—a problem called “node contention.” When your build servers are constantly at capacity, PR merges stall, developers get impatient, and your sprint velocity grinds down. PR testing becomes a speed bump instead of an accelerator. Since you can’t reduce the length of a test, the only option is to increase the concurrency to reduce the total run time.

On the other side of the system, you have the testing environment: Maintaining persistent environments for testing is expensive, while generating ephemeral environments takes even more complex DevOps. Be prepared to bring on at least one, if not several, SDETs or DevOps engineers to maintain this new infrastructure.

3. The test suite wasn’t built for concurrency

Running E2E tests on PRs requires a wholesale rethinking of the tests themselves to run concurrently without tripping over each other. The shared data and the shared environment can cause tests to collide and fail in unpredictable ways that are difficult to diagnose. These intermittent, seemingly random failures are tough to debug, precisely the opposite of the rapid, clear feedback PR testing promises.

4. Test quality issues are exacerbated

The volume of test runs that comes with PR testing exposes hidden tech debt in the test suite itself. What used to be small things that were easy to ignore or bypass on daily runs—flaky assertions, brittle data setups, inadequate teardowns—become glaring issues that QA engineers or developers need to address to keep the build pipeline moving.

What happens when teams leap into PR testing before looking

There’s the dream of PR testing, where developers confidently merge bug-free code and move closer to continuous deployment; and then there’s the reality. If the team isn’t prepared for the volume of test runs (and accompanying flakes and failures) they’ll burn out before lift off.

Falling back to manual testing. Because of how much noise PR testing generates (failures to investigate and action) overwhelms existing teams, it becomes faster to manually validate test cases than fix the underlying issues causing the noise.
Running only a handful of tests. Instead of running a full regression suite, they’ll do a smaller set of smoke or sanity tests. It reduces the noise but also increases the risk of untested bugs going to production.
Batching sequential runs. Because the infrastructure isn’t set up to handle the new volume of tests, the test suite often has to be batched and run sequentially which dramatically slows down test cycles and the value of PR testing in the first place.

Test data collisions happen when concurrent tests modify the same data simultaneously. These collisions usually cause race conditions that appear as flaky tests. While avoidable with proper data isolation, most test suites weren’t designed for concurrency and thus collide frequently when run against PRs.

When flakes block merges, teams often disable flaky tests and revert to manual spot-checking. This manual approach is slower, riskier, and prone to missing critical bugs, undermining the initial goal of faster delivery. To avoid this, teams must refactor for data isolation—often a significant effort involving new testing practices and even application modifications.

Only running select tests

Many teams respond to flakiness by running smaller sets of tests, such as smoke or sanity tests. While running fewer tests reduces immediate noise, it also increases the risk of untested bugs slipping through. Without a comprehensive follow-up regression, bugs masked as flakes reach production, causing confusion, costly escapes, and customer frustration.

Sequential or batch testing

Running tests sequentially or in limited parallel batches often leads to severe infrastructure bottlenecks, creating test node contention. Think traffic jams—limited nodes cause tests to queue up, drastically slowing merges. Effective parallelization, running all tests simultaneously, is essential. While initially challenging, fully parallel testing vastly improves speed and reliability, transforming PR testing from a bottleneck into a true productivity boost.

PR testing can work—if you’re prepared

Perhaps the biggest misconception about PR testing is believing it's a cure-all. Yes, running tests early catches issues sooner, but it doesn’t fix them automatically. Without deliberate investments in test quality, infrastructure, and maintenance, PR testing just moves your problems earlier in your workflow without truly addressing them.

PR testing is a mirror, not a fix. But that doesn’t mean teams that do PR testing are doomed. Done right, it can dramatically reduce production bugs and improve confidence. But success demands strategic preparation:

Tackle flakiness proactively: Treat flakes as critical bugs in your test suite, not nuisances.
Scale infrastructure strategically: Anticipate and invest early in parallel execution capacity.
Prioritize test isolation: Ensure every test can run independently without collisions.
Systematically manage test debt: Regularly review, refactor, and strengthen your tests.

Recognizing why PR testing often stumbles is the first step to ensuring your implementation succeeds rather than becoming another cautionary tale.

Some disclaimer text about how subscribing also opts user into occasional promo spam

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Why PR testing so often goes sideways

1. The team’s not ready for the volume of flakes

2. The test infrastructure can’t handle the load

3. The test suite wasn’t built for concurrency

4. Test quality issues are exacerbated

What happens when teams leap into PR testing before looking

Only running select tests

Sequential or batch testing

PR testing can work—if you’re prepared

Keep reading

About QA Wolf

Resources

Legal

Hello!

Why PR testing so often goes sideways

1. The team’s not ready for the volume of flakes

2. The test infrastructure can’t handle the load

3. The test suite wasn’t built for concurrency

4. Test quality issues are exacerbated

What happens when teams leap into PR testing before looking

Only running select tests

Sequential or batch testing

PR testing can work—if you’re prepared

What big eyes you have 👀

Let them read emails from QA Wolf

Keep reading

About QA Wolf

Resources

Legal

Hello!