It seems like all the cool kids are running their end-to-end test suites on pull requests these days. And why not? At first glance, it sounds great: run your entire test suite on every pull request, catch issues early, and ship confidently.
We think there are better ways to ensure quality while maintaining release velocity, which we cover in another post—but if you’re dead set on running your tests on every PR you’ll want to make sure you have a few things in order before you transition. Otherwise, PR testing turns from a velocity multiplier into a test maintenance nightmare.
Here are just some of the reasons the best-laid plans to test PRs often go sideways.
A 10–15% flake rate when you’re running your test suite once a day is manageable. Out of a 200-test suite, that’s 20–30 tests. Keeping the same flake rate but running the suite five times a day means your team needs to investigate and re-run up to 150 flaking tests per day. Without a surge in headcount that’s not sustainable. Suddenly, the irritating hum of test maintenance becomes an inescapable cacophony.
But in fact, you won’t keep the same flake rate; it’ll go up. Keep reading to learn why.
When you increase the number of tests you run each day without increasing the number of CI servers to run them, you end up clogging your pipeline with builds in the queue and nowhere to go—a problem called “node contention.” When your build servers are constantly at capacity, PR merges stall, developers get impatient, and your sprint velocity grinds down. PR testing becomes a speed bump instead of an accelerator. Since you can’t reduce the length of a test, the only option is to increase the concurrency to reduce the total run time.
On the other side of the system, you have the testing environment: Maintaining persistent environments for testing is expensive, while generating ephemeral environments takes even more complex DevOps. Be prepared to bring on at least one, if not several, SDETs or DevOps engineers to maintain this new infrastructure.
Running E2E tests on PRs requires a wholesale rethinking of the tests themselves to run concurrently without tripping over each other. The shared data and the shared environment can cause tests to collide and fail in unpredictable ways that are difficult to diagnose. These intermittent, seemingly random failures are tough to debug, precisely the opposite of the rapid, clear feedback PR testing promises.
The volume of test runs that comes with PR testing exposes hidden tech debt in the test suite itself. What used to be small things that were easy to ignore or bypass on daily runs—flaky assertions, brittle data setups, inadequate teardowns—become glaring issues that QA engineers or developers need to address to keep the build pipeline moving.
There’s the dream of PR testing, where developers confidently merge bug-free code and move closer to continuous deployment; and then there’s the reality. If the team isn’t prepared for the volume of test runs (and accompanying flakes and failures) they’ll burn out before lift off.
Test data collisions happen when concurrent tests modify the same data simultaneously. These collisions usually cause race conditions that appear as flaky tests. While avoidable with proper data isolation, most test suites weren’t designed for concurrency and thus collide frequently when run against PRs.
When flakes block merges, teams often disable flaky tests and revert to manual spot-checking. This manual approach is slower, riskier, and prone to missing critical bugs, undermining the initial goal of faster delivery. To avoid this, teams must refactor for data isolation—often a significant effort involving new testing practices and even application modifications.
Many teams respond to flakiness by running smaller sets of tests, such as smoke or sanity tests. While running fewer tests reduces immediate noise, it also increases the risk of untested bugs slipping through. Without a comprehensive follow-up regression, bugs masked as flakes reach production, causing confusion, costly escapes, and customer frustration.
Running tests sequentially or in limited parallel batches often leads to severe infrastructure bottlenecks, creating test node contention. Think traffic jams—limited nodes cause tests to queue up, drastically slowing merges. Effective parallelization, running all tests simultaneously, is essential. While initially challenging, fully parallel testing vastly improves speed and reliability, transforming PR testing from a bottleneck into a true productivity boost.
Perhaps the biggest misconception about PR testing is believing it's a cure-all. Yes, running tests early catches issues sooner, but it doesn’t fix them automatically. Without deliberate investments in test quality, infrastructure, and maintenance, PR testing just moves your problems earlier in your workflow without truly addressing them.
PR testing is a mirror, not a fix. But that doesn’t mean teams that do PR testing are doomed. Done right, it can dramatically reduce production bugs and improve confidence. But success demands strategic preparation:
Recognizing why PR testing often stumbles is the first step to ensuring your implementation succeeds rather than becoming another cautionary tale.