The Test Pyramid is a relic of a bygone era

John Gluck
July 19, 2025

If you’ve been working on software teams for any length of time, you’ve probably heard someone reference the Test Pyramid. You may have even said, “We follow the Pyramid,” when asked why E2E coverage is low. It’s time to stop. The Test Pyramid was designed at a time when E2E testing was too expensive and too slow to justify comprehensive coverage. Those days are long gone.

The Pyramid was first introduced by Mike Cohn in his book Succeeding with Agile and illustrates the hierarchy of automated testing. You’ve got unit tests at the bottom, integration tests built on top of them, and end-to-end (E2E) tests built on both. The size of each level represents the number of tests you’re likely to have, and is supposed to guide teams on how to allocate their developer and QA resources.

That structure looks tidy in a diagram, but it overlooks how most systems are built. Frontend engineers deal with highly dynamic components that are hard to isolate. Backend engineers often write integration tests but skip failure paths, as triggering edge cases requires extra setup and adds little immediate value. The advice baked into the Pyramid clashes with team intuition. Developers know bugs tend to surface in full workflows, across services, and under real conditions. They reach for E2E tests when they need confidence that a feature works.

Most teams still lack strong coverage. The Pyramid doesn’t help them get there. It was designed around old constraints and points teams toward test types that don’t catch the failures that matter most today.

Why 80% E2E coverage matters

E2E tests exercise real workflows across multiple services, environments, and interfaces. Unlike unit or integration tests, they verify whether a feature works the way the customer experiences it—end to end.

E2E coverage is the only way to catch failures that happen when systems don’t communicate correctly. A unit test won’t catch a payment that looks successful but fails because a backend process never completed. An integration test won’t catch a login that breaks when a session cookie expires mid-run. These failures only surface in real-world conditions through full-system workflows.

From direct experience maintaining thousands of tests across hundreds of teams, we’ve found that about 80% coverage of critical user flows is the right target. That level reliably catches high-impact regressions without creating unnecessary maintenance burden. Below that, gaps start to show—especially in flows that cross services or rely on stateful data.

Why E2E testing feels expensive

E2E tests get a bad rap for being hard to build and maintain. Historically, that reputation came from real pain—but not because the tests themselves are overly complex. The cost came from the systems they had to run in.

Most test environments aren’t built for E2E reliability. Tests run in sequence, so feedback is slow. Environments are shared, so leftover data or session state from one test can break another. Logs are noisy. When a test fails, there’s no clear signal whether it’s a product bug, a timing issue, or a flake caused by the environment. Debugging takes time and guesswork.

That instability forces teams to manage the test suite full-time. Engineers write scripts to reset users, rerun flaky jobs, and clean up polluted data. They chase failures caused by state bleed, misconfigured flags, or broken selectors—issues that have nothing to do with the product itself. Each new test increases the risk that something else breaks.

The cost stacks up and feedback loops slow down. Eventually, the team's confidence erodes and they stop expanding coverage because keeping the system running takes too much effort. E2E feels expensive because the environment keeps making it harder than it needs to be.

What testing looks like when infrastructure isn’t the bottleneck

Modern teams don’t wait for scheduled releases. They ship behind feature flags, merge dozens of changes a day, and deploy automatically when checks pass. Testing has to keep up—and that only works when the infrastructure behind it is built for speed, stability, and scale.

Parallel execution runs every test at once, not one at a time. Pre-booted environments guarantee clean state, so tests don’t interfere with each other. Data resets automatically, removing the need for brittle cleanup scripts. Flake handling systems retry unstable tests and surface real bugs with clear signals.

That foundation lowers the cost of running end-to-end tests—and raises the ceiling on what you can cover. Complex workflows, edge cases, mobile flows, interdependent user roles, third-party integrations—everything that used to feel too fragile or time-consuming becomes testable. You get fast, trustworthy results across the entire system.

When the system handles state, retries, and timing automatically, the test layer starts behaving like real infrastructure. It self-recovers. It scales. It stays out of the way. That’s what makes high coverage sustainable. That’s how teams reduce flakes, shorten feedback loops, and ship confidently every day.

The Testing Pyramid assumes E2E testing is too expensive to scale. That’s no longer true.

The Test Pyramid wasn’t wrong. It reflected what was practical at the time. When end-to-end tests were brittle, expensive, and slow, teams had to be strategic in their approach. Given the costs, you got more value from investing in unit and integration tests. E2E testing was too heavy a lift to justify broad coverage.

But ultimately, that was a question of ROI. Just because something made sense under old constraints doesn’t mean it was the right outcome—it was just the achievable one. It said, “Given what they cost, keep them limited.”

That’s the part that no longer applies. Modern infrastructure changes both sides of the equation. Input costs have dropped. Parallel execution, automated retries, pre-booted environments, and AI-powered maintenance make high-coverage E2E suites reliable and cheap to run. At the same time, the value of those tests has gone up. They catch the real failures—system-level, cross-service, regression-prone bugs—before customers see them.

So throw out your allegiance to the Pyramid. The math has changed. The new rule is simple: run as many E2E tests as you can, as long as you can run them all in parallel with minimal flakes and automated upkeep. When the infrastructure makes that possible, the returns speak for themselves.

Some disclaimer text about how subscribing also opts user into occasional promo spam

Keep reading

AI-research
Making sense of all the AI-powered QA tools
Test automation
How we built a real iPhone device farm for iOS mobile testing
Parallelization
Running the same test in multiple environments isn’t as simple as it sounds