So you want to build end-to-end tests like a QA Wolf

Rebecca Stone
April 27, 2023

Start with narrowly-focused tests

By narrowly-focused, we mean tests that validate just a single action rather than complex, compound tests that check several things as they run. Yes, you’ll end up with more individual tests, but think about all the benefits:

  • Fewer false positives
    End-to-end tests are flaky by nature. So many things that aren’t bugs can cause a test to fail: Network hiccups, changes in the test environment, and problems with third-party APIs will all throw false-positives. When tests are designed to be short and quick with minimal dependencies, there are fewer opportunities for something to go wrong. This means fewer false positives to investigate.
  • Faster run times
    Narrowly-written tests finish faster. Which means if you’re concerned about bugs in one specific area of the application, you can run those tests and get the results you need. There’s a reason Netflix has a “skip intro” button: you just want to get to the good stuff. 
  • Faster bug spotting
    When a complex, compound test fails, someone has to go in and figure out where it failed before you can determine if it was a bug or a flake. There’s just more to wade through. We write narrowly-focused tests so that when one fails we know exactly where things went wrong from the test name alone.
  • Simpler maintenance
    Products grow and change, and those changes aren’t always communicated ahead of time. If those changes break the tests, you don’t lose whole areas of coverage while they’re being fixed. It’s also easier to update smaller tests and keep the test suite free from outdated or unnecessary tests.
  • Easier long-term support
    Just as products change, so do team members. When tests are kept small, it’s easier for a new person to come in and figure out what was going on. 

Prevent test collisions 

When running tests in parallel, like we do at QA Wolf, it’s possible that one test could change a piece of data or site configuration while another test is running, causing one or more tests to fail and creating a lot of false-positive noise. There are a couple ways to avoid collisions, but you have to decide what trade offs work best for the environment you’re testing in and your testing goals.

Like everything else in life, QA Wolf makes the best choice for the test suite and environment. We work with QA teams, developers, and SDETs to find the right mix of accounts and in-test triggers to balance speed with reliability. 

Use multiple testing accounts

The benefit of having multiple test accounts is that they can each interact with the same systems at the same time. The downside is a lot more data being created and manipulated. Test environments aren’t normally as robust as production, so you may need to prepare your testing environments to handle the additional data and traffic.

You can minimize the impact of data bloat with self-cleaning tests, which I’ll get to below. 

Pace out your test runs 

If your test environment is just too wobbly to run hundreds of tests at the same time, simply pace them out so they run in batches. Alternatively, you could have one test trigger the next. The test suite will take longer to run, but for most teams, 80%+ test coverage is more valuable than truly continuous deployment. 

Run clean up steps to minimize junk data in the test environment 

Automated tests can generate a lot of data. If you let it accumulate, it will quickly overwhelm the test environments. That’s why QA Wolf writes self-cleaning tests. Before these tests do anything else, they run through code that deletes any data that a previous test run might have left behind. And when they’re done, they delete anything they created. Just like hiking in a national park, we prefer to leave the testing environment cleaner than we found it.

Have each run create and delete the necessary data

Let’s look at an example: A social media company wants to test their post editing functionality. A QA Wolf test would create a new post first rather than find an existing post. This will make the test more reliable because it will always know what to work with. But if we don’t delete that post at the end, there would be endless duplicates clogging up the database. 

While it’s easier to use the UI for all this, you may need to have your tests access an API or admin panel depending on your unique situation.

Clean up before you test

Sometimes a test will fail before it can clean up the data it created. That’s what it means to have automated tests. To avoid collisions the next time around, each of our tests looks for (and deletes, if necessary) data that shouldn’t be there. That creates a clean slate on which to run and a consistent, reliable result each time. This is also essential for high-quality bug reports - a bug should be repeatable by any user, free from possible errors that come from duplicate data, long load times, and mismatched filters.

Use resilient HTML attributes

The one thing you can count on in software testing is that the product will change and your tests will break. To prevent minor changes from wreaking havoc on your test suite and reduce the amount of triaging and maintenance, use the most stable selectors you can. 

As a rule of thumb, any element that users interact with or might carry a meaningful state should have one of these attributes. And if there’s a critical mass of them throughout the application, they can be used as “anchors” for a relative path to neighbor, ancestor, and children elements.

Unique element IDs make the best selectors

As the name implies, HTML element IDs identify specific HTML elements. The HTML specification says that all element IDs should be unique, and when used properly they make the code much more testable and much more reliable. 

But that’s not what usually happens in practice. Many developers just don’t use unique IDs because browsers are so good at recovering from any errors. Or they may simply miss an ID being duplicated because of a generative loop or conditional elements rendered together, which is fairly common with React and when using the Material UI library.

Aria-labels make a good back-up

Aria-labels are element attributes for accessibility. They provide additional context for screen readers and other assistive technologies to help people with disabilities navigate the web. The HTML specification is a lot less strict about each of these being unique, but in our experience they tend to be — and less likely to change, which we love. We normally see the `aria-labeled-by` attribute being used because there aren’t external dependencies. Again, not as good as unique element IDs, but more stable than the alternatives. 

Data-* attributes that are specific for QA testing

Critics might argue there’s a real-world cost to having aria-labels all over the codebase. We'd argue that the overall impact is positive, but for the purists who maintain test attributes in complete isolation, there's the data-* attribute, which allows developers to embed custom data into HTML elements. Like unique element IDs and aria-labels, a `data-test-id` (or `data-testid` if you’re saucy) provides a solid handhold for automated tests to grab onto. 

CSS selectors make your test code simple and sweet

When the product uses stable and reliable attributes, you’re all set. But as any QA engineer knows, the codebase is rarely as well organized as we would like. When you’re testing in the real world, more complex CSS selectors, Component Value Multipliers, can help you target specific elements more accurately and make your tests more resilient to changes in the UI. 

The caret (^) symbol 

Matches elements that begin with a given string. This could be helpful when all button IDs start with their action and the rest of it is dynamically generated. For example, let’s say we can choose this selector for a “Submit” button:

<button data-test-id="submit-btn_6473286547382">Submit</button>

Since we know that the Submit button will start with ‘submit’ we can use the caret to match for that prefix like this:

const submitButton = await page.locator('[data-test-id^="submit"]');

Much easier to read and generally less affected by changes in the code. 

The dollar ($) symbol

Matches elements that end with a given string. This solves the opposite problem, when the important part of the element name is at the end of the name. Take this:

<button data-test-id="action_67382568-submit">Submit</button>

You can use the $ modifier to shorten the name of the selector thusly:

const actionButtons = await page.locators('[data-test-id$="submit"]');

Oooh. Ahhh. 

The asterisk (*) symbol

Matches elements that contain a given substring. This super modifier is helpful when the important part of the element name is jammed in between a bunch of other stuff, like this:

<button data-test-id="btn_6473286547382-submit_43875">Submit</button>

Here’s the selector using the * symbol to simplify that element name: 

const submitButton = await page.locator('[data-test-id*="submit"]');

Way to go, *! 

Avoid location-based selectors 

A lot of web apps dynamically generate element names and IDs in the DOM tree as the page loads, so many people will use location-based selectors. Don’t be that person. Location-based selectors are flaky by nature because the DOM isn’t the same on every load. And, because the location selector is constantly changing, the tests are difficult to triage and slower to run. These are the selectors to avoid.

Selectors that are defined relative to the position within the parent element change too frequently to be useful and will lead to false positives if there’s more than one instance of the asserted text on the page. 

.first(), .last(), and :nth-child() selectors

These do not refer to anything static on the page. They’re defined relative to the position of a parent or other element, and that relative position can change even mid-test, causing false-positives and flakes. 

The parent/:text selector combination

Sometimes you have to work with what you’ve got. This selector will work if the text content is stable and unique, but the combination targets an element based on its relationship to a parent element — which still creates a risk that things will move unexpectedly and break the test.

“:text” selectors specifically are also incredibly vague. They apply to every matching element on a page and aren’t case-sensitive. For instance, ‘:text(“Purchase”)’ could relate to the page’s header — as in, “Select something to purchase" — or a button on the page with the text “Purchase item."

You can improve the reliability somewhat with a selector like ‘button:has-text(“Purchase”), but it's best just to use element IDs wherever possible.

Now you’re ready to write tests like a QA Wolf! 🐶

It takes work and planning to build a comprehensive test suite that runs quickly and reliably. Follow our advice and you can have a suite with thousands of tests running every time a developer deploys. 

Or make life easier on yourself: Have QA Wolf handle your testing needs! We’ll build out comprehensive coverage in less than 4 months and handle all the maintenance for you. You won’t even have to think about flaky tests — we investigate each failure, and only pass the verified bugs into your issue tracker.

Schedule a demo below and we’ll talk about your needs.

Keep reading