The QA Wolf guide to continuous deployment

The roadmap to safe, sustainable continuous deployment

Level 0: What even is a CI/CD pipeline?

Level 1: Unit and integration tests run before merging

Level 2: Automatic deployment to a stable, pre-production environment

Level 3: Concurrent E2E regression tests for 80% of workflows

Level 4: Automatic deployment to production

Level 5: PR validation in ephemeral preview environments

Let’s talk about your continuous deployment pipeline

The roadmap to safe, sustainable continuous deployment

Because there are better ways of releasing software than “move fast and break things.” That might work in the fast and loose start-up days, but once you’ve found your market, your users expect a certain level of reliability. Continuous deployment speeds up developers but bakes in safety and sustainability.

Teams practicing continuous deployment reap all manner of benefits:

Fewer bugs
Increased number of deployments
Faster recovery from production incidents
Higher job satisfaction
Better organizational culture
Greater levels of psychological safety
More mission-driven culture

Testing is the key

On top of the system checks, the CI/CD pipeline executes automated tests that determine if the build can go to production. Stringent tests give teams confidence that an application is production-ready.

When you make testing the center of your approach, you’ll be able to increase your speed sustainably; if you just focus on the quickest way to release to production, you’ll hit a wall because you’ll release bugs that need constant fixing.

📍You are here

Every team and company evolves a DevOps process that fits their working style, tech stack, and budget, but generally, we see a Crawl → Walk → Run evolution. No matter where your team is on that journey, this guide will help you reach the next level and the one after that.

Feel free to start at the top or skip ahead to the description of your pipeline.

Level 0: What is a CI/CD pipeline?
🙃 ‍
Level 1: Unit and integration tests run before merging
Also known as continuous integration and the starting point for continuous delivery and deployment. ‍
Level 2: Automatic deployment to a stable, pre-production environment
A staging or preview environment that's close to production as possible.‍
Level 3: Concurrent end-to-end regression tests for 80% of workflows
If your test suite takes less than an hour, then congratulations on reaching continuous delivery. ‍
Level 4: Automatic deployment to production
If your fully-tested build reaches production with no human involvement you're a continuous deployment shop. Good on ya! ‍
Level 5: PR validation in ephemeral preview environments
If you have ephemeral environments for lambdas you can use the same technology to shift left. If you don't have them, here's how to get them.

Level 0: What even is a CI/CD pipeline?

Modern software teams use a CI/CD pipeline to manage an application build’s progress. In the continuous deployment model, the pipeline guides the build all the way to production through a series of automated checks and gates. In contrast, continuous integration and continuous delivery have similar checks, but the team manually advances the build through each.

Regardless of the model, the CI/CD pipeline consists of various system checks that work incrementally. Generally speaking, the pipeline waits to advance to the next check until it performs the current one successfully.

The CI/CD pipeline has four main phases:

Run unit and integration tests

The pipeline checks out the repository with the merged branch from the SCM (source-code manager, like GitHub or GitLab). Then, the build tool pulls all relevant dependencies and, if necessary, compiles the application. Once the pipeline builds the application, it executes unit and component integration tests. Test failures cause the pipeline to stop.

When all the tests have passed, the pipeline creates an application package (e.g., an archive file or a container). When successful, the pipeline labels any artifacts and pushes them to an artifact repository or archive, such as Artifactory or ECR.

When teams that have this in place, they are practicing continuous integration.

Deploy

When the application is successfully packaged and archived, the pipeline pulls the artifacts, deploys them to the target environments, and attempts to start the application.

Run E2E tests

This phase is arguably the most crucial stage in the continuous deployment pipeline. For continuous deployment to work, the application released to users must be stable and defect-free. The lower-level testing done up until this point has validated that the new code is functional, but only E2E testing can validate that the new code plays nice with what’s already live.

Teams that can automatically deploy and immediately run E2E tests in a pre-production environment but stop short of deploying to production can say they practice continuous delivery.

Release

Once this last testing phase has passed, teams have a choice about how to deploy the application to production. The critical distinction is whether this is done automatically or manually.

Teams with such an automated progression can say they practice continuous deployment because there’s no human intervention from pre-commit.

Continuous deployment doesn’t define the specific deployment approach—blue/green, canary, dark, shadow, recreate, A/B, etc.—and each team will decide which approach is best for them.

Level 1: Unit and integration tests run before merging

What it looks like

Each commit to your application’s repository triggers a build of the software. With the build complete, the pipeline runs unit and component integration tests that provide feedback in a few minutes.

Best practices

Teams just starting with continuous integration these days execute unit and integration tests on the build server. Many cloud-based SCMs, such as Github, have add-ons teams can use (e.g., GitHub Actions) to accomplish this task.

Standalone CI cloud services like CircleCI can work for teams whose SCMs don’t include CI add-ons.

Packaging is a separate set of complexities, and teams typically tailor their approach to their tech stack.

Cloud services like those mentioned above almost certainly have APIs that teams can use to package their application.

And, of course, containers are all the rage, and for good reason. They require a little setup but make it much easier for teams to execute their applications

👉 Read more: Continuous Integration (CI) Explained

👉 Read more: Continuous Integration

Definition of done

Congratulations! You’ve reached continuous integration. 🎉

You should now have the following elements in your pipeline:

An automated build process.
A suite of automated tests.
A CI system that runs the build and automated tests on every check-in.

You’re ready to ensure all new product changes work as intended with all your live dependencies before they hit production.

Level 2: Automatic deployment to a stable, pre-production environment

What it looks like

Once the pipeline runs the unit tests and builds and uploads the package to the repository, it downloads that application package from the artifact repository onto the target instance in the pre-production environment and starts the application.

Best practices

Bottom line: teams should test in an environment that is as production-like as possible.

Roadmap

Usually, teams start in a staging environment that is stable enough for them to run their E2E tests regularly without false positives. From there, teams step to deploying to ephemeral preview environments.

In the interest of building testing into the DNA of your DevOps process, we suggest starting slowly. We’ve encountered teams that want to go straight to production for testing, and while we admire their enthusiasm, we encourage teams to test production safely.

Considerations when testing in staging

Testing in an isolated staging/pre-production environment is an excellent way to start practicing continuous development. Teams can make staging environments stable enough for them to run E2E tests against deployed target applications; for example, teams might consider limiting automatic deployment only to those applications that meet specific coverage requirements or deploying to a dark pool where they can test the application before it starts receiving traffic.

The main drawback of testing on staging environments, apart from their potential instability, is that there are always bound to be some differences between staging and production. Some of those differences are expected and put in place specifically for testing purposes.

The staging environment is (probably) under-provisioned
Which might mean that you have to run fewer tests simultaneously. Throttling your test suite will slow down your deployment and impact velocity across the organization.

It’s almost certainly under-attended
If staging uses older versions of software than production, or there are outages with some of the dependencies, it will affect how the tests perform and could mask defects that will later appear in production.

Data may be different
And not always a good reflection of the data in production.

Access permission levels could affect what tests pass
The DevOps people probably use the principle of least privilege, which says (in this case) that the relatively free-for-all staging environment will be more permissive than production, which is secured for conducting business.

That said, teams should strive to minimize those differences where feasible, lest the staging environment become too uncontrollable and impede the team’s progress to continuous development.

Considerations when testing with preview environments

From a testing perspective, preview environments are the best of both staging and production environments. They’re isolated from the user, like staging. Still, they can be provisioned to use the same configurations and resources as production environments, and they're ephemeral—created and destroyed as needed—saving money in the long run. That said, there are some challenges with setting up preview environments.

Time and resource needs
Preview environments require significant time and resources to create and maintain the system.

You’re racing the clock
To be cost-effective, preview environments must be torn down instantly or given a time-to-live (TTL), which will be used to tear down the environment after a specified amount of time, typically a few hours. Any implementation of a TTL strategy should be tested to confirm it removes all data associated with the instance to avoid cost overruns.

The tooling for preview environments is still young.
Setting up your application to integrate with the environment may require additional work. You may need to build systems for configuration management, service discovery, secrets management, test-data management, and dependency management for preview environments to work automatically.

👉 Read more: Deployment automation

Definition of done

You should now have the following elements in your pipeline:

Universally deployable packages created by the continuous integration (CI) process
Scripts to configure the environment, deploy the packages, and perform a deployment test (sometimes known as a smoke test)
Environment-specific configuration information

Now that the team has the environment to test in, you’re ready to start gaining the confidence that your application is production-ready.

Level 3: Concurrent E2E regression tests for 80% of workflows

What it looks like

Once the pipeline deploys the application on the target environment, it executes a set of meaningful, reliable, and robust E2E regression tests. The pipeline stops just before deploying the application to production.

Best Practices

You want 80% of your application workflows covered by E2E tests.

Those tests should take 30-60 minutes to complete.

Use full test parallelization with QA Wolf to run your tests quickly.

Do not use a sharded (or per-node) approach because it’s not fast enough for continuous delivery without the team skimping on coverage. If the team skimps on coverage, bugs are bound to escape to production.

Roadmap

Start with simple smoke tests. You can use these later when you start testing in production. Build up to a few more extended acceptance tests that exercise critical functionality. Keep up with test maintenance while expanding the test suite to cover new workflows.

Once the team has mapped out all their workflows and automated at least 80% of those flows with a meaningful, reliable, and robust set of easily maintained tests that don’t yield false positives, they’ve probably noticed that executing those tests takes too long. From there, they can move on to full test parallelization. Once the team has all the tests running in parallel in 30 - 60 minutes, they are ready for automatic deployment to production

Strategies for getting meaningful, reliable, robust coverage

Write robust, easily maintainable tests

You will never eliminate test maintenance—even (especially) with self-healing AI—but you can take steps to reduce the maintenance burden.

Prevent test collisions
When running tests in parallel, there’s a risk that two or more tests can attempt to access or modify the same data, causing one or both to fail. Using unique data per test will prevent these collisions.
Add clean-up steps
Frequently, E2E tests fail because the system is not in the expected state. Have each test run clean-up steps at the beginning and end to ensure the system is always ready for deployment.
Use resilient locators
As part of a culture of quality, developers should ensure that there are element attributes with a unique value for that state of the DOM.

Flake detection

Even the best-built tests have the potential to flake. Environment issues, network goofs, and inexplicable gremlins are part of testing. Automatically re-running failing tests will significantly reduce the human investigation work your team needs to do and helps your test automators focus on actual bugs and test maintenance.

Some teams use flake detection to rerun failing tests automatically. Calculations for determining a flake in automation can get pretty nuanced, but suffice it to say that any test should pass after a couple of retries without human intervention.

👉 Read more: Flaky coverage is fake coverage

AAA framework

Arrange, Act, Assert is a pattern for writing narrowly focused tests that are easy to understand, write, and debug. Maintenance is more sustainable with smaller tests. AAA provides a standard structure for tests, allowing teams to identify gaps in coverage and isolate bugs quickly. If everyone uses it, it can become a lingua franca, enabling team members to communicate more effectively about test structure.

Considerations for full test parallelization

For teams who want to build their own fully parallel testing system, there are three primary areas of concern: design, infrastructure, and test concurrency.

Design and Infrastructure

There is no one way to design and build the infrastructure teams need to run their tests fully in parallel. But, tracking down information on how to build it can be frustrating because there aren’t many organizations doing it. We’re happy to share a few things about our system, and you can check that out below.

Test concurrency

If the team wants to run all of their tests concurrently, they need to write those tests in a particular way. Tests must be isolated so they don’t share data or state with other concurrent tests. They also need to be autonomous and not dependent on other tests. Lastly, they should be built in such a way that they return the same result every time the pipeline executes them; in other words, they need to be idempotent

Preparing your environment for parallelization

Keep in mind that the environment you’re testing in has to be able to handle the volume. When the team runs all their tests in parallel, the system needs to handle the load; otherwise, the team will end up with test failures and probably some other problems.

For more on that, go back to testing in a staging environment.

Definition of done

Congratulations! You’ve reached continuous delivery. 🎉

Your team should now have the following elements in the pipeline:

A set of meaningful, reliable, robust E2E regression tests
A method for running all these tests so they are completed in 30-60 minutes
Dashboards where teams can easily view build progress

Teams that make it here ready for pain-free releases and increased deployment frequency.

Level 4: Automatic deployment to production

What it looks like

Once the pipeline has executed the E2E tests and they’ve passed, it deploys the same application artifact onto the production environment. From there, it automatically executes another set of tests. If those are successful, it performs actions to direct some traffic to the new application version.

Best practices

Teams should execute regular health checks every minute or so in production to ensure the applications are running smoothly.

Roadmap

Before releasing a feature directly to production, teams should start with automated sanity tests. As the team improves, they build up to automated acceptance tests.

Testing strategies for production environments

The production environment is tied to business systems, like accounting or shipping systems, so your test suite needs to be able to run transactions without generating a lot of noise for other parts of the business.

Avoid business systems with “harmless” smoke tests
Running tests that don’t impact back-end business systems conveniently avoids creating noise in other parts of the business. Still, those workflows are often the most critical to revenue and the user experience—not the kinds of things you want to leave untested.

Work around business systems with blue/green deployment
In a blue/green deployment pipeline, there are two identical production applications—the one receiving internal traffic and one receiving external traffic. Public traffic is switched over to the fully-tested pool when testing is complete. This strategy also allows for fast recovery since the pools can be swapped out easily until a new release candidate arrives.

Chaperone business systems with synthetic transactions
Synthetic (fake) transactions are like continuously-executing tests. The team can monitor log outputs for these transactions as new releases are deployed and watch for defects.

Hide business systems with feature flagging
Feature flagging allows teams to push features into production in a disabled state. Teams can turn on flagged features for a limited set of customers, such as test users. Feature flagging can enable canary releases, ramped releases, and A/B testing.

👉 Read more: Testing in Production: The safe way

Definition of done

Congratulations! You’ve reached continuous deployment. 🎉

At this point, your pipeline should have the following elements

Automatic deployment of the application artifact to the production environment after all E2E tests have passed
Automatic testing once the application is in production and automatic application rollback if any of those tests fail
One or more mechanisms for shielding customers from the risks of new applications

It would seem like releasing to production after the E2E test phase would be the end of the team’s journey. But teams that continuously improve keep going by shifting left.

Level 5: PR validation in ephemeral preview environments

What it looks like

Once a pull request is created for a new feature, a series of tests and checks are executed against the feature branch. If those checks pass, the pipeline merges the feature, triggering the CI/CD pipeline.

Best practices

Teams should implement PR validation. To do so, they create an additional pull-request pipeline, including testing and static analysis.

Testing results should be published to the pull request so they’re available to reviewers.

Any failures in this pipeline should block the pull request to prevent merging it until the developer resolves the failures.

To run E2E tests in the pull-request pipeline, teams should create a build artifact from the feature branch and deploy it to ephemeral preview environments because these environments are isolated and stable.

Testing pre-merge

Teams typically deploy the feature branch on a preview environment because such environments are isolated and stable. If they want to test before merging, teams need to have a reasonably sophisticated system in place, which might include an orchestrator, a configuration discovery mechanism, a secrets management mechanism, etc.

The advantage of pre-merge testing is that it allows developers to audition changes and increase their confidence before closing their eyes and pushing the button that kicks off the chain reaction and sends their code change straight to production.

Definition of done

Your team can declare victory when you have:

A pre-merge pipeline that triggers when a developer opens a pull request
Ephemeral preview instances
A mechanism to build an artifact from feature branches
A mechanism to deploy said builds onto preview instances and configure them correctly

Let’s talk about your continuous deployment pipeline

QA Wolf provides a complete solution for teams who want to get to continuous deployment with comprehensive E2E testing. When you let us handle the creation, execution, and maintenance of your E2E tests, you’ll free up cycles on your team that will allow them to start delivering higher-quality releases more quickly.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

the complete and indispensable guide to reaching continuous deployment

“Continuous delivery is the ability to get changes of all types—including new features, configuration changes, bug fixes, and experiments—into production, or the hands of users, safely and quickly in a sustainable way.” —Jez Humble

The roadmap to safe, sustainable continuous deployment

Testing is the key

📍You are here

Level 0: What even is a CI/CD pipeline?

Run unit and integration tests

Deploy

Run E2E tests

Release

Level 1: Unit and integration tests run before merging

What it looks like

Best practices

Definition of done

Level 2: Automatic deployment to a stable, pre-production environment

What it looks like

Best practices

Roadmap

Considerations when testing in staging

Considerations when testing with preview environments

Definition of done

Level 3: Concurrent E2E regression tests for 80% of workflows

What it looks like

Best Practices

Roadmap

Strategies for getting meaningful, reliable, robust coverage

Write robust, easily maintainable tests

Flake detection

AAA framework

Considerations for full test parallelization

Design and Infrastructure

Test concurrency

Preparing your environment for parallelization

Definition of done

Level 4: Automatic deployment to production

What it looks like

Best practices

Roadmap

Testing strategies for production environments

Definition of done

Level 5: PR validation in ephemeral preview environments

What it looks like

Best practices

Testing pre-merge

Definition of done

Let’s talk about your continuous deployment pipeline

About QA Wolf

Resources

Legal

Hello!