The three main challenges of testing Salesforce, explained

John Gluck
Lauren Gibson
June 4, 2025

Q: What’s worse: doing your taxes or waiting at the DMV?

A: Testing Salesforce.

Your testers aren’t exaggerating. Testing Salesforce apps, whether manually or automated, is painstaking compared to testing other apps. Features like the shadow DOM and extremely configurable UIs aren’t just hard to test around, they affect how you build, run, and maintain tests at every level. 

Let’s break down where those challenges show up, why they matter, and how they push against automation strategies that work fine elsewhere.

Challenge 1: Salesforce’s DOM structure works against test reliability

Stable end-to-end tests depend on the ability to locate UI elements predictably. Salesforce makes that hard. Its DOM structure is massive, opaque, and constantly shifting based on user state or data. Test frameworks have to traverse DOMs that contain layers of templates, shadow roots, and conditionally rendered elements. That slows tests and raises the risk of failures from mismatches, over-matches, or elements that aren’t visible when needed—especially with XPath or dynamic selectors.

No access to static locators

QA engineers know that the most reliable way to identify elements in a test is with custom attributes like data-testid. These attributes are static, uniquely scoped, and decoupled from layout changes—ideal for automation.

But Salesforce doesn’t expose them. Most of its DOM elements are generated dynamically, and developers don’t have a way to inject persistent IDs or test-specific attributes. That forces teams to rely on less stable fallback options.

Ever-changing DOM makes it difficult to write stable CSS selectors

In typical web applications, CSS selectors are the second-best option. They can target elements based on tag names, classes, or hierarchy, and are generally fast to evaluate.

But Salesforce complicates that too. Elements built from managed packages or custom components often include auto-generated prefixes in their id or class attributes. These prefixes can differ between environments or updates, which means selectors have to change with each update on each environment. 

On top of that, Salesforce components are built from reusable templates. These templates can appear multiple times on a page, depending on the data, user role, or app state. That means the same selector might match several elements—each from a different instance of the template—causing tests to fail, take longer, or mask bugs.

XPath is a last resort and a maintenance trap

Without stable attributes or consistent structure, testers fall back to XPath, which is more flexible than CSS and can match elements based on text content or position. But that power comes with trade-offs.

XPath is slower to evaluate and more brittle. The more deeply nested the DOM, the longer it takes to traverse. Worse, XPath selectors are tightly coupled to the structure of the page, which means even minor layout shifts can break them.

In a typical application, XPath is the selector of last resort. In Salesforce, XPath is often the first. That reverses the usual maintenance curve. Instead of using XPath to cover edge cases, teams end up rebuilding XPath locators every time the UI moves.

Shadow DOMs and delayed rendering block element access

Many Salesforce components hide their internal structure using the shadow DOM. Legacy frameworks like Selenium don’t support accessing these elements and require additional libraries. Newer frameworks like Playwright support shadow DOM interaction natively, but switching frameworks is rarely easy for any team.

Salesforce often delays rendering until a component is scrolled into view or triggered by interaction. To handle that, your test needs to scroll or interact in just the right way to get the component to load where it’s expected, anticipating where the element will appear and setting up the conditions for it to render in the first place.

Challenge 2: Salesforce environments limit test speed and data reliability

Salesforce testing environments, sandboxes and scratch orgs, aren’t built for speed or stability. They’re designed for safety and isolation to protect production data, but they slow tests down and make realistic data harder to manage.  

Environments are under-provisioned by design

Salesforce sandboxes and scratch orgs run on shared infrastructure, with deliberately limited compute and memory allocation. They’re meant for internal development, not high-throughput automation.

As a result, the application responds more slowly, especially under load. Every action—navigation, form submission, query—takes longer, increasing test runtime and the likelihood of flakes in otherwise stable flows. 

Governor limits throttle execution

Salesforce enforces strict resource limits to protect its multi-tenant architecture. These include:

  • API rate limits: Your test suite can only make a finite number of API calls in a given window. Exceed that, and requests start failing.
  • Governor limits: These cap things like CPU time, database rows queried, and DML operations within a single transaction.

These limits force test setup steps that minimize record creation, avoid chained queries, and batch writes carefully. Even moderate complexity—like creating related objects or loading reference metadata—can trigger throttling and cause unpredictable failures.

If your test code doesn’t explicitly account for these thresholds, you’ll hit flakes that look random but aren’t.

Shared infrastructure means unpredictable noise

Even though Salesforce enforces strong tenant isolation within Apex, the underlying infrastructure is still shared. That means activity from other orgs (even in different regions) can introduce background load that slows down your environment.

This is especially noticeable in lower-tier sandboxes and scratch orgs. These environments cost less because they run on limited resources, but in automated testing, that constraint magnifies the risk of timeouts, flaky interactions, and misleading failures.

Test data is constrained by data integrity rules

Salesforce enforces strict data integrity at the object level. Validation rules, workflows, triggers, and security constraints all run automatically when you create or modify records even in test environments.

That makes test data setup complicated. You can’t just update fields in place like you might in a relational test database. Instead, most teams:

  • Export sanitized snapshots of production data.
  • Subset and migrate them into test environments.
  • Run synthetic transactions to generate valid records.

Each of these paths comes with risk:

  • Modifying the snapshot before import can break relationships between records or cause type mismatches.
  • Changing records post-import may trigger business logic that skews the environment or introduces performance lag.
  • Running transactions in production can backfire if they touch shared objects or pollute analytics.
  • Building synthetic data from scratch offers the most control—but it’s expensive and brittle to maintain.

The net result: Even getting your data into place takes planning, tooling, and review. That’s overhead before your first assertion runs.

Poor test data strategies hurt performance and reliability

Test data determines both how tests are set up and how they perform. On Salesforce, poor data design leads to:

  • Longer load times, especially for objects with complex master-detail or lookup relationships.
  • Slow evaluation of sharing rules and security filters.
  • Cascading validation delays across related records.

And since many of these structures are inherited from production schemas, testers don’t always have the option to simplify them. Instead, they’re stuck working around the delays—with retries, waits, or redundant validations that drag out the suite.

Challenge 3: Salesforce’s surface area explodes faster than testers can track

Salesforce multiplies complexity through configuration. Even simple workflows surface hundreds of conditional paths depending on user profiles, record types, object permissions, and interface variants. That makes it hard to scope what to test, harder to test it efficiently, and nearly impossible to guarantee complete coverage.

Configuration multiplies test cases fast

Configuration settings such as profiles, layout rules, record types, and other app-specific settings all influence what a user can see and do. These settings interact in ways that change how a test behaves—sometimes requiring different selectors, flows, or expectations for each combination.  

It’s not uncommon for a basic lead flow to require dozens of permutations to cover all relevant profiles and field-level behaviors. That’s before accounting for automation like validation rules, flows, or triggers, which may fire only under specific data or context conditions.

Compatibility testing includes more layers than most apps

Salesforce doesn’t just run in browsers or on mobile, it connects to other systems, layers legacy interfaces over new ones, and depends on configuration-specific APIs. That means compatibility testing must cover more than the usual browser and device matrix.

At a minimum, teams need to test:

  • Third-party integrations (payment systems, marketing platforms, analytics tools).
  • Salesforce APIs (REST, SOAP, Bulk, Streaming) across versions.
  • Lightning vs. Classic UI compatibility, where both are still in use.
  • Org-specific features, which may behave differently based on cloud (Sales, Service, etc.) or edition (Professional, Enterprise, etc.).

For ISVs—those who are developing plugins for Salesforce—the surface area is even larger. Tests must also cover:

  • Managed and unmanaged package behavior across installs and upgrades.
  • Namespace versioning impacts.
  • AppExchange security review scenarios and constraints.

In short: You’re not just testing one app. You’re testing a shifting patchwork of UI, API, configuration, and integration logic—each with its own lifecycle and context.

Variants overwhelm manual testing and lead to missed coverage 

Testers usually start automation planning by manually going through each flow to capture all the variations. In Salesforce, this gets tedious fast. The number of configurations—based on profiles, record types, field-level rules, and UI changes—makes it hard to test every path reliably.  

This complexity increases the risk of missing coverage. If something looks fine under one profile or data set, it’s easy to overlook how it behaves elsewhere.

Coverage also suffers when developers use Flow Builder or Process Builder to add business logic behind the scenes. These tools often run logic that isn’t visible in the UI and may not be documented. If that logic or those use cases aren’t exposed during normal use, testers won’t know to account for them. 

Even well-planned test suites can miss logic that’s buried in configuration or only shows up under specific conditions. 

What to do with all this

Salesforce introduces challenges most test frameworks weren’t built to handle: complex DOMs, rigid data rules, and an explosion of variants. If you’re running into flakes, latency, or escapes caused by coverage gaps, it’s not just you. These are built-in hazards of testing on Salesforce. 

QA Wolf builds, runs, and maintains black-box tests for complex stacks, including Salesforce. We handle the entire test lifecycle so your team can stay focused on shipping. If you’re ready to offload the hard parts, let’s talk.

Some disclaimer text about how subscribing also opts user into occasional promo spam

Keep reading

E2E testing
Automated Mobile Testing Without Tradeoffs: The QA Wolf Approach
Culture of quality
The Test Pyramid is a relic of a bygone era
Culture of quality
What your system should do with a flaky test