Sunday, 3 July 2016

Why we're switching to Selenium Grid

The department that I am part of has gone through a big growth spurt recently. When I started in my role, just over a year ago, there were 20 testers. Now there are 30. That jump is indicative of what has happened in all disciplines of software delivery.

This growth is starting to create some interesting problems in the execution of our test automation. In particular for our web-based retail banking application, which is a relatively young product that has had test automation embedded in the development approach since the very beginning.

Alongside a comprehensive unit test suite, we've been using Selenium WebDriver to execute tests against Firefox. We call these tests our "automated acceptance suite" (AAS) or "node tests", which is a reference to the mock server technology that these tests execute against.

In the beginning the application was small and the node tests that ran alongside it were quick. As the product has grown we've added more tests, so they take longer to execute. When the fast feedback provided by our automation was no longer fast enough, we switched our tests from single thread to parallel execution.

In the beginning there was just a single development team and the node tests ran every time that a change was made. As the number of teams has grown the number of changes being made has increased, so the tests are being executed more frequently. When our build queues started to exceed reasonable lengths, we switched from a dedicated continuous integration hardware to docker containers that increased the number of builds we could execute in parallel.

Our solution to problems introduced by growth has been to do more things at once.

To get the tests to run faster we switched the test implementation to parallel execution.

To get the build queues to be shorter we switched the infrastructure to parallel execution.

These were good solutions for us. But now we're coming to the point where we can't do any more things at once with what we have. To illustrate, compare what was running on our build server against what is running there now:

In the beginning we had dedicated hardware. It ran a node server to return mock responses, a web server for our product, and the tests that opened a single Firefox window to execute against.

In our current state we have four active docker containers. Each runs a node server, a web server, and the tests that open four Firefox windows to execute against.

In our current state we're hitting the limits of what our infrastructure can do. This is manifesting in two types of problem that are causing a lot of frustration as they fundamentally impact on two key measures for the usefulness of automation: speed and stability.

Our current state can be slow, particularly when there are four builds executing at once and the hardware is fully loaded. Our overnight build time is approximately 30 minutes. By contrast, when a build executes during business hours it takes approximately 50 minutes.

I find it easiest to explain why this happens using an analogy. Imagine a horse towing a cart with four large pumpkins in it. The horse can trot down the street quite happily, relatively unencumbered by its load. Now imagine the same horse towing a cart with 28 large pumpkins in it. The horse can still move the cart, but it won't be able to travel at the same pace that it did with a lighter load. It may trudge rather than trot.

Our overnight build is carried by the lightly loaded horse as it may be the only build active on our hardware. Our build during business hours is carried by the heavily-laden horse as many builds run at once. The time taken to complete a build alters accordingly.

The instability we've seen comes partly from this variable speed. There's a particular case where we look for a success notification that is only displayed for a fixed duration. When the timing to complete the action that triggers this notification is variable, it becomes frustrating to verify.

But we've also had stability problems with the four Firefox browsers running on a single display. Some failures are caused by tests running in parallel that fight for focus e.g. attempting to confirm a payment via a modal dialog. Others are attributed to two different tests that simultaneously attempt to hover and click the mouse e.g. editing an account image. When these clashes occur, one of the tests involved will usually fail.

Our operations team ran some diagnostics on the existing hardware to determine what made it slow. They identified which processes were chewing up the most system resources or the largest pumpkins on the cart. It turned out that there was a clear single culprit: Firefox.

Enter Selenium Grid.

Selenium Grid enables a distributed test execution environment. What this means in our case is that we can move all of the Firefox instances out of our docker containers. This will significantly lighten the load on our existing continuous integration infrastructure:

In the proposed future state, our tests will trigger to the Selenium Grid Hub on our cloud-based infrastructure. The hub will have connectivity to a pool of Selenium Grid Nodes. Instead of having multiple Firefox windows open on a single display, we're provisioning each node in a dedicated container with a single browser.

Each grid node will know where it was triggered from, as the browser will still open the web application that is running on the existing docker architecture. This does mean that we are introducing network latency into each of our WebDriver interactions, so they'll be slower than on local hardware. But the distributed architecture should give us enough advantages that we still end up with a faster solution overall.

Our hope is that this proposed future will address our existing speed and stability issues. Increasing the system resource available through the introduction of hardware should help us to get consistent build times, regardless of the time of day. And having each Firefox browser in its own dedicated container should avoid any display contention.

We have a working prototype for the proposed future state and early signs are promising. I'm looking forward to turning the vision into reality and hope that it will bring benefits that we are searching for.


  1. This is a great solution, but I've got to ask a few follow-up questions.

    1) Who is maintaining the Selenium Grid?
    2) With the limitation of the Selenium nodes to only handle certain browser types (Chrome/Firefox) since because Docker. How does one acquire full cross coverage between Mac/Windows and all major browser versions?
    3) What's the benefit of spinning up your company's grid vs. utilizing a SaaS (Selenium As A Service) solution a la BrowserStack or SauceLabs? Cost? Privacy? Assorted Amalgamation?

    1. 1) The grid is currently maintained by a developer as we prototype but will be owned by the operations team who look after all of our infrastructure.

      2 & 3) We are looking to establish an instance of Browser Stack within our internal network, but this work isn't scheduled to be completed for a while. This is an interim step towards that.

  2. Wouldn't executing your tests on a headless browser solve your issue of browser confilcts i.e the one where it was trying to steal focus?

    1. Hi Vishal, we have some of these issues with browser conflicts. Also in some occasion the selenium grid fails to instantiate a browser and the requests gets queued up. The selenium node needs to restarted in order to clear the requests. Do you suggest this issue can be addressed using headless browsers. Thanks