Monday, 25 April 2016

What problems do we have with our test automation?

One of the things I like about my role is that I get to work with different teams who are testing different products. The opportunities and challenges in each of our teams is very different, which gives me a lot of material for problem solving and experimenting as a coach.

I recently spoke with a colleague from another department in my organisation who wanted to know what problems we were currently experiencing with our test automation. It was something I hadn't had to articulate before. As I answered, I grouped my thoughts in to four distinct contexts.

I'd like to share what we are currently struggling with to illustrate the variety of challenges in test automation, even within a single department of a single organisation.

Maintenance at Maturity

We have an automation suite that's over four years old. It has grown alongside the product under development, which is a single page JavaScript web application. The test suite has been contributed to by in excess of 50 people, including both testers and developers.

This suite is embedded in the development lifecycle of the product. It runs every time code is merged into the master branch of the application. Testers and developers are in the habit of contributing code as part of their day-to-day activities and examine the test results several times daily.

In the past four months we have made a concerted effort to improve our execution stability and speed. We undertook a large refactoring exercise to get the tests executing in parallel, they now take approximately 30 minutes to run.

We want to keep this state while continuing to adapt our coverage to the growing application. We want to continue to be sensible about what we're using the tool to check, to continue to use robust coding practices that will succeed when tests are executing in parallel, to continue to keep good logging messages and screenshots of failures that help us accurately identify the reasons.

There's no disagreement on these points. The challenge is in continued collective ownership of this work. It can be hard to keep the bigger picture of our automation strategy in sight when working day-to-day on stories. And it's easy to think that you can be lazy just once.

To help, we try to keep our maintenance needs visible. Every build failure will create a message in the testing team chat. All changes to the test code go through the same code review mechanism as changes to the application code, but the focus is on sharing between testers rather than between developers.

Keeping shared ownership of maintenance requires ongoing commitment from the whole team.

Targeted Tools

Another team is working with a dynamic website driven by a content management system. They have three separate tools that each provide a specific type of checking:

  1. Scenario based tests that examine user flows through specific functions of the site
  2. Scanner that checks different pages for specific technical problems e.g. JavaScript errors
  3. Visual regression tool that performs image comparisons on page layout

The information provided by each tool is very different, which means that each will detect different types of potential problems. Together they provide a useful coverage for the site.

The scanner and visual regression tool are relatively quick to adapt to changes in the site itself. The scenario based tests are targeted in very specific areas that rarely change. This means that this suite doesn't require a lot of maintenance.

Because the test code isn't touched often, it can be challenging when it does need to be updated. It's difficult to remember how the code is structured, how to run tests locally, and the idiosyncrasies in each of the three tools.

All of the tests are run frequently and are generally stable. When they do fail, it's often due to environmental issues in the test environments. This means that when something really does go wrong, it takes time to work out what.

It sounds strange, but part of the challenge is debugging unfamiliar code and interpreting unfamiliar log output. It's our code, but we are hands-on with it so infrequently that there's a bit of a learning curve every time.

Moving to Mock

In a third area of my department we've previously done a lot of full stack automation. We tested through the browser-based front-end, but then went through the middleware, down to our mainframe applications, out to databases, etc.

To see a successful execution in this full stack approach we needed everything in our test environment to be working and stable, not just the application being tested. This was sometimes a difficult thing to achieve.

In addition to occasionally flaky environments, there were challenges with test data. The information in every part of the environment had to be provisioned and align. Each year all of the test environments go through a mandatory data refresh, which means starting from scratch.

We're moving to a suite that runs against mocked data. Now when we test the browser-based front-end, that's all we're testing. This has been a big change in both mindset and implementation. Over the past six months we've slowly turned a prototype into a suite that's becoming more widely adopted.

The biggest challenge has been educating the teams so that they feel comfortable with the new suite. How to install it, how to configure it, how to write tests, how to capture test data, how to troubleshoot problems, etc. It's been difficult to capture all of this information in a way that's useful, then propagate it through the teams who work with this particular product.

Getting people comfortable isn't just about providing information. It's been challenging to persuade key people of the benefits of switching tack, offer one-on-one support to people as they learn, and embed this change in multiple development teams.


The final area we are using automation is in our mobile testing. We develop four native mobile applications: two on iOS and two on Android. In the mobile team the pace of change is astonishing. The platforms shift underneath our product on a regular basis due to both device and operating system upgrades.

We've had various suites in our mobile teams but their shelf life seems to be very short. Rather than pour effort in to maintenance we've decided on more than one occasion to start again. Now our strategy in this space is driven by quick wins.

We're working to automate simple smoke tests that cover at least a "Top 10" of the actions our users complete in each of the applications according to our analytics. These tests will then run against a set of devices i.e. four different android devices for tests of an android application.

Our challenge is alignment. We have four native mobile applications. At the moment the associated automation is in different stages of this boom-and-bust cycle. We have useful and fast feedback, but the coverage is inconsistent.

To achieve alignment, we need to be better about an equal time investment in developing and maintaining these suites. Given the rate of change, this is an ongoing challenge.


That's where we're at right now. I hasten to add that there are a lot of excellent things happening with our automation too, but that wasn't the question I was asked!

I'm curious as to whether any of these problems resonate with others, how the challenges you face differ, or if you're trying solutions that differ to what we're attempting.


  1. Thank you for the informative article, I would love to know more about the visual regression tool.
    We have tried many different ones for image comparison, but none worked as we hoped.

    1. We've written our own image comparison in Java that makes use of ImageMagick compare via command line to highlight visual differences between versions.

  2. Hi Katrina, your post is coming to the right time. Just yesterday I was finally having some time to sort my thoughts on risks in implementing test automation, as I have no experience with a running test automation suite at all. I "just" need to implement one. And I don't need to implement it, I have someone to do that for me. You can find my thoughts here:

    But I wanted to thank you for describing your four real-live challenges. I'd like to know, for those scenarios, how much do you trust your tests?
    Example 1 sounds interesting, as it treats test code like production code, has many contributors and people are informed about the problems, people double-check what has been implemented, etc. Would you say that that team trusts their automated tests?
    The team with the targeted tools and the smoke test suite, I'd be interested how much they actually double-check, unintentional of course, but covering the same ground twice, once per script, once "manually".

    Thanks again for sharing your, as always, well analyzed insights.

  3. Interesting post, but can you elaborate a bit more on mocking the test data? In what are you doing it? That's certainly one of the problems that we face, someone messes with the test data (for whatever reason) and tests fail.

    1. We're written our own lightweight web server in node.js that returns mock responses for testing in place of our normal web services layer.

  4. Thanks for that Katrina - good to hear real experiences and well done on the mocking out the services. cheers, J

  5. "To see a successful execution in this full stack approach we needed everything in our test environment to be working and stable, not just the application being tested. This was sometimes a difficult thing to achieve."

    In my view, this is a hugely important thing for testers to notice, for several reasons. Let me offer three.

    First, it is a test result in its own right. It may point to testability problems intrinsic to the product, or to project-related problems.

    Second, problems in the test environment may reveal problems in the product that manifest when the production environment is not running perfectly well.

    Third, dealing with these problems reduces the amount of scrutiny you can give to the product; that is, dealing with these problems reduces coverage.

    This third problem is a particularly big deal, because in any project, we have a finite amount of time to develop and test the product. Out of the universe of all the problems that we could find, the problems that we do find in that constrained time are the easiest ones. We don't find the harder, deeper, more rare, more subtle, more intermittent problems. Every minute we struggle against the current is another minute that we can't spend finding deeper bugs. My observation is that, in general, testers aren't terribly great at relating such problems to undiscovered bugs and the associated business risk.

    ---Michael B.

  6. Hey Katrina,

    These are all tough problems and I don't think I can claim to have full solutions, but I'll share my experience:

    Maintenance at Maturity - shared ownership is worst ownership. "Every build failure will create a message in the testing team chat" - this brought back memories for me. I remember being in a team like this - we didn't want to bother the developers with all the failures because they would complain it was "too noisy", and we'd only just managed to get them to pay attention to test failures at all. In my experience, if you want developers to help with the maintenance then it requires a massive cultural shift in the development team. The only way I've seen this shift work is when management decisively announces that developers are the sole owners of the test suite, and testers are now only responsible for the tooling. I can't say it was popular, but it did work. It required a firm hand and a lot of buy-in from the teams.

    Targeted Tools - "When they do fail, it's often due to environmental issues in the test environments." I bet at some point one of you has questioned if these tests even have value at all, and if you should just retire them. Where did you land on that and why? What kind of real bugs are they detecting and is there a cheaper way to detect these issues?

    Moving to Mock - This is an awesome move, and well worth the adoption pain in my opinion. I'm going to guess that part of the reason it's so hard to convince people to move to them is because the dev teams are not already feeling the pain of full stack testing because the test team absorbs it for them. This might be an easier move if you can tie it in with the ownership change as a kind of gift - "yes we know these tests are currently hard to maintain but we ultimately want to increase your productivity, not decrease it. To that end, we are are working on making it easier for you by providing the following solutions: mocked data, etc etc"

    Smokin' - Tricky, but I've got a few ideas:
    - How complex are these mobile apps? Given the high rate of change, how much time does it take to do a smoke test manually? Is it really worth automating these smoke tests if it only takes 5-10 minutes to do by hand (especially if you're spending much more time maintaining the automated tests)? Can you rotate the manual smoke test among team members, or outsource it to someone cheap?
    - Do you have access to any crawler-type tools that will do simple crawling / crash detection on your apps? This might be a half-decent type of automated smoke test that doesn't require as much maintenance.
    - If manual testing takes too long, can you create a halfway solution and use tool-assisted manual testing? e.g. Seed data to create forced states, automatic user setup, automatic build installation, etc.

    Hope that helps!