Thursday, 18 August 2016

Post-merge test automation failures

Recently we implemented selenium grid for one of our automated suites. I've written about our reasons for this change, but in short we wanted to improve the speed and stability of our automation. Happily we've seen both those benefits.

We've also seen a noticeable jump in the number of pull requests that are successfully merged back to our master branch each day. This gives some weight to the idea that our rate of application code change was previously impeded by our test infrastructure.

The increase in volume occasionally causes a problem when two feature branches are merged back to master in quick succession. Our tests fail on the second build of the master branch post-merge.

To illustrate, imagine that there are two open pull requests for two feature branches: orange and purple. We can trigger multiple pull request (PR) builds in parallel, so the two delivery teams who are behind these feature branches can receive feedback about their code simultaneously.

When a PR build passes successfully and the code has been through peer review, it can be merged back to the master branch. Each time the master branch changes it triggers the same test suite that executes for a pull request.

We do not trigger multiple builds against master in parallel. If two pull requests are merged in quick succession the first will build immediately and the second will trigger a build that waits for the first to complete before executing. Sometimes the second build will fail.

1. Failing tests after multiple PR merges to master

As the person who had driven sweeping test infrastructure changes, when this happened the first time I assumed that the test automation was somehow faulty. The real issue was that the code changes in orange and purple, while not in conflict with each other at a source code level, caused unexpected problems when put together. The failing tests reflected this.

We hadn't seen this problem previously because our pull requests were rarely merged in such quick succession. They were widely spaced, which meant that when the developer pulled from master to their branch at the beginning of the merge process these type of failures were discovered and resolved.

I raised this as a topic of conversation during Lean Coffee at CAST2016 to find out how other teams move quickly with continuous integration. Those present offered up some possible options to resolve the problem as I described it.

Trunk based development

Google and Facebook move a lot faster than my organisation. Someone suggested that I research these companies to learn about their branching and merging strategy.

I duly found Google's vs Facebook's Trunk Based Development by Paul Hammant and was slightly surprised to see a relevant visualisation at the very top of the article:

2. Google's vs Facebook's Trunk Based Development by Paul Hammant

It seems that, to move very quickly with a large number of people contributing to a code base, trunk-based development is preferred. As the previous diagram illustrates, we currently use a mainline approach with feature branches. This creates larger opportunities for conflicts due to merging.

I had assumed that all possible solutions to these tests failing on master would be a testing-focused. However, a switch to trunk-based development would be a significant change to our practices for every person writing code. I think this solution is too big for the problem.

Sequential build

Someone else suggested that perhaps we were just going faster than we should be. If we weren't running any build requests in parallel and instead triggered everything sequentially, would there still be a problem?

I don't think that switching to sequential builds would fix our issue as the step to trigger the merge is a manual one. A pull request might have successfully passed tests but be waiting on peer review from other developers. In the event that no changes are required by reviewers, the pull request could be merged to master at a time that still creates conflict:

3. Sequential PR build with rapid merge timing

The pull request build being sequential would slow our feedback loop to the delivery teams with no certain benefit.

Staged Build

Another suggestion was to look at introducing an interim step to our branching strategy. Instead of feature branches to master, we'd have a staging zone that might work something like this:

4. Introducing a staging area

The staging branch would use sequential builds. If a test passes there, then it can go to master. If a test fails there, then it doesn't go to master. The theory is that master is always passing.

Where this solution gets a little vague is how the staging branch might automatically rollback a merge. I'm not sure whether it's possible to automatically back changes off a branch based on a test result from continuous integration. If this were possible, why wouldn't we just do this with master instead of introducing an interim step?

I'm relatively sure that the person who suggested this hadn't seen such an approach work in practice.

Do Nothing

After querying the cost of the problem that we're experiencing, the last suggestion that I received was to do nothing. This is the easiest suggestion to implement but one that I find challenging. It feels like I'm leaving a problem unresolved.

However, I know that the build can't always pass successfully. Test automation that is meaningful should fail sometimes and provide information about potential problems in the software. I'm coming to terms with the idea that perhaps the failures we see post-merge are valuable, even though they have become more prevalent since we picked up our pace.

While frustrating, the failures are revealing dependencies between teams that might have been hidden. They also encourage collaboration as people from across the product work together on rapid solutions once the master branch is broken.

While I still feel like there must be a better way, for now it's likely that we will do nothing.

Other posts from CAST2016:


  1. Would you really want to solve this "problem" , knowing that -
    a) That it is not a test infrastructure or flakiness problem
    b) That it is actually exposing depdency/coexistence issues between feature branches (on Mainline)

    I used to work in a multi-discipline,multi product Mainline where in we were keen for failures like these to be caught consistently . We used to have a lot coexistence issues between application layer feature merges with low layer changes (it was an embedded product) , usually because while the application layer feature branches were happily churning away on their own branch, the low layer feature branches were short twigs that were merged quickly .So, by the time an application layer feature branch got merged the low layer code base had move into a state of incompatibility with the changes being merged.
    To mitigate some of these failures our dev manager enforced the following -
    1. Shorter feature branches . Not more than couple of days
    2. Rebase before merge for "big" feature branches , alive for more than couple of days . We used Mercurial (but through manual commands)

    I have not tried this in Git but this might be an equivalent ...

    However just to reiterate ultimately you always want such conflicts to be caught i.e. failed tests are a success in this case :)

    1. Thanks for the suggestions Sunjeet. Reducing the size of our feature branches is already a goal, I should add this problem to the rationale for why.

      The problem for me isn't so much that the tests fail, but where the failure is caught. I feel like we shouldn't be breaking master and instead fix problems on the branch pre-merge. But, for now, it looks like we'll continue to catch things a little later than we perhaps should.

  2. Hi Katrina!

    Nice post and community contribution. I have several follow up questions:

    1. If test fail, this always indicates true positive?
    2. Your test run always includes all tests that you have for the product?


    Regards, Karlo.

    1. Thanks for your comment Karlo.

      1. Not always, but a majority of the time.

      2. Yes.

  3. Another possibility is to reconsider the peer review process to try and reduce wait states. If a developer has gone through some reasonable number of pull requests (say, 4, for lack of a better number) with no issues spotted during peer review, then maybe that developer has earned the right to skip peer review 9 out of 10 times. Developers new to the project could be subject to peer review at first. This could reduce wait states without losing the value of peer review.

  4. I've seen this done with tags: prior to merge tag the branch with tags based on whether the branch passed auto testing and review, the CI service picks up un-merged branches and that have both and merges the branch into a fresh working copy of master and runs tests. If the tests pass, it merges the branch to master, and rinses and repeats.

  5. Hi Katrina, I would be interested in understanding why you think that changing to development on trunk could be too difficult for the team?
    Believe it or not in my last 20 years I have tried all of the approaches you describe but today with the tools tat exist I wouldn't look at anything different from trunk based development.