Katrina the Tester: Strategies for automated visual regression

In my organisation we have adopted automated visual regression in the test strategy for three of our products. We have made different choices in implementing our frameworks, as we use automated visual regression testing for a slightly different purpose in each team. In this post I introduce the concept of automated visual regression then give some specific examples of how we use it.

What is visual regression?

The appearance of a web application is usually defined by a cascading style sheet (CSS) file. Your product might use a different flavour of CSS like SCSS, SASS, or LESS. They all describe the format and layout of your web-based user interface.

When you make a change to your product, you are likely change how it looks. You might intentionally be working on a design task e.g. fixing the display of a modal dialog. Or you might be working on a piece of functionality that is driven through the user interface, which means that you need to edit the content of a screen e.g. adding a nickname field to a bank account. In both cases you probably need to edit the underlying style sheet.

A problem can arise when the piece of the style sheet that you are editing is used in more than one place within the product, which is often the case. The change that you make will look great in the particular screen that you're working in, but cause a problem in another screen in another area of the application. We call these types of problems visual regression.

It is not always easy to determine where these regression issues might appear because style sheets are applied in a cascade. An element on your page may inherit a number of display properties from parent elements. Imagine a blue button with a label in Arial font where the colour of the button is defined for that element but the font of the button label is defined for the page as a whole. Changing the font of that button by editing the parent definition could have far-reaching consequences.

We use automated visual regression to quickly identify differences in the appearance of our product. We compare a snapshot taken prior to our change with a snapshot taken after our change, then highlight the differences between the two. A person can look through the results of these image comparisons to determine what is expected and what is a problem.

Manufactured example to illustrate image comparison

Team One Strategy

The first team to adopt automated visual regression in my organisation was our public website, a product with a constantly evolving user interface.

The test automation strategy for this product includes a number of targeted suites. There are functional tests written in Selenium that examine the application forms, calculators, and other tools that require user interaction. There are API tests that check the integration of our website to other systems. We have a good level of coverage for the behaviour of the product.

Historically, none of our suites specifically tested the appearance of the product. The testers in the team found it frustrating to repetitively tour the site, in different browsers, to try to detect unexpected changes in how the website looked. Inattentional blindness meant that problems were missed.

The team created a list of the most popular pages in the site based on our analytics. This list was extended, so that it included at least one page within each major section of the website, to define an application tour for the automated suite to capture screenshots for comparison.

The automated visual regression framework was implemented to complete this tour of the application against a configurable list of browsers. It launches BrowserStack, which means that it is able to capture images against desktop, tablet, and mobile browsers. The automated checks replace a large proportion of the cross-browser regression testing that the testers were performing themselves.

The team primarily use the suite at release, though occasionally make use of it during the development process. The tool captures a set of baseline images from the existing production version of the product and compares these to images from the release candidate. The image comparison is made at a page level: a pixel-by-pixel comparison with a fuzz tolerance for small changes.

Team Two Strategy

The second team to adopt automated visual regression was our UI toolkit team. This team develop a set of reusable user interface components so that all of our products have a consistent appearance. The nature of their product means that display problems are important. Even a difference of a single pixel can be significant.

The tester in the this team made automated visual regression the primary focus of their test strategy. They explored the solution that the first team had created, but decided to implement their own framework in a different way.

In our toolkit product, we have pages that display a component in different states e.g. the button page has examples of a normal button, a disabled button, a button that is being hovered on, etc. Rather than comparing the page as a whole with a fuzz tolerance, this tester implemented an exact comparison at a component level. This meant that the tests were targeted and would fail with a specific message e.g. the appearance of the disabled button has changed.

The initial focus for this framework was getting component level coverage across the entire toolkit with execution in a single browser. This suite was intended to run after every change, not just at release. The tester also spent some time refining the reporting for the suite, to usefully abstract the volume of image comparisons being undertaken.

Once the tests were reliable and the reporting succinct, the tester extended the framework to run against different browsers. Cross-browser capability was a lower priority than in the Team One.

Team Three Strategy

A third team are starting to integrate automated visual regression into their test strategy. They work on one of our authenticated banking channels, a relatively large application with a lot of different features.

This product has mature functional test automation. There are two suites that execute through the user interface: a large suite with mocked back-end functionality and a small suite that exercises the entire application stack.

For this product, implementing automated visual regression for a simple application tour is not enough. We want to examine the appearance of the application through different workflows, not just check the display of static content. Rather than repeating the coverage provided by the existing large test suite, the team extended the framework to add an automated visual regression test.

This suite is still under development and, of the three solutions, it is the largest, the slowest, and requires the most intervention by people to execute. The team created a configuration option to switch on screenshot collection as part of the existing functional tests. This generates a set of images that will either represent the 'before' or the 'after' state, depending on which version of the application is under test.

Separate to the collection of images is a comparison program that takes the two sets of screenshots and determines where there are differences. The large suite of functional tests means that there are many images to compare, so the developers came up with an innovative approach to perform these comparisons quickly. They first compare a hash string of the image then, in the event that these differ, they perform the pixel-by-pixel comparison to determine what has changed.

In this team the automated visual regression has a fractured implementation. The collection and comparison happen separately. The focus remains on a single browser and the team continue to iterate their solution, particularly by improving the accuracy and aesthetics of their reporting.

Conclusion

We use automated visual regression to quickly detect changes in the appearance of our product. Different products will require different strategies, because we are looking to address different types of risk with this tool.

The three examples that I've provided, from real teams in my organisation, illustrate this variety in approach. We use visual regression to target:

cross-browser testing,
specific user interface components, and
consistent display of functional workflows.

As with any test automation, if you're looking to implement automated visual regression consider the problem that you're trying to solve and target your framework to that risk.

Katrina the Tester

Sunday, 29 October 2017

Strategies for automated visual regression

What is visual regression?

Team One Strategy

Team Two Strategy

Team Three Strategy

Conclusion

3 comments: