Katrina the Tester: July 2016

Thursday 14 July 2016

Test-Infected Developers

This article was originally published in the June edition of Testing Trapeze

At my workplace there is a culture of shared ownership in software delivery. We develop our products in cross-functional agile teams who work together to achieve a common business goal. However it’s still relatively rare for specialists to be proactive about picking up work in areas outside of their own discipline. For example, you don’t often see business analysts seeking out test execution tasks and prioritising those above work to refine stories in the product backlog.

That said, I’ve recently noticed an increase in the number of developers who are voluntarily engaging in test-related activities. They’re not jumping forward to think about test planning or getting excited about exploring the application. But they are diving into our automation by helping the testers to improve the coverage it provides, or working to enhance the framework on which our tests run.

As a coach part of my role is to foster cross-discipline collaboration. I confess that I haven’t been putting any active focus on the relationships between developers and testers. It is something that has changed as a byproduct of other activities that I’ve been part of. I’ve been reflecting on what’s behind this shift and the reasons why I believe the developers are getting more involved.

Better Test Code

In the past our test code has occupied a dark corner of our collective psyche. Everyone knows that it is there, but most people don’t want to engage with it directly. In particular, I have felt that developers were reluctant to get involved in a code base that was littered with poor coding practices and questionable implementation decisions. In instances where a developer did contribute, it was often a cause of frustration rather than satisfaction.

The test team have recently undertaken a lot of work to improve the quality of code that runs our automation. In one product we started the year with a major refactoring exercise that allowed us to run tests in parallel, which more than halved our execution time. In another we’ve launched a brand new suite with agreed coding standards from the beginning.

The experience for a developer who opens our automation is a lot less jarring than perhaps it has been in the past. As the skills of the testers improve, and the approach that we take to writing code becomes closely aligned with the way that developers are used to working, it’s no longer traumatic for a developer to delve into the test suites.

In addition, all of the changes to the test code now go through the same peer review process as the application code. We use pull requests to facilitate discussion on new code. There is a level of expectation: it’s not “just test code”. We want to write automation that is as maintainable as our application.

The developers have started to participate more in peer review of test code. There’s a two-way exchange of information in asking a developer to review the automation. The tester gains a lot of instruction on their coding practices. However the developer will also gain by having to completely understand the test coverage in order to offer their feedback on it.

Imperfect Test Framework

On the flip side of the previous point, there are still a number of very clear opportunities for enhancing our automation frameworks and extending the coverage that they offer. The testers don’t always have the capacity, skills or inclination to undertake this work.

I can think of a few occasions where a developer has been hooked into the test automation by an interesting problem in the supporting structure that needed a solution. Specific technical jobs like setting up an automated script for post-release database changes or tweaking configuration in the continuous integration builds. These tasks improve their understanding of the framework and may mean that the developer ends up contributing to the test code too.

Within the tests, there are application behaviours that are challenging to check automatically. Particularly in our JavaScript-heavy applications we often have to wait for different aspects of the screen to update during a sequence of user actions. Developers who contribute by writing the helper methods required for testing in these areas will often end up having a deeper understanding and closer involvement in all of the associated test code.

I believe the key here is providing specific tasks where the developers can engage in the test code with a clear purpose and feel a sense of accomplishment at their conclusion. In some instances, the developer will complete a single task then withdraw from the testing realm. In others, it’s a first step towards a deeper involvement in the test code and subsequently testing.

Embedded In Development

In almost every instance, a developer who is making a change to one of our applications will need raise a pull request to have their code merged back to our master branch for release. As part of the process enforced by our tools, the code to be merged must have passed all of our automated checks. Every change. All of the automation.

We’ve always run our automation regularly, but its relatively recent that it has it become mandated on every merge. This change has largely been driven by the developers themselves who are keen to improve the quality of code prior to testing the integrated code base.

Now that our automation runs many times per day it is in the best interests of the developers to be engaged in improving the framework. If it is unreliable or the tests are slow to execute, it has an immediate negative impact on the developers as they are unable to deliver changes to our applications. They want our automation to be robust and speedy.

The new build schedule has helped to flush out pain points in the test code and engaged a wider audience in fixing the root causes of these issues by necessity. Now most of the developers have experienced a failing build and had to personally debug one or more of the tests to fix the problem. The developers are actively monitoring test results and analysing failures, which means that they are a lot more familiar with the test code.

Conclusion

I see automation as a gateway to getting developers engaged in testing more broadly. When collaborating on coverage in automation, there is the opportunity to discuss the testing that will occur outside of the coded checks. The conversation about what to automate vs. what to explore is a valuable one for both disciplines to engage in.

We’ve taken three steps down the path to having our developers excited about picking up tasks in our test automation. We’ve made the suites a pleasant place to spend time by applying coding standards and ensuring that changes are peer reviewed. We’ve provided opportunities for developers to contribute to the framework or helper methods instead of asking them to write the tests themselves. And we’ve the automation in the development process to create a vested interest in rapid and reliable test execution.

Developers can become test-infected. I am seeing evidence of this in the collaborative environment that is continuing to evolve in my organisation.

Monday 11 July 2016

A community discussion

A while back I put out a tweet request:

Challenge: Describe the context-driven testing community in a single tweet. #research @CPHcontext
— Katrina Clokie (@katrina_tester) February 19, 2016

I spoke about the responses to this tweet during my talk titled "A Community Discussion" at Copenhagen Context. Somewhat ironically I've been reluctant to share the feedback that I received in writing. There's been exchanges in the testing community recently that makes me feel now is the time.

I had a lot of responses to my original request on Twitter. About half tried to explain context-driven testing rather than the community. Those who did speak about the people and environment gave responses like:

A bunch of supportive, challenging and engaged people full of questions, support and understanding.
Warm and welcoming, literally the best thing that I've come across in my career.
People who insist on a human perspective on testing
A community of people who constantly asks the question how can we be (test) better?
A group of people not restricted by a so called set of best practices and a one size fits all approach
A world-wide support network of people who share the same fundamental principles as me

I also had a lot of responses via private channels. Direct messages, email and skype. In many instances they were from people who no longer felt that they were part of the community. They gave responses like:

The Cult/Church of CDT due to the rhetoric used by CDT to describe their heroic and righteous fight against evil
The Test Police because they feel the need to correct the terminology and thinking of everyone else regardless of whether they share the same world-view.
They are an academic think-tank that is out of step with modern business needs
CDT is RST, it’s all just RST stuff, RST is the new best practice
If you don’t beat your drum to the CDT Rhythm they’ll beat you down hard
The Anti-ISO group, The Anti-ISTQB people, the Anti-anyone not CDT people etc.
Not a safe place to share and explore

Are you surprised by this?

I was surprised by the stark polarity in what was shared openly and what was shared privately. I was surprised by who responded and who chose not to. I was surprised by specific individuals who held different opinions to what I had expected. However, I wasn't surprised to see these two views emerge.

What bothers me is that these two viewpoints seem to be a taboo topic to have a conversation about.

On Twitter there has been activity that feels like warfare. Grenades are launched from both sides, loud voices shout at one another, misunderstandings create friendly fire, and when the smoke clears no one is sure what the outcome was.

What I wanted to do in my talk at Copenhagen Context was start a dialog. I talked about an inclusive context-driven testing community by sharing the model I created almost two years ago. I suggested some ways in which we could alter our behaviour. I was part of an Open Season discussion where those present shared their views.

Since then?

I continue to focus on making the New Zealand testing community as inclusive as possible. I believe that WeTest, Testing Trapeze and even this blog are making a difference in spreading the ideas from the context-driven school without the labels. I strive to be approachable, humble and open to questions.

I hope that I am setting an example as someone making a positive difference through action. My personal role model in this space is Rosie Sherry, who is the "Boss Boss" at Ministry of Testing. I observe that she has her own style of quiet leadership and a practical approach to change.

But the wider conversation is still adversarial or hidden. I'd like to see that change.

What are your thoughts?

T-Shirt print from Made in Production

Sunday 3 July 2016

Why we're switching to Selenium Grid

The department that I am part of has gone through a big growth spurt recently. When I started in my role, just over a year ago, there were 20 testers. Now there are 30. That jump is indicative of what has happened in all disciplines of software delivery.

This growth is starting to create some interesting problems in the execution of our test automation. In particular for our web-based retail banking application, which is a relatively young product that has had test automation embedded in the development approach since the very beginning.

Alongside a comprehensive unit test suite, we've been using Selenium WebDriver to execute tests against Firefox. We call these tests our "automated acceptance suite" (AAS) or "node tests", which is a reference to the mock server technology that these tests execute against.

In the beginning the application was small and the node tests that ran alongside it were quick. As the product has grown we've added more tests, so they take longer to execute. When the fast feedback provided by our automation was no longer fast enough, we switched our tests from single thread to parallel execution.

In the beginning there was just a single development team and the node tests ran every time that a change was made. As the number of teams has grown the number of changes being made has increased, so the tests are being executed more frequently. When our build queues started to exceed reasonable lengths, we switched from a dedicated continuous integration hardware to docker containers that increased the number of builds we could execute in parallel.

Our solution to problems introduced by growth has been to do more things at once.

To get the tests to run faster we switched the test implementation to parallel execution.

To get the build queues to be shorter we switched the infrastructure to parallel execution.

These were good solutions for us. But now we're coming to the point where we can't do any more things at once with what we have. To illustrate, compare what was running on our build server against what is running there now:

In the beginning we had dedicated hardware. It ran a node server to return mock responses, a web server for our product, and the tests that opened a single Firefox window to execute against.

In our current state we have four active docker containers. Each runs a node server, a web server, and the tests that open four Firefox windows to execute against.

In our current state we're hitting the limits of what our infrastructure can do. This is manifesting in two types of problem that are causing a lot of frustration as they fundamentally impact on two key measures for the usefulness of automation: speed and stability.

Our current state can be slow, particularly when there are four builds executing at once and the hardware is fully loaded. Our overnight build time is approximately 30 minutes. By contrast, when a build executes during business hours it takes approximately 50 minutes.

I find it easiest to explain why this happens using an analogy. Imagine a horse towing a cart with four large pumpkins in it. The horse can trot down the street quite happily, relatively unencumbered by its load. Now imagine the same horse towing a cart with 28 large pumpkins in it. The horse can still move the cart, but it won't be able to travel at the same pace that it did with a lighter load. It may trudge rather than trot.

Our overnight build is carried by the lightly loaded horse as it may be the only build active on our hardware. Our build during business hours is carried by the heavily-laden horse as many builds run at once. The time taken to complete a build alters accordingly.

The instability we've seen comes partly from this variable speed. There's a particular case where we look for a success notification that is only displayed for a fixed duration. When the timing to complete the action that triggers this notification is variable, it becomes frustrating to verify.

But we've also had stability problems with the four Firefox browsers running on a single display. Some failures are caused by tests running in parallel that fight for focus e.g. attempting to confirm a payment via a modal dialog. Others are attributed to two different tests that simultaneously attempt to hover and click the mouse e.g. editing an account image. When these clashes occur, one of the tests involved will usually fail.

Our operations team ran some diagnostics on the existing hardware to determine what made it slow. They identified which processes were chewing up the most system resources or the largest pumpkins on the cart. It turned out that there was a clear single culprit: Firefox.

Enter Selenium Grid.

Selenium Grid enables a distributed test execution environment. What this means in our case is that we can move all of the Firefox instances out of our docker containers. This will significantly lighten the load on our existing continuous integration infrastructure:

In the proposed future state, our tests will trigger to the Selenium Grid Hub on our cloud-based infrastructure. The hub will have connectivity to a pool of Selenium Grid Nodes. Instead of having multiple Firefox windows open on a single display, we're provisioning each node in a dedicated container with a single browser.

Each grid node will know where it was triggered from, as the browser will still open the web application that is running on the existing docker architecture. This does mean that we are introducing network latency into each of our WebDriver interactions, so they'll be slower than on local hardware. But the distributed architecture should give us enough advantages that we still end up with a faster solution overall.

Our hope is that this proposed future will address our existing speed and stability issues. Increasing the system resource available through the introduction of hardware should help us to get consistent build times, regardless of the time of day. And having each Firefox browser in its own dedicated container should avoid any display contention.

We have a working prototype for the proposed future state and early signs are promising. I'm looking forward to turning the vision into reality and hope that it will bring benefits that we are searching for.