Sunday, 23 February 2014

Non-functional testing in continuous delivery

Last year I worked in an organisation that did twice-daily production releases with very little non-functional testing. Often the litmus test of whether a release met non-functional requirements was to push it out to users and closely monitor the results. There was an amazing operations team who had a rapid automated production rollback procedure, which could clean things up in a flash, if required.

On joining this organisation as a tester I was astounded by their approach. On my first day they had a bad release go to production and roll back off the live platform. I felt that I had entered the Wild West of software development; it appeared reckless.

As time went on I saw the process repeated, there were more good releases than bad, and things started to make a certain amount of sense. In a twice-daily release cycle there isn't time to spend on detailed performance testing, usability sessions, or security audits. Though I became more comfortable, I had a nagging doubt that this was not the way that others were solving this problem.

At CITCON this weekend I had the opportunity to find out, by facilitating a session titled "How do you incorporate non-functional testing in continuous delivery?".

The first thing that struck me, after sharing what I just described, was that no-one jumped to volunteer a better solution. People started to talk around the topic, but not to it. I had to repeat my question several times before one attendee said "We focus on functionality and kind of ignore non-functional testing". I felt this statement was reflective for many in the group.

Someone proposed that the first problem in incorporating non-functional testing is a lack of written non-functional requirements. People can quickly determine whether something is not working by means of it being too slow, or difficult to use, or succumbing to malicious infiltration. Defining what is expected from the application for performance, usability, and security, is much more difficult. The rapid pace of continuous delivery, coupled with a relatively robust process for testing in production, creates a compelling excuse not to stop and think about non-functional requirements.

In the case that requirements are present, how do testers find time to test them? General consensus was that the requirements would form the basis for a suite of discrete automated checks designed to alert the tester; a prompt to hold the release while the tester investigated the problem. Pre-release non-functional testing would be driven by a failing check.

In the case of performance, the check may fire when a threshold is exceeded, or highlight a marked degradation that still falls within the threshold e.g. if the page load time jumps from 0.3s to 2s, and the threshold is set at 4s, we would still want to know about this change. Some in the audience had already implemented lightweight, targeted, automated performance checks that were running in their continuous integration environment.

As the conversation turned to security there was doubt that the same principle could be applied. However one tester in the audience was doing just this by using the results of security audits to create scripted security checks. Though vigilance is required to keep up with evolving security threats, he felt that the maintenance overhead was no different to any other automated test suite.

Finally we spoke about usability. The first thought from the audience was that perhaps A/B testing is how most companies achieve this in a continuous delivery environment. Those assembled were familiar with the concept as New Zealand is often used as the trial region for new Facebook features. Some used this approach, though others argued that if your focus is user loyalty or sales you may not want to risk alienating a proportion of your clientele by giving them a weaker design.

Interestingly, there were those who thought that the same principle of checks may even work for usability. In particular, the accessibility aspects that often require that the application can be used by a machine. Tools to check for tab order, alternate text in images, appropriate colour and contrast, and valid HTML were all mentioned.

The session finished with a conversation about whether this would really work. The arguments against seem to be invalidated by the type of organisations that choose continuous delivery. Organisations that make frequent releases a priority and pride themselves on responsiveness must acknowledge that this comes at the expense of quality. It's fine if a user sees something that isn't quite right, so long as its only briefly. I found it interesting that those with real-world experience in continuous delivery often worked in an iconic or monopolistic organisation where the user has strong brand loyalty and little choice.

Are you using continuous delivery? How do you incorporate non-functional testing?

Wednesday, 19 February 2014

A culture challenge

A comment on my previous post reads:

I'm currently in one of those places where finger pointing and politics are the norm (at higher levels - my team itself is great). While proactively accepting blame might sound like a great & noble thing to do, that's like voluntarily putting your head on the chopping block when no one is even asking that of you. Not gonna happen! We just talk amongst ourselves about what we can do better next time.

I felt that I needed a whole post to respond to this one, because it really made me wonder.

Where do you think culture comes from?

If you're in an organisation where finger pointing and politics are the norm, then ask yourself why that is. Try walking in the shoes of the person who is behaving in this way. Imagine being in a management position with responsibilities that straddle a number of teams; you try to manage risk, ensure that mistakes aren't repeated, and report on your department to higher levels of the organisation.

Now imagine that the teams you are responsible for are insular. That they talk amongst themselves, but they won't tell you anything. You know that anything you do hear is only part of the story. What do you do? Without accurate and complete information you cannot do your job.

Finger pointing happens when your manager is frustrated. When there is an endemic lack of ownership, finger pointing feels like the only way to assign and action improvement. I believe that finger pointing is not a reflection on your manager, it's a reflection of your behaviour as a team.

Who do you think changes your culture?

A manager is not going to change their approach when it feels as though a witch-hunt is the only way to find out what is going on. If you want to stop being persecuted, start taking responsibility for failures and communicating the things that you plan to improve as a result. Own it. Take away the reason that they behave the way that they do.

It's scary to be the first person to stick your neck out. I don't deny that. But if you want to build a relationship of trust, then you have to act like someone who can be trusted. The culture of an organisation is the result of the behaviour of every individual in it. If you want to see change, you have to make it.

Sunday, 16 February 2014

Own it

It feels like testing suffers as a profession because we fail to own our failures. We are quick to point out the plethora of reasons that something is not our fault. Where a product is released with problems we didn't have enough time, or we weren't listened to, and anyway, we didn't write the bugs into the code in the first place so why are you blaming us?

Testing, perhaps more than any other discipline in software development, includes a number of pretenders. These people may be called fake testers, possums, or zombies;  there are no shortage of names for a problem that is widely acknowledged. Yet they remain sheltered in software development teams throughout the world, pervasive in an industry that allows them to survive and thrive. Why?

We don't take the blame.

Think of a retrospective or post-project review where a tester took ownership of a problem, identified their behaviour as a cause, and actively worked to prevent recurrence in their work. Now think of the problems identified in testing that would be gone if the developer had delivered the code earlier, or if the project manager had allowed more time for defect fixing, or if the business analyst had identified the requirement properly. It seems that more often we attribute our problems elsewhere; fingers are pointed.

It is a brave thing to claim a failure. In doing so we acknowledge our imperfection and expose our flaws. I think that testers do not show this bravery enough. Instead, criticism of a poor product, or a failed project, is water off a testers back. We escape the review unscathed. We cheer our unblemished test process. We see this as a victory to be celebrated with other testers.

This is what allows bad testers to hide. Where a tester leaves a review without ownership of any problems they are warranted in considering their contribution successful. A tester may then consider themselves associated with any number of "successful" projects, by definition that none of its failures were attributed to them.

How do we fix this? By considering how everything could be our fault.

Imagine a project that goes live in production with a high number of defects. In the review meeting, one tester claims that the project manager did not allow enough time in her schedule for defect fixing. An action is taken by the project manager to allow more time for this activity in the next project.

Another tester on the project thinks about how this same problem could be their fault using the test of reasonable opposites, the idea that for every proposition you come up with there are contrasting explanations that are just as plausible. In this example the proposition is that the project goes live with a high number of defects because the project manager did not allow enough time in her schedule for defect fixing. A reasonable opposite may be that the project goes live with a high number of defects because the testers raised many minor problems that the business did not need to see resolved before release.

From a reasonable opposite we now have an action to consider; should the testers treat these minor problems differently in future? The tester is prompted to think about how their behaviour could have contributed to a perceived failure. Once you start imagining ownership, it becomes easier take it, where appropriate.

As good testers start to claim problems and action change we erode the position of bad testers who consider themselves above reproach. When we stop finger pointing, we stop enabling others to do so. To change the culture of our industry and expose those who hide among us, we need to be comfortable in accepting that sometimes the things that go wrong on a project are because of things that we did badly.

The law of reasonable opposites; a good tool for testers in a review meeting.