Sunday 2 March 2014

Reporting session based testing

I was recently asked about the reporting available for session based testing at a test management and senior management level. By discarding the metric that many managers use to track their teams, test case count, I had created uncertainty about what they would use instead. Though I could demonstrate the rich information available from visual modeling and session based test reports, I was caught short by a lack of example management reporting artifacts. I set out to fix this by creating a set of high-level reports for a sample project.


I was the test manager of a project for five half days, or four hours each day. The project was a learning exercise and the product under test was the TradeMe Sandbox, a test version of the most-popular online auction site in New Zealand.

I had a team of nine testers, six of whom were recent graduates. They were split in to three groups of three testers and each group was assigned a specific area of functionality; the red team for selling an item, green team for feedback on a purchase, and purple team for search.

The test process was straight-forward. Each trio developed a visual model of their piece of the system and I reviewed it. They then estimated how much testing would be required by breaking their model in to 60 minute sessions with a specific charter. Testing was divided among the team, with each tester using a basic session based test management report to record their activities. At the end of each test session this report was reviewed during a short debrief.

The entire project can be summarised at a high-level by this timeline:

Task Management

Due to the tight timeframe, I didn't want to support 3M by creating a physical visual board for task management. Instead I set up a virtual board in Trello for the team to work from. The board showed which pieces of functionality were being tested, and the number of charters awaiting execution, in progress, and completed. All cards on the board were colour coded to the trio of testers responsible for that area.

By using this tool it was easy for me to see, as a test manager, what my team were doing. At the end of each day of the project I archived the Tasks - Done list as Charters Complete - YYYY/MM/DD. By doing this I retained a separate list of the cards that were completed on any given day, and kept the board usable for the team.


The first high-level report that I wanted to create was a daily dashboard. I started with the Low Tech Testing Dashboard by James Bach and the Test Progress Report by Leah Stockley. Each day I shared the dashboard with a group of people who were not involved with the project. Their feedback from a senior management perspective altered the content and format of the report from its origins to what you see below.

The first column presents the functional area under test with a star coloured to match the associated card in Trello. Each piece of the system has a priority. The quality assessment is exactly as described by James Bach in his Low Tech Dashboard, the key at the top of the column expands to explain each value that may appear. The progress indicator is a simple pie graph to reflect how close to complete testing is in each area. Bugs are presented in a small mind map where the root node includes a count and each branch within has an indication of severity and the bug number. A brief comment is provided where necessary.

The dashboard is designed to present information in a succinct and visual fashion. It was created in xMind, the same tool that was used by the testers for their visual modeling. This allowed the dashboard to integrate directly with the work of the testers, making a very useful one-stop-shop from a test management perspective.


The last level of reporting I wanted was a way to anticipate the future; the crystal ball of session based test management. I found a great resource on Michael Kelly's blog from which I pulled the following two spreadsheets.

The first tracked the percentage of charters completed by each team:

This gave me some idea of the effort remaining, which would be familiar to a test manager who tracks based on test case count. The nice thing here is that the unit of measure is consistent. Each charter is one hour, as opposed to test cases that can vary in duration. The only annoyance I found with this spreadsheet was that the end goal of total charters changed each day as scope was discovered.

This brings me to charter velocity:

The blue line on the graph showed me that our scope was settling as the charters created dropped each day. As the team settled into a rhythm of execution the green line leveled out. The orange line shows work remaining, by extending the trend to where it crosses the X-axis we might guess when testing will be complete.

I found both of these spreadsheets necessary as a test manager to feel comfortable that the project team were heading in the right direction in a timely fashion. That said, I did not share these measures with my senior management team for fear they would be misinterpreted. They received the dashboard alone.

Is this approach similar to your own? Do you have any thoughts?


  1. Hi Katrina,
    thanks for your great insight. I will definitely draw some improvements from your report.
    I am currently in the end phase of my first SBTM project. And it grew from 220 man-days to 350, because my team was too successful in finding bugs. But I definitely need to improve my reporting and keep better track of the session lenghts. Some team members took a planned session for 1,5 hours to execute it a complete day.
    Next project will be smaller and good to improve all those things.

    Thanks for that post. Another good source for this topic.

  2. Wonderful article!

    This approach differs from my team's because - although we use Trello in a similar way - the cards are all development cards, and the testing tasks take place when the cards move (In Progress -> In Test -> Merged to Release branch). We attach the testing documentation to the Trello cards for the dev work, and whatever estimation happens is included with the estimation of the dev task.

    -Were the visual test models essentially like the ones in Aaron Hodder's article?
    -How detailed were the session reports? I'd love to see an example.
    -You mentioned that some of the testers involved were new to testing. Is it risky to put all the peer-review of the session reports off until after the product "ships"?

    1. Thanks for your comment Clint. In response to your questions:

      -Were the visual test models essentially like the ones in Aaron Hodder's article?

      Yes, very similar.

      -How detailed were the session reports? I'd love to see an example.

      Based on your first question and this one, I may write another post about the reporting from the testers themselves (visual models and their reports). My direction regarding detail was that the report should contain sufficient information so that another tester could repeat the intent of their session, but not so detailed that they would reproduce it exactly.

      -You mentioned that some of the testers involved were new to testing. Is it risky to put all the peer-review of the session reports off until after the product "ships"?

      After each 60 minute session they had a one-on-one debrief with me, as test manager, to talk through what they'd tested, what they'd discovered, and how they'd recorded these things. The separated peer review at the end was to have them share the report they were most proud of with a tester outside of their own squad, so that they could get some feedback on how easy it was for an "outsider" to understand.

  3. Wonderful Article: I really liked this post. Will keep as reference on my Blog pages.

    Need to knew:
    1. If its a first time, a team is using trello or dashboard - How the management can get approve the process?

    2. What means Priority over here on dashboard?

    1. Thanks for your comment Srinivas. In answer to your questions:

      1. I guess it's the same as the first time that you do anything. You can show management examples like this one and hope that they see the potential in a different type of reporting, but there'll always be a little bit of risk in not knowing how it will work for your project until you try it out.

      2. Priority reflects the order in which we should go about testing based on the opinion of the business, where priority 1 is the most important and priority 3 the least.

  4. Hi Katrina - great post! For some recent testing I've been using Google Docs to manage a process similar to SBTM, although this was more for a UAT kind of exercise so a lightweight testing solution was needed so as not to scare off the business folk. In essence, our tests were mapped out onto a google docs spreadsheet and then transitioned through the various RAG statuses until complete, with bugs & questions logged in a notes column that could then be managed in an external tool (Trac). There was also plenty of room for the business testers to create additional tests if they wanted too. 

    I found google docs pretty useful since it meant that we were all working on the same (change tracked) document, but it clearly has its limitations. One of which was pulling metrics out for reporting purposes. I had hoped that I could just turn it into a kind of heat map, but this didn't really work out. I found your examples and links really useful anyway, and I'll be thinking about how to modify my own approach in the future.

  5. Hi Katrina,

    Good one!

    How do you manage time metrics if sessions are different durations? As we know charter is not equal charter?


    1. Thanks for your question Radek.

      In this particular case I dictated that sessions should have equal length. As I mention in the response to James below, this seemed to work well for this particular context and is something I would try again in future.

      I don't see that having sessions of variable duration would alter much in how I reported to senior management (the dashboard). It would limit your ability as a test manager to predict outcomes based on the tracking spreadsheets, which I believe work best where sessions are of equal length.

  6. Hi Katrina,

    "Each charter is one hour, as opposed to test cases that can vary in duration."

    Although we have a default length too, we typically let session length vary as in the original Bach paper ( which gives the tester the ability in-session to take a decision to pursue a promising line or abort one that looks less valuable. They'll review the decision in debrief.

    How aggressively did you enforce the session timing in your experiment? If you were strict about spending an hour per session, did you find positives/negatives from that?


    1. Thanks for your question James.

      I did enforce a session time limit of one hour. The testers were expected to explore promising lines of enquiry, but the time box allowed for check-in and discussion to happen early. I wanted to influence the decision about whether further investigation was warranted, and how to treat scope remaining from the original mission of the session.

      In one case only a tester finished their session early (45 minutes instead of 60 minutes). I agreed that he had covered the purpose and he moved on.

      The positives from this approach, aside from early feedback that I just mentioned, were that the testers learned how much they were capable of achieving in an hour. The time management skills came through strongly in the "Learned" section of our 4-Ls retrospective. Though many of the team were graduate testers by day three their estimation of a 1 hour session in this application was pretty accurate.

      I think having a consistent session length also helped for reporting. Not just the quote that you mentioned, where a count of charters is consistent, but also for the percentage breakdown of where time was spent in a session (setup, test design & execution, bug analysis & reporting). I like comparing apples with apples.

      The negatives. In one case where testing took longer than anticipated, the distinction in charter between the two sessions required was not very clear. Though it was only once, this was a small project. I think that this could be a problem in a larger scale testing effort.

  7. I love this approach to manual testing, I attended Leah Stockleys course on Context Driven Testing and I've never looked back. I use Jira and mind maps, as well as screen capture software for recording our test sessions and I've found the following to be true;

    Test is no longer a blocker.

    SIT finds 99% off the bugs because we never wind down (any defects found in UAT are trivial at best), if anything we ramp up effort on finding no defects due to not having scripts.

    UAT runs like a dream.

    My SIT team regularly send me emails to say how happy they are in this approach (that’s a new one on me).

    My stress levels are greatly reduced because of the confidence we have in the system as a group
    We never roll back.

  8. Interesting post and many thanks for writing it. In my method of reporting Session Reports... I make sure to mention #type of the session that I perform. This helps me to better understand how much time is spent on what feature/charter for what purpose. For example...if the requirement is new, I spend some time on 'Analysis' type session...if product that I test is new then I choose 'Survey session'. My all sessions are generally followed by PCO and Deep Testing sessions and a summery of all such session types with their frequency and length gives me enough information that I use for test estimation. It certainly requires some period of time but once you spend enough time doing testing SBTM way.... further things become pretty easy...especially the estimation part which is less likely to be accurate if done based on simply number of test cases (or charters) and their complexity.

    I was curious to know, do you also generate TBS metrics in any form based on what information you extract from session reports? I trying something around it and would love to know how you do it.

    Lalitkumar Bhamare
    Tea-time with Testers

  9. Hello Katrina, looks like I'm well behind the 8-ball in reading this however found it very interesting. How do you see test charters differing from traditional test scenarios?


  10. Hi Katrina,

    Many years later...and this post holds a wealth of information for me...

    Thanks :)