Focus on key examples

It can be tempting to add a ton of scenarios and test cases to acceptance criteria for a story, and look at all possible variations for the sake of completeness. Teams who automate most of their acceptance testing frequently end up doing that. Although it might seem counter-intuitive, this is a sure way to destroy a good user story.

The core of the problem is that a large number of complex scenarios is OK for a machine to process, but not for humans to understand easily. Fast iterative work does not allow time for unnecessary documentation, so acceptance criteria in user stories often ends up being the only detailed specification most teams produce. If that specification is complex and difficult to understand, it is unlikely to lead to good results. Complex specifications don’t invite discussion – people tend to read them in isolation and selectively ignore parts they feel are less important. This can lead to an illusion of precision and completeness. The overwhelming amount of data makes everyone feel secure that someone has the entire picture, but in fact people can easily understand things differently.

Here is a typical example:

    
 Feature: Payment routing
    In order to execute payments efficiently
    As a shop owner
    I want the payments to be routed using the best gateway

      Scenario: Visa Electron, Austria
       Given the card is 4568 7197 3938 2020
       When the payment is made
       The selected gateway is Enterpayments-V2
      Scenario: Visa Electron, Germany
       Given the card is 4468 7197 3939 2928
       When the payment is made
       The selected gateway is Enterpayments-V1
      Scenario: Visa Electron, UK
       Given the card is 4218 9303 0309 3990
       When the payment is made
       The selected gateway is Enterpayments-V1
      Scenario: Visa Electron, UK, over 50
       Given the card is 4218 9303 0309 3990
       And the amount is 100
       When the payment is made
       The selected gateway is RBS
      Scenario: Visa, Austria
       Given the card is 4991 7197 3938 2020
       When the payment is made
       The selected gateway is Enterpayments-V1
     ....

…. followed by ten more pages of similar stuff.

The team that implemented the related story suffered from a ton of bugs and difficult maintenance, in my opinion largely caused by the way they captured their examples. This long, monolithic list of examples was difficult to break up, so it was only possible for one pair of developers to work on it instead of splitting work. Because of that, it took several weeks for the implementation to be initially available for business users to try out. The large chunk of work needed to be properly tested, so another week or so passed until it was ready to go live, and then the fireworks started. The false sense of completeness prevented the delivery team from discussing important boundary conditions with business stakeholders, and several important cases were obvious to different people in different ways. This surfaced only after a few weeks of running in production, when someone spotted increased transaction costs.

Although each individual scenario might seem understandable, pages and pages of this make it hard to see the forest for the trees. Of course, this wasn’t implemented as one huge If-Else statement, but some critical variations came in only on page six or seven. Complemented by hotfixing under pressure to resolve production issues, the result was pretty much a patchwork of special cases.

These examples tried to show that, in cases when several processors could potentially execute a card transaction, high risk transactions should be sent to a more expensive processor with better fraud controls, and low risk transactions need to go to a cheaper processor even if it has worse fraud controls. Important business concepts such as transaction risk score, processor cost or fraud capabilities were not captured in the examples – the technical model differed from the business model. Changing small thresholds for risk scoring required huge changes to a complex network of special cases in software, and a ton of unexpected consequences. When a cheaper payment gateway suddenly had better fraud control than a more expensive one, several weeks of testing were needed to adjust the system to benefit from it.

An overly complex specification is often a sign that the technical model is misaligned with the business model, or that the specifications are described at the wrong level of abstraction. Even when correctly understood, such specifications lead to software that is hard to maintain, because small changes in business can lead to disproportionately huge changes in software.

It is far better to focus on illustrating user stories with key examples – a small number of relatively simple scenarios that will be easy to understand, evaluate for completeness and criticise. This doesn’t mean throwing away precision – quite the opposite – it means finding the right level of abstraction and the right mental model that would allow a complex situation to be described better.

The payment routing case could be broken down into several groups of smaller examples. One group would describe transaction risk based on country of residence – something that business users intuitively had in their minds without ever expressing it directly. Another group of examples would describe how to score a transaction based on payment amount and country of purchase. Several more groups of examples would describe individual transaction scoring rules, focused only on the relevant characteristics of a purchase. Then one overall group of examples would describe how to combine different scores – regardless of how they were calculated. A final group of examples would describe how to match the transaction risk score with the compatible payment processors, based on processing cost and fraud capabilities. Each of these groups might have five to 10 important examples, and would be much easier to understand. Taken together, these key examples would allow the team to describe the same set of rules, much more precisely but with far fewer examples than before.

Key benefits

Describing stories with several simple groups of key, focused examples leads to specifications which are easier to understand and implement. Smaller numbers of examples make it easier to evaluate completeness and argue about boundary conditions, so they make it easier to discover and resolve inconsistencies and differences in understanding. Stakeholders can have a better, more focused discussion with several smaller groups of focused examples than a with huge spreadsheet of endless possibilities. As a result, the outcome will be much more likely to satisfy the original business needs faster, and it will be less likely that unexpected consequences will be discovered only in production.

Breaking down complex examples into several smaller and focused groups leads to more modularised software, which reduces future maintenance costs. Continuing with the transaction processing example, modelling examples that explain individual scoring rules would give enough hints to the delivery team to capture those rules as separate functions, so that changes to an individual scoring threshold do not impact all the other rules. This would avoid unexpected consequences when rules change. Adding another processor, or changing the preferred processor after a cheaper one gets better fraud control, would require small localised changes instead of causing weeks of confusion.

Describing different aspects of a story with smaller and focused groups of key examples allows the team to divide work better. Two people can take the country-based scoring rules, two other people could take the routing based on final score. When everything is described as a combinatorial explosion of examples, it’s difficult to divide work. Smaller groups also become a natural way of slicing the story – for example some more complex rules could be postponed for a future iteration, but a basic set of rules could go live in a week and provide some nice business value.

Finally, focusing on key examples significantly reduces the sheer volume of scenarios which need to be checked. Assuming that there are six or seven different scoring rules in combination, and that each has five key examples, the entire problem space can be described with roughly eighty thousand examples which combine all variations (five to the power of seven). Breaking it down into groups would allow us to describe the same problem space with forty or so examples (five times seven, plus a few overall examples to show that everything holds together). This significantly reduces the time required for describing and discussing the examples, but also for checking the implementation, and provides a much better starting point for any further exploratory testing.

How to make this work

The most important thing to remember is that if the examples are too complex, your work on refining a story isn’t done. There are many good strategies for dealing with complexity. Here are four that I often use as a good starting point:

  • Look for missing concepts
  • Group by commonality and focus only on variations
  • Split validation and processing
  • Summarise and explore important boundaries

Overly complex examples, or too many examples, are often a sign that some important business concepts are not explicitly described. Risk score is a good example of that in the payment routing case. Discovering these concepts will allow you to offer alternative models and break down both the specification and the overall story into more manageable chunks. You can use one set of examples to describe how to calculate the risk score, and another to describe how to use the score once it is calculated.

A very common case of hidden business concepts is mixing validation with usage – for example, if the same set of examples describes how to process a transaction and all the ways to reject a transaction without processing (card number in incorrect format, invalid card type based on first set of digits, incomplete user information etc). The hidden business concept in that case is ‘valid transaction’, and it would be much better to split a single large set of complex examples into two groups – determining if a transaction is valid, and working with a valid transaction. Those groups can then be broken down further based on structure.

Long lists often contain groups of examples similar in structure or values. For example, in the payment routing story there were several pages of scenarios with card numbers and country of purchase, a group of examples involving two countries (registration and purchase), and some examples where the actual transaction value was important. Identifying commonalities in structure is often a good first step to discovering meaningful groups. Each group can then be restructured to show only the important differences between examples, reducing the cognitive load.

The fourth good strategy is to identify important boundary conditions and focus on them – ignoring examples that do not increase our understanding. For example, if the risk threshold for low-risk countries is 50 USD, and 25 USD for high risk countries, then the important boundaries are:

  1. 24.99 USD from a high risk country
  2. 25 USD from a high risk country
  3. 25 USD from a low risk country
  4. 49.99 USD from a low risk country
  5. 50 USD from a low risk country

Discussing and documenting these five examples will probably give the team 90% of the value they could get from twenty pages of similar examples. Don’t aim to replace testing fully with examples in user stories – aim to create a good shared understanding, and give people the context to do a good job. Five examples that are easy to understand and at the right level of abstraction will do a much better job at that than hundreds of very complex test cases.

I'm Gojko Adzic, author of Impact Mapping and Specification by Example. I'm currently working on 50 Quick Ideas to Improve Your User Stories. To learn about discounts on my books, conferences and workshops, sign up for Impact or follow me on Twitter. Join me at these conferences and workshops:

Specification by Example Workshops

Product Owner Survival Camp

Conference talks and workshops

7 thoughts on “Focus on key examples

  1. I’d like to use only 5 small examples, but I work in the real world, so my system is complex and messy, and to test it fully we need to use lots of complex scenarios.

  2. The only one problem I see – all this ATDD approach is limited by tools and technical stuff.
    Sometimes, you just need to write such copy-pasted scenarios just because you have the test which you want to assign to Gherkin lines.
    The technical limitations and complications lead us to write crappy verbose scenarios.

  3. Some people myself included try to test every possible condition in schenarios. This leads to over the top, hard to maintain schenarios that are not clear. It’s much better as the artical states to define border conditions in the schenarios and push more exhaustive tests down to the service or unit level.

  4. Dmytro – whenever I hear this argument people actually write scenarios after the code is built, so perhaps this is the case with your team as well. Cucumber, FitNesse, Concordion and similar tools aren’t particularly good for test automation after the fact. If you use them as intended – to capture examples that drive development – then the software gets designed and developed so that automation can be wired in nicely.

  5. Gojko, I agree with you. In our team, we ended up with writing specifications, mostly as Excel tables, which describes only the expected application behavior, without examples. Sometimes, the requirements are not clear for everyone on the team, so we found it very useful to illustrate those ambiguous requirements as Examples in plain English. That helps a lot.

    The project management is actually does not care wherever out automated tests passed or failed. They care only the application should work correctly and implemented according the defined requirements.

    We refused to use SpecFlow (Cucumber like framework for .NET) or any other ATDD framework and automated our checks “separately” from requirements.

  6. Hi Gojko

    How would you apply this pattern to a re-engineering project – one where an existing system has grown over a number of years and now contains 100s (if not 1000s) of scenarios (just that there are were no tests in the existing system)?

    The problem I’m facing is that the mental model of the project is now fixed in people’s heads so the existing solution is driving the new one (with certain hot-spots being changed). That’s leading to 100s of test cases now being written to cover the functionality that, because it was there before, must be worth testing now.

    I’ve tried to use examples to identify modules and push back to the real business goal, but “we need that scenario (and therefore that test)” is a typical response.

    Have you applied your techniques on legacy re-engineering projects?

    Thanks for any advice.

    Carl

  7. Hi Carl,

    we’ve been doing (and still are) what you are trying for the last 2 years. Here is some of my experience for whatever it’s worth.

    Regarding your problem, having 100s of scenarios is not a bad smell per se. If your software is big, it’s big. We currently have about 450 lovingly hand-crafted scenarios in about 70 feature description files.

    When people say “we need that scenario”, then that’s so much more valuable information than people saying “I don’t care. Make it work.”

    When re-engineering a large legacy code base, you shouldn’t expect to be able to identify modules from whatever scenarios or examples you manage to amass. We certainly weren’t.

    What you need to do instead is classify and group and classify and group again. And again. Start with picking a feature of your software. Imagine you start over from scratch and your only requirement is that feature. How would you describe it? Can you give an example that shows that feature at work? Can you give another example that shows that same feature, but behaving differently? How many of these examples do you have? No more than 7? Then you’re done for now. More than 7? Then break it up into smaller features and repeat. Once you’re done, ask yourself if you can find another feature of your software that covers similar ground. Proceed as before. How many of these can you find. Several? Group them into a feature set. How many of these feature sets can you find that cover similar ground? Several? Group them again.

    If you follow the above procedure you will end up with a tree the feature sets and features that your software supports. Three things you should focus on while you do that: go breadth-first, not depth-first; choose features first that are most important to the business (e.g. login usually is not, but selling a product usually is); use the language of the business, it will help you later sort your feature sets into the different domains the business is trying to address with your software.

    During the whole process I described you will learn a lot about what the business really wants to achieve, which in turn will help you modularize the software into meaningful, cohesive packets. Talk to the business people about your scenarios and see if they understand you. If they don’t, adapt your language until they do. Refactor your feature set tree down to each scenario relentlessly with every new insight.

    In the end, writing down scenarios is not about testing. That’s only a bonus. Writing down scenarios is about enabling business people and tech people figure out together what they want to achieve and how the software can help them do it.

    One last thing: Our feature descriptions with all their scenarios turned out not to be sufficient to describe the whole system. The thing they can’t describe are long processes. We have lots of those, i.e. a lot of different things happen during one request-response cycle. At one point we refactored our feature descriptions to only describe cohesive parts of these processes and resorted to activity diagrams to capture how these parts connect to form a long process.

    Hope that helps.

    Giso

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>