Software planning stories must be conversation starters. The quality of a story is directly related to how well it drives people towards the right discussions. And far too often, story refinement sessions are only about figuring out how to break down a solution someone already decided on, and delivering it iteratively. But that is just putting lipstick on a pig. Breaking down a solution into pieces requires most of the bits to be implemented before any serious feedback can be given. It also assumes that the solution is the right one for the job. That’s why lots of teams struggle to come up with something ‘small but valuable’, as if it were some kind of unsolvable dichotomy.

Instead, the trick is to solve a smaller problem — ideally, in a way that contributes to the larger vision. I figured out a while ago, as an outsider, that my biggest contribution to the story refinement conversation is a different perspective of the problem the team is trying to solve. Can we solve the same thing for a subgroup of the users? Can we solve the problem only for the most common case, and simply delegate to an older system for all the other cases? Can we solve the problem in a way that’s releasable to a small group of early adopters first? Can we get people halfway there and let them manually finish the job? All those deliverables might be valuable, but do not require the full infrastructure. At the Agile Testing Days conference in Berlin, I finally came up with a memorable acronym for the key components needed for the job.

The next time you need to break down product backlog ideas, make them TOO BIG first. Make sure you capture Testable Outputs and Outcomes, Behavioural Impacts and a Goal. With those aspects in place, story splitting takes a completely different direction.

Testable Outputs

Getting customers involved is absolutely crucial to any successful delivery. But customer collaboration, the holy grail of iterative delivery, might be misleading — or even damaging — if we seek feedback on the wrong things. In most cases, the software delivery team is responsible for this mistake.

A common problem with the misuse of user stories today is that they turn into a shopping list of features. This is not what they were intended to do originally, but how people misuse them in the wild. Quite often, this pushes the responsibility of designing a solution away from the software delivery team and into the hands of a customer representative, such as a product owner. In extreme cases, that’s like asking a driver to design a car. It’s one thing to poll someone on whether they’d like to have a large or a small trunk, two doors or five, and whether they need a four-wheel drive or to save fuel. It’s a completely different thing to ask them to design the transmission mechanism, or to decide on the compromises needed to make the combustion engine run efficiently. With software, the line between needs and design can be very difficult to establish.

An example of this is a team I visited about six months ago. With their customers, they decided on the look and feel of the screens of the application they were going to build, and then proceeded to implement screen after screen. They got frequent user feedback throughout the implementation process. How things worked below the screens was up to the development team. Unfortunately, they painted themselves into a corner. The customers were complaining about constant delays, the developers were trying to keep up with all the changes the customers requested and everyone felt they were running in circles. Worse yet, the estimate for the ‘final delivery date’ spiralled out of control.

When people break down a predetermined solution into parts for iterative delivery, the sequence almost always follows the data. Screens to set up payment information come before the screens to process payments. Data loaders come before reports. Registration comes before sending messages. This means that data input features are implemented first, and delivery teams try to get feedback on those things before creating outputs. Unfortunately, value rarely comes from putting bits and bytes into a computer system. Until some data comes out of the software, it’s literally useless. Once the data starts coming out, someone can potentially use it for something, and we can get good feedback on the whole piece of work from start to finish. Feedback on inputs in isolation mostly ends up being on how nice something looks, or how much it fits the mental model of the person asking for it. And when the implementation sequence follows the data, the first two-thirds of stories will be just about inputs.

Back to the team I visited — they were asking for feedback on stories about putting the data in, which often led to cosmetic changes to screens. The customers asked them to add more fields and information to forms, and expand data loaders with support for more types of records. It was a vicious cycle that just added more work and delayed the point when someone could actually start using the software.

This is why good stories need to discuss the outputs of a system more than the inputs. It’s only when we can create some kind of an output that we can actually get good feedback. If we can support creating a payment report, and it’s good enough to allow accountants to report tax, that is valuable even if all the screens don’t look 100% pixel perfect.

The output allows us to slice the solution in many positive ways — ways that are small but still potentially valuable. For example, perhaps the output supports enough reporting to allow accountants to report only quarterly VAT tax first, then yearly corporation tax reports. Alternatively, perhaps the accountants export only the individual transactions with enough information to calculate VAT using Excel. Those slices will lead us to a discussion on how the input screens need to look and how much data the loaders need to understand for a slice.

To start a good conversation, we need to understand the outputs people expect to see. That’s it as far as the solution space is concerned. Forget everything else about the solution — it will come naturally as we discuss how to create those outputs.

There are three good ways to address an output for a story that’s TOO BIG:

  • Create a new output (for example, sending notifications using browser push)
  • Modify an existing output (for example, change purchase orders to include phone numbers for delivery)
  • Deliver the same output, but in a different way (for example, using cloud instead of local infrastructure)

Before someone complains that changing how inputs are collected for an existing system can also potentially be valuable, that fits well with the third category above. For example, improving the usability of data entry for an order management system is potentially valuable. But that’s really about producing the same set of outputs in a different way. Evaluating the story exclusively on the input screens, without considering the consistency and completeness of the order reports, is dangerous. The ‘testable outputs’ in this case will stay the same as before the story, but the process of producing them will change.

Testable Outcomes

This brings us now to the question of what makes things valuable? How do we know that a new output, or modifying an existing output, actually adds value? Is improving the usability of the order management screen really valuable, or is it just a waste of time? This is another area where the now de facto standard user story card format ‘As a … In order to … I want …’, falls tragically short.

The question of value is difficult to answer because there is rarely only one level of value. Improving the usability of the purchase screen might help users complete transactions faster, but it may also reduce opportunities for cross-selling and damage the overall profitability of the product. Capturing better end-user analytics might be valuable for the operators of an online shop so they can optimise their inventory, but it might work directly against end-user privacy concerns. Allowing users to share digital goods they have bought might be valuable to end-users because they get free stuff, and to system operators because they create a popular platform, but the copyright owners will probably have something to say about the whole idea. On the other hand, the lawyers working for the copyright representatives could find the whole thing incredibly valuable for their own financial needs. And when those roles start mixing, the situation gets even more complex, as the recent Prenda Law Saga shows so colourfully.

To start a conversation about a story, we need to understand the different viewpoints on what’s valuable and for whom. Capturing just one level of value is rarely enough to even start that conversation. In The Art of Business Value, Mark Schwartz talks about three potential viewpoints:

  • Customer (paying for the software)
  • User (using the software)
  • Business (making/operating the software)

Sometimes these overlap or even represent the same person. For example, on internal software for small companies, all three views are embodied by a single product owner. Sometimes these viewpoints are completely different. For example, the users of the Google homepage are people who come searching for information. The customers, on the other hand, are people advertising there — people who want to direct the users’ attention away from the search results and towards their products. The business is Google itself, trying to find a good balance between the two. Too much advertising and the users will go away. Too few people clicking on adverts and the business dies.

Out of the three types of value, the customer perspective is often the easiest to discuss. Going back to the idea that the outputs of a software system might be valuable, it’s the outcomes of those outputs that matter rather than the outputs themselves. Nobody really cares about a dashboard showing product sales. They care about being able to stock up on top-sellers and spot new trends before their competition. In User Story Mapping, Jeff Patton states it clearly: ‘At the end of the day, your job is to minimize output, and maximize outcome and impact.’ Ultimately, we must get the outcome as fast as we can. And the key to having a good conversation is to express those outcomes in a testable way. What do the people paying for software expect to get out of it when it’s done, and how will they measure those outcomes?

Testable outcomes are critical for slicing a story. Once we know the expected testable outcome, we can talk about a smaller problem. For example, if the expected outcome is doubling the number of active users, how about we increase it by 10% first? Can we satisfy a subgroup of customers first instead of trying to make everyone happy from the start?

Behavioural Impacts

Unfortunately, simply defining outputs and outcomes leaves a big gap between the two, famously known as the ‘Underpants Gnomes Profit Plan’. Between a delivery team shipping an updated screen, and the active usage of a system doubling over time, there’s a person-shaped hole populated by users with their own free will, and a lot of unpredictability that comes from human DNA. That’s where the ‘user’ perspective of value comes in.

On the ‘user’ level, the best way to describe value is to consider a change in behaviour. Improve usability on a screen, and users will be able to complete the checkout process faster. Modify a report, and they will be able to make inventory predictions more accurately. Capturing the expected change, or the impact on users’ behaviour, is critical for understanding the problem before we start considering a solution. It’s the behavioural impacts that bridge the gap between a software output and a business outcome. Behavioural impacts are the answer to the phase 2 question of the Underpants Gnomes plan.

Behavioural impacts are, by their nature, testable. That is because they describe the behaviour (‘purchasing’), but also the impact the behaviour has on the process (‘purchasing more frequently’). And behavioural impacts can typically be measured very soon after a piece of software goes live, unlike outcomes that tend to be measurable only six months too late. This provides much-needed feedback not just on whether something was done according to plan, but also on whether it was a good idea.

There are several effective ways of capturing a behavioural impact:

  • Start doing something (for example, start buying digital downloads from our shop).
  • Stop doing something (for example, stop calling Customer Service for failed login attempts).
  • Do something in a different way (for example, check out faster, or read our news more frequently).
  • Choose to do something over something else (for example, have customers update their personal details themselves instead of calling the Customer Service department).

Knowing behavioural impacts is critical for slicing stories, as it opens up several ways of slicing the problem. We can cause a smaller change in behaviour. For example, instead of having people check out twice as fast, first get them to check out 20% faster. We can also cause a change in the behaviour of a sub-segment of users. For example, get users from Italy to start updating their personal details themselves, and only then expand that to other regions. There are plenty of good ways to slice user segments that significantly reduce technical complexity, but still provide value in the right direction. For example, change users’ behaviour with Android phones first, and then address the iOS group. All these slices are still valuable.


Testable outcomes and behavioural impacts are good for slicing the problem into smaller pieces, but we also need to know that those smaller steps are going in the right direction. This is the Goal statement that stories need.

Marty Cagan suggested an alternative to roadmaps that includes a holistic Product Vision, and specific Objectives supported by Key Results (OKR). The vision isn’t necessarily something measurable in this case.

Melissa Perri talks about Good Product Strategy containing a vision (‘In five years, we’ll be a top dinner option’), supported by challenges (‘In order to achieve our vision, we need to double acquisitions by December 2016’), and a set of target conditions (‘In order to overcome our challenge, we first need to increase conversion rates across all platforms by X% by the end of Q2’). The vision is a bit lofty, acting as a guide, and the challenge and target conditions are measurable and testable.

Henrik Kniberg recently wrote about how Spotify stopped using OKRs, and moved to ‘North Star Goals’ and Prioritised Bets. The ‘North Star Goals’ are a vision helping to prioritise and select bets the company is making, not necessarily a measurable target state.

All three approaches use a similar concept, helping the teams align with the big picture, and provide visionary drive for strategic and tactical actions. Understanding that vision or goal is crucial to making important trade-offs between different outcomes and behavioural impacts, and balancing the needs of the various stakeholder groups.

This is the third type of value, the value for the business itself. It tells us what kind of initiatives are aligned with the vision of our organisation — in other words, where the organisation wants to go next. What does the organisation making this software get out of the planned delivery? Why is it worth investing time and money into trying to satisfy that particular group of customers? Are we making the ship go faster, or just rearranging the chairs?

Knowing a goal is crucial for slicing stories because it helps us triage whether we are slicing the problem well. A goal can help us decide if the smaller problem is a step towards the big vision, or just a local optimisation. For example, if the overall value of a legacy migration is to reduce operational costs, and we split that into smaller problems that create require outcomes and behavioural changes, we can compare the ideas with the bigger goal statement. A story that solves the smaller problem but increases overall costs is bad. It might create the desired outcome, but it is not taking us where we need to be.

TOO BIG – Elevator Pitch

To slice stories, don’t break down the solution, but try to solve a smaller problem. To facilitate a good discussion on discovering a smaller problem, identify these aspects of the story:

  • Testable Outputs tell us how to measure if the story was done well.
  • Testable Outcomes tell us how we’ll measure whether the story was a good idea from a business perspective.
  • Behaviour Impacts tell us how to measure if the story was actually useful to our users.
  • Goal statements tell us if this initiative is aligned with the vision of the organisation.

Knowing the expectations for each of these aspects is crucial to understanding the problem at hand, and then breaking it down into smaller problems that guide us towards the big vision.