How to prioritise technical debt against business features? How do we convince them that refactoring is important? How can anyone estimate the value of technical stories?

Lots of teams today struggle with scheduling something usually called technical work: system improvements that the team finds important, but nobody actually asked for them. Do too little technical work, and you’re actively damaging the product by slowly turning it into an unmaintainable mess. Do too much of it, and you’re actively damaging the product by delaying important business features. It’s very difficult to get this just right. Some teams blame micro-management caused by task management systems. Others claim that close customer collaboration turned the delivery team into a feature factory, taking orders from people who do not understand the importance of technical infrastructure. Of course, it’s easy to blame someone else, be it tools or ignorant stakeholders, but the cause of this problem is more fundamental. On the other hand, this is not a particularly new problem, and good solutions have existed for a long time.

Modern software delivery relies on close customer collaboration and transparency. Teams with remote members need digital work tracking tools. None of those factors cause problems with planning technical work, they just expose them more clearly. This is even more drastic with outcome-driven planning and measurable customer experiments from lean methods. With a laser-focus on a business goal, tech work will apparently always get postponed in favour of the stuff that actually brings in the money. This is a naive argument, of course, but after hearing it so many times following conference talks on impact mapping, it’s something I feel needs to be addressed.

Splitting work into technical and business is a wrong on many levels, but let’s disregard that for the moment. The core of the problem is the fact that a lot of work looks important to people working on software delivery, but it doesn’t seem that important to those paying for the whole process. Folks controlling the budget are also in charge of prioritisation, which is quite justified, so the prioritisation game is often dominated by tasks that easily to translate to money. As a result, teams look for ways to visualise, sell, explain, and convince stakeholders that technical stuff is important, usually bundled under the mystical spell of quality. This is a completely wrong approach, so no wonder it rarely brings results.

Metaphors taken out of context

A big part of this problem, in its latest incarnation, is caused by a bad analogy. Good metaphors help people relate to a new concept easily, so they can be quite useful for learning new things, but looking at an analogy too literally is counterproductive. One particular metaphor that outlived its usefulness is thinking about software development sprints.

At the turn of the century, during peak popularity of UML, RUP and CASE tools, software projects were built on templates from Soviet five-year economy plans (with matching success rates). Sprints were a useful metaphor in that context, because they forced people to think about shorter targets and smaller objectives. Two decades later, as two-weekly phases became the norm, planning cycle length is no longer a problem for most teams. The sprint metaphor has now taken on a dangerous literal meaning.

The word sprint came into English from Nordic languages, as a dialect term meaning to startle, jump or leap. Around the mid-nineteenth century it started to mean a short burst of running at full speed. Say the word sprint today, and most people will imagine Usain Bolt, flying out of the starting position as if a pack of hungry wolves were on his tail, zooming past the finish line in a blink.

To sprint successfully, Olympic racers needs to shut off everything and push their capacity beyond what normal people consider humanly possible. As far as the sprinters are concerned, the world stands still while they run. In the heat of the race, the only thing that matters is crossing the finish line. Usain Bolt can fly with a blatant disregard for traffic regulations and office politics, and not even once think about the difference between operational and capital costs. But that’s not how software delivery works.

Software teams can rarely disregard the world around them, and the game is far from over once a sprint is done. Racing sprints never continue into the next race straight away. Even Olympic champions can only sprint for a very short time, and then need comparatively long periods of rest before the next engagement. Software sprints are continual, with rarely any breaks or recovery time.

The two types of sprints share almost nothing apart from being relatively short. Running unsustainably is a perfect strategy for sprint racing, but a total disaster for software work. Feature factories, micro-management and always postponing infrastructure work force software teams to sprint unsustainably for a long time, to run at full speed on a continuous treadmill. All that stuff about refactoring, improving architecture or upgrading the infrastructure in a feature backlog is not about product quality, at least not the type of quality that business stakeholders care about. That is all about long term sustainability. It’s no wonder teams have trouble putting those things on a roadmap, when the prevailing metaphor about the software process is a mad dash. That is why thinking about sprints is dangerous.

If there is a racing metaphor for software product delivery, it would be a marathon, not a sprint. Building a successful product is a long and exhausting process. Most of the time, nothing major happens. Instead of taking giant leaps in short bursts, people make small continuous progress most of the time.

I like the idea of marathons much more than sprints because marathon runners need to care about long-term sustainability. Just like standing and breathing in the right rhythm isn’t going to win a race, dashing without the right breathing technique won’t finish well either. Similarly, all the software craftsmanship in the world won’t help if our work isn’t actually bringing value, but focusing on the business features alone will cause the team to fall out of the race before they even see the finish line. Those two aspects shouldn’t be competing for priority, they are fundamentally tied together.

Of course, this metaphor isn’t perfect either, no analogy is. Marathons have a well known end state, most software delivery does not (with the notable exception of pumping up a product just to sell it to a competitor, and watch it burn as they struggle to integrate it later). In the classic book on agile delivery, Agile Software Development: The Cooperative Game, Alistair Cockburn tried to destroy the whole racing analogy before it even took hold, but Scrum had a far better marketing model than any other agile process. Cockburn wrote about finite and infinite games, those that have a clear completion state and those that are played for the purpose of continuing the game. He also categorised games into collaborative or cooperative, concluding that software development is a collaborative infinite game. Races are competitive finite games, completely on the other end of the spectrum. Give up on the whole idea of sprints and marathons please, software delivery is nothing like racing. But keep the idea of sustainability in mind when planning.

Sustainability isn’t just about the pace

The need for a sustainable and predictable pace is nothing new of course, it is one of the original Extreme Programming rules. Sprints, in the early days of agile while the metaphor still made sense, actually did work towards sustainability. Shorter plans with frequent reviews help teams understand their realistic capacity, so thinking about sprints as short races helped people avoid taking on too much work.

However, sustainability requires a lot more than just ensuring that all the planned work can be completed, tested and integrated within the iteration. The eighth principle of the Agile manifesto cleverly doesn’t talk about sustainable pace. It talks about sustainable delivery and constant pace. I frequently work with teams that made their pace sustainable, but periodically need to stop completely in order to pay back technical debt, run a hardening sprint or a refactoring iteration. (Of course, the whole idea of a refactoring iteration is an oxymoron. Refactoring was originally the process of introducing small design improvements without changing the functionality, but words have a way of wiggling out of their intended meaning. Refactoring sprints are lot easier to sell than what they actually are, a pause for a major redesign. That’s like a Formula 1 pit stop during which the mechanics try install air-conditioning).

To keep the pace constant, we need the process to be sustainable, of course, but the product needs to be sustainable as well. That second part of the sustainability is often neglected. That is where all that work wanted by the delivery team comes in, even if it’s not necessarily wanted or understood by the stakeholders. Brady, the cleaning supplies company, sells warning labels that perfectly explain this problem: “If You Don’t Schedule Time for Maintenance, Your Equipment Will Schedule It for You”.

The key problem with prioritising sustainability tasks is that their value models are quite different from those of business features. For typical business features, people can at least make some solid assumptions about the expected value. The money usually comes at a known point after delivery, rises gradually as users adopt the feature, and then tends to drop off as competitors catch up, processes become obsolete and the needs of customers change. For sustainability features, the value may come at some unknown point in the future, mostly due to preventing unknown problems, so it’s very difficult to quantify.

Performance improvements bring zero value until the number of users outgrows the capacity, then teams need to scramble to troubleshoot the system during a total outage. Improving internal system observability brings no visible value until some heisenbug causes data loss for an important user, when it might make the difference between a quick resolution and an unsolvable problem. Upgrading technical dependencies is another great case in point. Many teams use third-party opensource libraries for infrastructure. Opensource projects move at their own pace, independent from the products that use them, and may introduce breaking changes in major updates. Keeping all those libraries up to date takes time and effort, and rarely brings business value immediately, so it can be difficult to justify when compared to business ideas. On the other hand, keeping up with small changes continually makes each individual upgrade easy. If someone discovers a critical vulnerability in a third-party dependency, teams that use a recent version will easily install the patch. Teams that are still using an ancient version might find it very difficult to switch quickly. A great example of that is the CVE-2017-5638 vulnerability in Apache Struts, which allowed hackers to steal data about 150 million people from Equifax last year. The bug was fixed by the Struts team two months before the hack, but Equifax teams couldn’t upgrade their product in time to prevent the blunder.

Create a budget instead of planning

Many teams insist on having all tasks in a single list and want to use the same process for everything, mostly because they think this is how to follow Scrum by the book. Then they struggle to describe acceptance criteria for sustainability work, estimate its value or predict how long it would take to complete those tasks. Stakeholders rarely have an opinion about the criteria when such tasks are done, or how to compare it to other things. But people rarely question why this is needed in the first place.

Instead of persuading stakeholders to see something that you can’t even put into words, just ask them whether the product needs to be sustainable in the medium to long term. If not, don’t worry about sustainability tasks. If stakeholders expect the product to stay around, then ask for a budget to make it sustainable. Deduct that budget from the overall capacity when planning business features. That way, you can have two categories of tasks, and they will not compete.

You don’t have to keep sustainability tasks in the backlog or track them in the task management tool, just do them as much as the budget allows every iteration. The team will mostly know what are the next few critical improvements, they don’t need to keep them in a list visible to anyone else. In theory, it may be good to understand the impact of sustainability tasks in order to prioritise them, but in most cases this is obvious. Urgent bugs come first, otherwise they aren’t that urgent, and the remaining sustainability tasks all tend to be gradual improvements, so it’s not that critical to select any particular order.

If keeping technical work out of the main product plan sounds scary, let me convince you that you’re already planning something much more important than your software product, without having most sustainability tasks in the roadmap. Continuing with the trend of metaphors that can easily be taken out of context, check your calendar for the next month. Unless you suffer from serious OCD, it’s likely that the calendar doesn’t contain any entries for taking a shower or brushing teeth. Yet most people still do those tasks regularly, without the need to track and prioritise them.

When things run smoothly, sustainability tasks don’t need to be in the plan, they just get done. Of course, in an emergency, it’s perfectly OK to skip brushing teeth for a day or two. But if you stop doing that completely, it’s almost certain that you’ll find yourself in agony, interrupting your normal schedule to visit a dentist. Sustainability tasks appear in the plan mostly when things go bad, and they usually take priority above everything else. And if you’ve really neglected the warning signs long enough, instead of a simple filling, you’ll need root canal treatment.

Putting regular sustainability tasks into a calendar would just create noise. Putting sustainability tasks into a product roadmap has the same effect. Having a budget removes the pressure to plan, prioritise or estimate such tasks, and avoids the need to compare them to features that bring immediate value. A sustainability budget gives teams slack to do necessary improvement work, but it also sets the upper limit for those tasks. A transparent agreement on the budget helps to ensure that the sustainability work does not impact regular product work. This reassures stakeholders that their priorities will not suffer because someone might be gold-plating technical features, removing the need to have everything in a single list and tracked in the same way.

Getting to a budget

Most teams actually already have a sustainability budget, but hidden in plain sight. Ad-hoc cleanups, technical debt sprints and refactoring iterations are effectively using that same budget, just assigned when things become unsustainable. Add up the cleanup time your team spent over the last year, and just consciously declare it as a budget. For example, if a team had two cleanup sprints of two weeks each last year, they effectively spent 10% improving the technical side of the product anyway, while causing a major disruption to regular delivery flow. Plan to use the same time, but spread it out so that major disruptions do not need to happen. Brush your team teeth regularly so you do not have to get root canal fillings. You’ll spend the same amount of time, and it will be significantly less painful.

One option for spending the budget is to create a strict schedule. That’s the equivalent of having a daily routine for personal hygiene. Perhaps schedule sustainability work at times that are usually quieter, such as a start of an iteration. For example, reserve Monday mornings for sustainability tasks, and allow teams to do whatever they feel is important to improve the product during that period.

Another way of spending the budget is to introduce the Batman role into the team. I first came across this idea in Jim Shore’s book The Art of Agile. (It is in the Iteration Planning chapter, available from Jim’s web site). Jim defines the role like this: “On an XP team, the batman deals with organizational emergencies and support requests so the other programmers can focus on programmering [sic].” The Batman is usually a rotating role, with different people taking the responsibility of doing everything that falls outside the regular work, so that the rest of the team can work uninterrupted. For teams with a reasonably stable product, the Batman is not doing much most of the time. While waiting for urgent support requests, the Batman for that week can work on the top priority sustainability tasks, and drop them on short notice if required.

To sum things up, the answer to the opening questions consists of two important steps:

  • Don’t think about internal team tasks as technical work, think about them as sustainability work.
  • Set up the planning process so that sustainability doesn’t compete with progress.

For a bonus tip, please stop thinking about sprints and go back to the more neutral idea of iterations from XP, which suggest an ongoing cyclic process.

Photo by Justyn Warner on Unsplash.