Don’t deal with problems like Gaggia
From /dev/coffee to modeling transaction processing based on Starbucks shops, the world of coffee has often inspired programmers. Here is a not so bright example from the world of coffee, giving us a hint how problems should not be solved. We recently bought a new coffee machine, and after a glance at the user manual, I was amazed to find out that our new Gaggia was affected with a common flaw of enterprise software: ignoring “stupid” problems.
On page ten of the user manual, a bordered paragraph stands out and simply draws attention. Running the pump without water in the tank will seriously damage the machine, and that misuse is not covered by the warranty. This got me thinking: why on earth would someone ruin this machine (which is not at all cheap) and void the warranty by making coffee from thin air? Although it sounds insane, it must have happened relatively often, or else it would not be in the user manual. The fact that this case is excluded from the warranty tells me that consequences are dire, probably rendering the machine useless.
I can think of quite a few situations parallel to that in the software world, and I am sure that most people in the industry have gone through a similar situation at least a few times in their career. The users do something incredibly “stupid”, and it is beyond anyone’s imagination why they did it. Maybe they started a database upgrade while people were still working on live data, or maybe they skipped a version and installed incompatible applications. A few years ago, I had a case of time-travelling: a trader executed transit optimisations for past contracts, causing havoc because past transactions no longer matched stored contract data.
After a problem like this, programmers often play the blame game. “How could someone be so stupid to run the settlement on past transactions?”. If the problem appears a few more times, it gets added to the user manual, like that is going to solve anything. I guess that is how the warning and the warranty disclaimer found their way into the Gaggia manual.
Are the customers to blame because they recalculated past transactions, or are programmers to blame because the operation was not reversible? Adding a warning to the documentation (and maybe modifying terms of service) puts all the blame on to the customers. Especially if the problem is “obvious”. That is really comforting from the programmer’s perspective, but leaves customers with a real problem.
Documenting a problem does not solve it
Going back to the coffee machine, I can imagine designers pulling their hair out and wondering how anyone could be so stupid to turn on the pump without any water in the tank. After all, that does sound completely insane. However, looking at it from a customer perspective, I can also understand why this might happen. The machine has a slick design, with a dark plastic water tank tucked in under a big metal head. If the machine is on a table, it is fairly hard to spot the water level from a standing height. The metal head is blocking the view, and the dark plastic is not really showing a lot under the typical kitchen light. So, it’s hard to spot the water line even when it’s there. And finally, because the lower third of the water tank is blocked by the coffee cup holder, it is impossible to know if there is any water left in the tank without dismantling half of the machine.
I am not in the business of designing coffee machines, but checking the water level seems relatively easy. Instead of just blaming the customers for doing something “stupid”, the machine could check for water in the tank and refuse to turn on the pump if necessary. Flashing a light to signal the problem would be even better, but the fact that I don’t hear the pump would be enough to tell me that something is wrong.
Going back to software examples, issues like these are often very easy to solve. If database upgrade would corrupt active transactions, we can check for activity and prevent the upgrade. If version 15 cannot be installed unless version 14 was previously deployed, we can check that before the installation. If recalculating past transits would break the system, we can skip contracts which were already settled.
Spotting the problem requires an open mind
Part of the problem, of course, is that these issues cannot always be foreseen. Programmers typically know their system much better than the customers and from a programmers perspective such operations simply do not make sense. But all that intrinsic knowledge blinds us to obvious problems. I have learned the value of inviting people who were not involved in development to test-drive the applications before a big release. Customers and experienced testers still manage to surprise me from time to time. Taking the Poka-Yoke approach to design helps a lot to prevent such problems from blowing up the system. There is a whole new field of interaction design emerging, aiming to reduce the friction in user interfaces, so including interaction designers on the team can help produce software which is less easily misused. But, even with all that we cannot anticipate everything, and these problems are going to pop up occasionally – that is simply part of our job.
A much bigger issue is that our mind is often closed to the real problem. Declaring someone stupid or insane and adding an obvious warning to the user manual is an easy way out and does not solve anything. Yet if people are repeatedly doing something “obviously stupid”, the cause might be a bigger design flaw. To spot that, we have to keep an open mind. Solving those bigger problems will surely pay off, as it would make our products better and easier to use. And, of course, it will save enormous effort on support and make customers happier, which is always nice.
What can we do about it?
Putting a warning in the manual, especially if it is something obvious, definitely will not help. Check the user manual, release notes and installation instructions for such warnings. Next time a customer does something “obviously stupid”, try to understand what they wanted to achieve in the first place and why the system allowed them to be so wrong. Think hard about whether something can be done to prevent the problem in the first place, be it a warning, blocking the operation, or changing the design of the system to support the offending operation gracefully.