Gojko Adzic Gojko Adzic's blog. Specification by Example, Impact Mapping, and more... https://gojko.net/ Tue, 19 Apr 2016 07:32:52 +0200 Tue, 19 Apr 2016 07:32:52 +0200 Jekyll v3.1.0 The key first step for successful organisational change <p>Last week, while helping a group of product managers learn how to get more out of user stories, I asked the participants to list the key challenges they’ll face trying to bring all the new techniques back to their organisations. “We’ve always done it this way”, “We already know how to do it” and “If it ain’t broken, don’t fix it” kept popping up. People who suffer from those problems then often explain how they’ve tried to propose good ideas, but their colleagues seemed uninterested, defiant, obstructive, resistant, ignorant, or any of the more colourful NSFW attributes that I’ll leave out of this post. And even when their colleagues commit to try a new idea, changes in internal processes rarely happen as quickly and as successfully as people expect.</p> <p>Whether you subscribe to the Adam Smith’s invisible hand or Herman Daly’s invisible foot, something magical directs the behaviour of a whole group of people, even if each individual’s contribution can be rationally explained. In many cases, the key issue is that various participants in the group don’t actually agree on a single problem. That’s why offering a solution in the form of good practices rarely creates big results. Sure, any process change in a larger organisation suffers from complacency, fear and petty kingdom politics, but it also competes with people’s individual priorities, short-term goals and division targets. While someone is arguing for iterative delivery, other people care more about reducing operational cost or meeting sales quotas. And so, instead of an overnight success, change initiatives end up being much more like Groundhog Day.</p> <p>The <a href="http://amzn.to/1RYergx">FutureSmart Firmware</a> story is one of the best documented successful large-scale organisational agile adoptions. Gary Gruver, Mike Young and Pat Fulghum wrote about it in <a href="http://amzn.to/1RYergx">A Practical Approach to Large-Scale Agile Development</a>. Unlike unknown thousands of mediocre attempts that resulted in water-scrum-fall and desperation, or installing pre-packaged four-letter acronyms that don’t really change anything, the LaserJet group division didn’t start by trying to ‘become agile’. They visualised the problem, in a compelling way that sparked action. By adding up all the different categories where the delivery teams spend time, the authors came to a chilling conclusion: the firmware development group spent 95% of their time just on getting the basic product out, and invested only 5% on ‘adding innovation’. The numbers shocked the stakeholders, and it was pretty clear to everyone that continuing on the same path won’t allow the organisation to stay competitive. Visualising the problem also provided something that everyone can rally around. People didn’t waste time on stupid questions such as should the daily scrum be at 10AM or 4PM, or if pair programming is required for everything. Instead, people started thinking about how to change the process so that they can get the basic product out cheaper and faster, and leave more time for new product development. All the tactical decisions could be measured against that. At the end, the FutureSmart group adopted a process that doesn’t sound like any other pre-packaged scaled version of Scrum. If you’ve been doing agile delivery for a while, you’ll probably disagree with a lot of the terminology in the book, and occasionally think that their mini-milestones stink of mini-waterfall, but all that is irrelevant. What matters is that, at the end, they got the results. Gruver, Your and Fulghum describe the outcome: development costs per program are down 78%, the number of programs under development more than doubled, and the maintenance/innovation balance moved to 60-40, increasing the capacity for innovation by a factor of eight.</p> <p>Visualising the problem is a necessary first step to get buy-in from people, but not all visualisations are the same. Raw data, budgets and plans easily get ignored. Too much data confuses people. To move people to action, choose something that can’t easily be ignored. Based on a research of about eighty highly successful large-scale organisational changes, John Kotter writes in the <a href="http://amzn.to/1SH2rzI">Heart of Change</a> that ‘people change when they are <em>shown</em> a truth that influences their <em>feelings</em>’. Kotter advocates creating a sense of urgency using a pattern he called ‘See-Feel-Change’. The best way to do that, according to Kotter, is to create a ‘dramatic, look-at-this presentation’. One of the most inspirational stories from Kotter’s book is ‘Gloves on the boardroom table’, about Jon Stegner, a procurement manager at a large US manufacturing company. Stegner calculated that his employer could save over a billion dollars by consolidating procurement. He put together a business case which was promptly ignored and forgotten by the company leadership. Then he tried something radical – instead of selling the solution, he found a typical example of poor purchasing decisions. One of his assistants identified that the company bought 424 types of work gloves, at vastly different prices. The same pair that cost one factory $5 could cost another factory three times as much. Stegner then bought 424 different pairs of gloves, put all the different price tags on them, dumped the whole collection on the main boardroom table, and invited all the division presidents to visit the exhibition. The executives first stared at it silently, then started discovering pairs that look alike but with huge differences in price tags, and then agreed that they needed to stop this from ever happening again. The gloves were sent on a road show to the major factories, and Stegner soon had the commitment to consolidate purchasing decisions.</p> <p>One of the most effective ways to shake up a system is to create new feedback loops. Kotter’s ‘See-Feel-Change’ is great to provide an initial spark. Fast and relevant feedback motivates people to continue to change their behaviour. In <a href="http://amzn.to/1qfyYpp">Thinking in Systems</a>, Donella H. Meadows tells a story about a curious difference in electricity consumption during the 1970s OPEC oil embargo in Netherlands. In a suburb near Amsterdam, built out of almost identical houses, some households were consistently using roughly 30% less energy than the others, and nobody could explain the difference. Similar families lived in all the houses, and they were all paying the same prices for electricity. The big difference, discovered later, was the position of the electricity meters. In some houses the meters were in the basements, difficult to see. In the other houses, the meters were in the main doorway, easily observable as people passed during the day. The simple feedback loop, immediately visible, stimulated energy savings without any coercion or enforcement.</p> <p>The next time you feel the organisation around you is ignoring a brilliant idea, instead of selling a solution, try selling the problem first. More practically, visualise the problem so it can sell itself, allowing people to see and feel the issue. For the best results, try visualising the problem in a way that closes a feedback loop. Create a way for people to influence the results and see how your visualisation changes.</p> Tue, 19 Apr 2016 00:00:00 +0200 https://gojko.net/2016/04/19/visualise-problem/ https://gojko.net/2016/04/19/visualise-problem/ favourites agile software-profession The most important lesson to improve software delivery <p>Richard Tattersall of the parish of St. George-in-the-Fields, liberty of Westminster, gentleman, as he liked to be described, had a lot of interesting claims to his name. In his youth, in mid-18th century Lancashire, he wanted to join the jacobite rebels. After a family intervention, he ran away from home, and ended up becoming a stud-groom in the service of the Duke of Kingston-upon-Hull. A unique business sense led Tattersall to start his own race-horse farm, then establish an auctioning house. Tattersalls Auctioneers’ ended up at Hyde Park Corner, and their clients included the key British nobility and even the King of France. Two and a half centuries later, France is no longer a kingdom, the dukedom of Kingston is extinct, but Tattersalls auctioneers still runs. It is the oldest such institution currently operating in the world, and Europe’s largest. Tattersall was not just a good businessman, he also knew how to entertain. Even the future king George IV was known to frequently visit Tattersall at Highflyer Hall, to enjoy ‘the best port in the land’. In his late years, Tattersall was so universally known and respected throughout the realm, that <a href="https://repositories.tdl.org/ttu-ir/bitstream/handle/2346/46457/NewNo339.pdf?sequence=1">Charles Dickens wrote</a> about Tattersall ‘no highwayman would molest him, and even a pickpocket returned his handkerchief, with compliments’. It would, then, come to great surprise to the gentleman how often software developers curse his name today.</p> <p>In an epic twist of irony, people often don’t even know that Tattersall, the person, had no influence on the cause of all that pain. Tattersall had just the right mix of business sense and social skills. At his Hyde Park Corner venue, he reserved two ‘subscription rooms’ for the members of the Jockey Club. The two rooms quickly became the centre of all horse betting in the UK. The Tattersalls Committee, an successor organisation to the informal clubs from the two subscription rooms, had the legal ability to exclude people from the sport or access to racecourses in Britain all the way up to 2008. The rules and regulations they used to settle disputes still more or less govern British horse racing today. And among those rules, there is the famed 4(c), the destroyer of software models. For all the pain it caused me over the years, it’s also responsible for one of the most important lessons in improving software delivery I’ve ever come across.</p> <p>For teams working on horse racing software, ‘Rule 4’ is the one thing you can always mention to disrupt a meeting, cause everyone to tell you to go to hell, and then spend the rest of the afternoon playing Desktop Tower Defence. It’s the edge case a tester can easily invoke to break almost anything. It’s the reason why people phone in the next day and insist to stay home because of some unforeseen illness. That’s because Rule 4 messes with the one thing developers hate to touch — time.</p> <p>Horse racing bets are priced either at the time when a bet is placed, or at the time when the race starts. Punters generally prefer the first, called ‘fixed odds’ because they know what they’re expecting. The price and the potential winnings are fixed, hence the name, and that’s the rock solid agreement between a punter and a bookie. Such a premise allows developers to design some nice and elegant settlement models, and then deal with the difficult stuff they like to solve, such as latency, throughput, and performance optimisations. But then Tattersall’s Rule 4 kicks in. It controls what happens in a case when one of the horses doesn’t show up for a race. If, for example, the favourite ends up in a ditch somewhere ten minutes before the race, all the people betting on the second favourite now stand to win a lot more than the bookies expected. Rule 4 allows a bookie to pay out the bets on the other horses as if the favourite never existed. This means going back in time, recalculating the odds for all the other runners, and applying the new price. With weird math, that isn’t exactly completely logical. For each single bet, at the time when that bet was placed. This is where the stomach problems start. Developers realise they need to start saving a lot more information at a time when the bet is placed, mess up with the nice elegant settlement models, break system performance, and a ton of other things. That’s why, generally, once Rule 4 is finally implemented correctly, that piece of code is off-limits. Nobody gets to even view it on a screen, to prevent an accidental Heisenbug.</p> <p>Rule 4 makes a lot of sense when one of the favourites misses the race, but it’s also applied to Limpy Joe, the horse that almost died in the previous race of old age and boredom. It can lead to weird and pointless complaints about why someone got 5 pence less than expected, and it can cost more in wasted customer support than it protects against fraud. I once worked with a company that tried to save a bit of money on customer servicing, and do something nice for its punters at the same time, by not taking advantage of Rule 4 below a certain threshold. Switching rules on and off wasn’t such a big deal, it turned out that they could actually do it themselves, and everyone was happy. No need for anyone to stay home playing Desktop Tower Defence.</p> <p>But then, one day, they asked us to change the monthly customer statements, and print out the bet results as if Rule 4 was still applied, then add the deduction back. Not only did this mess with time, but it messed with it twice. It required us to record something that didn’t happen as if it happened, combine it with a ton of other rules that did actually happen, and then flip back the whole thing again. And they asked us to do that not at the point when the bets were recorded or settled, but when the monthly statements were produced. This required keeping a lot more information so we could settle all the other rules backwards and forwards in time, break system performance, and change the one part of the system that nobody wanted to touch. Some of the additional rules were implemented by third parties, so we’d have to chase them to change their code as well. Of course, the whole thing needed to be done as quickly as possible, ideally yesterday. Lots of towers were successfully defended that day.</p> <p>The next week, on a visit to the call centre, I sat next to an operator who was one of the people insisting we ‘fix the statements’, and watched him deal with a disgruntled punter. The person on the other end of the line heard about the Rule 4 promotion, and called in to complain that he was short-changed. In fact, some other rules produced an odd value for the final payout. The operator was stuck explaining that the amount on the statement is OK, and he had to take the punter through the whole weird math required to settle a bet according to all the other on-going promotions. A few minutes after that call, another one came in. Instead of saving money on customer support, the Rule 4 voiding idea made it worse. It was clear that the interaction of rules was causing the confusion, not Rule 4 itself. I asked about why they singled out that one, and the operator said that they were going to ask for all the other promotions to be pulled out into separate line items as well, just later. They didn’t want to overload us with work, so that Rule 4 could get done quickly.</p> <p>Observing the problem first-hand, I could see the whole mess, and why they wanted something done about it. But just thinking about the implications on our nice, clean, elegant software models made my stomach turn. Luckily, being next to the one person who suffered the most, and finally understanding what’s going on, I could propose an alternative. What if, instead of actually redoing the numbers to list individual calculation components, we just listed the names of the special promotions applied to a bet? So punters would immediately see that there was more than one thing going on, and that their Rule 4 promotion still applied. That information was already available in the database. The fact that the solution could be deployed in a few days sold it easily. The trick with labels reduced the confusion, and not just for Rule 4, but for all the other promotions they were going to ask about later as well.</p> <p>That day, I learnt that how important it is to spend time observing people actually using the software that we were building. Sitting next to a user allowed us to together flush out all sorts of weird and wonderful assumptions, and work together on ideas that solve real problems, not just plaster over a huge crack. Together, we discovered insights that nobody could predict. That’s just common sense, so surely everyone is doing it by now, right? Not so much.</p> <p>With the previous post, I asked people to fill in a quick questionnaire about how frequently they observe people using their software. With slightly more than 700 responses, I can’t really claim any kind of universal statistical relevance for the whole industry. On the other hand, given that you’ve self-selected into a group by reading this, the data should be relevant for teams similar to yours.</p> <p>Roughly fifty percent of the survey participants said that had no direct interaction with end-users, ever. They’ve never seen an actual user work with their software. Only about 10% teams, across the whole group, actually engage with their users every week.</p> <p><a style="display: block; text-align: center; float: none; clear: both;" href="/assets/feb-survey-evaluation.png"> <img style="width: 100%; max-width: 100%" src="/assets/feb-survey-evaluation.png" /><br /> (click for a larger version) </a></p> <p>I asked separately about observing end-users testing future software ideas, pretty much the key aspect of getting any sort of user experience research executed. The numbers are even worse. Only about 6.4% of the respondents said that they do that on a weekly basis. In a fast moving industry such as software today, where bad assumptions and communication problems can cause serious damage, that’s just depressive.</p> <p><a style="display: block; text-align: center; float: none; clear: both;" href="/assets/feb-survey-research.png"> <img style="width: 100%; max-width: 100%" src="/assets/feb-survey-research.png" /><br /> (click for a larger version) </a></p> <p>This is both good and bad news. For most people reading this, the bad news is that you’re not benefitting from all the insight that you could easily collect. The good news is that the bar is so low at the moment, that you can easily change your delivery process a bit and be far better than the competition.</p> <p>If you’re a developer or a tester, figure out an excuse to sit with the actual users for a few hours every week. If you’re managing a team, think about sending the group to observe actual users periodically. I guarantee this will significantly improve the software you deliver. It doesn’t matter if your company already uses a UX specialist agency to deliver the decisions, or if you have someone else already tracking the usage patterns and results. Go and see things for yourself, you will learn stuff you could have never predicted before.</p> <p>For example, we recently added text notes to MindMup 2.0. This was one of the most requested features on the user forums, so we had a ton of data to start with. Instead of just charging ahead with the ideas collected through initial requests, we spent a week building a rough version, and then invited actual users to try things out. The results were surprising. We got a bunch of assumptions about ordering and exporting wrong, and our users wanted something significantly simpler than what we planned to develop. As a result, we were able to launch that probably a month ahead of the schedule, and make it a lot more intuitive.</p> <p>Of course, there will be plenty of excuses why spending time with users isn’t possible. Especially if they are not easily available. The survey results broken by type of software delivery clearly show that. Reduced only to software delivered internally, roughly 27% of the participants said that they observe users working with their software at least once a month. This is significantly better than just 16.5% of those working on consumer software. Enterprise B2B, of course, ends up in the last spot with roughly 14%.</p> <p>Don’t let the fact that your users are remote, or numerous, stop you from talking to them. Even for consumer-oriented products, getting this kind of feedback is easier than it seems. I go to lots of software conferences, and I always try to get a few people to try out MindMup between conference sessions. It doesn’t cost us anything, and most of the people I approach are willing to spend a few minutes helping us out, especially during long boring lunch breaks. If such direct contact isn’t possible for your team, think about remote screen-sharing sessions. It’s not the best way to do user research, and I’d love to have one of those hi-tech rigs that tracks eye movement and blood pressure that ad agencies use, but that’s far beyond my budget. However, even a simple screen share session opens up an incredible amount of insight. The text notes research we did for MindMup 2.0 was done 100% over remote screen sharing, and we had people participating from all over the world. Just get people to think out loud while they are clicking around the screen, and be ready to get surprised.</p> Mon, 14 Mar 2016 00:00:00 +0100 https://gojko.net/2016/03/14/most-important-lesson/ https://gojko.net/2016/03/14/most-important-lesson/ favourites agile software-profession Introducing Claudia.js &ndash; deploy Node.js microservices to AWS easily <p>I’m proud to announce the 1.0 release of Claudia.js, a new opensource deployment tool for Javascript developers interested in running microservices in AWS. AWS Lambda and API Gateway offer scalability on demand, zero operations overhead and almost free execution, priced per use, so they are a very compelling way to run server-side code. However they can be tedious to set up, especially for simple scenarios. The runtime is oriented towards executing Java code, so running Node.js functions requires you to iron out quite a few issues, that aren’t exactly well documented. Claudia.js automates and simplifies deployment workflows and error prone tasks, so you can focus on important problems and not have to worry about AWS service quirks. Even better, it sets everything up the way Javascript developers expect, so you’ll feel right at home.</p> <p>Check out the video below for an example how you can set up and deploy a new API in less than five minutes!</p> <iframe src="https://player.vimeo.com/video/156232471" width="500" height="281" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe> <p>Claudia.js <a href="https://www.npmjs.com/package/claudia">is available from NPM</a>, and the <a href="https://github.com/claudiajs/claudia">source code</a> is on Github!</p> Mon, 22 Feb 2016 00:00:00 +0100 https://gojko.net/2016/02/22/introducing-claudia/ https://gojko.net/2016/02/22/introducing-claudia/ news Potentially shippable is no longer good enough <p>Potentially shippable software is the holy grail of agile delivery, according to anyone out there with enough patience to sit through two days of Scrum conditioning. Ten years ago, most of the industry probably wasn’t capable of living up even to that benchmark. But today, potentially shippable software by the end of each iteration should be taken for granted, the same way you expect your next hamburger to be asbestos-free.  That’s the bare minimum, but far from being good enough.</p> <p>In fact, the current thinking around potentially shippable software severely limits what teams could achieve. The move to frequent releases is causing a fundamental change for consumers. Companies that can spot this in their market segments, and adapt quickly, will start running circles around the competition. Those that don’t will be left trying to play bowling on a basketball court.</p> <p>For an example, just look at transportation. The entire car industry seems to shake and tumble with problems. The data for 2015 isn’t out yet, but just for comparison, <a href="http://fortune.com/2015/10/21/toyota-recall-window-6-million/" target="_blank">Toyota recalled 6.5 million cars last year</a> to deal with switch malfunctions. In 2014 alone, the recalls ordered by the US NHTSA agency involved <a href="http://www.usatoday.com/story/money/cars/2015/03/03/nhtsa-recalls-2014/24323855/" target="_blank">63.9 million vehicles</a>. General Motors had to fix 5.8 million vehicles in 2014 to deal with faulty ignition switches that could cause fires. In a similar situation, NHTSA ordered Tesla to deal with problems in almost 30.000 vehicles. The NEMA 14-50 Universal Mobile Connectors could overheat, and potentially cause a fire. Pretty serious stuff, with faulty hardware. Judging by the rest of the industry, this should have been a crisis that would cost at least a few million. Instead, it turned into a ton of free press. </p> <p>The Tesla UMC recall didn’t require car owners to waste time driving their vehicles to a service shop. It didn’t require the manufacturer to cash out for new parts, or to pay for mechanics’ time. Instead, <a href="http://www.carscoops.com/2014/01/almost-30000-teslas-at-risk-of-fire.html" target="_blank">someone pushed a button</a>. An over-the-air software update remedied the problem until the next time a car gets brought in for a regular check-up.</p> <p>When one player in a market can respond to a major problem with an automatic software patch, while the others have to pay for parts and labour, they aren’t playing the same game any more. The costs of servicing, of course, plummet. But the impact goes far beyond that. </p> <p>In similar situations, the owners of the other vehicles had to make a difficult choice of trading their short term plans against security and safety risks. Tesla’s customers were not affected at all. For Tesla, continuous software delivery isn’t just a technical practice, it’s a way to change consumer expectations and open up marketing opportunities.  You can dismiss this as an isolated incident, but ten years from now, the car consumer expectations will completely turn. People will expect to have that level of service, and anyone not being able to provide it will be out of business. And it will happen to many other industries as well. </p> <h2 id="disrupting-business-models">Disrupting business models</h2> <p>When the user expectations and perceptions change, business models have to change as well. Just consider how the typical software sales models changed over the last twenty years. Before the web services became ubiquitous, it was quite normal for consumers to buy a particular version of software. When a new box of their favourite software came out, complete with a stack of 3.5” floppy disks and a printed user guide, consumers would pay for upgrades. This model was sensible when new versions came out every year. But as the Internet took off, and people requested higher bandwidth to enjoy ever increasing quality of funny kitten videos, it became possible to distribute software updates more frequently. </p> <p>Consumer expectations significantly changed. Technically, it makes a lot of sense to release software several times a month, especially to fix security risks. So people got used to upgrading frequently. Consumers might even like the new features enough to want to install a new version, but nobody wants to pay for software that often. The whole concept of selling versions didn’t make much sense at higher frequency, so companies started to offer free upgrades, and users started to become more entitled. The average internet consumer today expects to get web services for free. Free e-mail, free photo storage, free news. On mobile platforms, apps still sell, but users who pay $0.99 expect to get all future new features free, forever. That requires literally a pyramid scheme where early backers benefit from latecomers, and requires an ever increasing user-base. When the growth stops, commercial models like that fail, much like in a Ponzi scheme. </p> <p>Just look at operating systems. After OSX went free, the game changed. Microsoft had to make Windows 10 free as well, and move away from versions. <a href="https://redmondmag.com/blogs/the-schwartz-report/2015/07/windows-10-last-os-version.aspx" target="_blank">Now all Windows will be 10</a>,  and instead of a major version every three years, consumers can expect a continuous stream of updates. At the same time, because it can’t be sold any more, Windows <a href="http://thenextweb.com/microsoft/2015/07/29/wind-nos" target="_blank">collects private information and phones back home with advertising identifiers</a>, so it can make up for the lost revenue</p> <p>Changing the expected delivery frequency pretty much killed the old software business model. Instead of charging for new features with paid upgrades, software companies had to come up with completely different ways of financing development.  The wholesale sleaze-ball privacy invasion of ad networks is just a way to pass the need for payments from consumers to third parties. Some companies decided to constantly harass their users with micro-payments to unlock individual features. Zynga was one of the first to spot the game change, and it was one of the rising stars at the turn of the decade. But consumers can only suffer constant harassment for so long, <a href="http://arstechnica.com/business/2013/09/how-zynga-went-from-social-gaming-powerhouse-to-has-been/" target="_blank">and it took only a few years for the whole pyramid to collapse</a>. On the other hand, companies that could charge for rent, such as Dropbox or Github, flourished in the new game. The expectations in the market changed. People want software for free, but they seem happy to pay for a service.</p> <p>This user entitlement caused by more frequent delivery expectations isn’t a problem just for software. As continuous delivery crosses more into product strategy, it starts to affect customers of all types of products. In October 2014, Tesla announced that new cars coming out of the factory will have a forward radar, ultrasonic sensors and cameras, all wired up to a lane-changing autopilot and high precision digital breaking system. Although the news was amazing, not everyone was happy. Richard Wolpert from Los Angeles, for example, bought an older model just a few months before, and to him <a href="http://gas2.org/2014/10/21/tesla-upgrades-anger-existing-owners/" target="_blank">the world just seemed unfair</a>. Normally, he got the new car features for free, magically. But this one wasn’t coming. So he <a href="https://www.change.org/p/tesla-motors-petition-to-force-tesla-to-come-up-with-a-retrofit-for-the-new-autopilot-features-for-existing-owners-who-were-early-supporters-of-the-company-product" target="_blank">started a petition</a> to force Tesla to retro-fit radars and sonars into older cars. Dag Rinden of Oslo, Norway, pleaded that lane switching and automatic breaking are important for driver security, and that they should be provided for free to all existing owners. </p> <p>Now let's just take a moment to consider this. Someone bought a car, and later complained that new hardware did not magically appear overnight when it was announced in the news. Continuous delivery doesn't solve this problem, Star Trek replicators do. You and I can laugh about it, because we can distinguish hardware and software, but Richard and Dag don’t care about that. They only see a car, and they got used to getting the new stuff for free. Plus, the new features are potentially life-saving, so surely they are entitled to those. Disappointing users is never good, even when they are clearly wrong. But giving away free radars also isn’t good for business. And the whole mess is a consequence of the fundamental game change. </p> <p>I’ve never heard of anyone with similar complaints about any other car manufacturers. When you buy a car, pretty much it's clear that it won't one day just get a radar and a sonar. But Tesla trained their users to expect more. They aren’t playing the same game. For all other manufacturers, a car model is something with a fixed design, produced a particular year. For Tesla, that concept of car models just doesn’t work like that.  And once there are no more models and yearly versions, people just feel a lot more entitled and expect to get things for free.</p> <h2 id="disrupting-marketing">Disrupting marketing</h2> <p>Another major side-effect of frequent delivery is that it removes the drama. The more often software ships, the less risky each update becomes. Small changes mean quick testing, and small potential problems. Continuous delivery pipelines help companies deploy with more confidence, prevent surprises, and generally make releases uneventful. But making the releases uneventful also causes problems for marketing. I’ve learnt this the most stupid way possible, on my own skin.</p> <p>MindMup is a bootstrapped product, and we don’t have a lot of cash to spend on advertising.  Apart from slow and steady word of mouth, the typical way for such products to get new users is press coverage. Indeed, the three biggest spikes of user traffic for MindMup over the last three years came from news sites — spending a day on the front page of HackerNews after we open sourced it,  and getting reviewed on LifeHacker and PCWorld.  However, after those early successes, it took over a year and a half until we could get another big spike. Meanwhile, we shipped a ton of useful stuff, but nobody took notice. By having a continuous stream of small changes instead of big versions, we scored a marketing own-goal. Sure, technically there was no drama in any of the hundreds of releases. But there was also no excitement. No single change was ever big and important and newsworthy to be covered by a major channel. </p> <h2 id="potentially-shippable-in-a-changed-game">Potentially Shippable in a changed game</h2> <p>Changed consumer expectations, across industries, will put more pressure on companies to roll out software with increasing frequency. That’s a given. Yet the more frequently software ships, the more it has an impact on marketing, business models and consumer expectations. People who design continuous delivery pipelines, and people who break down features into iterative deliveries, now have a magic wand that can disrupt sales or marketing and disorient users. </p> <p>A nice example of that is how Paypal changed their business dashboard last year. One day, trying to pay something using PayPal, I panicked after several thousand pounds disappeared from our company account.  My first thoughts were that our PayPal account got hacked, or that the funds were frozen for some bizarre reason. PayPal is famous for being hostile to digital goods merchants, and frozen funds were an even scarier scenario than getting hacked. I looked through the recent transactions, and I couldn’t see any transfers or withdrawals. In fact, there was nothing suspicious in the list.  While I was trying to call the customer service, I spotted a link saying something similar to ‘how do you like our new business dashboard?’ Anyone who has ever done serious software testing would start guessing what happened there. And in fact, it took only three link clicks to find the money. My company has a multi-currency account with PayPal. The old dashboard converted all the money into an approximated value in the primary currency, but the new dashboard only showed the money actually in the primary currency. Someone did an incremental development change, and they either intentionally or mistakenly disregarded multiple currency accounts. I can only assume that most people with multiple currency accounts didn’t think like a software tester that morning, and that the PayPal customer service didn’t exactly have a pleasant day. At the same time, to get software potentially shippable, someone had to cut a huge piece of work into smaller batches. And they made the wrong choice. </p> <p>As an industry, we need to move the discussion away from ensuring things are potentially shippable towards how exactly that’s achieved.  The choice can have a ton of unexpected negative effects on sales and marketing, or it can open up new business opportunities and help companies run much faster than the competition. That’s why software planning and releases have to be driven more by market cycles and marketing opportunities than arbitrary iterations.  </p> <p>And that’s where the problem with the concept of 'Potentially Shippable' starts. Does that mean potentially could be deployed to production? Or does it mean potentially could be released to users? Who determines if potentially should be turned into actually? Or when that should happen? </p> <h2 id="deployments-are-not-the-same-as-releases">Deployments are not the same as Releases</h2> <p>When we started fixing this problem for MindMup, one thing became painfully clear. We thought about deployment and releasing as the same thing, but it's much more useful to look at them separately. Deployment is a technical event, bits and bytes of software being moved to production servers or users’ devices. Release is a marketing event, where a new version becomes available to a group of end-users. Think about ‘Deployment’ as the part when an Amazon courier brings a box of cardboard-packed toys, you wrap them up nicely, and hide them in a cupboard. But  ‘Release’ is when your children find the toys under a Christmas tree, at exactly the right moment to believe in Santa Claus.</p> <p>For MindMup, potentially shippable stuff turned into actually shipped almost all the time, in order to reduce technical deployment risks. We mentally coupled deployments and releases, and by doing that, we forced a technical event to have a marketing impact. Going back to the example with presents, it’s as if the children intercepted the couriers and took the presents themselves, along with the delivery slips and the receipts. Sure, at the end everyone got a toy, but the magic of Christmas is gone.  And they’ll start arguing about who got a more expensive present and who got shorted. Our software releases were driven by technical cycles, not marketing cycles. No wonder nobody wanted to pick up on any important news.</p> <p>Once I could spot this in our software, it became easy to see it with many of my consulting clients as well.  I don’t have any statistically relevant data to claim an industry-wide pattern, but it looks as if this is quite a common self-inflicted handicap. Deployments and releases are tightly coupled in our minds, it’s just the way we were conditioned to think. I assume that nobody reading this article primarily distributes software on floppy disks in boxes, physically shipped to consumers. Yet that’s still how most people think about releases and deployments.</p> <p>The solution is quite simple: <em>Decouple deployments and releases</em>. This effectively means being able to put software on production systems that is not necessarily generally available, running alongside software that is visible. It’s the nicely wrapped present, without the receipts or any other controversial crap, waiting for the right moment to make a big impact. That way, the marketing stakeholders can decide on their own when they are going to release it and how. Software releases can be organised around important marketing opportunities, while software deployments can still happen frequently to reduce technical risk. Jez Humble <a href="http://www.informit.com/articles/article.aspx?p=1833567">wrote about that in 2012</a>.  </p> <h2 id="the-key-is-in-multi-versioning">The key is in multi-versioning</h2> <p>The problem, of course, is that simple is not the same as easy. Although I can suggest the solution in one sentence, it is quite difficult to pull off in practice. Feature toggles, ever more present in software, <a href="http://martinfowler.com/articles/feature-toggles.html#ImplementationTechniques" target="_blank">lead to unmaintainable spaghetti of code, configuration, and magic</a>. To truly get the benefits of continuous delivery, most companies will likely need a completely different approach to technical architecture and design. Instead of simple toggles and flags, software will need to be designed from the ground up for multi-tenant, multi-versioned, multi-interface world. This means that every layer of the stack will need to accept calls from potentially different versions of things above it and know how to reply accordingly. It also means that almost every piece of data in transport will need to be tagged with the appropriate version. This will significantly increase the complexity of testing and operating software. But companies that don’t do that will end up playing bowling on a basketball court and wonder why they are not scoring.</p> <p>Once the capability for running multiple concurrent versions is in place, it’s becomes quite easy to make some versions of software available only to certain subsets of users. And so, it becomes easy to minimise the potential negative effects of small incremental changes. Imagine if the new Paypal business dashboard was only shown to customers with a single currency account. Instead of giving all the users a small increment of the improvement, this would give a small group of users 100% of what they need. There would be no user confusion, and the “new” business dashboard would actually be better for whoever could see it. Over time, as the features build up, more users could be brought over to the new system, and then finally, the old version completely retired. Ironically, I’m pretty sure that PayPal has the capability to deploy and release gradually to subsets of users, but they didn’t coordinate it well with the rest of the business.</p> <p>Once the capability for running multiple concurrent versions is in place, it’s becomes much easier to decide what and how to sell, and what, how and when to open up. Continuous delivery pipelines don’t need to have a negative impact on sales or marketing, and the decisions around those aspects can go back to the people that should be making them.  Even more importantly, with proper multi-versioning in place, it becomes a lot easier to make better informed decisions. Focus groups, prototype experiments and customer research can only suggest that people might potentially be able to do something, not that they will actually do it, or get the expected benefits. But with multi-versioned systems, companies don’t have to rely on potential usage data — they can look at actual, real user trends, and weed out bad ideas before they become cemented.  At Google, <a href="http://www.theguardian.com/technology/2014/feb/05/why-google-engineers-designers" target="_blank">one such test apparently led to an extra $200m a year in revenue</a>. </p> <p>Ron Kohavi, Thomas Crook, and Roger Longbotham have some chilling statistics in their paper <a href="http://www.exp-platform.com/Pages/expMicrosoft.aspx" target="_blank">Online Experimentation at Microsoft</a>, where they claim that only about one third of analysed ideas actually achieved what was expected once implemented in software. They also cite a source from Amazon, where the success rate is higher, but still less than 50%. </p> <p>This means that for the average software company out there, getting multi-versioning right can reduce maintenance costs by fifty to seventy percent, just by helping them drop deadwood, and not waste time on implementing things that just won’t fly.  The additional cost of operation and testing can then be easily be recovered through a significant reduction in maintenance costs. </p> <p>So, if you’re still late making your 2016 resolutions, or if all the ones you made already turned out to be unachievable, here's an idea for the next year: <em>push your organisation slightly more towards thinking that continuous delivery isn’t just a technical thing</em>. It’s a game-changer, that has massive side-effects on business models and customer expectations. And design your pipeline so that you can decouple deployments from releases. Run the former based on technical risk, and coordinate the latter with marketing cycles. </p> Mon, 01 Feb 2016 00:00:00 +0100 https://gojko.net/2016/02/01/potentially-shippable/ https://gojko.net/2016/02/01/potentially-shippable/ favourites agile Test automation without a headache <p>Here’s a video of my talk on Test automation without a headache, from Agile Tour Vienna 2015:</p> <iframe src="https://player.vimeo.com/video/146987369" width="500" height="281" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe> Sat, 21 Nov 2015 00:00:00 +0100 https://gojko.net/2015/11/21/test-automation-without-headache/ https://gojko.net/2015/11/21/test-automation-without-headache/ presentations spec-by-example Automated testing: back to the future <p>During the last few years, the world of computing evolved far beyond what even the writers of Back to the Future imagined. Fair enough, we still don’t have flying cars, but Marty McFly of 2015 got his termination letter by fax. In the real 2015, it would be more likely to get a smartwatch notification – and for an added insult, also a small electric shock. All this new technology is creating some incredible opportunities for software testing. Prohibitively expensive testing strategies are becoming relatively cheap, and things that we didn’t even consider automating will become quite cheap and easy kind of like McFly’s self-tying shoe laces. If your organisation is suffering from the high cost of testing, here are a few tools and trends that may help you outrun the competition.</p> <p>Three important trends, currently reshaping the software landscape, are important to consider before we jump into the new opportunities. Much like Doc Brown’s DeLorean, they both create problems and open up new possibilities.</p> <p>The first factor is the surge in platform fragmentation. The variety of devices running software today is just amazing, but it’s also a massive cause of pain for testing. Mobile is the new desktop, but the number of screen resolutions, devices, OS combinations and outside influences makes it horribly difficult to actually test even for the major mobile options. And the situation about to get a lot more difficult. Predicting the Internet of Things, various analysts are throwing around different figures, but they are all in billions. Gartner last year estimated that there will be <a href="http://www.gartner.com/newsroom/id/2905717">4.9 billion connected ‘things’</a> on the Internet by the end of 2015, and that the number will jump to 20 billion by 2020. ABI Research thinks that the 2020 number will <a href="https://www.abiresearch.com/press/the-internet-of-things-will-drive-wireless-connect/">actually be closer to 40 billion</a> . IDC forecasts that the worldwide market for IoT will be worth around <a href="http://www.forbes.com/sites/gilpress/2014/08/22/internet-of-things-by-the-numbers-market-estimates-and-forecasts/">$7.1 trillion in 2020</a> . Regardless of how many billions you subscribe to, one thing is clear. It may be difficult to keep up with platform fragmentation today, but by 2020 it’s going to be insane. If you think testing on various Android phone versions and resolutions is painful, wait until people start trying to use your software on smart toilet paper, just because there is nothing else around to read.</p> <p>The second factor is cloud hosting crossing the chasm from early adopters to the late majority. I remember reading a theoretical article about utility computing in 2001, announcing that HP will offer data-processing on tap, similar to municipal water or electric grids. Fifteen years ago that sounded more bonkers than being able to travel through time at 88 MPH. Apart from the fact that HP is not really a key player, that article was closer to predicting the future than Robert Zemeckis. Even if the majority of companies won’t shut down their data centres and fire all the sysadmins in the next five years, IDC claims that in 2017 roughly <a href="https://www.idc.com/getdoc.jsp?containerId=prUS25350114">65% of enterprise IT organisations will have some kind of hybrid</a> half-cloud half-on-premise monster. For testing, of course, this creates a challenge. Companies no longer control all the hardware or the data. There are many more assumptions in play. The industry is moving away from expensive kits that rarely break, to virtualised improvised magicked-up systems running on commodity hardware and likely to blow up at any time. This completely changes the risk profile for software architectures and tests.</p> <p>The third big ongoing trend is the push towards the front end. Ten years ago, most companies I worked with were content with making the back-end bullet-proof, and doing just a bit of exploratory testing on the front-end to cover the miniscule risk of something not being connected correctly. With mobile apps, single-page web apps, cross-platform app generators using HTML5 and the general commercial push towards consumer applications, the front-end is no longer a tiny risk. The top part runs most of the logic for many application, which means it carries the majority of risk. The testing pyramid, the corner-stone of test automation strategies in the last 10 years, is getting flipped on its head.</p> <p>Luckily, these new trends do not just bring challenges, they also create amazing new opportunities.</p> <h2 id="changing-the-balance-of-expected-and-unexpected-risks">Changing the balance of expected and unexpected risks</h2> <p>News travels fast on the Internet, especially news about seemingly stupid mistakes. For example, in June this year, it was possible to completely crash a Skype client by sending a message containing the string ‘http://’. In 2013, some clever music lovers found a way to <a href="https://labs.spotify.com/2013/06/18/creative-usernames/">hijack Spotify accounts using Unicode variants of Latin letters</a>. Once such problems are widely reported in the news, claiming that they are unexpected in our software is just irresponsible. Yet almost nobody checks actively for such problems using automated tools. Even worse, <a href="https://www.sans.org/top25-software-errors/">four out of the top 25 security errors</a> are caused by input formats and problematic values, well documented and widely published. Max Wolf compiled a list of <a href="http://github.com/minimaxir/big-list-of-naughty-strings">600 known problematic strings</a> and published it on Github. I wrote the <a href="gojko.github.io/bugmagnet/">BugMagnet Chrome extension</a> that makes typical problems with names, strings and numbers available on right click for any input box. It’s 2015, let’s please stop calling an apostrophe or an umlaut in someone’s name unexpected.</p> <p>Many teams still hunt for these problems only with manual exploratory tests. And there are plenty of good resources out there that help to speed up manual testing, but why not just automate the whole thing and run it frequently, on all the input fields? It’s not that the tools to check for such problems don’t exist – in fact, they are all too easy to find. Security testing groups, both white and black hat, have long had automated tools that grind through thousands of known problems for hours, trying to find an exploit. Many recent high-profile security hacks were actually caused by easily predictable mistakes, which just require time to detect. The real issue is that executing all those checks takes too long to be viable for every single software change, or even every single release in a frequent deployment cycle. Combine that with the increasing fragmentation of tools and platforms, and the push towards the front-end, and the future doesn’t look very bright.</p> <p>Yet, there are services emerging that have great potential to change the balance in our favour. With the abundance of cheap processing resources in the cloud, the time required to run a large-scale mutation test using known problematic values, such as Max Wolf’s list, is dropping significantly. For example, Amazon’s <a href="https://aws.amazon.com/device-farm">AWS Device Farm</a> can run tests in parallel over lots of real devices, reporting aggregated results in minutes. Services such as <a href="https://www.browserstack.com">BrowserStack</a> can allow us to quickly see how a web site looks in in multiple browsers, on multiple operating systems or devices. <a href="https://saucelabs.com">Sauce Labs</a> can run a Selenium test across 500 browser/OS combinations. With the increasing fragmentation of devices, I expect many more of such services to start appearing, offering to execute an automated test across the entire landscape of platforms and devices in a flash.</p> <p>My first prediction for 2020 is this: Combining cloud device farms and browser farms with existing exploratory testing heuristics will lead to tools for quick, economic and efficient input mutation testing across user interfaces. Evaluating all input fields against a list of several thousand known problematic formats or strings won’t be prohibitively expensive any more. This will shift the balance of what we regard as expected or unexpected in software testing. As a result, human testers will get more time to focus on discovering genuinely new issues and hunting for really unexpected things, not just umlauts and misplaced quotes.</p> <p>The other interesting trend emerging in this space is automated layout testing. Component layouts, especially across different resolutions and responsive design needs, are now almost impossible to test automatically. But there is a new set of tools rising that aims to change that. James Shore’s <a href="http://www.letscodejavascript.com/">TDD JavaScript screencast</a> led to the development of <a href="http://github.com/jamesshore/quixote">Quixote</a>, a unit-testing tool for CSS. Quixote makes it relatively easy to check for actual alignment of UI components similar xUnit. <a href="http://galenframework.com">The Galen Framework</a> helps to specify layout expectations and run the tests using a browser farm, such as Browser Stack or Sauce Labs. The technical capability of executing layout tests is here today, but the tools are mostly developer oriented. It’s still not easy for people who care about the layouts the most – the designers – to describe and run such tests themselves. On the other hand, there is a whole host of easy prototyping tools emerging for designers. <a href="http://popapp.in">PopApp</a> allows people to draw sketches on paper and post-its and then create an interactive app prototype. <a href="http://www.invisionapp.com/">InVision</a> aims to make web and mobile wireframing easy, integrating design with team workflows, collaboration and project management. Now imagine the future and all those tools combined.</p> <p>My second prediction for 2020 is this: We’ll see a new set of visual languages for specifying automated tests for layouts and application workflows. Instead of clunky textual descriptions, these tools will use digital wireframes, or even hand drawn pictures, to specify expected page formats, component orientation and alignment, and progression through an application. Teams will be able to move quickly from a whiteboard discussion to an automated tests, enabling true test-driven-development flow of front-end layouts and workflows.</p> <h2 id="assisting-humans-in-making-testing-decisions">Assisting humans in making testing decisions</h2> <p>The combination of cloud and microservice deployments, together with putting more logic into the front-end and the fragmentation of platforms, makes it increasingly difficult to describe all expectations for large-scale processes. Because of that, completely different testing approaches have started to gain popularity. For example, instead of being able to decide what’s right upfront, a new generation of tools helps humans to quickly approve or reject the results of a test execution. Such tools will have a profound effect on making exploratory testing faster and easier.</p> <p>One particularly extreme case of this phenomenon is the upcoming generative space game <a href="http://www.no-mans-sky.com/">No Man’s Sky</a>. The game developers are creating a procedural universe of 18 quintillion worlds, but instead of making them boring and repetitive, each world will be unique. Players will be able to, given several millennia of time, land on each one of those worlds and explore it. Each generated world will be grounded in reality. For example, planets at a specific distance from their suns will have moisture and water. On those planets, the buildings will have doors and windows. Animals will have a bone structure inspired by Earth’s animals. These rules are fed into what the art director Grant Duncan calls the ‘big box of maths’, that then creates variety. Each world is different, but unique. The big box of maths stretches the legs and arms of animals, paints them in different patterns, and so on. With 18 quintillion worlds, how does anyone test this model properly? You can see one world, or two, but each of those single worlds is supposed to be interesting enough for players to explore for years. The solution the developers came up with is pretty much the same as today’s space exploration. They built probes that <a href="http://www.polygon.com/2015/3/3/8140343/no-mans-sky-space-probes-gdc-quintillion-worlds">fly around and take pictures and short videos</a> , and the designers then look at the results to see if things are OK. It’s not perfect, but it speeds up significantly what humans would have to do anyway.</p> <p>Rudimentary tools that help with this approval-style testing have been around for a while. <a href="http://texttest.sourceforge.net/">TextTest</a> allows teams to quickly run a process and then compare log files, text outputs or console dumps to old baseline values. BBC News created <a href="https://bbc-news.github.io/wraith">Wraith</a>, a tool that efficiently creates screenshots of web sites in different environments or over time, compares them and quickly highlights the differences for humans to approve. <a href="https://github.com/xebia/VisualReview">Xebia VisualReview</a> highlights visual differences between screenshots and even provides some basic workflow for accepting or rejecting differences. There are already new cloud-based services emerging in this space. <a href="https://domreactor.com">DomReactor</a> is a cloud-based service that compares layouts across different browsers. <a href="http://applitools.com">Applitools</a> provides screenshot comparison and visual playback, and integrates with Selenium, Appium and Protractor, and even more enterprise-friendly technologies such as QTP and MS Coded UI. And that’s just the start. Over the next few years, we’ll see a lot more of that.</p> <p>My third prediction for 2020 is this: there will be a new set of automated cloud services to run probes through user interfaces, and provide a selection of screenshot or behaviour differences as videos for approvals. The current generation of tools might be a bit clunky to use or configure, they lack nice workflows and require scripting, but a new generation of tools will be able to move around apps and sites smarter and easier, and make smarter decisions on what to offer for approvals.</p> <h2 id="dealing-with-things-that-are-impossible-to-predict">Dealing with things that are impossible to predict</h2> <p>Highly complex systems often suffer from the butterfly effect of small changes. I still remember a panic day of troubleshooting about five years ago, when a seemingly simple change to a database view caused a chain reaction through a set of network services. A user with more than 20000 Facebook friends tried to log in, the system attempted to capture the network of relationships, but the new view didn’t work well for that particular case. The database server decided to run a full table scan instead of using an index, clogged the database pipe and caused the page to seem unresponsive. The user refreshed the page a few times, taking out all the database connections from the login service connection pool. That caused other things to start failing in random ways. It’s theoretically possible to test for such things with automated tools upfront, but it’s just not economically viable in most cases. And not just in the software industry.</p> <p>In 2011, Mary Poppendieck wrote the fantastic <a href="http://www.leanessays.com/2011/01/tale-of-two-terminals.html">Tale of Two Terminals</a>, comparing the launch of two airport terminals – Terminal 5 at Heathrow and Terminal 3 in Beijing. Despite months of preparations, the UK terminal ended up in chaos on the first day, having to cancel dozens of flights. During the first week, a <a href="http://news.bbc.co.uk/2/hi/uk_news/7322453.stm">backlog of 15000 bags piled up</a>, that ended up being shipped to a completely different airport for sorting. The Beijing terminal, however, opened without a glitch. This is because the Chinese authorities organised several drills before the launch, the final one including <a href="http://news.xinhuanet.com/english/2008-02/23/content_7654965.htm">8000 fake passengers</a> trying to check in into 146 flights, requiring 7000 pieces of luggage to be processed during a three hour exercise. Now, of course, the cynics were quick to say that the Chinese government can do this because it doesn’t cost them anything to use their army for such experiments, and that the cost of running an equivalent drill would be prohibitively expensive in the UK. Yet the <a href="http://www.publications.parliament.uk/pa/cm200708/cmselect/cmtran/543/543.pdf">UK parliamentary enquiry</a> revealed that the owners of T5 engaged 15,000 volunteers in 66 trials prior to the opening of the terminal. But they weren’t monitoring the right things. Similarly, most software stress tests and load tests today involve predictable, deterministic and repeatable scripts. Although such tests don’t necessarily reflect real world usage, and may not trigger the same bottlenecks as thousands of people who are trying to achieve different things at the same time, designing and coordinating more realistic automated tests just costs too much.</p> <p>The most common solution today is to gradually release to production. For example, Facebook first <a href="http://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/">exposes features to a small number of random users</a> to evaluate if everything is going smoothly, then gradually extends the availability and monitors the performance. Lots of smaller organisations rely on services such as Google Analytics with A/B deployments to evaluate trends and figure out whether something bad unexpected happened.</p> <p>Theoretically, crowdsourcing should enable us to reduce the cost of such tests before production, and engage real humans to behave in unexpected ways. The reach of the Internet is far and wide, and there are lots of idle people out there who can trade a bit of their time for peanuts. But coordinating those people is a challenge. Amazon started offering the Mechanical Turk computer interface to <a href="http://www.mturk.com">humans performing micro-tasks</a> almost a decade ago. And there are some niche testing crowd-sourcing services already emerging. For example, <a href="http://www.usertesting.com">UserTesting</a> enables scheduling easy hallway-type usability testing, recording videos and comments during testing sessions. But crowd-sourced testing hasn’t really taken off for the same reason as the T5 launch failed. It’s difficult to look for the right things. Or more precisely, it’s difficult to process thousands of test reports and conclude anything useful. There is just too much data, and the signal-to-noise ratio isn’t that good. Application state often depends on things that are difficult to replicate, and that means that confusing test reports would just take too much of our time to consume.</p> <p>Another emerging trend might turn the situation in our favour. Crash reports, made ubiquitous with mobile apps, have now become the norm for desktop applications and web sites as well. Things will go wrong, so when they do, it’s important that people in charge quickly know about it. Even more, it’s crucial to be able to sort out relevant information from accidental flukes. Problems can happen in end-users’ systems for a variety of reasons, from network glitches, over unrelated third party software, to malice and stupidity. And the more popular an application, the riskier it is to assume everything will be OK, but the more difficult it is to actually separate signal from noise with crash analytics. As an industry, we’ve learned to collect historical user interaction, network events, and a lot of other inputs to help with crash analytics over the last few years. And that propagated back into testing. In the book <a href="http://amzn.to/1WUhzRW">How Google Tests Software</a> , Whittaker and colleagues talk about BITE – Browser Integrated Test Environment – a browser extension that collects a ton of telemetry and records all user interactions to make it easy for developers to act on a bug report. This tool was originally developed for Google Maps, where the application state depends on an ever-changing dataset and a ton of user actions such as pinching, sliding and zooming. Too many variables to control easily. BITE was actually <a href="https://code.google.com/p/bite-project/">open sourced and out there for a while</a>, but the public version is now deprecated an no longer maintained. But combining something like that with the Mechanical Turk could make crowd-sourcing testing easy to consume. A whole new set of tools and services is emerging to provide operational awareness and help with trend reporting. Two nice examples are <a href="http://hotjar.com">HotJar</a>, an analytics service that combines trend analysis, heatmaps and user feedback, and <a href="https://www.trackjs.com">Track.JS</a> , a cloud-based error aggregation and reporting system for JavaScript that collects a ton of data to help with root-cause analysis.</p> <p>My fourth prediction for 2020 is this: a new class of services will combine crowd-sourced coordination with powerful telemetry, analytics and visual session comparisons, to enable testing for behaviour changes and detecting unexpected problems. Such services will enable us to request an army of users to poke around, then provide a good noise-to-signal filter, to support quick session review and decision making. The new services will also record a ton of useful information about individual crowd-sourced sessions to help with analysis and reproducing state. These new services will make it cheap to schedule sessions with real humans, real devices, at statistically significant volumes, that are easy to control and coordinate. Imagine the combination of Mechanical Turk, Applitools and HotJar, recording user interactions and network traffic, and everything else you need to reproduce any particular testing session quickly.</p> <p>Some crowd-sourcing services will no doubt claim that they have real testers on stand-by somewhere half-way around the world, for a fraction of the price, but commoditising testers is not the real value of my premise. Bleak results with offshore testing have hopefully already shown that this is a false economy to most companies by now. I’d really love to see value-added services, that will allow a small number of expert testers to coordinate and direct large crowds and conduct experiments. Think about instant focus groups, or smoke testing as a service. A tool will schedule and coordinate this for you, and you just get the results back in 30 minutes. And it will be cheap enough so you can run it multiple times per day.</p> <h2 id="test-automation-with-artificial-intelligence">Test automation with artificial intelligence</h2> <p>At the moment, most test automation is relatively unintelligent. Automation makes things faster, but humans need to decide what to test. Yet, machines have gotten a lot smarter. Using statistics to predict trends and optimise workflows has been around for at least a hundred years, but it’s really taken off in the last few years. In 2012, big data made big news when the US retail chain Target apparently <a href="http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/">guessed that a teenage girl in Minnesota is pregnant even before her family knew</a>. Models, tools and skills are rapidly evolving in this area.</p> <p>Even if you’re not a government funded nuclear research institute, cheap cloud processing power and opensource tools will make machine learning and big data analytics accessible. Google engineers recently opensourced <a href="http://tensorflow.org/">TensorFlow </a>, a library developed to conduct machine learning and deep neural network research for the Google Brain. Microsoft recently made its <a href="http://www.winbeta.org/news/microsoft-makes-distributed-machine-learning-toolkit-open-source">Distributed Machine Learning Toolkit opennsource</a> . Such systems are re-shaping how YouTube offers the next video to play, how Netflix recommends titles on the homepage, and how Amazon offers related items to buy. Combined with the ton of analytic data collected by application telemetry today, this could be a powerful source of insight for testing. Combine production analytics, version control changes and bug reports and let a machine learning system loose on it. It may not be able to explain why a problem is likely to happen somewhere, but it should be pretty good at guessing where to look for issues.</p> <p>My fifth prediction for 2020 is this: machine learning and AI tools will emerge to direct exploratory testing. Imagine adding a button on a page, and a helpful AI proposing that you should check how it impacts an obscure back-office report implemented five years ago. Or even better, add crowd-sourcing and coordination services to the equation, and the AI offering to automatically schedule a usability test for a particular flow through the site. Alternatively, combine machine learning conclusions with cloud-based mutation tests, to narrow down the area for automated mutation testing. Even better, machine learning could be used to predict new problematic values to test and add for mutation experiments. Imagine changing a piece of middleware, and pushing the source code up to the version control system. As part of the CI build, an AI model could come up with a hypothesis that ‘http:’ without the rest of the URL could crash an app, run a data-grid mutation test to prove it, and report back two minutes later similar to how unit test results report today. Wouldn’t that be powerful? And please, if you ever decide to create an AI proposing exploratory tests, just for the sake of good old times, make it look like Clippy.</p> Mon, 16 Nov 2015 00:00:00 +0100 https://gojko.net/2015/11/16/automated-testing-future/ https://gojko.net/2015/11/16/automated-testing-future/ favourites testing Turning continuous delivery into a business advantage <p>Here’s a video of my talk on Turning continuous delivery into a business advantage, from Oredev 2015:</p> <iframe src="https://player.vimeo.com/video/144802179" width="500" height="281" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe> Fri, 06 Nov 2015 00:00:00 +0100 https://gojko.net/2015/11/06/continuous-delivery-business-advantage/ https://gojko.net/2015/11/06/continuous-delivery-business-advantage/ presentations agile software-profession Avoiding the most common pitfall of large-scale agile <p>From the time of ancient Greeks who marvelled at Hercules, to today’s Hollywood fans who marvel at, I guess, Marvel’s superheroes, everyone is inspired and captivated by tales of laborious tasks against difficult odds. Simply put, great effort makes great stories. And in many of those stories, the greatness of the effort is actually more important than the outcome. Thirty years after Rocky Balboa first ran up the flight of stairs below the Philadelphia Museum of Art, millions of people still play that scene in their heads for motivation. It doesn’t matter that Rocky actually lost the fight at the end of the first film. The eagle that brought Gandalf to Rivendell could have easily air-lifted Frodo to Mordor, but then three glorious books would never see the light of day, and instead we’d get a short uninteresting pamphlet. Perhaps someone from the New Zealand tourist office bribed the birds to conveniently disappear at the start of the Fellowship, or they just worked for Lufthansa and that was their usual time of year to take industrial action.</p> <p>With such a cultural obsession about effort, it’s no wonder that the software industry is also enchanted by it. Late-night heroism earns respect, and nobody asks if the heroes were the ones that designed the server code to crash so often in the first place. Products delivered with crazy working hours earn promotions for managers, without anyone even asking if those same managers committed to a complete fantasy. This obsession with great effort is most dangerous when it affects groups of teams. It can easily cause months of people’s lives to go down the drain with nothing to show for it. And for any organisation thinking about scaling agile, avoiding this mass-hypnosis is crucial for actually doing something better, rather than just doing it differently.</p> <p>Let me digress for a moment, because I don’t want to build my case of magical characters such as Saruman, Sauron or Stallone. I want to tell you a story that you probably already know, but bear with me for the moment, so we can examine it from a slightly unusual perspective.</p> <p>Seventy-one years ago, more than 600 courageous airmen risked life and limb to dig not one, but three tunnels under Stalag Luft III, the infamous Nazi prison camp. The Nazis specifically designed the camp to prevent tunnelling. They built it on sandy ground, which was likely to cave in. They placed seismographs to detect digging. They raised sleeping barracks about half a meter above the ground, and placed them far from the fences. To avoid detection, the escapees had to dig human-wide pipes nine meters below ground, through a hundred meters of sand. Tunnels so long and so deep require a lot of supporting infrastructure, or an escape route can quickly turn into a mass grave. Unfortunately, Amazon Prime didn’t exist back then, so the clandestine Bob-the-Builder crew to had steal wires and building materials from the prison camp infrastructure. That kind of inventive supply-chain management is risky even during peaceful times, but back then discovery must have meant a certain death. Three times hundred meters of human-wide pipes is a lot of dirt to hide, so two hundred people took almost 25,000 trips up and down, carrying sand in their trousers and scattering it inconspicuously. Finally, remember that this wasn’t a movie, so none of the prisoners had the lung capacity of Tim Robbins from the Shawshank Redemption. Bob Nelson, the Leader of 37 Squadron, earned a place in engineering history for inventing an air pump system built out of bed pieces, hockey sticks and knapsacks. On March 24th 1944, a group of 76 prisoners escaped through one of the tunnels, nicknamed Harry (yes, of course, the other two tunnels were called Dick and Tom). This is the stuff legends are made from, so no wonder the whole story was immortalised by Hollywood as The Great Escape, in the only way that was reasonable in the sixties — in DeLuxe colour starring the indestructible Steve McQueen. The movie brought in millions in the box office (which was a huge thing in 1963), it still has a 93% fresh score on Rotten Tomatoes (which is a huge thing in 2015), and it left a lasting impression on modern culture. This, of course, includes the animated children’s version The Chicken Run, which itself earned 250 million dollars.</p> <p>The sheer scale of this effort completely overshadowed another concurrent action in the same prison camp. Faced with the same constraints, and the same external parameters, a smaller group of prisoners came up with an alternative plan. They built a gymnastics vaulting horse out of plywood and repurposed Red-Cross parcels. Under the guise of exercising, they placed the wooden horse close to the perimeter fence every day and dug a tunnel below. They worked in shifts of one or two at a time, digging only using food bowls. At the end of each day, the prisoners placed a board on top of the tunnel entrance and masked it with surface dirt, using the hollow wooden horse to take away the dirt. Because the tunnel started close to the fence, it didn’t have to be very long. The fake exercising also provided the cover for seismographs in the area, so the tunnel did not have to be very deep. This allowed them to just poke holes through the surface for fresh air. At the end, the tunnel was roughly 30 meters long. Though no small feat to achieve with bowls, it was just one third of the length of a single tunnel for The Great Escape. The plan did not require anyone to steal materials from the prison. There was no need for inventive ventilation engineering. It took only three months to build, compared to almost a year for Tom, Dick and Harry. To the best of my knowledge, the tunnel didn’t even have a nickname. Only three people escaped through it, so no wonder the whole enterprise isn’t that well known. Someone made a film about it, of course, because WWII scripts were a license to print money back in the sixties, similar to superhero movies today. With a feeble plot and such a small effort, Steve McQueen wasn’t even on the cards, and the movie didn’t come close to the cult status of The Great Escape. This is why, I guess, Tolkien didn’t consider my avian solution.</p> <p>The Wooden Horse plot was a tiny undertaking, allowing only three people to escape, but all three of them reached freedom. The Great Escape engaged hundreds of workers for a year and allowed seventy six of them to escape. With so many variables in play, something just had to go wrong. The tunnel exit was too close to the fence, so the escapees were spotted by the guards on their way out. Most were caught the next day and returned to the camp. The majority of them, around fifty, were executed. At the end, only three out of the seventy six reached safety. Counting people who actually escaped, the outcome of both these attempts was the same. Counting the cost, both in effort and human life, the Wooden Horse is a clear winner. Yet our society celebrates and glorifies the second one, which is even today known as The Great. Somehow, this feels as the completely wrong way to measure greatness.</p> <p>Unfortunately, it’s scaling effort, not outcome, that makes good stories. And that’s a reality that can’t be ignored. ‘Managed a 10MM project involving 300 people on two continents’ makes a resume stand out, regardless of whether the client got squat for all that money. That kind of scaling scares me quite a lot, and if you’re working in an organisation that is thinking about rolling out some variant of large-scale agile at the moment, it should scare you too. In most of the discussions on scaling agile at conferences and in books, the word ‘scaling’ just means doing things with more effort: distributing work to more locations, engaging larger groups, involving more teams. There’s a naive assumption that more effort brings better outcomes, but that is rarely the case. And the reason why people make that assumption turns out to be quite important for why it’s so often wrongly taken for granted.</p> <p>On a small scale, effort does boost outcome. If a single person puts in two hours of work, they are likely to get two times more work done than in just one hour. If a team puts in hard work over a few weekends, they might hit a critical deadline. On a larger scale, effort no longer directly relates to results. With dozens or hundreds of people, and months of available time, Parkinson’s law kicks in. The small-scale visible impact of effort, and the illusion that it brings, makes people delusional. The poster-child for this in software delivery is the FBI Sentinel project. For two years, the Sentinel programme managers had bi-monthly meetings with senior FBI stakeholders, showing status charts and a project thermometer, which unsurprisingly always showed yellow trending towards green. In October 2009, the main contractor missed its deadline for delivery of Phase 2, and all the critical problems ‘came crashing down on the project’. An independent audit concluded that there were more than ’10,000 inefficiencies’. The project management approach, as it turns out, focused solely on tracking activity, leading to more than 450 million USD being spent before anyone noticed that something isn’t quite right.</p> <p>Because more effort works on a small scale, it allows people to feel a progression of small wins, and that blinds them to the overall mess. Organisations become like Batman, who defeats bad guys in each of the hundreds of graphic novels and dozens of films, but doesn’t realise that his strategy never even comes close to solving the overall problem. Investing some of Bruce Wayne’s apparently unlimited cash in social services and city infrastructure would surely have a much larger effect on Gotham than all that crime fighting. Then again, great effort makes good stories, and philanthropy is a lot less interesting than going around and punching individual villains in the face.</p> <p>So here we are in 2015, in an industry that mostly equates effort to progress. Novice Scrum teams seem to be particularly obsessed by velocity and story-points, and sell them to gullible stakeholders as indicators of value. Unfortunately, both of those metrics just show the effort spent. To put it plainly, they show money flushed down the toilet. And the easiest way to scale that is to buy more toilet seats. Doug Hubbard noted a long time ago that organisations prefer metrics that are easy to measure, without even considering if they that are important or not. Outcomes are viciously problematic to nail down, even outside software. They come on a delayed feedback cycle and depend on too many external factors. Plus it’s very difficult to agree on what actually to measure. That’s why large companies have highly creative earning/profit accounting schemes. On the other hand, effort can be defined and measured cheaply. It can easily be added across teams and time. Even more importantly, it be easily multiplied with money.</p> <p>Assuming that the easiest thing people will try to do is to increase effort, how do we enable that to relate the outcomes on something that doesn’t taper off? There are plenty of theoretical biology quotes out there attributed to Fred Brooks and Warren Buffet on the combination one child, nine months and multiple mothers. They all come down to the critical factor of inter-team dependencies. On the far-end of that scale, if teams are perfectly independent, each one can run their own show in parallel. People working on different products just don’t have to wait for each-other. Though better outcomes are not guaranteed, the organisations as a whole at least get a fighting chance to achieve more. But there are plenty of ambitious software products that can’t be written by a single team in the timelines of technology half-life today, so completely getting rid of dependencies often isn’t possible. But we should at least try to minimise them. You don’t need a smart Fred Brooks quote to tell you that — anyone could come up with that in their sleep. The problem is that there are multiple ways to minimise dependencies, and our cultural obsession with effort often makes organisations choose the wrong one.</p> <p>To over-simplify the situation, let’s agree for a moment that there are at least two types of inter-team dependencies. I’m sure you can think of more, but these two will be enough for now. One limits how much work can start independently of other teams. The other type limits how much work can be finished independently. And they are often not even closely related. The first group of dependencies is mostly solved by management action— reorganising where people sit, who they report to, and what they work on. Hiring a few more people is often an easy way to unlock those dependencies, and as an added bonus it increases the amount of effort people can put it. The second group of dependencies mostly requires technical solutions, and it’s a lot more difficult to act on or communicate. At some point, it becomes incredibly difficult to explain why the entire organisation has just one production-like testing environment, but everyone takes it for granted that installing a second one would cause the universe to end in a singularity event. It’s just easier to hire more people than to deal with that risk.</p> <p>Because idle time is worse than heresy in most organisations, the first type of dependencies typically gets all the attention. A while ago most organisations were dividing work based on technical areas of expertise. All the database developers could sit together, and have a single line manager who can then adequately evaluate and reject their holiday requests. That made management easy, but it made starting to work difficult — analysts were too busy dealing with other projects, and most days testers could easily turn off the QA environment and go out for a long lunch. Cross-functional teams solved that problem, so people can be busy all the time. Five guys can safely be imprisoned in the basement to develop an obscure C++ API and forgotten about for a few months. Everyone knows they wouldn’t use a shower even if it was available. Timmy from the third floor can be the only person maintaining the all-important interest rate calculator, and guarantee his mortgage payments for at least a few more years. They don’t compete for the same management resources, client time, or anything else that would prevent them from starting to work. Teams can and will move at different speeds, even when working on the same customer deliverable. Problem solved!</p> <p>Well, perhaps not… Remember the second type of dependencies, those that prevent work being actually finished? They get overlooked. People can start working in parallel, but they often can’t deliver independently. Timmy can change his calculator, but without the server API, the clients cannot use it. When teams depend on each other to actually ship stuff, the slowest team determines the speed of the entire pipeline. That pretty much guarantees that the majority of people will either sit idle for most of the time (heresy again) or they will be pushed to start some new work. Product managers then have to run several parallel streams of work, so it’s easy to keep them busy as well. Parallel work makes the organisation accumulate a ton of almost-finished stuff, that isn’t exactly ready to go live, so there are even more dependencies to manage. Slow teams suffer from ever changing dependencies, and have to rework things many times before releasing. Faster teams often have to go back and redo things that were supposed to be finished, but it turns out were not 100% complete. Effort breeds more effort. This is the software equivalent of building deep fragile tunnels through sand. But it’s all incredibly efficient, and allows organisations to keep many people busy at the same time, spending a great deal of effort. And we’ve already established that great effort makes great stories. Managing The Last War of the Ring is a lot better for career progression than booking an air-eagle ticket.</p> <p>I strongly believe that starting from the other side of dependencies is a lot more effective long-term. Imagine for a moment seven teams, all interdependent and with all the possible excuses why they couldn’t release software on their own. Their QA platform is built from unobtainium. They have legacy software components that are difficult to package and install automatically. The last person who knew how to configure the database died in Stalag Luft III. The architecture is based on Stanley Kubrick’s black brick from 2001. But let’s for a moment consider that these are problem statements, not conclusions. Instead of scaling the process so that all seven teams can start working on production-bound software, they choose people for only one team that can ship end-to-end. That team gets exclusive access to the priceless QA environment. They get the complete product management attention. And they start running. Yes, I know, they can’t slay the whole big bad wolf on their own fast enough, but they can at least release stuff without waiting for anyone else, so the customer feedback loop get closed. Instead of working on customer features, the other teams form so they can start addressing some of those reasons that keep independent shipping impossible. They investigate configuration management and automate it. They set up another QA environment, and the universe does not end. They automate critical tests. Some components get pulled out of the monolith architecture, and get ready for independent deployment. As the entanglement around shipping parts separately recedes, more teams can be reformed to join and work directly on client software. In a few months, all those unsurmountable problems will be gone. And during that time, the single team that was actually shipping will probably achieve more than the entire previous group anyway, just because they didn’t have to wait for anyone to deploy.</p> <p>If deployment dependencies can get reduced, the excuses for large bureaucratic processes, release trains, QA cycles and all those fantastic effort-generators on just disappear. Oddly enough, when those dependencies are addressed, the work-starting dependencies seem to magically vaporise as well. Companies can easily find ways to keep people busy. So the one thing to remember from all this, if you’re thinking of restructuring the process in a multi-team environment, is to deal with the far end of the cycle first. It’s not intuitive, but it’s highly effective. At the end of the day, the effort the whole group can put in is always the same — it’s determined by the number of people and the available time. We should really try to improve our processes on the scale of outcomes.</p> Thu, 10 Sep 2015 00:00:00 +0200 https://gojko.net/2015/09/10/avoiding-pitfall-large-scale-agile/ https://gojko.net/2015/09/10/avoiding-pitfall-large-scale-agile/ favourites agile software-profession To improve testing, snoop on the competition <blockquote> <p>A more polished version of this article is in my book <a href="//books.gojko.net/fifty-quick-ideas-to-improve-your-tests/">Fifty Quick Ideas To Improve Your Tests</a></p> </blockquote> <p>As a general rule, teams focus the majority of testing activities on their zone of control, on the modules they develop, or the software that they are directly delivering. But it’s just as irresponsible not to consider competition when planning testing as it is in the management of product development in general, whether the field is software or consumer electronics.</p> <p>Software products that are unique are very rare, and it’s likely that someone else is working on something similar to the product or project that you are involved with at the moment. Although the products might be built using different technical platforms and address different segments, key usage scenarios probably translate well across teams and products, as do the key risks and major things that can go wrong.</p> <p>When planning your testing activities, look at the competition for inspiration – the cheapest mistakes to fix are the ones already made by other people. Although it might seem logical that people won’t openly disclose information about their mistakes, it’s actually quite easy to get this data if you know where to look.</p> <p>Teams working in regulated industries typically have to submit detailed reports on problems caught by users in the field. Such reports are kept by the regulators and can typically be accessed in their archives. Past regulatory reports are a priceless treasure trove of information on what typically goes wrong, especially because of the huge financial and reputation impact of incidents that are escalated to such a level.</p> <p>For teams that do not work in regulated environments, similar sources of data could be news websites or even social media networks. Users today are quite vocal when they encounter problems, and a quick search for competing products on Facebook or Twitter might uncover quite a few interesting testing ideas.</p> <p>Lastly, most companies today operate free online support forums for their customers. If your competitors have a publicly available bug tracking system or a discussion forum for customers, sign up and monitor it. Look for categories of problems that people typically inquire about and try to translate them to your product, to get more testing ideas.</p> <p>For high-profile incidents that have happened to your competitors, especially ones in regulated industries, it’s often useful to conduct a fake post-mortem. Imagine that a similar problem was caught by users of your product in the field and reported to the news. Try to come up with a plausible excuse for how it might have happened, and hold a fake retrospective about what went wrong and why such a problem would be allowed to escape undetected. This can help to significantly tighten up testing activities.</p> <h2>Key benefits</h2> <p>Investigating competing products and their problems is a cheap way of getting additional testing ideas, not about theoretical risks that might happen, but about things that actually happened to someone else in the same market segment. This is incredibly useful for teams working on a new piece of software or an unfamiliar part of the business domain, when they can’t rely on their own historical data for inspiration.</p> <p>Running a fake post-mortem can help to discover blind spots and potential process improvements, both in software testing and in support activities. High-profile problems often surface because information falls through the cracks in an organisation, or people do not have sufficiently powerful tools to inspect and observe the software in use. Thinking about a problem that happened to someone else and translating it to your situation can help establish checks and make the system more supportable, so that problems do not escalate to that level. Such activities also communicate potential risks to a larger group of people, so developers can be more aware of similar risks when they design the system, and testers can get additional testing ideas to check.</p> <p>The post-mortem suggestions, especially around improving the support procedures or observability, help the organisation to handle ‘black swans’ – unexpected and unknown incidents that won’t be prevented by any kind of regression testing. We can’t know upfront what those risks are (otherwise they wouldn’t be unexpected), but we can train the organisation to react faster and better to such incidents. This is akin to government disaster relief organisations holding simulations of floods and earthquakes to discover facilitation and coordination problems. It’s much cheaper and less risky to discover things like this in a safe simulated environment than learn about organisational cracks when the disaster actually happens.</p> <h2>How to make it work</h2> <p>When investigating support forums, look for patterns and categories rather than individual problems. Due to different implementations and technology choices, it’s unlikely that third-party product issues will directly translate to your situation, but problem trends or areas of influence will probably be similar.</p> <p>One particularly useful trick is to look at the root cause analyses in the reports, and try to identify similar categories of problems in your software that could be caused by the same root causes.</p> Thu, 23 Apr 2015 00:00:00 +0200 https://gojko.net/2015/04/23/to-improve-testing-snoop-on-the-competition/ https://gojko.net/2015/04/23/to-improve-testing-snoop-on-the-competition/ testing Explore capabilities, not features <blockquote> <p>A more polished version of this article is in my book <a href="//books.gojko.net/fifty-quick-ideas-to-improve-your-tests/">Fifty Quick Ideas To Improve Your Tests</a></p> </blockquote> <p>Exploratory testing requires a clear mission. The mission statement provides focus and enables teams to triage what is important and what is out of scope. A clear mission prevents exploratory testing sessions turning into unstructured playing with the system. As software features are implemented, and user stories get ready for exploratory testing, it’s only logical to set the mission for exploratory testing sessions around new stories or changed features. Although it might sound counter-intuitive, story oriented missions lead to tunnel-vision and prevent teams from getting the most out of their testing sessions.<!--more--></p> <p>Stories and features are a solid starting point for coming up with good deterministic checks. However, they aren’t so good for exploratory testing missions. When exploratory testing is focused on a feature, or a set of changes delivered by a user story, people end up evaluating if the feature works, and rarely stray off the path. In a sense, teams end up proving what they expect to see. However, exploratory testing is most powerful when it deals with unexpected and unknown. For that, we need to allow tangential observations and insights, and design new tests around unexpected discoveries. To achieve that, the mission teams set for exploratory testing can’t be focused purely on features.</p> <p>Good exploratory testing deals with unexpected risks, and for that we need to look beyond the current piece of work. On the other hand, we can’t cast the net too widely, because testing will lack focus. A good perspective to investigate, that balances wider scope and still provides focus, is around user capabilities. Features provide capabilities to users to do something useful, or take away user capabilities to do something dangerous or damaging. A good way to look for unexpected risks is to avoid exploring features, but explore related capabilities instead.</p> <h2 id="key-benefits">Key benefits</h2> <p>Focusing exploratory testing on capabilities instead of features leads to better insights and prevents tunnel vision.</p> <p>A nice example of that is the contact form we built for MindMup last year. The related software feature was sending support requests when users fill in the form. We could have explored that feature using multiple vectors, such as field content length, e-mail formats, international character sets in the name or the message, but ultimately this would only focus on proving that the form works. Casting the net a bit wider, we identified two capabilities related to the contact form. People should be able to contact us for support easily in case of trouble. We should be able to support them easily, and solve their problems. Likewise, there is a capability we wanted to prevent. Nobody should be able to block or break the contact channels for other users through intentional or unintentional misuse. We set those capabilities as the mission of our exploratory testing session, and that led us to look at the accessibility of the contact form in case of trouble, and the ease of reporting typical problem scenarios. We discovered two critically important insights.</p> <p>The first one is that a major cause of trouble would not be covered by the initial solution. Flaky and unreliable network access was responsible for a lot of incoming support requests. But when the internet connection for users goes down randomly, even though the form is filled correctly, the browser might fail to connect to our servers. If someone suddenly goes completely offline, the contact form won’t actually help at all. People might fill in the form, but lack of reliable network access will still disrupt their capability to contact us. The same goes for our servers suddenly dropping offline. None of those situations should happen in an ideal world, but when they do, that’s when users actually need support. So the feature was implemented correctly, but there was still a big capability risk. This led us to offer an alternative contact channel when the network is not accessible. We displayed the contact e-mail prominently on the form, and also repeated it in the error message if the form submission failed.</p> <p>The second big insight was that people might be able to contact us, but without knowing the internals of the application, they wouldn’t be able to provide information for troubleshooting in case of data corruption or software bugs. That would pretty much leave us in the dark, and disrupt our capability to provide support. As a result, we decided to not even ask for common troubleshooting info (such as browser and operating system version), but instead read it and send automatically in the background. We also pulled out the last 1000 events that happened in the user interface, and sent them automatically with the support request, so that we could replay and investigate what exactly happened.</p> <h2 id="how-to-make-it-work">How to make it work</h2> <p>To get to good capabilities for exploring, brainstorm what a feature allows users to do, or what it prevents them from doing. When exploring user stories, try to focus on the user value part (‘In order to…’) rather than the feature description (‘I want …’).</p> <p>If you use impact maps for planning work, the third level of the map (actor impacts) are a good starting point for discussing capabilities. Impacts will typically be changes to a capability. If you use user story maps, then the top-level item in the user story map spine related to the current user story is a nice starting point for discussion.</p> Thu, 12 Mar 2015 00:00:00 +0100 https://gojko.net/2015/03/12/explore-capabilities-not-features/ https://gojko.net/2015/03/12/explore-capabilities-not-features/ testing