At the agile testing user group meeting yesterday in London, Hemal Kuntawala talked about development strategies and process improvements at uSwitch.com over the last year or so. The start of his experience report reminded me very much of several teams I’ve seen over the last few years, with a process that is seemingly agile with lots of good ideas but also a ton of problems. Judging by the reactions from the audience the situation is very common in development shops today, so his success story is sure to be very encouraging to lots of people out there.
What was really interesting for me in this report is that the change was driven by the demand for better quality. Hemal joined uSwitch as a QTP automation specialist a year ago, when the process in place had quite a few agile elements. Developers were doing TDD, continuous integration was in place, they were doing three weeks sprints and had a Scrum wall. Developers worked in small teams with testers on the team and “never more than two or three seats away from the business”. Two days at the end of every iteration were left for cleanup and testing. But they still had lots of quality problems, which led to a company-wide effort to focus on quality. Out of that, little by little, they fundamentally changed the development process and how they think about software development.
Fat-er wall
Kuntawala said that although developers and testers were working closely together, there was still an invisible wall between them and developers. Developers would develop and testers would test - throwing features over that invisible wall to each-other. As there was one tester to three developers, testers quickly became a bottleneck. “QA and development were ping-ponging each-other. When the time is up, deadlines have to be met and the code gets shipped regardless of quality”, said Kuntawala. This phased approach, albeit done in short iterations, still did not allow them to get the quality they expected. So they started a company-wide initiative to focus on quality and get everyone to think about improving quality.
The first thing that they did is remove the testing bottleneck and get rid of the imaginary barrier between developers and testers (Kuntawala called it the Fat-er wall, anagram of waterfall). Developers were asked to start thinking about testing and they were educated on how to test and learned the theory behind testing. Likewise, testers got educated about development and started to develop and look at the system underneath the user interface. “You get a better perspective of what needs to be tested in the right place”, said Kuntawala. This all resulted in code that was more testable and much better tested. Instead of a massive number of QTP tests, which were poorly documented and used only by testers due to licensing costs, they switched to Selenium for UI testing.
The tester role was dropped completely. Instead of talking about testing, they just called everything development. Testers got their job roles changed to reflect this. One of the developers in the audience said that dropping the testing phase made people work much more carefully because they knew that “nobody is going to look at this after they commit and it will go to production”.
Sorting out deployment
At this point, developers were really releasing to the source code control system. Once the code was in the release branch, they considered their work done. The operations team would take the code from there and try to release it to production. This process was error-prone and it was their next area of focus. The operations team already had some tests to verify the deployment but they were unreliable (often timing out). After doing a few deployments together with the operations team, the developers started writing better tests to verify post-deployment status of the application. These tests are written with Selenium and operations team used them to quickly verify newly deployed code. Developers also make sure that they pass them before handing over to operations. Resolving deployment issues and timeouts on tests and working together to investigate these things resulted in much shorter time to run tests and being able to verify deployments better. It’s also worth noting that they deploy to half of the server farm, verify that it is working correctly, and then deploy to the other half. This allows them to quickly roll back and deploy without downtime.
Moving from tasks to stories
The next part of the process they optimised is being able to deliver smaller chunks of software. They previously used technical tasks to plan and divide work. Technical tasks made it hard to work out a specific acceptance criteria for each task. This also caused delays in deployment. “Tasks depend on each other so developers were reluctant to deploy it until everything is done”, said Kuntawala. So they moved to working on user stories instead of tasks. Stories reflect user value, they are individually deployable and it is easy to get a good definition of when a story is done. As part of the change, they started using acceptance tests as a guide for development and introduced specification workshops to get better involvement from the business side. In order to help people write better acceptance tests, they developed cheat-sheets, taking ideas from Jerry Weinberg and James Whittaker (Does it do what we want it to do? does it do what we don’t want it to do? think about where bugs can lurk: input, state, environment and data). Selenium tests were replaced with Cucumber and WatiR.
Avoiding mass sign-off and testing
Once the deliverables were broken down into stories and it was possible to ship them out sooner, the cycle at the end of the iteration where features get signed-off before a release became a bottleneck and was seen as an unnecessary delay. Instead of running the tests at the end, they started running tests immediately. Instead of one big demonstration at the end, they started demonstrating new features to the business and getting them signed off as soon as they were done. Finally, they started deploying these things on demand. To help with this, they started continuous monitoring on production. They chose Tealeaf for user experience monitoring, such as tracking error rates and usage of new features. This added visibility provides a safety-net against unnoticed deployment problems.
Restructuring the scrum wall
After all this, they restructured the scrum wall to reflect their new process, moving in the Kanban direction. It now has four columns: Ready, In Progress, Inventory and Done. A story becomes ready when it is added into the release plan. It is In Progress during development (including testing) and goes to Inventory when it is ready for deployment. It goes to Done once it is in production, making money. What’s really interesting about this is a limit of three stories in Inventory - in order to move things there, you first have to deploy what is already in the Inventory, reducing the difference between development and production systems. The limit on Done is 9, meaning that once 9 things are deployed, they do the retrospective. Instead of time-based iterations, they effectively have work-based iterations.
The result
The expected turnaround time for a feature a year ago was six to nine weeks. Today, the average turnaround time from requirement to production is just four days. And things are coming out without problems - apparently there hasn’t been a serious issue in production for over six months. One of the development managers from uSwitch was in the audience and said that “quality has increased substantially and our conversion rates have grown”. “We now see work hit production a lot sooner. It might seem we’re working faster but we’re doing things that are smaller and smaller”, commented one of Kuntawala’s colleagues during the discussion.
As some interesting quotes from the team, Kuntawala said that a colleague from the operations team started asking “is there a better way to test this”, that a business colleague asked “how are we going to test this” and that the business is telling them “we can’t keep up”. The company no longer thinks about development in terms of projects and iterations, but in terms of business value.
A very interesting observation during the discussion after Kuntawala’s presentation was that these changes were not driven by a big management decision. They were all implemented as small gradual improvements. Small changes, continuous problem solving and improvement made the transition smooth. “When we got comfortable, we started thinking about what’s the next thing we can do”, said Kuntawala, suggesting that the next things that they will look into will be fully automated deployments and better integration of development teams and the business. He concluded that in twelve months time the process will probably look a lot different. I look forward to hearing an update on this next year.