Building software that matters, part two

This week at the XPDay 09 conference in London, I facilitated a discussion on practices, ideas and tools that help us focus on building software that matters. We started by quickly going over the conclusions from a similar workshop held in august during the Alt.NET UK conference. We then started an open discussion on new ideas. Unlike the Alt.NET workshop where most people in the room seemed to be server-side developers, this time web developers were the majority, so the discussion was often centered around mass-market software. The main theme of this workshop turned out to be feedback as a tool for focusing projects on things that matter. Continue reading

Shock therapy agile adoption at 7Digital

Dan Rough, Rob Bowley, Hibri Marzook and others from 7Digital presented an experience report of agile adoption by the services team at 7Digital, a media distribution company, today at XPDay 09. What’s really interesting about this experience report, compared to others at technical conferences, is that their CEO and commercial director were sitting on the panel. Continue reading

Software process improvements with Lean and Kanban at BNP Paribas

Benjamin Mitchell from BNP Paribas presented an experience report on moving from Scrum to Lean and Kanban yesterday at XPDay 2009 in London. During the initial phases of the project, before he joined, consultants were called in to help implement Scrum. Soon after the consultants left there was a backlash on anything labelled “agile” among business users. Mitchell gave the example of a trader that got furious after his 50-page requirements document was thrown in the bin. From what I could make out from the presentation, there was a general distrust between the business and development team. At the point where Mitchell took over the project, it was so “politically toxic” that the only way to run it was to make everything very visible to mitigate political risks. Continue reading

Improving testing practices at Google

Mark Striebeck from Google opened XPDay 2009 today with a talk titled “Developer testing, from the dark ages to the age of enlightenment”. Suggesting that software testing is today in a renaissance stage, Striebeck said that the community is now rediscovering “ancient” practices. Most things we use in testing today were invented a long time ago, and then forgotten, said Striebeck. In the last fifteen years, the community started rediscovering these practices and people were focused on advancing the art, not teaching. As a result, there are many good testing practices out there but having testable code is still more an art than science, according to Striebeck.

Google had a team of Test Mercenaries, who joined different teams for a short period of time to help them with testing. In most cases, they could see what was wrong after a few days and started helping the teams, but the effort wasn’t a success. When they left, teams would not improve significantly. Striebeck said that testing “wasn’t effective to teach”. Knowing what makes a good test often relied on personal opinion and gut-feel. Doing what they often do in similar situations, Striebeck said that they decided to collect data. The things that they were interested in were figuring out the characteristics of good tests and testable code and how to know in the first place that a test is effective. They decided to use a return-on-investment criteria: low investment (easy to write, easy to maintain), high return (alert to problems when it fails). According to Striebeck, Google spends $100M per year on test automation, and wanted an answer whether they are actually getting a good return on that investment. They estimated that a bug found during TDD costs $5 to fix, which surges to $50 for tests during a full build and $500 during an integration test. It goes to $5000 during a system test. Fixing bugs earlier would save them an estimated $160M per year.

To collect data, they set up a code-metrics storage to put all test execution analytics in a single place. Striebeck pointed out that Google has a single code repository, which is completely open to all of the 10000 developers. Although all systems are released independently (with release release cycles randing from a week to a month), everything is built from HEAD without any binary releases, and the repository receives several thousand changes per day with spikes of 20+ changes per minute. This resulted in 40+ millions of tests executed per day from a continuous integration service plus IDE and command line runs, they collected test results, coverage, buld time, binary size, static analysis and complexity analysis. Instead of anyone deciding whether a test is good or not, the system observed what people do with tests to rank them. They looked into what a developer does after a test fails. If the code was changed or added, the test was marked as good. If people changes the test code when it fails, it was marked as a bad test (especially if everyone is changing it). This means that the test was brittle and has a high maintenance cost. They also measured which tests are ignored in releases and which tests often failed inthe continuous build and weren’t executed during development.

The first step was to provide developers reactive feedback on tests. For example, the system suggested deleting tests that teams spent loads of time maintaining. They then collected metrics on whether the people actually acted on suggestions or not. The system also provided metrics to tech leads and managers to show how teams are doing with tests.

The second step, which is in progress at the moment, is to find patterns and indicators. As they now have identified lots of good and bad tests, the system is looking for common characteristics among them. Once these patterns are collected, algorithms will be designed to identify good and bad tests, and manually calibrated by experts.

The third step will be to provide constructive feedback to developers, telling developers how to improve tests, what tests to write an dhow to make the code more testable.

The fourth step in this effort will be to provide prognostic feedback, analysing code evolution patterns and warn developers that their change might result in a particular problem later on.

I will be covering XpDay 2009 on this blog in detail. Subscribe to my RSS feed to get notified when I post new articles.