A lot of my consulting work lately has been around helping teams see software quality more holistically – that it’s not something only testers (or only developers) should be concerned about. Doing that I’ve started formulating an idea that isn’t fully baked yet – but it helped me explain things better – and I’d love your comments on this.
A lot of the confusion about software quality is that it means different things to different people, and the best definitions so far are zen-like, eg Weinberg’s ‘value to someone (who matters)’. They don’t describe things like technical code quality, which I intuitively know matters but doesn’t directly provide value.
Laking a holistic definition of quality, teams measure things like bug trends, code coverage etc. What gets measured gets optimised, so we end up overly optimising things beyond the point that it makes sense. Air and food are a necessity, but more air and food than we need do not really improve the quality of life. Similar to that, technical correctness and performance are necessary but going beyond a certain point gives us diminishing returns. As with any local optimisation, there is a potential that we can hurt the whole pipeline by working on the wrong thing. As I was explaining this comparison to a client, it hit me that it might be worth trying to build a parallel between Abraham Maslow’s hierarchy of needs and software quality.
The famous Maslow’s pyramid lists human needs as a stack from physiological – necessary for basic functions (such as food, water), safety (personal security, health, financial security), love and belonging (friendship, intimacy), esteem (competence, respect) to self-actualisation (fulfilling potential). The premise of the hierarchy of needs is that when a lower level need is lacking, we disregard higher level needs. For example, when a person doesn’t have enough food, intimacy and respect, food is the most pressing thing. Another premise is that satisfying needs on the lower levels of the pyramid bring diminishing returns after some point. Our quality of life improves by satisfying higher level needs more. Eating more food than I really need brings obesity. More airport security than needed becomes a hassle. The key idea of the pyramid model is that once the basics are satisfied, we should work towards satisfying higher level goals.
Maybe software quality isn’t as simple as a zen-like sentence, and maybe measuring bugs actually does provide some value. Maybe we should stop trying to make it simple and instead model quality on different levels. I tried modelling this with a client – and this is their particular case – but it might apply to others as well.
The lowest level of quality are physiological needs – if these things aren’t satisfied, software is completely useless. For this particular situation, we identified two things: that the software has to be deployable and it has to satisfy the minimum functionality so that “it works”. Activities such as TDD, functional testing (automated + manual), post-deployment testing and similar help us prove that this category of needs is satisfied. Measuring bug counts, code coverage and so on also works on this level. And similar to the human physiological needs, enough is enough and after some point there is no more value to be gained. Newer versions of Microsoft Office have thousands of features that nobody ever uses, because even Word 95 had enough. Investing in more features, developing, testing, maintaining them, is an overkill.
Once the software “works”, the second thing we need from it is to “work well”. Looking at parallels between human security needs and what we need from software, this is where a lot of what people typically call non-functionals goes in. Performance, reliability, security etc. Activities such as architectural design and performance optimisations come into this level, and performance testing, penetration analysis, stress tests etc might prove that we have satisfied enough. And similar to human security needs, enough is enough. Recent examples of Apple Itunes store security questions demonstrate that more features than needed in this space piss people off, or waste money. Building a system that can handle millions of concurrent users when in the next year or so we’ll only have thousands is a horrible waste of time. I lost a lot of my own money on this a few years ago – we goldplated a system architecturally instead of shipping more stuff that makes money.
Providing the software performs well and is secure enough, the next level above is love and belonging – and now we cross over to the users of the software. Case in point is Twitter. Famous for its fail whale, twitter’s second-level qualities are often just good enough, but that hasn’t stopped them from building a huge community of loyal users. Activities such as user interaction design, graphical design, community engagement and similar support software in fulfilling the needs on the level of love/belonging. Usability testing proves it. Of course, different types of software need different levels of this. In-house admin software requires a lot less love because people are forced to use it. On the other hand, user devotion and interaction can make or break consumer products.
So far this is nothing revolutionary. If I look back at most of my client engagements, these three levels are where most of the investment in specifications, development and testing was, and roughly proportional to the levels as well. Most of the investment goes in building in functionality and testing it. Performance and things like that are sometimes planned for, often reactively built in, and tested less frequently. I rarely worked with teams that were serious about usability and invested a lot of time or money designing for it and testing it. The pyramid model told this concrete client that I worked with that maybe they should start shifting their investments a bit. But the real surprise came out when we started looking at the top two levels.
Usability on its own can be an overkill. No doubt about it. In Change by Design: How Design Thinking Transforms Organizations and Inspires Innovation, Tim Brown presents Nokia Ovi as a key success story to showcase design thiking.
Barely a year later, Nokia announced Ovi, a new service offering that
could be accessed through any of its multimedia devices. Design
thinking had enabled Nokia not only to explore new possibilities but
also to convince itself that these possibilities were sufficiently
compelling to move away from its strongly entrenched and previously
successful approach. The timing was right. Today Ovi is one of the
operating business divisions of the company, and Nokia – a technology
leader – has reinvented itself as a service provider.
Oh, the irony. If ever there was a IT equivalent of harakiri, this was it. Nokia – a technology
leader – has “reinvented itself as a service provider” but nobody wants their service. Don’t get me wrong, I am not saying that user interaction design is bad, or that design thinking is wrong – just that it’s not the end. It is a need to be satisfied, but brings diminishing returns after a point. There are two more levels on the pyramid above it.
The key thing missing from the Nokia Ovi story is the fulfilment of the potential. Usability marks potential, and if nobody is using it do we care? The level above usability is usefulness. I did a study with a client recently and looking at the log files we determined that roughly 60% of the features aren’t used enough to justify investment in further maintenance. Maybe instead of investing a lot of money in functional testing we can invest in measuring the usefulness of software? One team I interviewed for
Specification by Example: How Successful Teams Deliver the Right Software use automated tests as a target for development, but then disable most tests after they pass the first time. Instead, they protect against problems by delivering in small batches and measuring production use. They define quality more at the fourth level than the first level, which enables them to be much more productive. If the indicators expected by the business users aren’t there, the feature is taken out. Think of this as a business-bug.
Finally, the fact that someone uses a feature doesn’t necessarily mean that the feature was right for the business in the first place. There is one more level above – corresponding to self-actualisation. Does the software achieve what it was originally intended to? Does it save money, earn money or protect money? Or whatever the key business goals were originally. If not, then it doesn’t really matter that people use it, that it is usable or performant or that all unit tests pass. This is the space where activities such as Impact Mapping and Feature Injection come in, along with the ideas of Build-Measure-Learn cycles and Actionable Metrics from The Lean Startup to prove them.
The top level is really where “more is better”, with perhaps a gradual transition to “good enough is good enough” on the two levels below, with the lowest two levels definitely falling into the “good enough” category. Yet from what I see most software teams invest, build and test only at the lowest two levels, gold-plating things without a way to explain why that is bad. Breaking things down in a visual model such as the one with the five levels of the pyramid here helped me get one team to start thinking better about what they really want to achieve. Your turn next!



Are you saying that it’s my job as a tester to measure if the software makes money? How am I responsible for that?
This is such a great way to show how different types of quality apply. We were talking this weekend at TelSum about visualizing quality, and I pointed folks to visualisingquality.org. I look forward to discussing this ‘quality hierarchy pyramid’ idea w/ others (do you have a catchy name for it btw?)
Interesting transition from the Maslow’s pyramid to a quality pyramid.
@Rajnish I don’t think that Gojko makes testers responsible for measuring if the software makes money. He talks about “the team”, which in my opinion is a broad definition in this case.
I think I must agree that in a lot of cases too much testing is done in the lower levels of this pyramid. It actually all boils down to a sound business case and risk based testing. The only question is how to do the measuring of the higher levels of the pyramid before investing too much time at the bottom…
@Rajnish if you see your role as someone who measures quality to inform better stakeholder decisions, then absolutely yes. If you see your role as a keyboard automation tool, then no.
Excellent article…nice to see something about quality that’s grounded in reality.
@Rajnish testers aren’t the only ones responsible for quality. I’d imagine the consensus is that you’re responsible for that which you can test.
This is terrific. It supports an approach to testing that I emphasize in my company, which is that teams take responsibility for identifying practices which not only ensure that software works, but that it solves real problems in friendly ways. I do think that testers (by which I mean both those who have “test*” in their job title and those who don’t) are responsible for profitability, to the extent that every month I give all profits from our own products’ sales growth back to the teams to use as they see fit. In the end, the product you’re building has to pay your salary, although for some of you the connection might be more indirect than for others.
Weinberg’s ‘value to someone (who matters)’. They don’t describe things like technical code quality, which I intuitively know matters but doesn’t directly provide value.
But it does directly provide value to someone: to the programmer who wrote the code, and to the programmers who will maintain it. As such, it indirectly provides business value. If there’s a problem in the code, or if the business wants programmers to extend or enhnace the product, that value will be realized and recognized.
That doesn’t mean that doing everything to a Platonically perfect is essential, so your cautionary notes about gold-plating are well-taken. Quality is not merely about value to some person who matters, although that’s important. It’s also about deciding whose values matter—which is fraught with politics and feelings, as Jerry points out.
With respect to the measurement problem, if you haven’t read Austin’s Measuring and Managing Performance in Organizations, I’m confident that you’ll find it valuable.
Cheers,
—Michael B.
It looks like this is very useful The human test triangle
I am currently working on how to define quality for a project. Which has different stakeholders and therefor different views / reports.
I would like to give this a try.
Thanks for posting
I like using Maslow’s hierarchy of needs as a platform for establishing what is important to an organization. I’ve used Maslow before with respect to Sales organizations but hadn’t thought of it directly as a holistic approach to quality.
What could be enlightening from a team exercise perspective would be for everyone to create their own hierarchy and see where they differ. This could help each member of the team see how the others holistically view quality.
Great start on an interesting approach
I disagree that Weinberg’s quote doesn’t apply to internal quality. Internal quality is valuable to developers, because it makes code easier to understand and maintain (ah, I see Michael Bolton makes the same point).
I do agree though that the top three levels of the pyramid deserve much more attention than most teams give them. But why is this the case? Usually because development teams are so cut off from “the business” or their customers. It’s important to examine this root cause because teams aren’t going to care about the upper levels of the pyramid unless they have some ability to influence the decisions on the trade-offs between these qualities.
That brings me to my last point – I think it’s more of a trade-off question than the pyramid model suggests. I might actually tolerate bugs if the software is useful enough to me.
That’s not necessarily a trade-off I’d support in most circumstances except as a short-term tactical decision, but people make it all the time. I think this is because the metaphor doesn’t quite work – we don’t ignore value and usability when software is buggy or not terribly performant – in fact, we might tolerate it for a while if there’s no easy alternative.
Nevertheless I think it’s an interesting and thought-provoking tool – thanks.
Rajnish S. “Are you saying that it’s my job as a tester to measure if the software makes money? How am I responsible for that?”
I think we testers are sometimes too willing to say “I can’t test that, it’s not a testable requirement”, then we move on as if we’re off the hook.
If we see that a requirement cannot readily be tested then we have a duty to query and challenge the requirement so that it is framed in a way that allows us to perform some useful testing.
In the example you cite, if the business wants the product to make money then they must have some idea how the product should do that. It is up to the BAs to turn that aspiration into a meaningful, testable requirement. The developers should then have built something we can test.
If that’s not happening we should be screaming, even at the risk of making ourselves unpopular. I hate the lousy attitude some testers have that leads then to say; “whew, these requirements aren’t testable, the users will have to see how it goes when we ship it, this is going to save us a whole load of time”.
I find the pyramid interesting, but probably not definitive. In other words, to say that each stage is the ‘higher order’ than the one below will change on a product-by-product basis. That said, though, it highlights the importance of understanding which measures are important in a software release. I think instead of value, maybe the pyramid should be more along the lines of “I can’t really measure X until Y meets some minimum threshhold.” So, before I can measure performance, the software needs to meet some minimum functional standard. And usability measurement implies a minimum level of security/performance. It’s not usable if it takes 60 seconds per page load. There is no point in measuring. But even then, we are just talking about sequences of measuring, not about the importance of certain testing levels.
I think software quality does mean value, and in my role as tester, I should be able to translate my work (both the time I spend working as well as the artifacts I produce) into value to the business.
Certainly, not all value is direct: but that is why I am an engineer, not a test-monkey. I am challenged to connect the software process to quality. Reducing the number of help-desk calls means we only need to pay 1 support person. Well-designed code means that long-term maintenance costs will be lower. Fewer functional bugs means higher customer satisfaction. Ambiguous requirements means that we don’t understand the customer need and will probably deliver something they don’t want/need, resulting in re-work costs, etc….
The danger of over-testing or pushing for optimal metrics is that the cost of accomplishing that goal is greater than the value received by accomplishing it. Sure, I could use a dictionary/grammar program to load every piece of text from my site and generate a bug report for every misspelled word/sentence. But does that really hurt the value all that much?
I think we are still a ways away from the time when we really have figured out the Key Indicators in software quality, and come up with approaches to optimize against those indicators.
Terrific. I’m pleased that it seems to work for commercial-off-the-shelf implementations too.
Great viewpoint, thanks for sharing your thoughts on a holistic model – IMO too much software quality effort is focussed on ‘acceptance’ rather than ‘success’
The model looks like a useful tool to support requirements definition as well as testing because quality is an ongoing cycle – asking ‘how do we measure Useful? How do we test that?’ may have saved Nokia Ovi a lot of pain.
While I like the simplicity of the model, an extended version explaining/overlaying existing test concepts could prove useful – how far up the triangle does software testing take you? – where does market analysis fit in?
Yes! I like this.
It feels real and like something that most people can relate too. Especially with the example (^^) with Maslov to compare with.
So it’s The Adzic Software Quality Need’s Pyramid then? ASONP…
Thanks Gojko
Interesting! Just a few comments after a quick browsing (I’ll get back to reading the article more in depth later):
- The Maslow’s pyramid displays a hierarchy of needs, where we need to achieve the lowest before we can get any value from the level above (or at least, that is how I perceive it). Do you view the quality pyramid the same way?
- An observation is that the different levels correspond to the need of communication on different levels. The lack of the appropriate communication constitutes a “communicational debt” for the different levels. I’ve experimented with that concept a little, it would be interesting to hear your take on that! See http://johannesnordh.wordpress.com/
Cheers!
Good idea but solely looking at the diagrams it gives the impression that the process is linear.
I think as you work on your new model and illustrations you could perhaps include parallel, self-informing processes (also similar to psychology).
Nice article Gojko. I also think that having a visual model is just great. The only thing that bothers me is “useful” staying so high in the pyramid. It doesn’t seem right to me to care about performance and security (above the very minimum levels) before finding out its usefulness. Perhaps I didn’t grasp the proposal properly.
@Jose,
the model is suggesting that the system has to be performant enough, otherwise it won’t be useful, but once it is performant enough more performance isn’t better. Delivering features that are more useful is better instead.
I like it. It’s simple, high level and realistic. It seems relevant (especially to leaders and managers) when discussing vision, technology and where manpower will be spent over time.
Excellent format to engage in good quality conversation (pun intended).
Looking from the piramid, I get the feeling that there is a disconnect between successful & useful and other levels of pyramid. The first remark, is that if system is not useful, then the lower level of pyramid does not make sense – i.e., the system that does not satisfy some need would not be used – even if all previous levels are achieved. Secondly, in order to be successful, useful system should appear in suitable time frame – later the place could be busy. IMHO, I would suggest to change the diagram to tree with several leves . (I = Successfull -> (II = Useful -> (III – Usable, Performant, Deployable functionality ok), Timely). Useful seems to me like a prerequisite, and Usable, Performant, functionality like the limiting factors – i.e., lack of them is limiting usefulness of software. Initial model look like a good starting point. Thank for publishing it.
My experience in the small business setting shows that there is nothing more important than making sure your cash tunnels are taken care of first. Developing “maybes” and “nice to haves” is the biggest loss for startup companies, that end up investing time and money into features and services that fall behind on demand and profitability.
Yes, features that make money should be maintained first.
Successful product helps your clients save money and that is usually the most profitable segment of your business, the one that helps your clients SAVE money. The reason people use software is to save time and money.
1. Identify the key, most useful components of your product. Be clear about how it saves your clients money.
2. Identify any ways you can make software more cost-effective for you and for your client: Is there a cheaper data source or service than you are currently using? Are there settings, shortcuts or configurations that can help your client view only relevant tools and information? Is the application workflow lean?
3. Everyone in the company, including testers, receptionists, and company executives is responsible for understanding the client’s needs and always looking for opportunities to make software more useful and successful.
Thanks Gojko – Nice article,
For me it reemphasizes older terms of Validation & Verification (in your pyramid these seem to be in different ends of the pyramid), and the 80/20 rule.
I think some warning should be made, to verify that people do not consider the levels as a binding execution order, as that might cause raising of higher level issues much too late.
This deserves much more discussion, and might very well become a very useful ground rule.
@halperinko – Kobi Halperin
I would like to see the connection to Dimensional Planning. Nice work.
I like this pyramid based from Mazlow’s Hierarchy. The upper level is basically hypotheses tests, much akin to what a Lean Start-up may do.
Way to go!
Paul
Interesting point of view. However I do not believe it is a productive view when you are trying to create a successful product.
Prioritizing Usefulness and Usability lower then functionality/performance is a dangerous thought process, because it automatically gets delayed to later phase of product development. It even sounds like classical waterfall product development:
“First we think of what we want to accomplish and later we are going to make it good (or useful/usable).”
Great products incorporate a holistic approach to product development. You will not be successful, just because you deploy a performant software, neither should this be your first step. You need to think about value creation in terms of usefullness and usability not as a commodity or something you can do later, but as a core part of your product.
Your twitter example shows this perfectly. Even more so for young products: They rely on early adopters more so then established products. And early adopters are much more forgiving then mainstream users.
The iphone, ipad and so forth are other great examples: Usability, Usefullness, Commodity, User Experience come first. Performance, speed, bugs come second.
@Andrej – I think that you got the wrong impression, I was not implying priority
Gojko, this is a great picture, but despite your response to Andrej your text (e.g. “when a lower level need is lacking, we disregard higher level needs”, and “if these things aren’t satisfied, software is completely useless”) can be read as setting very black-and-white priorities. And when you say, for example in your response to Jose, “once it is performant enough more performance isn’t better” you could be giving the impression that “performant enough” is not a time-varying criterion.
The Lean Startup approach is more like asking “what are the thinnest possible legs we can put in at the lower levels to enable us to start to test our assumptions in the top levels”. When you know what people find useful enough to pay for, you are in a new context, and the whole pyramid can be re-evaluated.
@Justin,
thanks for the slap, it serves me well for answering comments on my mobile phone.
I was not implying priority in the sense of order of implementation. I don’t think that we can first build functionality, then performance, then usability etc. I do think, however, that there is a progression of need on those levels. If twitter doesn’t allow me to log in, no point even building in or testing performance. We can release software with useless functionality (I’ve seen this too many times), but we can’t release useful software without any functionality. My idea with the model is to show that there are different levels of quality and that, while planning implementation and testing, we should be considering all those levels, and prevent gold-plating or over-focusing on only the lower ones.
I proposed the model as having something to look at more holistically – I was not implying that this was by any criteria static. All these things are of course varying and we need to reevaluate those definitions periodically. What’s good now even from a functional perspective might be useless tomorrow, or something deemed not important now might be a core thing when situation changes.
Thanks, Gojko. As a result of this conversation I’m watching Joshua Kerievsky’s presentation Lean Startup: Why It Rocks Far More Than Agile Development for a second time. If you haven’t seen it, I recommend it.
When I wrote, above, about the “thinnest possible legs we can put in at the lower levels”, that description would apply to a Minimum Viable Product thrown together without much regard for quality or performance to see if it will fly. But I was missing an important point – there are ways of testing ideas at the upper levels (useful, usable, and likely to succeed) without building the product itself. Alberto Savoia calls this “pretotyping”, short for “pretend prototyping” (i.e. a prototype that is faked in some way, rather than being an early implementation using the intended product technology). “Make sure you are building the right ‘it’ before you build it right.”
(By the way, it would be nice if links in comments were distinct from the surrounding text. There are three in my comment above.)
Sure, there is a lot more to lean startup than just software delivery. This post, though, is about software.
I appreciate that. For example, a paper prototype addresses usability of software that hasn’t been built yet.
I think the criticism on this pyramid also is reflected in the criticism on Maslows pyramid. Everyone can imagine cases of people whose basic needs are not satisfied, but who have their self-esteem, live in a loving family, or even get to self-actualisation (artists living in poverty).
In fact the same applies to this post as to user stories. The document is not important, but the conversation about it is. Interesting!
Inge, there is also an argument that artists living in poverty have their money needs satisfied to the level important to them.
I’m looking at this and trying to relate it to the Software Testing Quadrants that Brian Marick came up with, and some ideas from Lean Startup. It seems to me that it could help teams to identify which kinds of testing to invest more in or when they have enough of one kind of testing. I see a correspondance like this:
Successful – Net promoter score
Useful – A/B testing
Usable – Q3
Performant/Secure – Q4
Deployable Functionality OK – Q1, Q2
Gojko, using a maslov-like pyramid to set quality levels can only be helpful if the priorities mentioned correspond to the business case you’re assessing.
If the default layering is correct for a specific customer, that would be pure luck. IMHO.
@J Huisman
of course, I propose this as a model for further investigation, not as a universal solution to everyone’s problems. this hierarchy worked for one client once, I’ve not made any universality claims.
Arguably all successful solutions have been under pinned by a real need. Having “useful” near the top of your pyramid is getting it very wrong i.e. it’s the same as saying your delivery team lacks the basic understanding of the requirements. No client or business sponsor will sign off on the extra budget to refine a product if you can’t demonstrate early on that the solution will be useful!
Nice write up. The hard part is knowing how much is enough on each item.
What about trying to apply the same logic one (abstraction) level up to the development of software development methodologies?
Sometimes I have the impression we (a collective we, the United Software Developers of this Planet) reached the point of diminishing returns on trying to define, refine, structure and improve the development methodologies we use (of course to make them more useful to develop useful etc. code).
What do you think?
@ Rajnish S.:
Not the tester has to measure it, the whole team has to!
This includes the product owner (who will mostly do the measuring-action).
But some tracking links in the application will also help to get infos
I think this “could” be a valuable tool for one single team instance. The problem with quality is that it can’t be defined. Quality is different for every subject, object and context. It disappears as soon as you try to define it. Dependent on where I am in a project/team I would most likely move things around in that pyramid, that doesn’t say that it’s wrong, it’s just not defining anything.
Quality is a direct experience independent of and prior to intellectual abstractions.
What I mean (and everybody else means) by the word ‘quality’ cannot be broken down into subjects and predicates. This is not because Quality is so mysterious but because Quality is so simple, immediate and direct. – Robert M. Pirsig
(It’s awesomely hard to preview this text on this site… I just hope I come through in a good way)
@Rajnish S
You should know the concept that “Quality Is Money”
And main Goal Of testing is to improve/Maintain the quality.
So indirectly Tester is responsible….