Let’s break the Agile Testing Quadrants

Five years ago, Lisa Crispin and Janet Gregory brought testing kicking and screaming into agile, with their insanely influential Agile Testing book. They are now working on a follow-up. This got me thinking that it’s about time we remodelled one of our sacred cows: the Agile Testing Quadrants. Although Brian Marick introduced the quadrants a few years earlier, it is undoubtedly Crispin and Gregory that gave Agile Quadrants the wings. The Quadrants were the centre-piece of the book, the one thing everyone easily remembered. Now is the right time to forget them.

XP is primarily a methodology invented by developers for developers. Everything outside of development was boxed into the role of the XP Customer, which translates loosely from devspeak to plain English as “not my problem”. So it took a while for the other roles to start trying to fit in. Roughly ten years ago, companies at large started renaming business analysts to product owners and project managers to scrum masters, trying to put them into agile boxes. Testers, forever the poor cousins, were not an interesting target group for expensive certification. So they were left utterly confused about their role in the brave new world. For example, upon hearing that their company is adopting Scrum, the entire testing department of one of our clients quit within a week. Developers worldwide, including me, secretly hoped that they’ll be able to replace those pesky pedants from the basement with a few lines of JUnit. And for many people out there, Crispin and Gregory saved the day. As the community started re-learning that there is a lot more to quality than just unit testing, the Quadrants became my primary conversation tool to reduce confusion. I was regularly using that model to explain, in less than five minutes, that there is still a place for testers, and that only one of the four quadrants is really about rapid automation with unit testing tools. The Quadrants helped me facilitate many useful discussions on the big picture missing from typical developers’ view of quality, and helped many testers figure out what to focus on.

The Quadrants were an incredibly useful thinking model for 200x. However, I’m finding it increasingly difficult to fit the software world of 201x into the same model. With shorter iterations and continuous delivery, it’s difficult to draw the line between activities that support the team and those that critique the product. Why would performance tests not be aimed at supporting the team? Why are functional tests not critiquing the product? Why would exploratory tests be only for business stuff? Why is UAT separate from functional testing? I’m not sure if the original intention was to separate things into those during development and after development, but most people out there seem to think about the horizontal Quadrants axis in terms of time (there is nothing in the original picture that suggests that, although Marick talks about a “finished product”). This creates some unjustifiable conclusions – for example that exploratory testing has to happen after development. The axis also creates a separation that I always found difficult to justify, because critiquing the product can support the team quite effectively, if it is done timely. Taking that to the extreme, with lean startup methods, a lot of critiquing the product should happen before a single line of production code is written.

The Quadrants don’t fit well with the all the huge changes that happened in the last five years, including the surge in popularity of continuous delivery, devops, build-measure-learn, big-data analytics obsession of product managers, exploratory and context driven testing. Because of that, a lot of the stuff teams do now spans several quadrants. The more I try to map things that we do now, the more the picture looks like a crayon self-portrait that my three year old daughter drew on our living room wall.

The vertical axis of the Quadrants is still useful to me. Separation of business oriented tests and technology oriented tests is a great rule of thumb, as far as I’m concerned. But the horizontal axis is no longer relevant. Iterations are getting shorter, delivery is becoming more continuous, and a lot of the stuff is just merging across that line. For example, Specification by Example helps teams to completely merge functional tests and UAT into something that is continuously checked during development. Many teams I worked with recently run performance tests during development, primarily not to mess things up with frequent changes – more to support the team than anything else.

Dividing tests into those that support the team and those that evaluate the product is not really helping to facilitate useful discussions any more, so it’s time to break that model.

The context driven testing community argues very hard that looking for expected results isn’t really testing – instead they call that checking. Without getting into an argument what is or isn’t testing, the division was quite useful to me for many recent discussions with clients. Perhaps that is a more useful second axis for the model: the difference between looking for expected outcomes and analysing aspects without a definite yes/no answer, where results require skilful analytic interpretation. Most of the innovation these days seems to happen in the second part anyway. Checking for expected results, both from a technical and business perspective, is now pretty much a solved problem.

Thinking about checking expected vs analysing outcomes that weren’t pre-defined helps to explain several important issues:

  • We can split security into penetration/investigations (not pre-defined) and a lot of functional tests around compliance such as encryption, data protection, authentication etc (essentially all checking for pre-defined expected results), debunking the stupid myth that security is “non-functional”.
  • We can split performance into load tests (where will it break?) and running business scenarios to prove agreed SLAs and capacity, continuous delivery style, debunking the stupid myth that performance is a technical concern.
  • We can have a nice box for ACC-matrix driven exploration of capabilities, as well as a meaningful discussion about having separate technical and business oriented exploratory tests.
  • We can have a nice box for build-measure-learn product tests, and have a meaningful discussion on how those tests require a defined hypothesis, and how that is different from just pushing stuff out and seeing what happens through usage analytics.
  • We can have a nice way of discussing production log trends as a way of continuously testing technical stuff that’s difficult to automate before deployment, but still useful to support the team. We can also have a nice way of differentiating those tests from business-oriented production usage analytics.
  • We could avoid silly discussions on whether usability testing is there to support the team or evaluate the product.

Most importantly, by using that horizontal axis, we can raise awareness about a whole category of things that don’t fit into typical test plans or test reports, but are still incredibly valuable. The 200x quadrants were useful because they raised awareness about a whole category of things in the upper left corner that most teams weren’t really thinking of, but are now taken as common sense. The 201x quadrants can help us raise awareness about some more important issues for today.

That’s my current thinking about it. Perhaps the model can look similar to the picture below.


What do you think?

I'm Gojko Adzic, author of Impact Mapping and Specification by Example. My latest book is Fifty Quick Ideas to Improve Your User Stories. To learn about discounts on my books, conferences and workshops, sign up for Impact or follow me on Twitter. Join me at these conferences and workshops:

Specification by Example Workshops

Product Owner Survival Camp

Conference talks and workshops

28 thoughts on “Let’s break the Agile Testing Quadrants

  1. Your reconfiguration of the quadrant reminds me of Elisabeth Hendrickson’s variation at CAST 2012, as mentioned by Markus here:
    including slideshare and video, so dig in – it seems you are on the same page ;-)

    So setting an axis of expected vs. unknown is a good question for challenging testing activities. I’m tricked though by the position of the specific .. erhm .. techniques … a lot of “load testing” could be to confirm and API testing to investigate unknowns.

    Perhaps consider spectrums of Break vs Confirm | Skip vs Deep dive | prepared vs explored | .. (http://jlottosen.wordpress.com/2013/04/12/3d-for-context/) – there are probably more than 3 dimensions ;-)

    I’ve used the original inspiration for mapping competencies, mentioned here:
    As you can see, they key vertical axis is kept :) In mapping competencies (not only test activities) – it can be a good discussion trick to explore the difference in HOW we critique the product compared to how we manage the team.

    Do break models – that’s what we do to learn more about the limits of our craft. /Jesper

  2. My initial thought is that calling it agile testing quadrant is very restrictive. I would prefer it to adopt a general name such as “Testing Quadrant”. Bringing all types of testing in an agile framework would be welcomed to cut down delivery time but this isn’t the norm. It would be too expensive and resource hungry by bringing skilled people at sprint level who can carry out all types of testing. The axis on the right and left can have arrows pointing down an up to show decrease in checking and undefined/unexpected analysis respectively. The structure of testing is so complex that it is difficult to fit in a single simple model. I hope you will get feedback that help improve it further and look universal to adopted by teams across the spectrum.

  3. As processes evolve, so must the models. And, we must remember it is a model, not an absolute. I find that I adjust how I talk about the model depending on who my audience is. For example, you mentioned it was useful 10 years ago – this is true, and there are still many companies in that position so it is useful now. I do like your wording on the vertical access, although it might be even more of a mouthful when trying to describe :-) As I mentioned in a tweet yesterday, I think that the left side (looking for expected outputs) is about preventing defects, while the right side is about finding them – hopefully as early as possible.

  4. Janet – yes it’s a mouthful, but if it’s going in the right direction we’ll find a nice name for it :) Finding defects doesn’t resonate well with me, because I explore as I develop to understand things better as well. If I do something differently during development surely that’s preventing defects, not discovering them.

  5. Ah, Gojko! I thought you were gonna scrap the rectangle completely and give us some clouds or a hexagon =) Seriously, if we are redefining the model, we need to have another shape because otherwise people will miss the change.

    Ok, so you took away the possible misunderstanding of left to right being time related, which is good. Having checking vs analysing on the two axis instead seems like a reasonable thing as well.

    So my question is what this helps you with? When I talk to customers about the agile testing quadrants, they help me visualise the test “coverage” in terms of what types of testing they are doing and more importantly not doing. Any side step in terms of an empty quadrant for the product or feature needs to be considered and decided upon explicitly. Do you think these changes in the model will help me more and better with this? and how?

  6. Sigge – they help me do pretty much that but, but I can fit technical exploratory testing better, move usability where it belongs etc. The six bullet-points I listed at the end of the post are the potential discussion advantages for me

  7. I love how people have been evolving and adapting the Quadrants over the years. You make some compelling arguments here, and I like your drawing of the model.

    I’m not ready to let go of the quadrants. The taxonomy helps me think through all the testing we’ll need to do on a particular feature or change.

    Unfortunately, a lot of people look at the quadrants but didn’t read our book carefully, and totally misunderstand them. (Not you, Gojko, but some folks). I wrote this rant a couple years ago in response to that, and I hope people will use the quadrants as they were intended – and continue to make their own changes to it.


  8. Great post, Gojko! From Elisabeth’s “Explore it!”, she described two sides of testing – checking vs. exploring, does it similarly represent the other dimension besides business vs. technology? Thanks! Yi

  9. Gojko, I am not sure we can put usage analytics and A/B in the same box as Exploratory and Usability.
    The former require interaction with the end customers and their feedback to provide value, hence I believe they should have a new separate space somewhere to the right. We might need a brand new shape all together here.

  10. Hi Gojko,

    I thought that it was only me that run into problems explaining where to draw the line… thanks for sharing.

    Regarding the name on the vertical axis; how about just “Expected” and “Unknown” (or maybe “Expected outcome”, “Unknown outcome”) as short names, with the other things you mention implied.
    I mean “Technology” and “Business” is not telling the complete story either, right?

  11. I think your model and Elisabeth’s are essentially saying the same thing, and I like what they’re saying.

    I think the thing I most prefer over the original quadrants is that unit and other lower-level tests are not classified as “supporting the team” (i.e. developers write them for the sake of developers), rather “checking for expected results” (i.e. developers write them to verify aspects of the product). I personally feel that all tests (including unit tests) should be checking or analysing something meaningful about the product; I’ve seen too many cases where lower-level tests seem to have been written just for the sake of coverage or to satisfy team process. (TDD helps with this, but not everyone uses it.)

    So for me this new model seems to fit a bit better with my own view of testing.

    In fact, I think it’s *also* worth re-evaluating the technology vs business distinction – since the reason almost all software projects exist is to meet business needs (except perhaps things like research projects at universities, where the motivation might purely be exploring technology). But in general I don’t really agree with the idea that there are tests which are needed only for the sake of technology and not to help solve a business problem.

    I think to solve this I would simply rename “business” to “user” (and maybe draw some sort of “business” cloud around the whole thing!)

  12. What are “production log trends” ? What type of tests use them? Can you give an example for an Production Trend test?

  13. Thanks for the clarification.
    With ‘Stakeholder Impact’ you refer to ‘Impact Mapping’ ?
    I First thought ‘A/B’ testing is just an abbreviation for ‘Alpha/Beta’ but then realized it is a completely different test method.

  14. Wait a minute! I just reread this and noticed you placed TDD in the check column.

    I would like to put it in the other column. If you’re using tests (checks) to drive your design, then yes, there is the word check in there but it isn’t so much about checking the expected results as it is exploration of the design and the unknown.

    Writing unit tests belong there though because then the behavior is known and your goal is to check it.

    Or am I reading too much into TDD? :)

  15. I was thinking more about this as “being able to define an expected outcome”. when doing TDD, I define an expected outcome, and I check for that. the thing on the right is much more not being driven with a particular expected outcome, looking for unexpected and investigating tangental stuff.

  16. Hi Gojko,
    I love the quadrants as you describe them.

    As I was previously the solo QA on a project I’ve found that initially investing in the automation stack gave me the capacity to extend the scope of my testing beyond what I might have been able to do historically. The non functional areas are now some of my biggest areas of focus. I use a radar chart to map out where I’m at in a given area and where I aspire to be. Which helps in those times where you feel bogged in feature work but know there are some real gains to be made elsewhere too.

    I try to “automate” the views over these lower right quadrant elements so as to discover any possible insights to further improve quality. For instance, using Splunk logs to create charts of our most used endpoints and most common filter parameters.

    Thanks for taking the time to write this up! Really nice read!

  17. A few years back I had an email exchange with Tom Poppendieck, where he stated that they thought about the left side of the Quadrants as “Test to Specification”, and the right side as “Test to Failure”. This is in line with concept described in Allan G. Ward’s book “Lean Product and Process Development”, and I’ve kept this in my presentations of the Quadrants since. This matches also what you’ve put on the left and right sides on your new version of the quadrants. I like this improvement on the quadrants, but I guess you are intentionally overstating in the title about the breaking of the quadrants. This model is still relevant and useful, but can certainly be enhanced as you point out.

  18. One expansion of the original quadrant was by Gerard Meszaros into a six-box grid. It added a middle row that included component tests and exploratory tests.
    Using a variation of your idea of changing the axis, one might expand the six-box to a nine-box grid.

    | Functional | Pre-determined | Ad-hoc
    Outward | Acceptance Tests | Use / appearance | Usability
    | Performance

    In-Between |Module |Pre-scripted | Exploratory
    | exploratory
    Inward | Unit Tests | Known security |Penetration

    The Outward/In-Between/Inward and Functional / Pre-determined / Adhoc are possible scales. The “pre-determined” category includes the non-functional tests whose outcome is pretty well-fixed. It could also include scripted exploratory tests. For example, for a functional test like “add item to order”, the scripted exploratory test might be “add item, delete it, add three more, delete two, …”

    Just a thought.

  19. Gojko, I agree with your reasoning. But one statement made me wonder. You write “Checking for expected results, both from a technical and business perspective, is now pretty much a solved problem.”

    Does that mean you think most teams are successful at checking, in the sense of letting very few defects slip through automated testing, er, checking?
    And are efficient at the same time, not spending too much effort on inventing and implementing tests?
    And this without getting bogged down by masses of tests that are hard to maintain upon changes and/or that run too long?

    For what fraction of teams do you think this is true?
    And what fraction of individual developers have all those skills?

  20. Hi,

    I meant that for anyone willing to research the topic, there is plenty of information out there how to do it right. People doing it wrong or still suffering are doing so because of their ignorance.

    I don’t have any industry statistics about percentages.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>