Anatomy of a good acceptance test

The long term benefits of agile acceptance testing come from live documentation – a description of the system functionality which is reliable, easily accessible and much easier to read and understand than the code. In order to be effective as live specification, acceptance tests have to be written in a way that enables others to pick them up months or even years later and easily understand what they do, why they are there and what they describe. Here are some simple heuristics that will help you measure and improve your tests to make them better as a live specification.

The five most important things to think about are:

  • It needs to be self explanatory
  • It needs to be focused
  • It needs to be a specification, not a script
  • It needs to be in domain language
  • It needs to be about business functionality, not about software design

Here is a really good example:

It has all the elements listed above. Whenever I’ve shown it to people in the workshops, I did not have to use a single word to explain it. The title and the introductory paragraph explain the structure of the test data enough that readers don’t need to work back from the data to understand the rule. But the examples are there to make it actually testable and explain the behaviour in edge cases. It is focused on a very particular rule of free delivery availability, does not explain how the books are purchased but just what the available delivery mechanism is, and does not try to talk about any implementation specifics.

Here is a really bad example (taken from http://fitnesse.org/FitNesse.UserGuide.PayrollTests.AddAndPayTest)

This test has so many bad things in it that it is actually a good example to demonstrate what happens when people don’t take care about writing good tests.

First of all, although it has a title and some text around the tables to seemingly explain what’s going on, the effect of that is marginal. Why is this test “simple”, and what exactly in payroll is it checking? It fails straight away on the self-explanatory litmus-test.

Second, it’s not really clear what this test is checking. We need to work backwards. It seems to verify that the cheques are printed with unique numbers, starting from the next available number that’s configurable in the system. It also seems to validate the data printed on each cheque. And that there is one cheque printed per employee (I’ll come back to this later).

There is a lot of seemingly incidental complexity there – names and addresses aren’t really used anywhere in the test apart from setting up the employees. There are some database IDs there which are completely irrelevant for the business rules, but they are used to match up employees and the paycheck inspector.

Paycheck inspector is obviously an invented thing just for testing. No company is going to have Peter Sellers in a Clouseau outfit inspecting cheques as they go out. If you have enough employees to have to print cheques, you don’t want to inspect them manually. That’s what this test is about, anyway.

There is also a very interesting issue of those blank cells in the assertion part of the test, and the two Paycheck inspector tables which seem unrelated. Blank cells in Fitnesse are used to print test results for debugging and troubleshooting, they don’t check anything. So this is an automated test that a human has to look over — pretty much defeating the purpose of automation. Blank cells are typically a sign of instability in tests (more on this a bit later) and they are often a signal that you’re missing something – either testing in the wrong place or missing a rule that would help make the system process repeatable and testable.

The language is inconsistent, which makes it hard to make a connection between inputs and outputs at first. What is the 1001 value in the table below? The column header tells us that it is a number – well thanks for that, I though it was a sausage. There is a “cheque number” above, but what kind of a cheque number is that? What is the relationship between these two things?

So this test is very very bad.

Presuming that the addresses are there because cheques are printed as part of a statement with an address ready for automated envelope packaging, this test fails to check at least one very important thing (and I’ll come back to this later as well): that the right people got paid the right amounts. If the first person got both cheques, this test would happily pass. If they both got each-others salaries, this test would pass. If a date far in the future was printed on the cheque, our employees might not be able to cash it in but the test would still pass. The reason these cells are blank is because there is another hidden rule there: ordering of cheques coming out. There is nothing specifying that. So a technical workaround for a functional gap is to create a test that gives us loads of false positives.

Is this test checking one thing or many things? Without the context information it’s hard to tell. If the cheque printing system is used for anything else, I’d pull out the fact that cheque numbers are unique and start from a configured number into a separate page. If we only ever print salary cheques, it’s probably part of a single thing (salary cheque printing).

Now, let’s clean it up. Let’s try to work back and drop all the incidental stuff. Let’s use a nice descriptive title, such as “Payroll Cheque Printing”. And let’s add a paragraph that explains the structure of the test.

A cheque has the payee name, amount, payment date. A cheque does not have a name and a salary. If the cheque is printed on a statement, it also has an address that will be used for automatic envelope packaging. A combination of a name and address should be enough for us to match the employee with his cheque — we don’t really need the database IDs. We can make the system more testable by agreeing on an ordering rule, whatever it is. For example, alphabetic by name.

Let’s also pull the context to the start of the test. Our context is the payroll date and the next available cheque number, along with employee salary data. Let’s make it explicit what the number is for, so that the readers don’t have to figure this out for themselves in the future. We can also make this block stick out visually to show that it is about the context.

The action that gets kicked off doesn’t necessarily need to be listed in the test. A payroll run can be executed implicitly by the table that checks payroll results. This is an example of focusing on what’s being tested instead of how it is being checked. There is no need to have a separate step that says “Next we pay them”.

Let’s also rename that paycheck inspector to something that makes more sense. Because we want whoever automates this to ensure that we check for all cheques printed, let’s put that in the header. Otherwise, someone might use subset matching and the system might print every cheque twice and we won’t notice.

And here is the cleaned version.

Much easier to understand, without all that incidental stuff. And now comes the punchline. When we look at the test like this, without database IDs, without all the unnecessary clutter, we can have a nice shot at answering the question “Are we missing something?”. What are the edge cases that might break this? We don’t really need to ensure validity of employee data, hopefully that’s done in another part of the system. But is there any kind of valid employee data that would be an edge case for this test – can we play with the numbers to make this illogical in some way?

An obvious answer to that is – what happens if an employee has a salary of 0? Do we still print the cheque? The rule as we described it says “One cheque per employee” – so any employees that have been fired years ago and no longer receive salaries would still get cheques printed, with zeroes on them. We could then have a discussion with the business on making this rule stronger and ensuring that cheques don’t go out when they don’t need to do.

FitNesse gets a lot of bad reputation because of this kind of broken tests. Concordion was built as an answer. Some people at my recent workshop on this example pointed out that Given-When-Then structure of Cucumber would be better at preventing some of these problems. I don’t think so. It’s not about the tools – people can do this kind of bad test design with any tool, and likewise they could do nice and clean tests with FitNesse. I guess the fact that the basic examples coming with FitNesse are so bad doesn’t help, but the problem is not in the tool. It’s about the process and effort put into making the tests easy to understand. The funny thing is it doesn’t take a lot more effort to make the test nice and clean, but it brings a lot more value that way.

If you found this article helpful, you’ll love my three day workshop on effective specification by example and agile acceptance testing

I'm Gojko Adzic, author of Impact Mapping and Specification by Example. To learn about discounts on my books, conferences and workshops, sign up for Impact or follow me on Twitter. Join me at these conferences and workshops:

Product Owner Survival Camp

Specification by Example Workshop

Conference talks and workshops

11 thoughts on “Anatomy of a good acceptance test

  1. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #622

  2. Excellent article, Gojko. Your examples of “good” acceptance tests were a great reminder of how much more effective they can be than “bad” ones, and how, like several agile development techniques, using “bad” examples can actually be counterproductive to getting folks on board with using a new practice or understanding the value a practice provides.

    I hope the folks at FitNesse might consider “cleaning and refactoring” (sorry, Uncle Bob) the example given as FitNesse, for many, is the “face of automated specifications”; it should have great examples :)

    Anyhow, a great reminder as to how powerful great acceptance test specifications can be at challenging assumptions.

  3. Pingback: Fitnesse Smell – Executable Requirements that look like Scripts : David Laing’s blog

  4. Hi Gojko,

    Great article. We have used fitnesse for a quite a few projects, and we daily deal with problems like you mentioned. Great to see an explanation and example on who to address those issues.

  5. Nice article. The amusing thing to me is that I recognize some of the data from fixtures in that test, stuff used for a hand-grown acceptance test framework that OM originally constructed pre-Fitnesse. So the example really dates back almost ten years. (I didn’t translate it into fitnesse though, so I take only half-blame.)

    You’re absolutely right–examples suggest how people should code things, that’s why there is no end of bad Java code (due to the horrid examples used in Java certification tests) and no end of bad Fitnesse tests. I helped review Cohn’s book Succeeding With Agile, and noted similar problems with an example he used. I ended up writing a companion article that digs into other things to look for in constructing tests that are quickly understandable: http://www.informit.com/articles/article.aspx?p=1393274

    BTW, as lame as this test may be, I usually see far worse.

  6. Pingback: Anonymous

  7. Great post Gojko! I really like how you described the pitfalls of the original test and demonstrated a better way.

    I find the given/when/then language adds clarity by making obvious what is an input and what is an expectation. This might be the current fad, but here is a semantic alternative.

    |Given|Employees|
    |name|address|salary|
    |Jeff Languid|10 Adamant St; Laurel, MD 20707|1005.00|
    |Kelp Holland|12B Baker St; Cottonmouth, IL 60066|2000.00|

    |When|Payroll Runs|
    |on|10/15/2010|
    |with next check number|1000|

    |Then|Printed Cheques|
    |cheque number|cheque date|payee|address|amount|
    |1000|10/15/2010|Jeff Languid|10 Adamant St; Laurel, MD 20707|1005.00|
    |1001|10/15/2010|Kelp Holland|12B Baker St; Cottonmouth, IL 60066|2000.00|

  8. Pingback: ParkCalc automation – Refactoring a data-driven test : Clevertester

  9. Pingback: blog.codecentric.de

  10. Pingback: Iterative Development, Manual Testing and Frustrations « Managing Software Development

  11. Pingback: Contract and Integration Service Testing in a Componentized Enterprise Model « Manage to Test

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>