How to implement UI testing without shooting yourself in the foot

I’m currently interviewing lots of teams that have implemented acceptance testing for my new book. A majority of those interviewed so far have at some point shot themselves in the foot with UI test automation. After speaking to several people who are about to do exactly that at the Agile Acceptance Testing Days in Belgium a few weeks ago, I’d like to present what I consider a very good practice for how to do UI test automation efficiently.

I’ve written against UI test automation several times so far, so I won’t repeat myself. However, many teams I interviewed seem to prefer UI level automation, or think that such level of testing is necessary to prove the required business functionality. Almost all of them have realised six to nine months after starting this effort that the cost of maintaining UI level tests is higher than the benefit they bring. Many have thrown away the tests at that point and effectively lost all the effort they put into them. If you have to do UI test automation (which I’d challenge in the first place), here is how do go about doing it so that the cost of maintenance doesn’t kill you later.

Three levels of UI test automation

A very good idea when designing UI level functional tests is to think about describing the test and the automation at these three levels:

  • Business rule/functionality level: what is this test demonstrating or exercising. For example: Free delivery is offered to customers who order two or more books.
  • User interface workflow level: what does a user have to do to exercise the functionality through the UI, on a higher activity level. For example, put two books in a shopping cart, enter address details, verify that delivery options include free delivery.
  • Technical activity level: what are the technical steps required to exercise the functionality. For example, open the shop homepage, log in with “testuser” and “testpassword”, go to the “/book” page, click on the first image with the “book” CSS class, wait for page to load, click on the “Buy now” link… and so on.


At the point where they figured out that UI testing is not paying off, most teams I interviewed were describing tests at the technical level only (an extreme case of this are recorded test scripts, where even the third level isn’t human readable). Such tests are very brittle, and many of them tend to break with even the smallest change in the UI. The third level is quite verbose as well, so it is often hard to understand what is broken when a test fails. Some teams were describing tests at the workflow level, which was a bit more stable. These tests weren’t bound to a particular layout, but they were bound to user interface implementation. When the page workflow changes, or when the underlying technology changes, such tests break.

Before anyone starts writing an angry comment about the technical level being the only thing that works, I want to say: Yes, we do need the third level. It is where the automation really happens and where the test exercises our web site. But there are serious benefits to not having only the third level.

The stability in acceptance tests comes from the fact that business rules don’t change as much as technical implementations. Technology moves much faster than business. The closer your acceptance tests are to the business rules, the more stable they are. Note that this doesn’t necessarily mean that these tests won’t be executed through the user interface – just that they are defined in a way that is not bound to a particular user interface.

The idea of thinking about these different levels is good because it allows us to write UI-level tests that are easy to understand, efficient to write and relatively inexpensive to maintain. This is because there is a natural hierarchy of concepts on these three levels. Checking that delivery is available for two books involves putting a book in a shopping cart. Putting a book in a shopping cart involves a sequence of technical steps. Entering address details does as well. Breaking things down like that and combining lower level concepts into higher level concepts reduces the cognitive load and promotes reuse.

Easy to understand

From the bottom up, the clarity of the test increases. At the technical activity level, tests are very technical and full of clutter – it’s hard to see the forest for the trees. At the user interface workflow level, tests describe how something is done, which is easier to understand but still has too much detail to efficiently describe several possibilities. At the business rule level, the intention of the test is described in a relatively terse form. We can use that level to effectively communicate all different possibilities in important example cases. It is much more efficient to give another example as “Free delivery is not offered to customers who have one book” than to talk about logging in, putting only a single book in a cart, checking out etc. I’m not even going to mention how much cognitive overload a description of that same thing would require if we were to talk about clicking check-boxes and links.

Efficient to write

From the bottom up, the technical level of tests decreases. At the technical activity level, you need people who understand the design of a system, HTTP calls, DOM and such to write the test. To write tests at the user interface workflow level, you only need to understand the web site workflow. At the business rule level, you need to understand what the business rule is. Given a set of third-level components (eg login, adding a book), testers who are not automation specialists and business users can happily write the definition of second level steps. This allows them to engage more efficiently during development and reduce the automation load on developers.

More importantly, the business rule and the workflow level can be written before the UI is actually there. Tests at these levels can be written before the development starts, and be used as guidelines for development and as acceptance criteria to verify the output.

Relatively inexpensive to maintain

The business rule level isn’t tied to any particular web site design or activity flow, so it remains stable and unchanged during most web user interface changes, be it layout or workflow improvements. The user interface workflow level is tied to the activity workflow, so when the flow for a particular action changes we need to rewrite only that action. The technical level is tied to the layout of the pages, so when the layout changes we need to rewrite or re-record only the implementation of particular second-level steps affected by that (without changing the description of the test at the business or the workflow level).

To continue with the free delivery example from above, if the login form was suddenly changed not to have a button but an image, we only need to re-write the “login” action at the technical level. From my experience, it is the technical level where changes happen most frequently – layout, not the activity workflow. So by breaking up the implementation into this hierarchy, we’re creating several layers of insulation and limiting the propagation of changes. This reduces the cost of maintenance significantly.

Implementing this in practice

There are many good ways to implement this idea in practice. Most test automation tools provide one or two levels of indirection that can be used for this. In fact, this is why I think Cucumber found such a sweet spot for browser based user interface testing. With Cucumber, step definitions implemented in a programming language naturally sit with developers and this is where the technical activity level UI can be described. These step definition can then be reused to create scenarios (user interface workflow level), and scenario outlines can be used to efficiently describe tests at the business rule level.

New SLIM test runner for FitNesse provides similar levels of isolation. The bottom fixture layer sits naturally with the technical activity level. Scenario definitions can be used to describe workflows at the activity level. Scenario tables then present a nice, concise view at the business rule level.

Robot Framework uses “keywords” to describe tests, and allows us to define keywords either directly in code (which becomes the technical level) or by combining existing keywords (which becomes the workflow and business rule level).

The Page Object idea from Selenium and WebDriver is a good start, but stops short of finishing the job. It requires us to encapsulate the technical activity level into higher level “page” functionality. These can then be used to describe business workflows. It lacks the consolidation of workflows into the top business rule level — so make sure to do create this level yourself in the code. (Antony Marcano also raised a valid point that users think about business activities, not page functionality during CITCON Europe 09, so page objects might not be the best way to go anyway).

TextTest works with xUseCase recorders, an interesting twist on this concept that allows you to record the technical level of step definitions without having to program it manually. This might be interesting for thick-client UIs where automation scripts are not as developed as in the web browser space.

With Twist, you can record the technical level and it will create fixture definitions for you. Instead of using that directly in the test, you can use “abstract concepts” to combine steps into workflow activities and then use that for business level testing. Or you can add fixture methods to produce workflow activities in code.

Beware of programming in text

Looking at UI tests at these three levels is I think generally a good practice. Responsibility for automation at the user interface level is something that each team needs to decide depending on their circumstances.

Implementing the workflow level in plain text test scripts (Robot Framework higher level keywords, Twist abstract concepts, SLIM scenario tables) allows business people and testers who aren’t automation specialists to write and maintain them. For some teams, this is a nice benefit because developers can then focus on other things and testers can engage earlier. That does mean, however, that there is no automated refactoring, syntax checking or anything like that at the user interface automation level.

Implementing the workflow level in code enables better integration and reuse, also giving you the possibility of implementing things below the UI when that is easier, without disrupting the higher level descriptions. It does, however, require people with programming knowledge to automate that level.

An interesting approach that one team I interviewed had is to train testers to write code enough to be able to implement the user activity level in code as well. This doesn’t require advanced programming knowledge, and developers are there anyway to help if someone gets stuck.

Things to remember

To avoid shooting yourself in the foot with UI tests, remember these things:

  • Think about UI test automation at three levels: business rules, user interface workflow and technical activity
  • Even if the user interface workflow automation gets implemented in plain text, make sure to put one level of abstraction above it and describe business rules directly. Don’t describe rules as workflows (unless they genuinely deal with workflow decisions – and even then it’s often good to describe individual decisions as state machines).
  • Even if the user interface workflow automation gets implemented in code, make sure to separate technical activities required to fulfil a step into a separate layer. Reuse these step definitions to get stability and easy maintenance later.
  • Beware of programming in plain text.

I'm Gojko Adzic, author of Impact Mapping and Specification by Example. My latest book is Fifty Quick Ideas to Improve Your User Stories. To learn about discounts on my books, conferences and workshops, sign up for Impact or follow me on Twitter. Join me at these conferences and workshops:

Specification by Example Workshops

Product Owner Survival Camp

Conference talks and workshops

16 thoughts on “How to implement UI testing without shooting yourself in the foot

  1. Great post! I fully agree that UI level testing is sometimes needed and your guidelines help avoiding the biggest traps.

    You wrote that when the workflow level is implemented using plain text “there is no automated refactoring, syntax checking or anything like that at the user interface automation level”. With Robot Framework this situation is getting better as RIDE [1] matures. It already has a very useful keyword completion feature, and I expect it go get first refactoring features (rename keyword, extract keyword) and syntax checking still this year.

    [1] http://code.google.com/p/robotframework-ride

  2. I thought I’d mention that the next generation of your WebTest framework, Selenesse, is going strong in at least one company. And if I’m not mistaken, another company has ported it to .NET and will be making that public soon.

  3. This could be implemented also to tests with different scope than UI? Eg. testing TCP protocol, can have something roughly:

    Business rule: create TCP connection.
    Workflow : send TCP packet with SYN flag ON
    receive TCP packet with SYN and ACK flags ON
    send TCP packet with ACK flag ON.
    Technical level: these messages are created using protocol tester.

  4. @Ismo

    I don’t see why not. Other teams I interviewed had similar maintenance problems with technical tests involving databases – that could have been solved similarly.

    Although I don’t understand completely your example – is establishing a TCP connection something that really delivers value or is it just a part of a larger business rule (eg in these conditions, TCP connection happens; in these other conditions it doesn’t happen…)?

  5. Pingback: Adam Goucher » Blog Archive » A Smattering of Selenium #15

  6. In case you are writing a TCP/IP protocol stack itself (not an application top of that), establishing a TCP connection might be a business rule?

    Do I make any sense? :)

  7. Hi Gojko, really useful one indeed. And the picture really catches the mood :-)

    In general, I realized that for some strange unknown reason developers tend not to use abstraction when writing tests. Can’t really say why though…

    Thanks

  8. Hi Gojko,

    I just came across this post. Extremely insightful information!! We’ve implemented something along the lines you’ve described leveraging the Page Object pattern, Selenium and FitNesse. Thus far it has worked out rather well for us. We’ve attempted to try to strictly enforce clear and concise FitNesse pages by recently incorporating the Given/When/Then syntax along with an abbreviated FitNesse test table – a kind of hybrid test page. This approach seems to get the message across quite well. We are currently investigating the use of either the GivWenZen library for FitNesse or possibly moving towards Cucumber, where the Given/When/Then sort of syntax is supported out of the box. Here is an example of one of our FitNesse pages

    Given an External Manager has accessed the Performance Reports page
    When the External Manager selects to run a Daily Summary report
    Then the report table is successfully returned
    And the report’s criteria message is displayed

    !|fixtures.reports.DailySummaryReport |
    |role |is the report table returned?|is the report criteria present?|
    |External Manager|true |true |

  9. sorry but I don’t see the advantage. Even with the three levels of test, from the technical level if you change a CSS you will still need to change too many tests. So what’s the gain? Can you explain it?

  10. no, you don’t have to change any of your tests if you change the CSS. CSS changes only affect related technical components (bottom level). none of the business test specifications (level 1) need to change.

  11. I understand that level 1 tests don’t change for a CSS change.

    But I’ll like to understand the gain from a level 3 viewpoint.

    If I, as a level 3 tester, I have 30 Selenium tests that depen on a CSS, and the CSS changes, that’s still 30 changes, isn’t?

  12. there are no separate level 1/level 3 tests or testers. each test is spanning all three levels, with specification in business language on level 1 and automation spanning levels 2 and 3 for easy automation, sharing and reusing workflows and technical components with other tests.

  13. Pingback: Eine Tour durch die Testpyramide – Namics Weblog

  14. Pingback: User Story Acceptance « Tales from a Trading Desk

  15. Great article. I fully agree with all of your observations and recomendations. I’d like to add only one thought:
    Abstraction decreases from technical layer toward business rules. I know, might be confusing my statement at first. There is natural tendency to detach the technical layer from the concrete properties of the application under test. Eg. navigating in an explorer like tree or finding/modifying cell in a table etc. So far so good.
    On the other hand at the business layer, test cases should be domain specific and use concrete exemplar values if possible.
    Keeping the intention to abstract the implementation of business rules layer test from the specific domain results in hardly to understand test cases as the test datas are hidden in variables, data files or database setup.
    When exclusively software developers are responsible to implement all the 3 layers the probability for trying to keep the genericity is high.

  16. Pingback: Automation applied to an efficient operation will magnify the efficiency* | Trivento Improve

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>