An introduction to Behaviour-Driven Testing with Cucumber / SpecFlow

BDD (Behaviour-Driven Development) evolved out of TDD (Test-Driven Development), and is an approach to testing and development in general.  I have found it has benefits even if you just use it as a way of doing testing, rather than having tests drive what you develop.  There are a few different frameworks that allow you to do BDD – for instance Jasmine and CucumberSpecFlow is an open source port of Cucumber to the C# world.

The benefits I’ve found to testing with SpecFlow are:

  • A test is broken up into more than one level of abstraction – at least 2, often 3 or 4;
  • Tests can be written concisely (while still being understandable and effective);
  • Tests are robust against changes in low-level details like API calls.

Different Levels of Abstraction

Documentation as part of the test

In the past, I have read and written test documents that can help you navigate the tests, understand what the various bits do and so on.  However, they can easily not exist, have gaps in or not be up to date.  With Cucumber, the highest level of abstraction of your test is in a natural language like English, albeit in a slightly structured form such as:

Scenario: the client successfully places an order for something that’s in stock
there are 17 widgets in stock
When the client places an order for 4 widgets
Then an order is created for 4 widgets
And the stock of widgets is 13

This really is part of your test – it will be in source control, and typos might mean your test doesn’t work correctly.  This is actually a language called Gherkin, but it reads like English (or 60 or so other natural languages).  This highest level is suitable to share with pretty much everyone – customers, management, documentation, support, development, ops etc.  The idea is that there can be a common understanding based on a common language.

However, even if the tests stay within the world of development, this high level lets programmers understand what a test’s trying to achieve, what the tests cover (and what they don’t), and so on.  You can divide and conquer the problem of testing, by worrying about only a part of it at a time – the high-level stuff, the choreography, or the low-level stuff – rather than having to wade through a mass of detail and try to mentally reverse-engineer the higher-level parts.

The colours above are mine, added to highlight key parts of the test.  If you use SpecFlow inside Visual Studio, you don’t get these colours but instead there will be colours auto-generated to indicate how (if at all) the lower levels of the test will interpret the Gherkin.

The first line – Scenario – is just a text label for the rest of the test.  The guts of the test are the remaining lines, and they follow the common testing pattern:

  1. Arrange (Given) = Get the system under test into the correct starting state ready for the test to begin;
  2. Act (When) = Do some action that you expect to move the system under test from its starting state to some other state;
  3. Assert (Then) = Test that the system under test is in the new state that you expect.

Each section can be more than one line, as in the example above.  The first line in each section starts with Given, When or Then, and any subsequent lines start with And.  In theory that could be confusing, but if you see a line that starts with And you just read up to the first line that doesn’t start with And to get the name of the section.  The sections are all optional, so if there is no starting state to worry about you can go straight to When.

Actually doing stuff

This high level text is very useful for understanding but not enough to actually do stuff, like get the system under test into the correct starting state, do the key action, or test the results.

This is where the other level[s] of abstraction come in.  From here on down things are written in code, e.g. C#, and the join between the world of code and the world of Gherkin is made using regular expressions and C# attributes.

So, in another file you will have some C# like this:

[Given(@"^there are (\d+) (\w+) in stock$")]
public void CreateInitialStock(int stockLevel, string productType)
    AddProductToInventory(productType, stockLevel);

[When(@"^the client places an order for (\d+) (\w+)$")]
public void PlaceOrder(int orderAmount, string productType)
    SubmitOrder(orderAmount, productType);

[Then(@"^an order is created for (\d+) (\w+)$")]
public void IsOrderCreated(int orderAmount, string productType)
    Assert.IsTrue(OrderExists(productType, orderAmount);

[Then(@"^the stock of (\w+) is (\d+)$")]
public void TestStockLevel(string productType, int stockLevel)
    Assert.IsEqual(stockLevel, GetStockLevel(productType));

This is missing out some details, such as handling singular / plural nouns nicely.  These are normal C# methods, each with an attribute (Given, When or Then) above them.  The attributes each have a regular expression, and if the type of line matches the attribute’s name and the regular expression matches the rest of the line, then the method underneath the attribute is called.  The method’s parameters are passed values taken from the brackets in the regular expression that matched, in the order they matched, and each value is converted to the corresponding parameter’s type.

You would then need to write AddProductToInventory, SubmitOrder, OrderExists and GetStockLevel, but hopefully at least some of these exist already.  The ones that don’t will just be normal C# stuff.

So, you have to have two abstraction levels for the test to work:

  1. Gherkin
  2. C# code to implement the Gherkin

However, the C# level can often be made more manageable if you split it up, giving levels like this:

  1. Gherkin
  2. C# code that controls the choreography of the tests’ implementation
  3. C# code that handles the low-level details of interacting with things outside the test

(or even more, if you split up level 3.)

Concise tests

The main thing that drives conciseness isn’t things being terse, rather it’s removing duplication by factoring out common things, and also taking advantage of any patterns or structures that already exist in the world of the system under test.

Factoring out common things

The magic here is the use of attributes to bind bits of Gherkin to bits of C#.  Because of this, your Gherkin files and your C# files can have a many-to-many relationship that is resolved automatically for you by SpecFlow.

It might make sense to split up your Gherkin tests into files by functional area, for instance:

  • Users;
  • Stock;
  • Orders;
  • Bills.

Each of these files will contain tests that do things like:

  • Make API calls to manipulate the system under test;
  • Check details of responses to the APIs;
  • Check for the presence (and possibly contents) of files in the file system.

You can organise your C# files along the lines of this last list, without having to worry about how these bits will be assembled to test a given functional area.  So you can have one file that just creates the HTTP request to make an API call, another file that just worries about parsing and deserialising JSON, another to look at files on the file system and so on.  These lumps of C# can be general-purpose, with the details being filled in via parameters whose values come from the different tests.  One test in one Gherkin file can pull in code from any number of different C# files, as long as the code is linked to the Gherkin via attributes and regular expressions.

The alternative might be lots of copy-paste-edit, which is unpleasant for lots of reasons.

This diagram shows a fictitious but realistic example of how tests and code can be brought together.  There are different bits of low-level code that each concentrate on one area, e.g. talking to a database or to an external API.  Each test will select those bits of low-level code it needs, e.g. logging a user into a website, checking the state of the database and calling an external API.

specflow assembling code into tests

Taking advantage of patterns and structures

Things can get even more concise if you’re doing things like REST APIs because of the conventions behind REST.  For instance, to get a list of widgets the URL will be something like /widgets, and to get the widget with the id 12 the URL will be /widgets/12.  Big deal, you might think.  But to get a list of flanges will be /flanges and to get flange 20833 will be /flanges/20833 – there is a pattern across many (possibly all) API calls.

So instead of your Gherkin, attributes and code being

  • When the client requests a list of flanges + [When(@”^the client requests a list of flanges$”] -> MakeRequest(“/flanges”);
  • When the client requests a list of widgets + [When(@”^the client requests a list of widgets$”)] -> MakeRequest(“/widgets”)
  • Etc.

It can be just

  • When the client requests a list of whatever + [When(@”the client requests a list of (\w+)”)] -> MakeRequest($”/{listName}”)

You might not be doing REST API calls, but your problem domain could well have other structures and patterns that you can make use of, like conventions behind database table names etc.

Robust Tests

If you have tests, and then the details of the world outside the test change – such as API calls, XML schema, database table column names, etc. – then your tests won’t work until you update them to cope with the new reality.

If you have the details spread throughout your tests, with the minor differences between tests hard-coded in amongst the details that change only when e.g. API signatures changes, this can be painful.  In fact, it can be so painful that you don’t bother, and your tests rot so that a shrinking fraction of them still work (and the rest throws up errors or even prevents the good bit from running).

With SpecFlow it can be much easier to fight this kind of entropy.  The factoring out common things mentioned above means that the part that has to change can be restricted to a narrow slice of the tests, e.g. just the part dealing with the database and not the parts dealing with files or HTTP.  If you prevent low-level details from leaking up into higher levels, then this narrow slice is also short and sitting at the bottom of the abstraction levels.

You could write your Gherkin to say

When the client makes a GET request to

However, this is bad for at least two reasons:

  1. It hides the intent of the test – is it this way because you’re referring back to the last line of the order you just submitted, or are you referring to a non-existent line of a non-existent order, or something else?
  2. It puts lots of unnecessary low-level detail up at a higher level than is needed or helpful.  Someone who doesn’t need to know this detail will have a harder time understanding the Gherkin because of it.  And, when the details change of the API call, you have lots of places to change in your tests.

So instead you should put something more like this in your Gherkin

Given the system contains an order for 12 widgets, 8 flanges, 51 hippopotami, and 18 walruses
When the client requests the last line of the order

And then the details of working out what the last line means in this particular context, and how to request the last line can be kept in lower-level parts of your test.  The part that depends on the context – working out what the previous order is, how many lines it has and so on – will need help from the higher levels of C# code.  However, the details of getting an order, getting lines of it and so on can be restricted to the lowest level of the C#.

This means changing the domain name from to localhost needs to affect only one place (the depths of the API code, where hopefully it is read in once from e.g. a config file and used many times).  When the API changes from v3 to v4 this will likewise affect only place, and similarly when the API to get lines of an order changes.

Costs and problems

As with any tool, it’s good to have an honest and full idea of it, so that you know what it’s not for as well as what it is for.  Also, knowing potential problems and costs of doing things is very useful.

There are two or three different abstraction levels, implemented in different ways, that need to be kept in sync.  I think that the levels exist in all code and tests, but they’re often not as explicit as this.  Some changes will break out of one level and affect lots of stuff – SpecFlow can limit the scope of some changes but not all of them.  Regular expressions aren’t everyone’s cup of tea, and they’re a fundamental bit of glue in this set-up.

By default, everything is global, so you will need to take care that you don’t inadvertently match something from a file you weren’t expecting.  You can make things have a smaller scope, but that’s not the default behaviour.

The way things are broken up and then glued together at run-time can cause problems, but I cover these in another post on gluing bits of the test together.

Having many combinations of things can quickly explode in a nasty way, but I cover this in a post on taming combinatorial explosions.

UPDATE 2018/05/27: Added the table showing an example mapping between tests and low-level code.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s