Unit testing database queries

In this post I will try to describe an approach to unit testing database queries. It happened to be inspired by some testing I was doing of some C# code, but you could apply the same ideas to other languages. Note that if you’re unit testing stored procedures then I recommend using something like tSQLt, which I have written about already.

Query under test

I’m assuming that you have a query with a predicate of several parts, for instance

from shipment in context.Shipments
where shipment.Weight <= 100 &&
            shipment.Weight > 10 &&
            shipment.BiggestDimension <= 80 &&
            shipment.BiggestDimension > 5
select shipment.TrackingId;

This is in LINQ, but I hope it’s close enough to SQL for you to understand. This can be generalized to something like this diagram:

A rectangle of data you want inside a rectangle of data you don't want

Your query’s predicate might produce a different shape, but I hope you get the idea.

Pinning down the code

The obvious thing to do when testing the code is to check that it does what you expect. In the case of our query, checking that if there’s a shipment with a weight of 30 and a biggest dimension of 40 then it will be returned by the query. This is good but not enough. If I wrote code that returned everything in the shipment table, this would pass that test – not what we want. To counter this, we also need to check that the code doesn’t do what we don’t expect.

So, our test needs test data that is a combination of things we expect to be returned and things we expect not to be returned.

There are at least two ways we could approach this – one is a big bang, heroic, approach and the other is several smaller steps that each address part of the task. For me, heroism is a design and code smell. By that I mean: it’s not definitely a sign that a design or some code is bad, but it’s enough to draw my attention as often it’s the wrong approach.

Heroic tests

In the heroic approach, you are aiming for a single run of the query. This run will exclude several bits of data, and include several other bits (hopefully the right bits in both cases). In such tests, the setting up of the test data before the single test run is non-trivial – you will be setting up data inside the inner rectangle above, plus data outside it. My example is relatively simple and can be drawn in 2D; real life examples can get much more complex and so the test data is correspondingly harder to set up.

Checking that just the correct stuff has been returned is also tricky. It’s also often hard to see why a bit of data should be included or excluded. All this means:

The test is hard to understand, and tests should be a valuable source of documentation for your code, so the test is not fulfilling that duty;
It’s hard to debug the test if it doesn’t do what you expect;
It’s hard to change the test should the query under test change in the future.

If running the query were expensive, then this might be a good approach. However, I’m assuming that it’s not because, for instance, you are using a mock version of the database.

One reason why this approach feels wrong for unit tests is that you’re burdening yourself with the data set-up demands you often experience for integration or system tests. There’s a reason why the testing pyramid is smaller the higher up you go – the higher-level tests are more expensive to create and maintain, so you tend to have fewer of them. This is OK, because they are usually each of higher value than the tests in the level below in the test pyramid.

Having tricky and brittle data set-up for a unit test is coupling a high cost to a low value. Individual unit tests are low value, but their large numbers is how they’re useful as form of testing – they can cover all the edge cases that are trickier to test at higher levels.

Stronger tests

The approach I prefer produces tests that are stronger – they are simpler to understand, easier to debug and easier to change. Instead of trying to pin everything down in a single test run, you do a separate test run per element of your query’s predicate. For each test run you need only two bits of test data to be set up: one that you expect to be returned and one that you expect to not be returned.

This is like taking a single part of the inner rectangle above that defines the data that you want, e.g. its right-hand edge, and putting one data point just inside that line and one data point just outside it. You repeat this process for each line that defines the inner rectangle – one pair for the top, another for the left and yet another for the bottom.

An example set of tests based on the query above is:

Test name	Bit of predicate tested	Data to return	Data to ignore
Heavy shipments are ignored	shipment.Weight <= 100	Weight = 100	Weight = 101
Light shipments are ignored	shipment.Weight > 10	Weight = 11	Weight = 10
Too big shipments are ignored	shipment. BiggestDimension <= 80	BiggestDimension = 80	BiggestDimension = 81
Too small shipments are ignored	shipment. BiggestDimension > 5	BiggestDimension = 6	BiggestDimension = 5

You might have also noticed that the values in the two right-hand columns test exactly at the boundary, e.g. the first row has the largest valid value and the smallest invalid value. If you don’t do this then things like the details of the operator (<= rather than <, for instance) could trip you up. It’s also the kind of thing highlighted by mutation testing.

You might have also noticed that the data in the two right-hand columns has only some of its values described. This is because I’m assuming that you’re using a test method to create the data, that uses default values for its parameters. The default values will be for something that you expect to be returned by the query, e.g. weight = 80 and biggest dimension = 50.

This means each test has the following structure:

Call test data method to create data to return
Call test data method to create data to ignore
Run code under test
Check that the returned data is exactly the data you expect

You could simplify this is a bit further, if you use conventions. You might have two outputs, such as shipment tracking ids, defined: e.g. TRACKING_ID_A = 879, TRACKING_ID_B = 564. You could arrange the data creation calls such that TRACKING_ID_A is always on the data you expect to be returned, and TRACKING_ID_B is always on the data you expect to be ignored. You could then wrap steps 3 and 4 above into a single shared method that is used by all tests. This shared method will run the code under test and check that a single result is returned with tracking id == TRACKING_ID_A.

If you need to change the query so that e.g. the upper weight limit changes from 100 to 300, you will need to make two small changes:

Only one test needs to change – to be precise, its two data creation calls will need to each have one parameter changed;
You need to make sure that the default weight in the method that creates test data stays in the range of data you expect to be returned.

Limitations

This approach is good for much but not all of database query unit tests. You might need to do other tests such as returning 0 values, returning many values etc. This approach won’t help with those kinds of tests, but will take care of checking the edges of the predicate.

Summary

Heroism in design or coding should ring alarm bells. In the case of unit testing database queries, a big bang, heroic, approach produces tests that are poor now and in the future. In contrast, a simpler, divide-and-conquer, approach produces tests that are better now and in the future too. If you really like crafting intricate test data, then I recommend you point that enthusiasm towards integration or system tests.